Study Reveals 35% of New Websites Are AI-Generated Since 2022

digital transformation AI-generated content ChatGPT impact Internet Archive study AI on the web AI content detection semantic diversity factual accuracy Pangram v3 Dead Internet Theory

A groundbreaking study by researchers from Stanford University, Imperial College London, and the Internet Archive has revealed that 35% of all websites published since late 2022 are AI-generated or AI-assisted. The findings, published in the paper "The Impact of AI-Generated Text on the Internet", highlight the rapid transformation of the web since the launch of ChatGPT and similar tools.

Key Findings: AI’s Growing Dominance on the Web

The research team analyzed data from the Internet Archive to examine websites created between August 2022 and May 2025. Their analysis uncovered several critical trends:

35% of new websites are now AI-generated or AI-assisted, up from 0% before ChatGPT’s release.
AI-generated content has led to a more cheerful and less verbose online landscape.
The study tested six major critiques of AI-generated text, including concerns about factual accuracy, source citation, and semantic diversity.

AI’s Impact on Online Discourse and Content

The researchers addressed pressing questions about AI’s role in shaping the internet:

Does AI shrink viewpoints and reduce semantic diversity?
Does it contribute to the spread of disinformation through hallucinations?
Does AI-generated writing feel more sanitized and less nuanced?
Does it fail to cite sources properly?
Does it produce low-semantic-density text?
Has it led to a monoculture where unique voices disappear in favor of generic, uniform styles?

Methodology: How Researchers Identified AI-Generated Websites

The team used a multi-step approach to analyze websites:

They partnered with the Internet Archive to retrieve archived snapshots of websites from August 2022 to May 2025 using the Wayback Machine’s CDX Server API.
The raw HTML of each snapshot was downloaded and stored for processing.
Researchers then used Pangram v3, an AI-detection tool, to identify AI-generated websites. Pangram v3 was selected for its highest detection rate among tested tools.
Once AI-generated websites were identified, the team used them as samples to test their six hypotheses about AI’s impact on the web.

Testing AI’s Influence on Factual Accuracy and Source Citation

The researchers employed rigorous methods to evaluate AI’s impact:

To assess factual accuracy, they extracted fact-based claims from AI-generated websites and hired human fact-checkers to verify them.
To determine whether AI cites sources properly, the team analyzed outbound links from AI-generated content.

"I find the sheer speed of the AI takeover of the web quite staggering. After decades of humans shaping it, a significant portion of the internet has become defined by AI in just three years. We're witnessing, in my opinion, a major transformation of the digital landscape in a fraction of the time it took to build in the first place."

— Jonáš Doležal, AI researcher at Stanford and co-author of the study, speaking to 404 Media

Concerns Over Semantic Diversity and Stylistic Uniformity

The study raises concerns that AI-generated content may be contributing to a decline in semantic and stylistic diversity. Critics argue that AI tools, trained on vast datasets, may inadvertently promote a generic and uniform writing style, eroding the unique voices that once defined the web.

The researchers also explored whether AI-generated text leads to a reduction in viewpoints and an increase in disinformation, as hallucinations—false or misleading content generated by AI—become more prevalent.

Implications for the Future of the Internet

The findings underscore the rapid and profound impact of AI on the digital landscape. With 35% of new websites now AI-generated, the study highlights the need for further research into AI’s long-term effects on content quality, diversity, and reliability.

The research team’s work serves as a critical reminder of the challenges and opportunities presented by AI in shaping the future of online communication.

Source: 404 Media

← Previous

Justice Breyer Defends Supreme Court's Shadow Docket Amid Growing Conc...

Lions' Kerby Joseph and Brian Branch on Track for Healthy 2024 Season, GM Brad Holmes Says

15:52 · 15 May 2026

ArXiv Implements Strict Penalties for AI-Generated Academic Slop, Bans Authors for One Year

ArXiv, the open-access repository of preprint academic research, will ban authors of papers for a year if they submit obviously AI-generated work. Lat...

15:19 · 15 May 2026

Behind the Blog: AI Challenges, ICE Surveillance, and the Beach Boys

This is Behind the Blog, where we share our behind-the-scenes thoughts about how a few of our top stories of the week came together. This week, we dis...

13:00 · 15 May 2026

Mayo Clinic’s AI Ambient Listening Raises Consent and Accuracy Concerns in ERs

Mayo Clinic, the massive U.S. hospital network, is using what it describes as “Ambient Listening” to record patient interactions with nurses, includin...

20:35 · 14 May 2026

Pentagon Official Warns AI Will ‘Revolutionize Warfare’—But Challenges Remain

Advanced artificial intelligence models will “fundamentally change warfare as we know it,” a top cyber official at the Defense Department said Thursda...

20:15 · 14 May 2026

White House Cyber Official Warns Identity Security is Critical as AI Expands Threats

As AI becomes more integrated into federal IT (and attacker toolsets) government agencies will need to focus their resources on regulating and monitor...

18:00 · 14 May 2026

DOGE’s 2025 USAID Shutdown Linked to Surge in African Violence, Study Reveals

🌘Subscribe to 404 Media to get The Abstract, our newsletter about the most exciting and mind-boggling science news and studies of the week. The sudde...

14:23 · 14 May 2026

Foxconn Hit by Nitrogen Ransomware Attack: 8TB Data Stolen, Factories Disrupted

Foxconn, one of the world’s largest manufacturers of electronics sold by major tech vendors, is recovering from a cyberattack that disrupted some of t...

13:30 · 14 May 2026

AI Poop Analysis App Sells User Database of 150,000 Stool Images to Highest Bidder

A few weeks ago, I came across a wild post on Reddit’s r/DHExchange, a subreddit for trading large datasets: “I hoarded a large database of something...

Cybersecurity

AI-Generated Websites Now Make Up One-Third of New Sites Since 2022, Study Reveals

Key Findings: AI’s Growing Dominance on the Web

AI’s Impact on Online Discourse and Content

Methodology: How Researchers Identified AI-Generated Websites

Testing AI’s Influence on Factual Accuracy and Source Citation

Concerns Over Semantic Diversity and Stylistic Uniformity

Implications for the Future of the Internet

Justice Breyer Defends Supreme Court's Shadow Docket Amid Growing Conc...

Lions' Kerby Joseph and Brian Branch on Track for Healthy 2024 Season,...

Cybersecurity

AI-Generated Websites Now Make Up One-Third of New Sites Since 2022, Study Reveals

Key Findings: AI’s Growing Dominance on the Web

AI’s Impact on Online Discourse and Content

Methodology: How Researchers Identified AI-Generated Websites

Testing AI’s Influence on Factual Accuracy and Source Citation

Concerns Over Semantic Diversity and Stylistic Uniformity

Implications for the Future of the Internet

Justice Breyer Defends Supreme Court's Shadow Docket Amid Growing Conc...

Lions' Kerby Joseph and Brian Branch on Track for Healthy 2024 Season,...

Related articles

ArXiv Implements Strict Penalties for AI-Generated Academic Slop, Bans Authors for One Year

Behind the Blog: AI Challenges, ICE Surveillance, and the Beach Boys

Mayo Clinic’s AI Ambient Listening Raises Consent and Accuracy Concerns in ERs

Pentagon Official Warns AI Will ‘Revolutionize Warfare’—But Challenges Remain

White House Cyber Official Warns Identity Security is Critical as AI Expands Threats

DOGE’s 2025 USAID Shutdown Linked to Surge in African Violence, Study Reveals

Foxconn Hit by Nitrogen Ransomware Attack: 8TB Data Stolen, Factories Disrupted

AI Poop Analysis App Sells User Database of 150,000 Stool Images to Highest Bidder