A groundbreaking study by researchers from Stanford University, Imperial College London, and the Internet Archive has revealed that 35% of all websites published since late 2022 are AI-generated or AI-assisted. The findings, published in the paper "The Impact of AI-Generated Text on the Internet", highlight the rapid transformation of the web since the launch of ChatGPT and similar tools.

Key Findings: AI’s Growing Dominance on the Web

The research team analyzed data from the Internet Archive to examine websites created between August 2022 and May 2025. Their analysis uncovered several critical trends:

  • 35% of new websites are now AI-generated or AI-assisted, up from 0% before ChatGPT’s release.
  • AI-generated content has led to a more cheerful and less verbose online landscape.
  • The study tested six major critiques of AI-generated text, including concerns about factual accuracy, source citation, and semantic diversity.

AI’s Impact on Online Discourse and Content

The researchers addressed pressing questions about AI’s role in shaping the internet:

  • Does AI shrink viewpoints and reduce semantic diversity?
  • Does it contribute to the spread of disinformation through hallucinations?
  • Does AI-generated writing feel more sanitized and less nuanced?
  • Does it fail to cite sources properly?
  • Does it produce low-semantic-density text?
  • Has it led to a monoculture where unique voices disappear in favor of generic, uniform styles?

Methodology: How Researchers Identified AI-Generated Websites

The team used a multi-step approach to analyze websites:

  1. They partnered with the Internet Archive to retrieve archived snapshots of websites from August 2022 to May 2025 using the Wayback Machine’s CDX Server API.
  2. The raw HTML of each snapshot was downloaded and stored for processing.
  3. Researchers then used Pangram v3, an AI-detection tool, to identify AI-generated websites. Pangram v3 was selected for its highest detection rate among tested tools.
  4. Once AI-generated websites were identified, the team used them as samples to test their six hypotheses about AI’s impact on the web.

Testing AI’s Influence on Factual Accuracy and Source Citation

The researchers employed rigorous methods to evaluate AI’s impact:

  • To assess factual accuracy, they extracted fact-based claims from AI-generated websites and hired human fact-checkers to verify them.
  • To determine whether AI cites sources properly, the team analyzed outbound links from AI-generated content.

"I find the sheer speed of the AI takeover of the web quite staggering. After decades of humans shaping it, a significant portion of the internet has become defined by AI in just three years. We're witnessing, in my opinion, a major transformation of the digital landscape in a fraction of the time it took to build in the first place."

Jonáš Doležal, AI researcher at Stanford and co-author of the study, speaking to 404 Media

Concerns Over Semantic Diversity and Stylistic Uniformity

The study raises concerns that AI-generated content may be contributing to a decline in semantic and stylistic diversity. Critics argue that AI tools, trained on vast datasets, may inadvertently promote a generic and uniform writing style, eroding the unique voices that once defined the web.

The researchers also explored whether AI-generated text leads to a reduction in viewpoints and an increase in disinformation, as hallucinations—false or misleading content generated by AI—become more prevalent.

Implications for the Future of the Internet

The findings underscore the rapid and profound impact of AI on the digital landscape. With 35% of new websites now AI-generated, the study highlights the need for further research into AI’s long-term effects on content quality, diversity, and reliability.

The research team’s work serves as a critical reminder of the challenges and opportunities presented by AI in shaping the future of online communication.

Source: 404 Media