{"id":1865,"date":"2025-05-30T13:18:17","date_gmt":"2025-05-30T13:18:17","guid":{"rendered":"https:\/\/musictechohio.online\/site\/ai-models-falling-apart\/"},"modified":"2025-05-30T13:18:17","modified_gmt":"2025-05-30T13:18:17","slug":"ai-models-falling-apart","status":"publish","type":"post","link":"https:\/\/musictechohio.online\/site\/ai-models-falling-apart\/","title":{"rendered":"AI Models Show Signs of Falling Apart as They Ingest More AI-Generated Data"},"content":{"rendered":"<div>\n<div><img width=\"1200\" height=\"630\" src=\"https:\/\/wordpress-assets.futurism.com\/2025\/05\/ai-models-falling-apart.jpg\" class=\"attachment-full size-full wp-post-image\" alt=\"As CEOs trip over themselves to invest in AI, the models are falling apart at the seams and going mad from cannibalism.\" style=\"margin-bottom: 15px;\" decoding=\"async\" loading=\"lazy\"><\/div>\n<p>As <a href=\"https:\/\/futurism.com\/ceos-return-ai-investments\">CEOs trip over themselves<\/a> to invest in artificial intelligence, there&#8217;s a massive and growing elephant in the room: that any models trained on web data from after the advent of ChatGPT in 2022 are ingesting AI-generated data \u2014 an act of low-key cannibalism that may well be causing increasing technical issues that could come to threaten the entire industry.<\/p>\n<p>In a <a href=\"https:\/\/www.theregister.com\/2025\/05\/27\/opinion_column_ai_model_collapse\/\">new essay for <em>The Register<\/em><\/a>, veteran tech columnist Steven Vaughn-Nichols warns that even attempts to head off so-called &#8220;model collapse&#8221; \u2014 which occurs when large language models (LLMs) are fed <a href=\"https:\/\/futurism.com\/the-byte\/what-is-synthetic-data\">synthetic, AI-generated data<\/a> and consequently <a href=\"https:\/\/futurism.com\/the-byte\/ai-trained-with-ai-generated-data-gibberish\">go off the rails<\/a>\u00a0\u2014 are another kind of nightmare.<\/p>\n<p>As <a href=\"https:\/\/futurism.com\/the-byte\/ai-dumber\"><em>Futurism<\/em><\/a> and <a href=\"https:\/\/www.bbc.com\/audio\/play\/m00274wj\">countless<\/a> <a href=\"https:\/\/www.nytimes.com\/interactive\/2024\/08\/26\/upshot\/ai-synthetic-data.html\">other<\/a> <a href=\"https:\/\/apnews.com\/article\/ai-artificial-intelligence-training-data-running-out-9676145bac0d30ecce1513c20561b87d\">outlets<\/a> have reported over the <a href=\"https:\/\/futurism.com\/ai-trained-ai-generated-data-interview\">last few years<\/a>, the AI industry has continuously barreled toward the moment at which all available authentic training data \u2014 that is, information that was produced by humans and not AI \u2014 <a href=\"https:\/\/futurism.com\/the-byte\/ai-training-data-shortage\">will be exhausted<\/a>. Some pundits, <a href=\"https:\/\/techcrunch.com\/2025\/01\/08\/elon-musk-agrees-that-weve-exhausted-ai-training-data\/\">including Elon Musk<\/a>, believe we&#8217;re already there.<\/p>\n<p>To circumvent this &#8220;<a href=\"https:\/\/www.technologyreview.com\/2024\/07\/24\/1095263\/ai-that-feeds-on-a-diet-of-ai-garbage-ends-up-spitting-out-nonsense\/\">Garbage In\/Garbage Out<\/a>&#8221; conundrum, industry titans including Google, OpenAI, and Anthropic have engaged in what&#8217;s known as retrieval-augmented generation (RAG), which essentially involves plugging LLMs up to the internet so they can look things up if they&#8217;re presented with prompts that don&#8217;t have answers in their training data.<\/p>\n<p>That concept seems pretty intuitive on its face, especially when presented with the specter of rapidly-approaching model collapse. There&#8217;s only one problem: the internet is now full of lazy content that uses AI to drum up answers to common questions, often with hilariously bad and inaccurate results.<\/p>\n<p>In a recent study from the research arm of Michael Bloomberg&#8217;s media empire that was <a href=\"https:\/\/aclanthology.org\/2025.naacl-long.281\/\">presented at a computational linguistics conference<\/a> in April, 11 of the latest LLMs, including OpenAI&#8217;s GPT-4o, Anthropic&#8217;s Claude-3.5-Sonnet, and Google&#8217;s Gemma-7B, produced far more &#8220;unsafe&#8221; responses than their non-RAG counterparts. As the paper put it, those safety concerns can include &#8220;harmful, illegal, offensive, and unethical content, such as spreading misinformation and jeopardizing personal safety and privacy.&#8221;<\/p>\n<p>&#8220;This counterintuitive finding has\u00a0far-reaching implications given how ubiquitously RAG is used in [generative AI] applications such as customer support agents and question-answering systems,&#8221; explained Amanda Stent, Bloomberg&#8217;s head of AI research and strategy, in another interview with Vaughn-Nichols <a href=\"https:\/\/www.zdnet.com\/article\/rag-can-make-ai-models-riskier-and-less-reliable-new-research-shows\/\">published in <em>ZDNet<\/em><\/a> earlier this month. &#8220;The average internet user interacts with RAG-based systems daily. AI practitioners need to be thoughtful about how to use RAG responsibly.&#8221;<\/p>\n<p>So if AI is going to run out of training data \u2014 or it has already \u2014 and plugging it up to the internet doesn&#8217;t work because the internet is now full of AI slop, where do we go from here? Vaughn-Nichols notes that <a href=\"https:\/\/venturebeat.com\/ai\/synthetic-data-has-its-limits-why-human-sourced-data-can-help-prevent-ai-model-collapse\/\">some folks<\/a> have <a href=\"https:\/\/arxiv.org\/abs\/2404.01413\">suggested mixing authentic and synthetic<\/a> to produce a heady cocktail of good AI training data \u2014 but that would require humans to keep creating real content for training data, and\u00a0the AI industry is actively undermining the incentive structures fo them to continue \u2014 while <a href=\"https:\/\/futurism.com\/nick-clegg-scoffs-ai-copyright\">pilfering their work without permission<\/a>, of course.<\/p>\n<p>A third option, Vaughn-Nichols predicts, appears to already be in motion.<\/p>\n<p>&#8220;We&#8217;re going to invest more and more in AI, right up to the point that model collapse hits hard and AI answers are so bad even a brain-dead CEO can&#8217;t ignore it,&#8221; he wrote.<\/p>\n<p><strong>More on AI in crisis:<\/strong> <a href=\"https:\/\/futurism.com\/nick-clegg-scoffs-ai-copyright\"><em>Legendary Facebook Exec Scoffs, Says AI Could Never Be Profitable If Tech Companies Had to Ask for Artists&#8217; Consent to Ingest Their Work<\/em><\/a><\/p>\n<p>The post <a href=\"https:\/\/futurism.com\/ai-models-falling-apart\">AI Models Show Signs of Falling Apart as They Ingest More AI-Generated Data<\/a> appeared first on <a href=\"https:\/\/futurism.com\/\">Futurism<\/a>.<\/p>\n<\/div>\n<div style=\"margin-top: 0px; margin-bottom: 0px;\" class=\"sharethis-inline-share-buttons\" ><\/div>","protected":false},"excerpt":{"rendered":"<p>As CEOs trip over themselves to invest in artificial intelligence, there&#8217;s a massive and growing elephant in the room: that any models trained on web data from after the advent&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[529,177,1161,1162,1163],"tags":[],"class_list":["post-1865","post","type-post","status-publish","format-standard","hentry","category-ai-training","category-artificial-intelligence","category-llms","category-retrieval-augmented-generation","category-synthetic-data"],"_links":{"self":[{"href":"https:\/\/musictechohio.online\/site\/wp-json\/wp\/v2\/posts\/1865","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/musictechohio.online\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/musictechohio.online\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/musictechohio.online\/site\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/musictechohio.online\/site\/wp-json\/wp\/v2\/comments?post=1865"}],"version-history":[{"count":0,"href":"https:\/\/musictechohio.online\/site\/wp-json\/wp\/v2\/posts\/1865\/revisions"}],"wp:attachment":[{"href":"https:\/\/musictechohio.online\/site\/wp-json\/wp\/v2\/media?parent=1865"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/musictechohio.online\/site\/wp-json\/wp\/v2\/categories?post=1865"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/musictechohio.online\/site\/wp-json\/wp\/v2\/tags?post=1865"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}