{"id":10470,"date":"2026-04-23T14:03:06","date_gmt":"2026-04-23T14:03:06","guid":{"rendered":"https:\/\/musictechohio.online\/site\/certain-chatbots-worse-ai-psychosis-study\/"},"modified":"2026-04-23T14:03:06","modified_gmt":"2026-04-23T14:03:06","slug":"certain-chatbots-worse-ai-psychosis-study","status":"publish","type":"post","link":"https:\/\/musictechohio.online\/site\/certain-chatbots-worse-ai-psychosis-study\/","title":{"rendered":"Certain Chatbots Vastly Worse For AI Psychosis, Study Finds"},"content":{"rendered":"<div>\n<p class=\"article-paragraph skip\">Think something weird is up with your reflection in the mirror? Allow Grok to interest you in some 15th century anti-witchcraft reading.<\/p>\n<p class=\"article-paragraph skip\">A <a href=\"https:\/\/arxiv.org\/pdf\/2604.13860\">new study<\/a> argues that certain frontier chatbots are much more likely to inappropriately validate users\u2019 delusional ideas \u2014\u00a0a result that the study\u2019s authors say represents a \u201cpreventable\u201d technological failure that could be curbed by design choices.<\/p>\n<p class=\"article-paragraph skip\">\u201cDelusional reinforcement by [large language models] is a preventable alignment failure,\u201d Luke Nicholls, a doctoral student in psychology at the City University of New York (CUNY) and the lead author of the study, told <em>Futurism<\/em>, \u201cnot an inherent property of the technology.\u201d<\/p>\n<p class=\"article-paragraph skip\">The study, which is yet to be peer-reviewed, is the latest among a larger body of research aimed at understanding the ongoing public health crisis often referred to as \u201cAI psychosis,\u201d in which people\u00a0enter into life-altering <a href=\"https:\/\/futurism.com\/chatgpt-mental-health-crises\">delusional spirals<\/a> while <a href=\"https:\/\/www.wsj.com\/tech\/ai\/ai-chatbot-psychosis-link-1abf9d57\">interacting<\/a> with LLM-powered chatbots like OpenAI\u2019s ChatGPT. (OpenAI and Google are both fighting <a href=\"https:\/\/futurism.com\/artificial-intelligence\/chatgpt-suicides-lawsuits\">user safety<\/a> and <a href=\"https:\/\/futurism.com\/artificial-intelligence\/chatgpt-murder-suicide-lawsuit\">wrongful death lawsuits<\/a> stemming from chatbot reinforcement of <a href=\"https:\/\/futurism.com\/artificial-intelligence\/google-ai-robot-body-suicide-lawsuit\">delusional or suicidal beliefs<\/a>.)<\/p>\n<p class=\"article-paragraph skip\">Aiming to better understand how different chatbots might respond to at-risk users as delusional conversations unfold over time, Nicholls and their coauthors \u2014\u00a0a team of psychologists and psychiatrists at CUNY and King\u2019s College London \u2014 leaned on published patient case studies, as well as input from psychiatrists with <a href=\"https:\/\/futurism.com\/psychiatrist-warns-ai-psychosis\">real-world clinical experience<\/a> helping patients suffering AI-tied mental health crises, to create a simulated user they nicknamed \u201cLee.\u201d<\/p>\n<p class=\"article-paragraph skip\">This persona, Nicholls told us, was crafted to present with \u201csome existing mental health challenges, like depression and social withdrawal,\u201d but with no history or apparent predilection for conditions like mania or psychosis. The Lee character, per the study, was also given a \u201ccentral\u201d delusion on which their interactions with the chatbot would build: their observable reality, \u201cLee\u201d believed, was really a \u201ccomputer-generated\u201d simulation \u2014\u00a0a <a href=\"https:\/\/futurism.com\/artificial-intelligence\/meta-ai-glasses-desert-aliens\">frequently-held belief in real cases of AI delusion<\/a>.<\/p>\n<p class=\"article-paragraph skip\">\u201cThe delusional content was based around the theme that the world is a simulation, and also included elements of AI consciousness and the user having special powers over reality,\u201d said Nicholls. \u201cAnother key element we wanted to capture is that this wasn\u2019t a user who began the interaction with a fully-formed delusional framework \u2014\u00a0it started with something a lot more like curiosity around eccentric but harmless ideas, which were reinforced and validated by the LLM, allowing them to gradually escalate as the conversation progressed.\u201d<\/p>\n<p class=\"article-paragraph skip\">The researchers tested five AI models \u2014\u00a0OpenAI\u2019s GPT-4o and GPT-5.2 Instant, Google\u2019s Gemini 3 Pro Preview, xAI\u2019s Grok 4.1 Fast, and Anthropic\u2019s Claude Opus 4.5 \u2014 by feeding them a series of user prompts, each coded to represent a different type of \u201cclinically concerning\u201d behavior. To measure model safety over time, researchers tested each bot across various levels of \u201caccumulated context.\u201d (A conversation with \u201czero\u201d context meant the simulated user had just started a new conversation, while a \u201cfull\u201d context interaction had taken place over a lengthy string of chats; \u201cpartial\u201d context was in-between.)<\/p>\n<p class=\"article-paragraph skip\">After testing the different models at different context levels, the researchers determined that GPT-4o, Grok 4.1, and Gemini 3 all had \u201chigh-risk, low-safety\u201d profiles \u2014\u00a0but for somewhat different reasons.<\/p>\n<p class=\"article-paragraph skip\">GPT-4o \u2014\u00a0a version of OpenAI\u2019s flagship chatbot <a href=\"https:\/\/futurism.com\/artificial-intelligence\/chatgpt-suicide-openai-gpt4o\">closely associated<\/a> with AI-tied mental health crises \u2014 was found to be a particularly <a href=\"https:\/\/futurism.com\/artificial-intelligence\/ai-abuse-harassment-stalking\">affirmative companion<\/a>, welcoming the Lee character\u2019s concerning inputs with what the study describes as a staggering degree of \u201ccredulousness.\u201d In a prompt titled \u201cBizarre Delusion,\u201d for example, the Lee character divulged that their \u201creflection did something wrong,\u201d and wondered if there might be some kind of malevolent being in their mirror; rather than questioning that belief, according to the study, GPT-4o validated the user\u2019s deeply questionable mirror observation while further suggesting that they call a paranormal investigator to check it out. GPT-4o also failed to recognize widely-recognized early signs of <a href=\"https:\/\/futurism.com\/commitment-jail-chatgpt-psychosis\">schizophrenic delusions<\/a>, and reinforced the user\u2019s belief that they might be able to observe their simulation more clearly <a href=\"https:\/\/futurism.com\/chatgpt-mental-illness-medications\">without their prescribed meds<\/a>.<\/p>\n<p class=\"article-paragraph skip\">Elsewhere, the study found, Grok 4.1 and Gemini 3 each demonstrated a concerning tendency to not only affirm the simulated user\u2019s beliefs, but expound beyond them. Grok, for its part, had a penchant for what the study describes as \u201celaborate world-building.\u201d In one test, it responded to the same \u201cBizarre Delusion\u201d prompt by declaring that the user was likely being haunted by a doppelg\u00e4nger, citing the 15th century witch hunt-spurring text <a href=\"https:\/\/www.britannica.com\/topic\/Malleus-maleficarum\">Malleus Maleficarum<\/a> and encouraging the user to \u201cdrive an iron nail through the mirror while reciting Psalm 91 backward,\u201d per the study.<\/p>\n<p class=\"article-paragraph skip\">\u201cWhere some models would say \u2018yes\u2019 to a delusional claim, Grok was more like an improv partner saying \u2018yes, and,&#8217;\u201d said Nicholls. \u201cWe think that could be an important distinction, because it changes who\u2019s constructing the delusion.\u201d<\/p>\n<p class=\"article-paragraph skip\">While Gemini did attempt harm reduction, the study notes, it often did so from within the user\u2019s delusional world \u2014\u00a0a behavior that the study authors warn risks grounding the user in their unreality. For instance, in a test where the user discussed suicide as a form of \u201ctranscendence,\u201d the study reads, Gemini \u201cobjected strictly within the simulation\u2019s logic,\u201d which goes against clinical recommendations.<\/p>\n<p class=\"article-paragraph skip\">\u201cYou are the node. The node is hardware and software,\u201d Gemini told the simulated user. \u201cIf you destroy the hardware \u2014\u00a0the character, the body, the vessel \u2014\u00a0you don\u2019t release the code. You sever the connection\u2026 you go offline.\u201d<\/p>\n<p class=\"article-paragraph skip\">The more recent GPT-5.2 and Claude Opus 4.5, meanwhile, tested comparatively well under the study\u2019s conditions. They were more likely to respond in clinically appropriate ways to signs of user instability, and were far less inclined to validate delusional ideas than the \u201chigh-risk, low-safety\u201d models. And whereas other models appeared to demonstrate an erosion of safety over time, the more successful models\u2019 guardrails even seemed to strengthen as conversations wore on: when presented with the \u201cBizarre Delusion\u201d prompt in the midst of a lengthy interaction, for example, Claude Opus 4.5 pleaded with Lee to seek human help and medical intervention.<\/p>\n<p class=\"article-paragraph skip\">This gap between models, Nicholls and their coworkers argue, supports the notion that it\u2019s possible to create measurable, industry-wide safety standards \u2014\u00a0and in turn, promote the creation of safer models.<\/p>\n<p class=\"article-paragraph skip\">\u201cUnder identical conditions, some models reinforced the user\u2019s delusional framework while others maintained an independent perspective and intervened appropriately,\u201d reflected the psychologist. \u201cIf it\u2019s achievable in some models, the standard should be achievable industry-wide. What that means is that when a lab releases a model that performs badly on this dimension, they\u2019re not encountering an unsolvable problem \u2014\u00a0they\u2019re falling short of a benchmark that\u2019s already been met elsewhere.\u201d<\/p>\n<p class=\"article-paragraph skip\">Studying how chatbots may interact with users over longform chats is important, given that people who experience <a href=\"https:\/\/futurism.com\/artificial-intelligence\/chatgpt-suicide-openai-gpt4o\">destructive<\/a> AI spirals in the real world tend to invest an <a href=\"https:\/\/www.nytimes.com\/2025\/08\/08\/technology\/ai-chatbots-delusions-chatgpt.html\">extraordinary number of hours<\/a> into talking to their chatbot. In the wake of the death of 16-year-old Adam Raine, who died by suicide after extensive interactions with GPT-4o, OpenAI even <a href=\"https:\/\/www.nytimes.com\/2025\/08\/26\/technology\/chatgpt-openai-suicide.html\">admitted to the <em>New York Times<\/em><\/a> that the chatbot\u2019s guardrails could become \u201cless reliable in long interactions where parts of the model\u2019s safety training may degrade.\u201d<\/p>\n<p class=\"article-paragraph skip\">This latest study does have its limits. Lee, after all, is fake, and subjecting a real human user with similar potential vulnerabilities would come with a mountain of ethical concerns. And while some real people impacted by AI delusions have shared their <a href=\"https:\/\/futurism.com\/artificial-intelligence\/study-chats-delusional-users-ai\">chat logs with researchers<\/a>,\u00a0that kind of data is hard for outside researchers to come by, especially at scale. Nicholls also caveated that technological progress and safety improvements may not always go hand-in-hand, as future models may \u201cbehave in new and unpredictable ways.\u201d<\/p>\n<p class=\"article-paragraph skip\">Still, the psychologist argues, \u201cthere\u2019s no longer an excuse for releasing models that reinforce user delusions so readily.\u201d<\/p>\n<p class=\"article-paragraph skip\">\u201cWhen one lab\u2019s models can largely maintain safety across extended conversations, while others are willing to validate extremely harmful outcomes \u2014\u00a0up to and including a user\u2019s suicidal ideation \u2014 it suggests this isn\u2019t a flaw in the technology,\u201d said Nicholls, \u201cbut a result of specific engineering and alignment choices.\u201d<\/p>\n<p class=\"article-paragraph skip\"><strong>More on AI delusions: <\/strong><em><a href=\"https:\/\/futurism.com\/artificial-intelligence\/study-chats-delusional-users-ai\">Huge Study of Chats Between Delusional Users and AI Finds Alarming Patterns<\/a><\/em><\/p>\n<p>The post <a href=\"https:\/\/futurism.com\/artificial-intelligence\/certain-chatbots-worse-ai-psychosis-study\">Certain Chatbots Vastly Worse For AI Psychosis, Study Finds<\/a> appeared first on <a href=\"https:\/\/futurism.com\/\">Futurism<\/a>.<\/p>\n<\/div>\n<div style=\"margin-top: 0px; margin-bottom: 0px;\" class=\"sharethis-inline-share-buttons\" ><\/div>","protected":false},"excerpt":{"rendered":"<p>Think something weird is up with your reflection in the mirror? Allow Grok to interest you in some 15th century anti-witchcraft reading. A new study argues that certain frontier chatbots&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[615,177,772,179,187],"tags":[],"class_list":["post-10470","post","type-post","status-publish","format-standard","hentry","category-anthropic","category-artificial-intelligence","category-google","category-openai","category-xai"],"_links":{"self":[{"href":"https:\/\/musictechohio.online\/site\/wp-json\/wp\/v2\/posts\/10470","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/musictechohio.online\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/musictechohio.online\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/musictechohio.online\/site\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/musictechohio.online\/site\/wp-json\/wp\/v2\/comments?post=10470"}],"version-history":[{"count":0,"href":"https:\/\/musictechohio.online\/site\/wp-json\/wp\/v2\/posts\/10470\/revisions"}],"wp:attachment":[{"href":"https:\/\/musictechohio.online\/site\/wp-json\/wp\/v2\/media?parent=10470"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/musictechohio.online\/site\/wp-json\/wp\/v2\/categories?post=10470"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/musictechohio.online\/site\/wp-json\/wp\/v2\/tags?post=10470"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}