{"id":4370,"date":"2025-08-08T20:20:37","date_gmt":"2025-08-08T20:20:37","guid":{"rendered":"https:\/\/musictechohio.online\/site\/gpt-5-demo-dumb-errors\/"},"modified":"2025-08-08T20:20:37","modified_gmt":"2025-08-08T20:20:37","slug":"gpt-5-demo-dumb-errors","status":"publish","type":"post","link":"https:\/\/musictechohio.online\/site\/gpt-5-demo-dumb-errors\/","title":{"rendered":"GPT-5 Launch Demo Plagued With Catastrophically Dumb Errors"},"content":{"rendered":"<div>\n<div><img loading=\"lazy\" width=\"2400\" height=\"1260\" src=\"https:\/\/wordpress-assets.futurism.com\/2025\/08\/gpt-5-demo-dumb-errors.jpg\" class=\"attachment-full size-full wp-post-image\" alt=\"OpenAI's attempt to show off its latest GPT-5 model's awesome performance states produced wildly embarrassing gaffes.\" style=\"margin-bottom: 15px;\" decoding=\"async\"><\/div>\n<p><span style=\"font-weight: 400;\">OpenAI&#8217;s GPT-5 is finally here and already powering ChatGPT, but it <a href=\"https:\/\/futurism.com\/gpt-5-sucks\">hasn&#8217;t made a great first impression<\/a>.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In a livestream dedicated to the release, OpenAI tried to show off its newest large language model which CEO Sam Altman called a &#8220;significant step along the path to AGI&#8221;\u2014 but instead turned heads with some catastrophically dumb errors.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Across several examples, bar graphs intended to show off GPT-5&#8217;s awesome performance benchmarks, while appearing professional-looking, turned out to be horribly inaccurate nonsense upon closer inspection.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The gaffes were <a href=\"https:\/\/x.com\/shreyk0\/status\/1953509438255464603\">flagged on social media<\/a> and <\/span><a href=\"https:\/\/www.theverge.com\/news\/756444\/openai-gpt-5-vibe-graphing-chart-crime\"><span style=\"font-weight: 400;\">highlighted by <\/span><i><span style=\"font-weight: 400;\">The Verge<\/span><\/i><\/a><span style=\"font-weight: 400;\">. The <\/span><a href=\"https:\/\/x.com\/EgeErdil2\/status\/1953505551570415718\"><span style=\"font-weight: 400;\">most egregious example<\/span><\/a><span style=\"font-weight: 400;\"> is a bar graph comparing coding benchmark scores <\/span><span style=\"font-weight: 400;\">for GPT-5 compared to older models. Somehow, the bar for GPT-5&#8217;s score of 52.8 percent accuracy is nearly twice as tall as the bar for a score of 69.1 percent for the o3 model. Even more bafflingly, the 69.1 percent bar is the exact same size as another bar representing 30.8 percent for GPT-4o. Make it make sense!<\/span><\/p>\n<blockquote class=\"twitter-tweet\" data-width=\"500\" data-dnt=\"true\">\n<p lang=\"en\" dir=\"ltr\">this screenshot from GPT-5 livestream has to be among the worst chart crimes of the century <a href=\"https:\/\/t.co\/HXsK2CWCon\">pic.twitter.com\/HXsK2CWCon<\/a><\/p>\n<p>\u2014 Ege Erdil (@EgeErdil2) <a href=\"https:\/\/twitter.com\/EgeErdil2\/status\/1953505551570415718?ref_src=twsrc%5Etfw\">August 7, 2025<\/a><\/p>\n<\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<p><span style=\"font-weight: 400;\">OpenAI hasn&#8217;t confirmed if it used GPT-5 to generate the graphs \u2014 and at this point, it has every reason not to \u2014 but it&#8217;s an incredibly embarrassing mistake from a company that&#8217;s <a href=\"https:\/\/www.ft.com\/content\/ab1ef47e-3c5b-49e0-afea-7c8be9d351e4\">being valued<\/a> in the region of half a trillion smackeroos.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">It&#8217;s also a little poetic. Some research suggests that newer models could actually be <\/span><a href=\"https:\/\/futurism.com\/ai-industry-problem-smarter-hallucinating\"><span style=\"font-weight: 400;\">getting dumber<\/span><\/a> in key ways,<span style=\"font-weight: 400;\"> hallucinating more frequently <\/span><span style=\"font-weight: 400;\">than earlier versions. One <\/span><a href=\"https:\/\/venturebeat.com\/ai\/anthropic-researchers-discover-the-weird-ai-problem-why-thinking-longer-makes-models-dumber\/\"><span style=\"font-weight: 400;\">study<\/span><\/a><span style=\"font-weight: 400;\"> even found that the longer these new reasoning models &#8220;think,&#8221; the more their performance deteriorates. Other <\/span><span style=\"font-weight: 400;\">research implicates the AI slop that&#8217;s <\/span><a href=\"https:\/\/futurism.com\/ai-models-falling-apart\"><span style=\"font-weight: 400;\">increasingly poisoning the AI&#8217;s training data<\/span><\/a><span style=\"font-weight: 400;\">. Circling back to GPT-5&#8217;s bar graph, you have OpenAI trying to spin its lower score of 52.8 as actually being better than its predecessor&#8217;s.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Altman, playing it cool, tried to laugh off the blunder.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">&#8220;[W]ow a mega chart screwup from us earlier,&#8221; he <\/span><a href=\"https:\/\/x.com\/sama\/status\/1953513280594751495\"><span style=\"font-weight: 400;\">tweeted<\/span><\/a>, in his typical lower-case patois<span style=\"font-weight: 400;\">. &#8220;wen GPT-6?!&#8221;<\/span><\/p>\n<p>OpenAI corrected the charts in its <a style=\"cursor: pointer !important; user-select: none !important;\" href=\"https:\/\/openai.com\/index\/introducing-gpt-5\/\">blog post<\/a>, but the originals are still there in the livestream.<\/p>\n<p><span style=\"font-weight: 400;\">Human error may or may not be to blame for the charts, but following GPT-5&#8217;s release, users were quick to expose how error-prone <\/span><span style=\"font-weight: 400;\">its image- and diagram-generating capabilities remain. One asked ChatGPT to draw a map of two cities in Virginia with their neighborhoods labeled, prompting it to return names that were <\/span><a href=\"https:\/\/bsky.app\/profile\/pulpandpolitics.bsky.social\/post\/3lvvgdaxz522m\"><span style=\"font-weight: 400;\">complete gobbledygook<\/span><\/a><span style=\"font-weight: 400;\">.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">And in what should&#8217;ve been a layup for GPT-5, Ed Zitron of the &#8220;<\/span><a href=\"https:\/\/www.wheresyoured.at\/\"><span style=\"font-weight: 400;\">Where&#8217;s Your Ed At?<\/span><\/a><span style=\"font-weight: 400;\">&#8221; newsletter <\/span><a href=\"https:\/\/bsky.app\/profile\/edzitron.com\/post\/3lvua4fgc722k\"><span style=\"font-weight: 400;\">found<\/span><\/a><span style=\"font-weight: 400;\"> that the AI couldn&#8217;t even nail a simple map of the US. Ever think of visiting &#8220;West Wigina,&#8221; &#8220;Delsware,&#8221; &#8220;Fiorata,&#8221; or &#8220;Rhoder land&#8221;? Or maybe &#8220;Tonnessee&#8221; and &#8220;Mississipo?&#8221;<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The irony is that OpenAI <a href=\"https:\/\/futurism.com\/openai-new-image-generator-perfect-text\">bragged<\/a> back in March that an update for its previous GPT-4o model meant that ChatGPT could now excel at generating texts in images.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">&#8220;As you can tell now it&#8217;s very good at text,&#8221; one of the <\/span><a href=\"https:\/\/x.com\/OpenAI\/status\/1904602845221187829\"><span style=\"font-weight: 400;\">example generated images<\/span><\/a><span style=\"font-weight: 400;\"> read. &#8220;Look at all this accurate text!&#8221;<\/span><\/p>\n<p>Sounds like they might&#8217;ve spoken too soon. Or maybe AI models really are going backwards.<\/p>\n<p><strong>More on OpenAI: <\/strong><em><a href=\"https:\/\/futurism.com\/gpt-5-sucks\">GPT-5 Users Say It Seriously Sucks<\/a><\/em><\/p>\n<p>The post <a href=\"https:\/\/futurism.com\/gpt-5-demo-dumb-errors\">GPT-5 Launch Demo Plagued With Catastrophically Dumb Errors<\/a> appeared first on <a href=\"https:\/\/futurism.com\/\">Futurism<\/a>.<\/p>\n<\/div>\n<div style=\"margin-top: 0px; margin-bottom: 0px;\" class=\"sharethis-inline-share-buttons\" ><\/div>","protected":false},"excerpt":{"rendered":"<p>OpenAI&#8217;s GPT-5 is finally here and already powering ChatGPT, but it hasn&#8217;t made a great first impression. In a livestream dedicated to the release, OpenAI tried to show off its&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[177,196,2935,179],"tags":[],"class_list":["post-4370","post","type-post","status-publish","format-standard","hentry","category-artificial-intelligence","category-chatgpt","category-gpt-5","category-openai"],"_links":{"self":[{"href":"https:\/\/musictechohio.online\/site\/wp-json\/wp\/v2\/posts\/4370","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/musictechohio.online\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/musictechohio.online\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/musictechohio.online\/site\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/musictechohio.online\/site\/wp-json\/wp\/v2\/comments?post=4370"}],"version-history":[{"count":0,"href":"https:\/\/musictechohio.online\/site\/wp-json\/wp\/v2\/posts\/4370\/revisions"}],"wp:attachment":[{"href":"https:\/\/musictechohio.online\/site\/wp-json\/wp\/v2\/media?parent=4370"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/musictechohio.online\/site\/wp-json\/wp\/v2\/categories?post=4370"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/musictechohio.online\/site\/wp-json\/wp\/v2\/tags?post=4370"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}