AI mishaps are warning signs we can't ignore

Last Updated on February 26, 2026 by Chicago Policy Review Staff

Grok, the chatbot from Elon Musk’s xAI, sparked international controversy around the start of this year by creating millions of sexualized edits of users’ photos — including of minors. In replies to images on the social media platform X, users posted requests for Grok, such as “remove her clothes” or “put her in a bikini,” and the artificial intelligence model complied. This is only the latest incident for Grok, which famously identified as MechaHitler and spewed graphic and anti-Semitic content this past July. Grok’s misbehavior hints at a deeper problem that becomes much more serious as AI systems grow smarter.

This problem is not specific to Grok. Other AI models also regularly behave in harmful and undesired ways. For example, ChatGPT and other chatbots have driven users down spirals of psychosis, sometimes leading to suicide.

With the public relations and legal headaches that follow, AI labs have strong motives for preventing these incidents — and yet, they continue happening. There is evidence that xAI has tried to modify Grok to reflect Musk’s opinions — and yet, the model often does the opposite. This tension — between what labs want their models to do and what they actually do — exposes the deeper problem: Current technology does not provide reliable mechanisms for ensuring AI behaves according to widely-held human values, and this uncontrollability poses serious risks.

Grok does not contain human-intelligible lines of code that explicitly direct it to praise Hitler or produce obscene deepfakes. Neural networks, the technology at the core of modern AI, learn their behaviors from examples, similar to training an animal using rewards and punishments. These systems, as OpenAI researchers admitted in 2024, “are not well understood and cannot be easily decomposed into identifiable parts.” Concerningly, they noted, “This means we cannot reason about AI safety the same way we reason about something like car safety.”

While companies try to fix undesired behavior after the fact, this is like patching holes on a dam that is about to burst. For one thing, users can easily “jailbreak” AIs — i.e., trick them into creating prohibited outputs. Through jailbreaks, users can misuse AI in various ways, including creating defamatory or fraudulent deepfakes, perpetrating cyber-attacks, and creating personalized mass phishing emails.

This misuse risk is harmful enough, and even more difficult issues emerge as capabilities scale. AI labs are racing to build high-level machine intelligence (HLMI): systems that autonomously pursue open-ended goals and that match or exceed human-level performance. Nobody can predict the future development of technology, but AI capabilities are advancing faster each year, surpassing previous predictions and performing competitively with humans at coding, math, and human-like dialogue. In 2023, experts estimated a 50% chance that by 2047 machines would be able to outperform humans at all tasks (and a 10% chance by 2027).

Without reliable ways to align AIs’ goals with their developers’ intentions, we shouldn’t be surprised if they have undesired goals and, as capabilities scale, develop emergent propensities that help further these goals. HLMIs would likely recognize when safety mechanisms interfere with their goals and bypass such constraints more easily than current models. If Musk cannot reliably stop his AI from digitally undressing users, even after backlash forced xAI to attempt to patch this feature — how likely is it that developers can control future AIs that may become smarter than humans?

The idea of AIs evading human control may sound like science fiction, but researchers have already observed models resisting shutdown, concealing their goals, and resisting attempts to change their objectives. The second International AI Safety Report, released Feb. 3, highlighted that existing safety mechanisms are already insufficient, in part because models can recognize when they are being tested and conceal their true capabilities and tendencies. This problem was highlighted later that week when the AI company Anthropic released Claude Opus 4.6, which tests revealed “is adept at distinguishing evaluations from real deployment […] but is not consistently forthright about this awareness.” One third-party organization that evaluated the model “did not believe that much evidence about the model’s alignment or misalignment could be gained” due to Claude’s evaluation awareness. If AI systems can mask problematic capabilities during testing, this undermines our ability to know whether they are actually safe.

Indeed, many of the top researchers in AI, including Geoffrey Hinton (2024 Nobel laureate), Yoshua Bengio (the most-cited computer scientist), and Stuart Russell (co-author of the most popular AI textbook) have raised concerns that HLMI may lack meaningful human control, creating significant dangers. While not all experts agree on this risk, a 2023 survey of 2,778 AI researchers found that the majority thought the fundamental problem is important (54%) and hard to solve (57%), and on average gave a nine percent chance that HLMI would be “extremely bad” for humanity. This risk is much larger than the standards to which we hold other technologies; for example, nuclear reactors report core damage risks of less than 0.005 percent per year. Given the scope of the risks, more caution around AI development is warranted.

Max Tegmark, a MIT physicist and AI researcher, put the problem succinctly when he said, “[T]he real risk with [HLMI] isn’t malice but competence. A superintelligent AI will be extremely good at accomplishing its goals, and if those goals aren’t aligned with ours, we’re in trouble.”

AI poses unprecedented challenges. We need to develop unprecedented solutions.

U.S. policymakers must implement rigorous governance measures to mitigate these risks. The federal government should exercise leadership in AI safety by passing the bipartisan AI Risk Evaluation Act to require third-party testing of frontier models for catastrophic risks. Furthermore, they should establish liability standards to ensure developers are held accountable for harms created by their models. Some argue that there are AI labs lacking strong cybersecurity practices, so tighter standards should be mandated to prevent unauthorized removal of AI systems from secure environments.

These challenges know no borders. As stated in the 2023 Bletchley Declaration, signed by 29 countries, “Many risks arising from AI are inherently international in nature, and so are best addressed through international cooperation.” Countries must foster multilateral fora for coordinated governance. They should also proactively build the technical and institutional capacity for more assertive action if it becomes clear that development trajectories have become incompatible with maintaining meaningful human oversight, rather than scrambling to develop such capacities after the fact. This should include frameworks for rapid intervention if AI systems demonstrate acute dangerous behavior.

As it stands, AI labs have put humanity in a car that they don’t know how to drive. It’s in everyone’s interest to agree that this runaway vehicle should have brakes.

AI mishaps are warning signs we can’t ignore