The Co-opting of Safety

We dig into how the concept of AI "safety" has been co-opted and weaponized by tech companies. Starting with examples like Mecha-Hitler Grok, we explore how real safety engineering differs from AI "alignment," the myth of the alignment tax, and why this semantic confusion matters for actual safety.


Links
  • JMLR article - Underspecification Presents Challenges for Credibility in Modern Machine Learning
  • Trail of Bits paper - Towards Comprehensive Risk Assessments and Assurance of AI-Based Systems
  • SSRN paper - Uniqueness Bias: Why It Matters, How to Curb It
Additional Referenced Papers
  • NeurIPS paper - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
  • ICML paper - AI Control: Improving Safety Despite Intentional Subversion
  • ICML paper - DarkBench: Benchmarking Dark Patterns in Large Language Models
  • OSF preprint - Current Real-World Use of Large Language Models for Mental Health
  • Anthropic preprint - Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Inciting Examples
  • ars Technica article - US government agency drops Grok after MechaHitler backlash, report says
  • The Guardian article - Musk’s AI Grok bot rants about ‘white genocide’ in South Africa in unrelated chats
  • BBC article - Update that made ChatGPT 'dangerously' sycophantic pulled
Other Sources
  • London Daily article - UK AI Safety Institute Rebrands as AI Security Institute to Focus on Crime and National Security
  • Vice article - Prominent AI Philosopher and ‘Father’ of Longtermism Sent Very Racist Email to a 90s Philosophy Listserv
  • LessWrong blogpost - "notkilleveryoneism" sounds dumb (see comments)
  • EA Forum blogpost - An Overview of the AI Safety Funding Situation
  • Book by Dmitry Chernov and Didier Sornette - Man-made Catastrophes and Risk Information Concealment
  • Euronews article - OpenAI adds mental health safeguards to ChatGPT, saying chatbot has fed into users’ ‘delusions’
  • Pleias website
  • Wikipedia page on Jaywalking
The Co-opting of Safety
Broadcast by