The Co-opting of Safety
We dig into how the concept of AI "safety" has been co-opted and weaponized by tech companies. Starting with examples like Mecha-Hitler Grok, we explore how real safety engineering differs from AI "alignment," the myth of the alignment tax, and why this semantic confusion matters for actual safety.
Links
- (00:00) - Intro
- (00:21) - Mecha-Hitler Grok
- (10:07) - "Safety"
- (19:40) - Under-specification
- (53:56) - This time isn't different
- (01:01:46) - Alignment Tax myth
- (01:17:37) - Actually making AI safer
Links
- JMLR article - Underspecification Presents Challenges for Credibility in Modern Machine Learning
- Trail of Bits paper - Towards Comprehensive Risk Assessments and Assurance of AI-Based Systems
- SSRN paper - Uniqueness Bias: Why It Matters, How to Curb It
Additional Referenced Papers
- NeurIPS paper - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
- ICML paper - AI Control: Improving Safety Despite Intentional Subversion
- ICML paper - DarkBench: Benchmarking Dark Patterns in Large Language Models
- OSF preprint - Current Real-World Use of Large Language Models for Mental Health
- Anthropic preprint - Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Inciting Examples
- ars Technica article - US government agency drops Grok after MechaHitler backlash, report says
- The Guardian article - Musk’s AI Grok bot rants about ‘white genocide’ in South Africa in unrelated chats
- BBC article - Update that made ChatGPT 'dangerously' sycophantic pulled
Other Sources
- London Daily article - UK AI Safety Institute Rebrands as AI Security Institute to Focus on Crime and National Security
- Vice article - Prominent AI Philosopher and ‘Father’ of Longtermism Sent Very Racist Email to a 90s Philosophy Listserv
- LessWrong blogpost - "notkilleveryoneism" sounds dumb (see comments)
- EA Forum blogpost - An Overview of the AI Safety Funding Situation
- Book by Dmitry Chernov and Didier Sornette - Man-made Catastrophes and Risk Information Concealment
- Euronews article - OpenAI adds mental health safeguards to ChatGPT, saying chatbot has fed into users’ ‘delusions’
- Pleias website
- Wikipedia page on Jaywalking
Creators and Guests
