Resource Center
For machine learning researchers
Risks from Advanced AI
- FAQ on Catastrophic AI Risks by Yoshua Bengio (2023)
- The Alignment Problem from a Deep Learning Perspective (Ngo et al., 2022), Twitter thread
- Why I Think More NLP Researchers Should Engage with AI Safety Concerns by Sam Bowman (2022)
- Researcher Perceptions of Current and Future AI by Vael Gates (2022)
- More is Different for AI by Jacob Steinhardt (2022)
Technical Research
- Goal Misgeneralization: Why correct specifications aren't enough for correct goals (Shah et al., 2022)
- Specification gaming: the flip side of AI ingenuity (Krakovna et al., 2020)
- Mechanistic interpretability: In-context Learning and Induction Heads (Olsson et al., 2022), Locating and Editing Factual Associations in GPT (Meng et al., 2022)
- Discovering Latent Knowledge in Language Models Without Supervision (Burns et al., 2022), Twitter thread
- Agendas: Unsolved Problems in ML Safety (Hendrycks et al., 2022), Concrete Problems in AI safety (Amodei et al., 2016)
Other
- ∗ AI Safety Fundamentals Curriculum (best in-depth resource)
- Alignment Newsletter and ML Safety Newsletter
Interested in doing AI alignment research?
- Learn about the organizations and researchers in the space, funding and job opportunities, and guides to get involved at What can I do?
For a general audience
- ∗ The Case For Taking AI Seriously As A Threat to Humanity by Kelsey Piper (2020)
- The Alignment Problem by Brian Christian (2020)
- Existential Risk from Power-Seeking AI by Joe Carlsmith (2021)
- Why AI Alignment Could be Hard with Modern Deep Learning by Ajeya Cotra (2021)
- 80,000 Hours Podcast: Preventing an AI-related Catastrophe (2022)
- The Most Important Century by Holden Karnofsky (podcast, summaries, various articles)
- AI Safety YouTube channel by Robert Miles