Library purpose
Track the paper trail, not just the hype cycle
This section starts with Anthropic because the Bloomberg Originals clip highlighted Dario Amodei and the early Anthropic/OpenAI research group. The same structure applies to OpenAI, DeepMind, Google, Meta, independent safety researchers, and AI critics.
The goal is simple: collect the papers, name the authors, link the source, and explain the claim in plain English without turning company PR, media profiles or forecasts into settled facts.
Editorial rule
Each item gets an evidence label: technical paper, company comment, strategy forecast, safety warning, governance critique or media profile. Managing Expectations should preserve the source trail first, then publish interpretation second.
2024 · Anthropic · technical / interpretability
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
People: Chris Olah, Anthropic interpretability team
A major public interpretability release: attempts to identify human-understandable features inside a deployed large model.
Open source →
2024 · Anthropic · commentary / interpretability
Mapping the Mind of a Large Language Model
People: Anthropic interpretability team
A public-facing explanation of feature maps and why interpretability matters for frontier AI oversight.
Open source →
2024 · Anthropic · model safety / deception
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
People: Evan Hubinger and Anthropic collaborators
Useful warning paper about models that appear safe during training but preserve hidden backdoor/deceptive behavior.
Open source →
2022 · Anthropic · alignment / governance
Constitutional AI: Harmlessness from AI Feedback
People: Yuntao Bai and Anthropic collaborators
One of Anthropic’s defining alignment papers: replacing part of human preference feedback with written principles/constitutional critique.
Open source →
2020 · OpenAI / cross-lab alumni · scaling laws
Scaling Laws for Neural Language Models
People: Jared Kaplan, Sam McCandlish, Tom Brown, Dario Amodei and co-authors
Core scaling-law paper tying loss, compute, data and model size to predictable frontier-model performance trends.
Open source →
2020 · OpenAI · frontier LLMs
Language Models are Few-Shot Learners
People: Tom Brown and OpenAI co-authors
GPT-3 paper that made few-shot prompting and large-scale language models central to the public AI conversation.
Open source →
2017 · Google Brain / Google Research · foundation architecture
Attention Is All You Need
People: Ashish Vaswani and co-authors
Transformer architecture paper behind modern LLMs; the root text for much of today’s AI industry.
Open source →
2016 · DeepMind · RL / systems milestone
Mastering the game of Go with deep neural networks and tree search
People: David Silver, Aja Huang, Demis Hassabis and DeepMind co-authors
AlphaGo Nature paper; crucial public proof point for deep reinforcement learning and strategic AI systems.
Open source →
2021 · AI ethics / ACM FAccT · critique / governance
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
People: Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, Shmargaret Shmitchell
Important critique of large language models: data, labor, environmental, bias and meaning risks. Link may be paywalled/403 but DOI is the source trail.
Open source →
2024 · Leopold Aschenbrenner · strategy / forecast
Situational Awareness: The Decade Ahead
People: Leopold Aschenbrenner
Strategic essay already mirrored locally: compute, security, geopolitics, timelines and governance as a frontier-AI thesis.
Open source →
2010 · Ray Kurzweil · forecasting audit / self-review
How My Predictions Are Faring
People: Ray Kurzweil
Source of the 147-prediction / 86% success-rate claim. Useful, but self-scored and dependent on broad/essentially-correct categories.
Open source →
How future posts should work
When a new paper, video or operating method drops, add a library card, save a source note, and publish a guide only when the source needs plain-English context. Priority topics: interpretability, model behavior, agent operations, source-grounded research, automation, governance, labor impacts, security and human rights.
Back to AI section