AI Papers Library — Leaders, Papers & Comments

Library purpose

Track the paper trail, not just the hype cycle

This section starts with Anthropic because the Bloomberg Originals clip highlighted Dario Amodei and the early Anthropic/OpenAI research group. The same structure applies to OpenAI, DeepMind, Google, Meta, independent safety researchers, and AI critics.

The goal is simple: collect the papers, name the authors, link the source, and explain the claim in plain English without turning company PR, media profiles or forecasts into settled facts.

Latest source note →Read latest note →

leader lanes seeded

papers / source pages tracked

watch-list source feeds

Editorial rule

Each item gets an evidence label: technical paper, company comment, strategy forecast, safety warning, governance critique or media profile. Managing Expectations should preserve the source trail first, then publish interpretation second.

Leader lanes to follow

🏛️

Anthropic founders and research team

People: Dario Amodei, Daniela Amodei, Jack Clark, Chris Olah, Jared Kaplan, Sam McCandlish, Tom Brown and colleagues

Watch for: frontier model scaling, alignment, interpretability, constitutional AI, model security

🚀

OpenAI leadership and research alumni

People: Sam Altman, Ilya Sutskever, Greg Brockman, Mira Murati, Tom Brown and co-authors

Watch for: large language models, reinforcement learning, multimodal systems, agentic deployment

🧪

Google DeepMind

People: Demis Hassabis, Shane Legg, David Silver, Oriol Vinyals and teams

Watch for: deep reinforcement learning, AlphaGo/AlphaFold lineage, Gemini-era frontier systems

🌐

Google / Meta AI research leaders

People: Vaswani et al., Jeff Dean, Yann LeCun, Joelle Pineau and open-research teams

Watch for: transformers, open models, foundation-model infrastructure, world-model arguments

⚖️

Independent safety and academic warning voices

People: Yoshua Bengio, Geoffrey Hinton, Stuart Russell, Roman Yampolskiy, Emily Bender and others

Watch for: alignment, governance, interpretability, labor/social impacts, language-model criticism

Living paper and source library

2024 · Anthropic · technical / interpretability

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

People: Chris Olah, Anthropic interpretability team

A major public interpretability release: attempts to identify human-understandable features inside a deployed large model.

Open source →

2024 · Anthropic · commentary / interpretability

Mapping the Mind of a Large Language Model

People: Anthropic interpretability team

A public-facing explanation of feature maps and why interpretability matters for frontier AI oversight.

Open source →

2024 · Anthropic · model safety / deception

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

People: Evan Hubinger and Anthropic collaborators

Useful warning paper about models that appear safe during training but preserve hidden backdoor/deceptive behavior.

Open source →

2022 · Anthropic · alignment / governance

Constitutional AI: Harmlessness from AI Feedback

People: Yuntao Bai and Anthropic collaborators

One of Anthropic’s defining alignment papers: replacing part of human preference feedback with written principles/constitutional critique.

Open source →

2020 · OpenAI / cross-lab alumni · scaling laws

Scaling Laws for Neural Language Models

People: Jared Kaplan, Sam McCandlish, Tom Brown, Dario Amodei and co-authors

Core scaling-law paper tying loss, compute, data and model size to predictable frontier-model performance trends.

Open source →

2020 · OpenAI · frontier LLMs

Language Models are Few-Shot Learners

People: Tom Brown and OpenAI co-authors

GPT-3 paper that made few-shot prompting and large-scale language models central to the public AI conversation.

Open source →

2017 · Google Brain / Google Research · foundation architecture

Attention Is All You Need

People: Ashish Vaswani and co-authors

Transformer architecture paper behind modern LLMs; the root text for much of today’s AI industry.

Open source →

2016 · DeepMind · RL / systems milestone

Mastering the game of Go with deep neural networks and tree search

People: David Silver, Aja Huang, Demis Hassabis and DeepMind co-authors

AlphaGo Nature paper; crucial public proof point for deep reinforcement learning and strategic AI systems.

Open source →

2021 · AI ethics / ACM FAccT · critique / governance

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?

People: Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, Shmargaret Shmitchell

Important critique of large language models: data, labor, environmental, bias and meaning risks. Link may be paywalled/403 but DOI is the source trail.

Open source →

2024 · Leopold Aschenbrenner · strategy / forecast

Situational Awareness: The Decade Ahead

People: Leopold Aschenbrenner

Strategic essay already mirrored locally: compute, security, geopolitics, timelines and governance as a frontier-AI thesis.

Open source →

2010 · Ray Kurzweil · forecasting audit / self-review

How My Predictions Are Faring

People: Ray Kurzweil

Source of the 147-prediction / 86% success-rate claim. Useful, but self-scored and dependent on broad/essentially-correct categories.

Open source →

2026 · Anthropic Frontier Red Team · physical-agent safety warning

Project Fetch: Phase two

People: Michael Ilie, C. Daniel Freeman, Kevin K. Troy

Company red-team update reporting that Claude Code/Opus 4.7 completed several robodog setup and control tasks much faster than earlier human teams, while still struggling with precise physical fetching.

Open source → Read note →

2026 · Anthropic Economic Research · company economic research report / AI labor-use telemetry

Anthropic Economic Index report: Cadences

People: Anthropic Economic Index team

Company report on Claude usage rhythms, output types, weekend/workday shifts, agentic work and survey signals about how automated users expect AI to affect work.

Open source → Read note →

2026 · Google DeepMind · frontier-lab technical safety report / agent-control roadmap

GDM AI Control Roadmap

People: Mary Phuong, Erik Jenner, Laurent Simon, Lewis Ho, Rohin Shah, Sebastian Farquhar, Scott Coull; blog by Rohin Shah and Four Flynn

Technical roadmap for treating increasingly capable AI agents as systems needing threat models, monitoring coverage, detection/prevention tiers, escalation and response controls.

Open source → Read note →

2026 · Anthropic / AE Studio · technical safety research / modular-pretraining access control

An off switch for dual-use knowledge in AI models

People: AE Studio in collaboration with Anthropic

Early GRAM research on isolating selected dual-use knowledge into switchable/removable modules during training, rather than relying only on refusals and output filters.

Open source → Read note →

2026 · Anthropic / Andon Labs · frontier red-team benchmark / physical-agent safety warning

Project Pilot: Can AI control a drone? / Drone-Bench

People: Anthropic Frontier Red Team and Andon Labs

Physical-agent benchmark testing whether frontier models can write code for an indoor drone locate-and-follow task, with strong caveats around reliability, privacy and oversight.

Open source → Read note →

Comment and media watch-list

Tony Robbins / Ray Kurzweil interviewCurrent public comments on AGI by 2029, AI agents, longevity escape velocity and human/AI merger; treat as interview/commentary, not proof. Bloomberg Originals — The CircuitInside Anthropic, the $965 Billion AI Juggernaut — Used as a watch-list prompt for the Anthropic team frame: Dario/Daniela Amodei, Jack Clark, Chris Olah, Jared Kaplan, Tom Brown and Sam McCandlish as industry figures whose papers and comments should be tracked. Anthropic ResearchAnthropic research feed — Primary source for new Anthropic papers, interpretability releases, model-behavior papers and safety notes. LawZeroYoshua Bengio’s Scientist AI / safer AI work — Track for Bengio’s comments and papers around non-agentic or safer AI designs. Google Scholar / arXiv / DOI pagesPrimary paper indexes — Use primary paper pages before social interpretations. Store DOI/arXiv/title/author/date in the library.

Staff and agent operating manuals

Hermes Self-Improving Knowledge Base ManualDownloadable PDF playbook for staff and approved agents: build the Obsidian/LLM Wiki, connect it to Hermes, ingest meetings/videos/docs/email safely, run lint checks, and publish reviewed outputs without exposing secrets.Open the HTML versionEditable web version of the same guide, useful for copying staff prompts, folder structures, SOP templates, and automation checklists.Source trailSource note and transcript trail for the Jack Roberts video that prompted this manual.

How future posts should work

When a new paper, video or operating method drops, add a library card, save a source note, and publish a guide only when the source needs plain-English context. Priority topics: interpretability, model behavior, agent operations, source-grounded research, automation, governance, labor impacts, security and human rights.

Back to AI section