# Project Fetch Phase Two — physical-agent source note (2026-06-22) ## Item added - **Title:** Project Fetch: Phase two - **Source:** Anthropic Research / Frontier Red Team - **Publication date:** 2026-06-18 - **Authors listed by source:** Michael Ilie, C. Daniel Freeman, and Kevin K. Troy - **URL:** https://www.anthropic.com/research/project-fetch-phase-two - **Managing Expectations label:** company technical red-team report / physical-agent safety warning - **Public article:** `blog/articles/project-fetch-phase-two-physical-agent-safety.html` ## Why this was selected for the weekly AI Papers Library update Anthropic’s Frontier Red Team item is meaningful for the library because it moves the AI-agent discussion from screens, coding tasks and cyber workflows toward **off-the-shelf physical tools**. The key public-interest point is not that robotics is “solved”; it is that a general-purpose model plus Claude Code was reported to complete several previously human-assisted robodog setup/control tasks much faster than the August 2025 human teams. ## Source-grounded facts to preserve - Anthropic describes the post as a **Frontier Red Team** update. - The post says Anthropic previously ran an August 2025 experiment in which employee teams used Claude, or did not use Claude, to perform tasks with an off-the-shelf robotic quadruped. - In the phase-two update, Anthropic says the researcher’s role was limited to plugging a laptop running Claude Code into the robot, entering the initial prompt, approving commands, and approving movement to the next task. - Anthropic reports that **Claude Opus 4.7, operating without human assistance, was about 20 times faster than the fastest human team at all tasks completed by participants less than a year earlier**. - Anthropic also states that this does **not** mean LLMs have solved robotics, and says the models still struggled with the precise “fetching”/beach-ball movement part of the test. - Anthropic frames the larger implication as progress toward models using off-the-shelf physical tools for limited purposes, with more research needed on bespoke physical-tool use, control policies, and robotic-system design. ## Editorial caveats - Treat this as a **company-run red-team experiment**, not an independent robotics benchmark. - Do not state that Claude has solved robotics, general physical autonomy, or real-world embodied safety. - Do not translate speed on a warehouse-style robodog task into claims about industrial robots, drones, weapons, vehicles, homes or hospitals without separate evidence. - Preserve the risk framing: this is a warning signal about AI agents plus physical affordances, not a verdict that physical-agent risk is imminent in every setting. ## Sources checked during this weekly run Primary / near-primary sources checked: 1. Anthropic Research feed — https://www.anthropic.com/research Latest items visible included Project Fetch: Phase two (2026-06-18), Agentic coding and persistent returns to expertise (2026-06-16), Paving the way for agents in biology (2026-06-08), Measuring LLMs’ impact on N-day exploits (2026-06-08), Making Claude a chemist (2026-06-05), and AI-enabled cyber-threat mapping items. 2. Anthropic Project Fetch page — https://www.anthropic.com/research/project-fetch-phase-two 3. Anthropic Claude Code expertise report — https://www.anthropic.com/research/claude-code-expertise 4. Anthropic N-day exploits report — https://www.anthropic.com/research/n-days 5. OpenAI RSS / sitemap sources — https://openai.com/news/rss.xml and OpenAI research/safety/publication sitemaps. Noted recent items included Deployment Simulation (2026-06-16), LifeSciBench (2026-06-17), and AI chemist / health-intelligence posts, but no single OpenAI item was selected over Project Fetch for this run. 6. Google DeepMind news page — https://deepmind.google/discover/blog/ Noted recent Responsibility & Safety items including “Securing the future of AI agents” and “Investing in multi-agent AI safety research.” 7. LawZero home/research pages — https://lawzero.org/en and https://lawzero.org/en/research Latest research page still highlighted the May 2026 Scientist AI and activation-noise/dropout items; no newer research item was selected. ## Plain-English takeaway Project Fetch Phase Two belongs in the library because it is a useful, source-visible marker of **agentic AI crossing into physical tooling**. The sober reading: models are getting better at writing code and choosing interfaces for unfamiliar hardware, while still failing at parts of embodied control that require physical nuance. That combination is exactly why physical-agent safety needs careful evidence tracking instead of hype.