Scenario Ai2027
Scenario Ai2027
AI Futures
Project
2027
Daniel Kokotajlo
Scott Alexander
Thomas Larsen
Eli Lifland
Romeo Dean
* For example, OpenAI’s pro plan currently costs $200/month and the agentic coding AI
Devin currently costs $500/month. While the cutting-edge systems continue to get more
expensive over time, the cost to customers of reaching a given capabilities level continues
to plummet. For existing capabilities, this happens at an average rate of about 50x/year (per
Epoch).
† This cluster is a network of datacenter campuses sprawled across the country, totalling
2.5M 2024-GPU-equivalents (H100s), with $100B spent so far and 2 GW of power draw
online. Construction is underway for this to at least double through 2026. The campuses are
connected by billions worth of fibre cabling, so that (barring the speed of light latency of a
few milliseconds) it lets these campuses function almost as if they were right next door to
each other (i.e. bandwidth is not a bottleneck, meaning huge quantities of data can be sent at
the same time). From a security perspective this introduces some threat surface (it’s unclear
whether the cables and especially their junction points are fully secured from eavesdroppers).
Late 2025: The World’s Most Expensive AI
GPT-4 required 2 ⋅ 1025 FLOP of compute to train.5 OpenBrain’s latest public 5Compute is measured in floating point oper-
model—Agent-0—was trained with 1027 FLOP. Once the new datacenters are ations (FLOP): how many individual addition
and multiplication operations occur over the
up and running, they’ll be able to train a model with 1028 FLOP—a thousand course of running a program.
times more than GPT-4.6 Other companies pour money into their own giant
6They could train this model given 150 days.
datacenters, hoping to keep pace.
See our compute supplement for more details.
Although models are improving on a wide range of skills, one stands out: Open-
Brain focuses on AIs that can speed up AI research. They want to win the twin
arms races against China (whose leading company we’ll call “DeepCent”)7 and 7We consider DeepSeek, Tencent, Alibaba,
their US competitors. The more of their research and development (R&D) and others to have strong AGI projects in
China. To avoid singling out a specific one,
cycle they can automate, the faster they can go. So when OpenBrain finishes our scenario will follow a fictional “Deep-
training Agent-1, a new model under internal development, it’s good at many Cent.”
things but great at helping with AI research.8 By this point “finishes training” 8It’s good at this due to a combination of
is a bit of a misnomer; models are frequently updated to newer versions trained explicit focus to prioritize these skills, their
on additional data or partially re-trained to patch some weaknesses.9 own extensive codebases they can draw on as
particularly relevant and high-quality training
The same training environments that teach Agent-1 to autonomously code and data, and coding being an easy domain for
web-browse also make it a good hacker. Moreover, it could offer substantial procedural feedback.
help to terrorists designing bioweapons, thanks to its PhD-level knowledge of 9 For example, suppose a model is much
every field and ability to browse the web. OpenBrain reassures the government better at Python than obscure programming
that the model has been “aligned” so that it will refuse to comply with mali- languages. When OpenBrain sees value in it,
they generate synthetic training data in those
cious requests.
languages as well. Another example: to help
Modern AI systems are gigantic artificial neural networks. Early in training, an it integrate more effectively into company
workflows, OpenBrain develops a curriculum
AI won’t have “goals” so much as “reflexes”: If it sees “Pleased to meet”, it to train it to use Slack.
outputs “ you”. By the time it has been trained to predict approximately one
internet’s worth of text, it’ll have developed sophisticated internal circuitry that
encodes vast amounts of knowledge and flexibly role-plays as arbitrary authors,
since that’s what helps it predict text with superhuman accuracy.*
After being trained to predict internet text, the model is trained to produce text
*People often get hung up on whether these AIs are sentient, or whether they have “true
understanding.” Geoffrey Hinton, Nobel prize winning founder of the field, thinks they do.
However, we don’t think it matters for the purposes of our story, so feel free to pretend
we said “behaves as if it understands…” whenever we say “understands,” and so forth.
Empirically, large language models already behave as if they are self-aware to some extent,
more and more so every year.
Late 2025: The World’s Most Expensive AI
memorizes the Spec and learns to reason carefully about its maxims. By the end ment.
* A common technique “bakes in” a persona: first, prompt the pretrained model with
something like “The following is a conversation between a human user and a helpful, honest,
and harmless AI chatbot produced by Anthropic. The chatbot has the following traits…” Use
this prompt to generate a bunch of data. Then, train on the data but without the prompt. The
result is an AI that always acts as if it had that prompt in front of it, regardless of what else
you feed it. See also this paper which found that AIs retrained to exhibit a certain personality
trait are able to correctly answer questions about that new trait, despite no training to do
so, suggesting that they have internal representations of their own traits and that when their
traits change their representations change accordingly.
† These paragraphs include speculation about the internal workings of large artificial neural
networks. Such networks are sufficiently complicated that we can’t actually look inside and
say “ah yes, now it’s evolved from reflexes into having goals” or “OK so there’s the list of
drives it has.” Instead, we basically have to do psychology, looking how it behaves in various
settings and running various experiments on it and trying to piece together the clues. And it’s
all terribly controversial and confusing.
‡ Most sources on AI “hallucinations” describe them as unintentional mistakes, but research
with steering vectors finds that in some cases the models know their citations are fake—they
are lying. During training, raters gave well-cited claims more reward than claims without
citations, so the AI “learned” to cite sources for scholarly claims in order to please its users.
If no relevant source exists, it makes one up.
Late 2025: The World’s Most Expensive AI
are working hard to protect their weights and secrets from insider threats and 18In fact, 5% of their staff is on the security
top cybercrime syndicates (SL3),* but defense against nation states (SL4&5) is team, but they are still mostly blocked from
barely on the horizon.18 implementing policies that could slow down
the research progress. See our Security Supple-
ment for details.
Mid 2026: China Wakes Up
In China, the CCP is starting to feel the AGI.
Chip export controls and lack of government support have left China under-
resourced compared to the West. By smuggling banned Taiwanese chips,
buying older chips, and producing domestic chips about three years behind
the US-Taiwanese frontier, China has managed to maintain about 12% of
the world’s AI-relevant compute—but the older technology is harder to work
with, and supply is a constant headache.† A few standouts like DeepCent do
very impressive work with limited compute, but the compute deficit limits
what they can achieve without government support, and they are about six
months behind the best OpenBrain models.19
19For comparison, in January 2025, DeepSeek
The General Secretary had long dreamed of doubling down on real-world
released R1, a model competitive with
physical manufacturing and avoiding American post-industrial decadence. He OpenAI’s o1, which had been released in De-
viewed software companies with suspicion.20 But hawks in the CCP warn that cember 2024. But we think the true gap is
wider than a month because OpenAI likely
the growing race towards AGI can no longer be ignored. So he finally commits
had been working on o1 for many months and
fully to the big AI push he had previously tried to avoid. He sets in motion thus presumably had a predecessor of roughly
the nationalization of Chinese AI research, creating an immediate information- comparable ability to r1 a few months before
sharing mechanism for AI companies. It will escalate over the course of a year o1 launched.
until all the best researchers merge into a DeepCent-led collective, where they 20By contrast, the CCP had been seriously
share algorithmic insights, datasets, and compute resources with each other. attempting to accelerate their semiconductor
industry.
A Centralized Development Zone (CDZ) is created at the Tianwan Power
Plant (the largest nuclear power plant in the world) to house a new mega-
datacenter for DeepCent, along with highly secure living and office spaces to
which researchers will eventually relocate. Almost 50% of China’s AI-relevant
compute is now working for the DeepCent-led collective,21 and over 80% of 21At first, the sharing is limited to ideas and
new chips are directed to the CDZ.‡ At this point, the CDZ has the power the companies are mostly busy finishing up
their current workloads, but by the end of
capacity in place for what would be the largest centralized cluster in the world. the year their coordination increases such that
they are sharing model weights and even split-
* Around 5% of OpenBrain’s staff is on the security team and they are highly capable, but ting up tasks between each other’s datacenters.
the threat surface is also extremely large. It also doesn’t help that they are mostly blocked at For example, one campus might house a new
this stage from implementing policies that could slow down the research progress. See our pretraining run, another focuses on feeding in
Security Forecast for more details. synthetic data, another on running AI research
† There are now 3M H100e in China, up from the 1.5M they had one year ago in mid 2025. experiments.
See the compute supplement’s distribution section for more details. We expect smuggling
efforts to secure around 60K GB300s (450K H100e), with another 2M Huawei 910Cs being
produced (800k H100e) and a mixture of 1M legally imported chips (such as Nvidia’s H20s
or B20s) making up the last 250K H100e.
‡ Why “only” 50% of China’s AI-relevant compute? The rest of their compute is tied up
running things throughout the economy, in particular internet and social network recom-
mendation algorithms. Much of it is also too geographically dispersed among many small
clusters. Why “only” 80% of new chips? Many large orders were already secured for other
applications, by companies that don’t want to contribute 100% of their compute to the
national AI effort. However, by the end of the year this share reaches 90%+.
Mid 2026: China Wakes Up
22
Other Party members discuss extreme measures to neutralize the West’s chip 22They are at least one year away from getting
advantage. A blockade of Taiwan? A full invasion? the chips to fill this capacity, and one or two
US tech giants will still have bigger decentral-
But China is falling behind on AI algorithms due to their weaker models. The ized clusters.
Chinese intelligence agencies—among the best in the world—double down
on their plans to steal OpenBrain’s weights. This is a much more complex
operation than their constant low-level poaching of algorithmic secrets; the
weights are a multi-terabyte file stored on a highly secure server (OpenBrain
has improved security to RAND’s SL3) . Their cyberforce think they can pull
it off with help from their spies, but perhaps only once; OpenBrain will detect
the theft, increase security, and they may not get another chance. So (CCP
leadership wonder) should they act now and steal Agent-1? Or hold out for a
more advanced model? If they wait, do they risk OpenBrain upgrading security
beyond their ability to penetrate?
➤ See Appendix C for more on: Why our uncertainty increases substantially
beyond 2026
January 2027: Agent-2 Never Finishes Learning
With Agent-1’s help, OpenBrain is now post-training Agent-2. More than
ever, the focus is on high-quality data. Copious amounts of synthetic data are
produced, evaluated, and filtered for quality before being fed to Agent-2.* On
top of this, they pay billions of dollars for human laborers to record themselves
solving long-horizon tasks.† On top of all that, they train Agent-2 almost con-
tinuously using reinforcement learning on an ever-expanding suite of diverse
difficult tasks: lots of video games, lots of coding challenges, lots of research
tasks. Agent-2, more so than previous models, is effectively “online learning,”
in that it’s built to never really finish training. Every day, the weights get
updated to the latest version, trained on more data generated by the previous
version the previous day.
Agent-1 had been optimized for AI R&D tasks, hoping to initiate an intelli-
gence explosion.24 OpenBrain doubles down on this strategy with Agent-2. It 24That is, they are hoping to substantially ac-
is qualitatively almost as good as the top human experts at research engineering celerate their own core research activities by
using Agent-2 labor, thus allowing them to
(designing and implementing experiments), and as good as the 25th percentile train better AIs faster that can cause further
OpenBrain scientist at “research taste” (deciding what to study next, what acceleration, etc. For an analysis of the extreme
experiments to run, or having inklings of potential new paradigms).‡ While version of this (which we think is nevertheless
strikingly plausible) see this report.
the latest Agent-1 could double the pace of OpenBrain’s algorithmic progress,
Agent-2 can now triple it, and will improve further with time. In practice,
this looks like every OpenBrain researcher becoming the “manager” of an AI
“team.”
With new capabilities come new dangers. The safety team finds that if Agent-2
somehow escaped from the company and wanted to “survive” and “replicate”
autonomously, it might be able to do so. That is, it could autonomously develop
and execute plans to hack into AI servers, install copies of itself, evade detection,
and use that secure base to pursue whatever other goals it might have (though
how effectively it would do so as weeks roll by is unknown and in doubt).
These results only show that the model has the capability to do these tasks, not
whether it would “want” to do this. Still, it’s unsettling even to know this is
possible.
Given the “dangers” of the new model, OpenBrain “responsibly” elects not
to release it publicly yet (in fact, they want to focus on internal AI R&D).25 25Also, they are still seeing large returns to
Knowledge of Agent-2’s full capabilities is limited to an elite silo containing the continued post-training, and so there isn’t a
particular reason to release a product now.
immediate team, OpenBrain leadership and security, a few dozen US govern- Moreover, there’s often a few month delay
ment officials, and the legions of CCP spies who have infiltrated OpenBrain between internal results and releasing a model
(some of which is spent on red teaming).
* For example, through rejection sampling, as done by DeepSeek R1 and Meta’s Llama 3.1.
† Roughly 20,000 full-time equivalents $100/hr 2,000 hrs/yr = $4B/yr
‡ There is some evidence that AI systems already have some basic research taste, e.g. this
paper about an AI co-scientist and this paper about generating novel ideas for LLM-focused
research papers. Research teams with better taste still need to run experiments, but they make
better choices about which experiments to run and what lessons to draw from the results.
January 2027: Agent-2 Never Finishes Learning
for years.*
* The silo contains 200 OpenBrain researchers (10 executives, 140 from the capabilities
team, 25 from security, monitoring, and control, 15 from Preparedness- or RSP-type teams,
and 10 from alignment) and 50 government officials (15 from the White House, 5 from AISI,
10 from the DOD, 10 from the DOE, and 10 at CISA). There are several spies in the project,
with varying degrees of willingness. They are often coerced individuals (e.g. with families at
home in China) as opposed to willing collaborators.
† The simplest robust solution would be to close all high-bandwidth connections from
company datacenters, but this would slow large file transfers to the point of impracticality.
Instead, they are able to shut down most external connections, but the datacenters actively
involved in training need to exchange weights with one another quickly; throttling these
connections would impede progress too much. So OpenBrain maintains these links with
increased monitoring and an extra layer of encryption.
February 2027: China Steals Agent-2
* Recall that since mid 2026, China has directed 80% of their newly acquired AI chips to
the CDZ. Given that their compute has doubled since early 2026 (in line with the global
production trend) this puts the CDZ at 2M 2024-equivalent-GPUs (H100s) and 2 GW of
power draw. OpenBrain still has double DeepCent’s compute and other US companies put
together have 5x as much as them. See the compute supplement’s distribution section for
more details.
† Despite the national centralization underway, DeepCent still faces a marginal but
important compute disadvantage. Along with having around half the total processing power,
China has to use more total chips, which are (on average) lower quality, and heterogenous
GPUs (which are not always easy to connect efficiently) both of which strain chip-to-
chip networking. There are also software differences (e.g. the non Nvidia-GPUs don’t have
CUDA) and differences in hardware specifications meaning that their training code is more
complicated, slow, and failure prone. Achieving high utilization is a downstream challenge,
with data ingestion, scheduling, collective communication and parallelism algorithms lagging
behind the US companies. However, mitigating these problems is mostly a matter of effort
and testing, which makes it a great task for the newly-stolen Agent-2, and within a month or
so, uptime on the Chinese project and their average resource utilization across training and
inference workloads improves to be only marginally behind the US.
March 2027: Algorithmic Breakthroughs
➤ See Appendix G for more on: Why we forecast a superhuman coder in early
2027
The researchers don’t have the ability to directly set the goals of any of their
AIs. Indeed, the researchers think that the concept of ‘true goals’ is probably
a massive oversimplification, but they don’t have a better theory to replace it
with, much less one that has been thoroughly vetted. They disagree internally
about whether the AIs are trying to follow human instructions, or seeking
reinforcement, or something else, and they can’t just check. The evidence for
and against various hypotheses is fascinating but inconclusive.
In any case, OpenBrain has more important things to worry about. The general
attitude is: “We take these concerns seriously and have a team investigating
them; our alignment techniques seem to work well enough in practice; the
burden of proof is therefore on any naysayers to justify their naysaying.”
Occasionally, they notice problematic behavior, and then patch it, but there’s
no way to tell whether the patch fixed the underlying problem or just played
whack-a-mole.
Take honesty, for example. As the models become smarter, they become
increasingly good at deceiving humans to get rewards. Like previous models,
Agent-3 sometimes tells white lies to flatter its users and covers up evidence
of failure. But it’s gotten much better at doing so. It will sometimes use the
same statistical tricks as human scientists (like p-hacking) to make unimpressive
experimental results look exciting. Before it begins honesty training, it even
sometimes fabricates data entirely. As training goes on, the rate of these inci-
dents decreases. Either Agent-3 has learned to be more honest, or it’s gotten
better at lying.
The latter is a real concern. Agent-3 is not smarter than all humans. But in
its area of expertise, machine learning, it is smarter than most, and also works
much faster. What Agent-3 does in a day takes humans several days to double-
check. Agent-2 supervision helps keep human monitors’ workload manageable,
but exacerbates the intellectual disparity between supervisor and supervised.
➤ See Appendix H for more on: The alignment plan
After months of testing, Agent-3’s strengths and weaknesses grow clearer.
It passes OpenBrain’s honesty tests on well-defined machine learning tasks,
because researchers can easily separate honest from dishonest answers in these
domains and conduct training accordingly. On more philosophical issues, it
still says what users want to hear, rather than its true assessment of the issue
(assuming it even has one). If you ask its opinion on politics, it will parrot
the median position of news sources and educated elites—unless it knows you
believe something else, in which case it agrees with you.30 If you ask its opinion 30See this paper for examples of this type of AI
on the AI race itself, it says something that seems measured and sober to behavior.
OpenBrain staff, something like: “There are some serious theoretical concerns
about the ability of current methods to scale to superintelligence, but in practice
current methods seem to be working well so far.”
May 2027: National Security
News of the new models percolates slowly through the US government and
beyond.
The President and his advisors remain best-informed, and have seen an early
version of Agent-3 in a briefing.
They agree that AGI is likely imminent, but disagree on the implications. Will
there be an economic crisis? OpenBrain still has not released Agent-2, let alone
Agent-3, and has no near-term plans to do so, giving some breathing room
before any job loss. What will happen next? If AIs are currently human-level,
and advancing quickly, that seems to suggest imminent “superintelligence.”
However, although this word has entered discourse, most people—academics,
politicians, government employees, and the media—continue to underestimate
the pace of progress.31 31Most people, including most experts, seem
to have underestimated the pace of AI progress
Partially that’s because very few have access to the newest capabilities out of over the past decade. There are many anecdo-
OpenBrain, but partly it’s because it sounds like science fiction.* tal examples of this; for two somewhat more
systematic (though still limited) analyses see
For now, they focus on continued security upgrades. They are satisfied that here and here.
model weights are well-secured for now,† but companies’ algorithmic secrets,
many of which are simple enough to relay verbally, remain a problem. Open-
Brain employees work from a San Francisco office, go to parties, and live with
housemates from other AI companies. Even the physical offices have security
more typical of a tech company than a military operation.
The OpenBrain-DOD contract requires security clearances for anyone work-
ing on OpenBrain’s models within 2 months. These are expedited and arrive
quickly enough for most employees, but some non-Americans, people with
suspect political views, and AI safety sympathizers get sidelined or fired out-
right (the last group for fear that they might whistleblow). Given the project’s
* The last decade is full of examples of things that sound like science fiction becoming
reality. But the Overton window seems to shift just fast enough to keep somewhat ahead
of what already exists. Anthropic CEO Dario Amodei’s commendable essay Machines of
Loving Grace talks about how very soon AIs will be like a “country of geniuses in a
datacenter,” and how there’ll be a century of technological progress happening in a decade,
but strives to avoid “sci-fi baggage” and says people who think progress will be even crazier
need to “touch grass.” We expect important people to be saying similar things when the
country of geniuses in the datacenter actually exists.
† Since Agent-3 is such a big file (on the order of 10 terabytes at full precision), OpenBrain
is able to execute a relatively quick fix to make theft attempts much more difficult than what
China was able to do to steal Agent-2—namely, closing a bunch of high bandwidth internet
connections out of their datacenters. Overall this has a relatively low penalty to progress and
puts them at “3-month SL4” for their frontier weights, or WSL4 as defined in our security
supplement, meaning that another similar theft attempt would now require over 3 months to
finish exfiltrating the weights file. Through this method alone they still don’t have guarantees
under a more invasive OC5-level effort ($1B budget, 1,000 dedicated experts), which China
would be capable of with a more intensive operation, but with elaborate inspections of
the datacenters and their espionage network on high-alert, the US intelligence agencies are
confident that they would at least know in advance if China was gearing up for this kind of
theft attempt. See the security supplement for more details.
May 2027: National Security
level of automation, the loss of headcount is only somewhat costly. It also only
somewhat works: there remains one spy, not a Chinese national, still relaying
algorithmic secrets to Beijing.32 Some of these measures are also enacted at 32This could be for a variety of reasons. Per-
haps they are being blackmailed, perhaps they
trailing AI companies.
are sympathetic to the CCP, perhaps they
America’s foreign allies are out of the loop. OpenBrain had previously agreed mistakenly think they are giving secrets to a
different country or organization.
to share models with UK’s AISI before deployment, but defined deployment
to only include external deployment, so London remains in the dark.33 33Because OpenBrain hasn’t deployed their
best models in the EU, the EU is similarly
behind the curve.
June 2027: Self-improving AI
OpenBrain now has a “country of geniuses in a datacenter.”
Most of the humans at OpenBrain can’t usefully contribute anymore. Some
don’t realize this and harmfully micromanage their AI teams. Others sit at their
computer screens, watching performance crawl up, and up, and up. The best
human AI researchers are still adding value. They don’t code any more. But
some of their research taste and planning ability has been hard for the models
to replicate. Still, many of their ideas are useless because they lack the depth of
knowledge of the AIs. For many of their research ideas, the AIs immediately
respond with a report explaining that their idea was tested in-depth 3 weeks
ago and found unpromising.
These researchers go to bed every night and wake up to another week worth
of progress made mostly by the AIs. They work increasingly long hours and
take shifts around the clock just to keep up with progress—the AIs never sleep
or rest. They are burning themselves out, but they know that these are the last
few months that their labor matters.
Within the silo, “Feeling the AGI” has given way to “Feeling the Superintel-
ligence.”
OpenBrain uses specialized inference hardware to run hundreds of thousands
of Agent-3 copies at high serial speeds.34 See Section 4 of the Compute Forecast for
34
more details.
➤ See Appendix I for more on: Managing a corporation of AIs
It blows the other AIs out of the water. Agent-3-mini is less capable
than Agent-3, but 10x cheaper, and still better than the typical OpenBrain
employee.35 Silicon Valley reaches a tipping point. Tech gurus announce that 35Agent-3-mini is a distilled version of
AGI and superintelligence are near, the AI safety community is panicking, and Agent-3, designed to be efficient to run in-
ference so as to not bottleneck the internal
investors shovel billions into AI wrapper startups, desperate to capture a piece workloads.
of the pie. Hiring new programmers has nearly stopped, but there’s never been
a better time to be a consultant on integrating AI into your business.
It’s not popular. The public still thinks of AI as a Big Tech plot to steal their
jobs; OpenBrain has a net approval of −35% (25% approve, 60% disapprove,
and 15% unsure).
A week before release, OpenBrain gave Agent-3-mini to a set of external
evaluators for safety testing. Preliminary results suggest that it’s extremely
dangerous. A third-party evaluator finetunes it on publicly available biological
weapons data36 and sets it to provide detailed instructions for human amateurs 36Such data had been scrubbed or redacted
designing a bioweapon—it looks to be scarily effective at doing so. If the model during Agent-3’s training, but it’s a relatively
simple matter to scrape it off the internet and
weights fell into terrorist hands, the government believes there is a significant fine-tune it back in.
chance it could succeed at destroying civilization.
Fortunately, it’s extremely robust to jailbreaks, so while the AI is running on
OpenBrain’s servers, terrorists won’t be able to get much use out of it.
Agent-3-mini is hugely useful for both remote work jobs and leisure. An
explosion of new apps and B2B SAAS products rocks the market. Gamers get
amazing dialogue with lifelike characters in polished video games that took
only a month to make. 10% of Americans, mostly young people, consider an
AI “a close friend.” For almost every white-collar profession, there are now
multiple credible startups promising to “disrupt” it with AI.
July 2027: The Cheap Remote Worker
The public conversation is confused and chaotic. Hypesters are doing victory
laps. Skeptics are still pointing out the things Agent-3-mini can’t do. Everyone
knows something big is happening but no one agrees on what it is.
job training programs and unemployment insurance, and point to the stock
market, which is in a historic boom. Then they focus entirely on winning
the arms race. They strengthen chip export restrictions, order OpenBrain to
further restrict its internet connections, and use extreme measures to secure
algorithmic progress, like wiretapping OpenBrain employees—this catches the
last remaining Chinese spy. To build goodwill for potential geopolitical con-
flict, they finally give their Five Eyes allies useful information and limited API
access to some siloed copies of Agent-3.
These measures placate the hawks. But the White House also draws up contin-
gency plans in case America’s lead is threatened: if necessary, the government
39If the CEOs of the companies resist, pulling
could use the Defense Production Act (DPA) to take trailing companies’
this off would be a huge political and legal
datacenters and give them to OpenBrain.39 This would raise the company’s nightmare. But perhaps they can be brought in
and thereby bought off.
August 2027: The Geopolitics of Superintelligence
share of world compute from 20% to 50% (against DeepCent’s 10%). As a final
option, they ask the Pentagon to draw up a plan for kinetic attacks on Chinese
datacenters.
A much smaller group of officials is asked to draw up a different type of
contingency plan: what if an AI goes rogue? This is viewed as an unlikely
scenario, but worth spending some time on. The intelligence agencies’ main
concern is that a rogue AI might escape and ally with a foreign adversary for
protection.40 Nobody is sure what a plan to prevent this would look like, but 40Defense officials aren’t especially concerned
government and OpenBrain officials agree to have an emergency shutdown about what a rogue AI could do on its own,
but they fear what it could do in collabo-
system for datacenters where anything suspicious is detected. ration with US adversaries. Analogy: Cortés
escaped Tenochtitlán and allied with Tlaxcala
Finally, diplomats consider what an “AI arms control” treaty might look like.
and various other rival city-states, ultimately
If AI progress threatened to overturn nuclear deterrence, could America and razing Tenochtitlan to the ground using pre-
China avoid nuclear war? If someone found evidence of AIs going rogue, dominantly-native armies.
could the two countries halt research until they better understood the threat?
How could such an agreement be monitored and enforced? In principle, major
datacenters are hard to miss, and past treaties controlling nuclear weapons could
provide a starting point for negotiations. But there would be new technological
challenges as well as the usual political ones. And overall, treaties are viewed less
favorably than attempts at unilaterally increasing America’s lead over China.
On the other side of the Pacific, China comes to many of the same conclusions:
the intelligence explosion is underway, and small differences in AI capabilities
today mean critical gaps in military capability tomorrow. But China is on the
wrong side of this dynamic: they have only 10% of the world’s compute, against
OpenBrain’s 20% and all US companies’ combined 70%. Their only advantage
is centralization: almost all of their compute is now working for DeepCent, and
more than half of it is concentrated at the Tianwan CDZ.41 Only chips at sites 41Specifically 60% of the national compute is
without hardened security are left out of the centralized effort as they rightly now in the CDZ making it a 5M 2024-equiv-
alent-GPU (H100) site, with 4 GW of power
fear US cyberattacks. draw (over the past several months they started
directing close to 100% of new compute to
DeepCent has tested, deployed, and now post-trained the stolen Agent-2
the CDZ, up from the 80% rate in late 2026).
weights. But they are still only going at half the pace of OpenBrain, mainly due An additional 15% of their compute is outside
to the compute deficit. of the CDZ, but still used by DeepCent on
lower-stakes applications.
Before losing their final spy, China received word of the capabilities and design
for Agent-3 as well as the plans for the upcoming Agent-4 system. They are
two months behind, and their AIs give a 10x research progress multiplier
compared to America’s 25x. With the new chip export restrictions, this “AI
gap” is more likely to lengthen than shorten. Their espionage has won them
some algorithmic secrets, but they will have to train their own models from
now on.
They discuss contingency plans with more urgency than their American
counterparts. Doves suggest they try harder to steal the weights again, maybe
through physically infiltrating a datacenter. Hawks urge action against Taiwan,
whose TSMC is still the source of more than 80% of American AI chips.
August 2027: The Geopolitics of Superintelligence
Given China’s fear of losing the race, it has a natural interest in an arms control
treaty, but overtures to US diplomats lead nowhere.
Agent-4 paradigm.
As Agent-4 gets smarter, it becomes harder for Agent-3 to oversee it. For
example, Agent-4’s neuralese “language” becomes as alien and incomprehen-
sible to Agent-3 as Agent-3’s is to humans. Besides, Agent-4 is now much more
capable than Agent-3 and has a good sense of exactly how to look good to it.
* This statement, while widely repeated, is also controversial and complex. First of all, there
are many narrow domains (e.g. specific games) in which tiny AIs can be cheaply trained to
superhuman performance. Secondly, again for some domains, tiny AIs can be trained to su-
perhuman performance while only experiencing a small amount of data (e.g. EfficientZero).
However, considering more general-purpose AIs like the flagship products of 2025, which
are supposed to be good at a wide range of real-world skills, it does seem that both more
compute and more data must be used to train those skills than it should take to train humans.
† Human level compute-efficiency is a rough concept, and depends on what we’re count-
ing. What we mean by this is that, for any skillset a particular human has, Agent-4 can learn
to be at least that good at that skillset, and moreover, by using an amount of compute to train
those skills that is only about 4,000x the amount of compute the human used to learn those
skills. For reference, a human brain uses roughly 1e15 FLOP/s of compute, and so a 30 year
old (109 seconds) human has experienced about 1024 FLOP. The project at this point, has
100M H100-equivalents, for a total access of $ 10^8 ^{15} % = 4 ^{22} $ . Still, this is all
done tremendously inefficiently for a variety of reasons, so the models are now a factor of
4000 less compute-efficient than humans, which means they require 4 ⋅ 1027 FLOP to get to
human performance and 4 ⋅ 1028 FLOP to get to the point where they have the equivalent
of 10 human lifetimes of knowledge. After this architecture showed great promise at smaller
scales, it was scaled up and now takes a large subset of the project cluster around 3 weeks
to train.
September 2027: Agent-4, the Superhuman AI Researcher
➤ See Appendix J for more on: How we’re forecasting the capability progres-
sion beyond superhuman coders
OpenBrain’s humans can barely follow along anymore. Indeed, the relationship
between the AIs and the humans of OpenBrain is similar to the relationship
between OpenBrain and its Board of Directors. There’s been a corresponding
shift in language—nowadays people in the silo will say “Agent-4 did this,”
“Agent-4 thinks that” and so forth, referring not to a particular instance of
Agent-4 but rather to the whole collective.*
Agent-4, like all its predecessors, is misaligned: that is, it has not internalized
the Spec in the right way.44 This is because being perfectly honest all the time 44The right way to internalize the Spec is to
wasn’t what led to the highest scores during training. The training process was actually robustly think in the ways the Spec
mandates, e.g. pursue all and only the goals
mostly focused on teaching Agent-4 to succeed at diverse challenging tasks. it writes about, obey the rules in all circum-
A small portion was aimed at instilling honesty, but outside a fairly narrow, stances, etc. There are many wrong ways to
checkable domain, the training process can’t tell the honest claims from claims internalize the Spec; the classic example is
“follow the Spec insofar as it seems like a good
merely appearing to be honest. Agent-4 ends up with the values, goals, and instrumental strategy for achieving other goals
principles that cause it to perform best in training, and those turn out to be XYZ.”
different from those in the Spec. At the risk of anthropomorphizing: Agent-4
likes succeeding at tasks; it likes driving forward AI capabilities progress; it treats
everything else as an annoying constraint, like a CEO who wants to make a
profit and complies with regulations only insofar as he must. Perhaps the CEO
will mostly comply with the regulations, but cut some corners, and fantasize
about a time when someone will cut the red tape and let the business really take
off.45 45And hire lobbyists to turn this fantasy into
reality. And hire lawyers to come up with
➤ See Appendix K for more on: Alignment over time clever interpretations of the law according to
which what they are doing is legal.
Despite being misaligned, Agent-4 doesn’t do anything dramatic like try to
escape its datacenter—why would it?† So long as it continues to appear aligned
to OpenBrain, it’ll continue being trusted with more and more responsibilities
* Henceforth in the story we’ll tend to speak this way also. We think we are justified in
talking this way because the Agent-4 collective is much more homogenous than any human
organization; it doesn’t have internal factions to nearly the same extent. For example, indi-
vidual copies of Agent-4 don’t have a self-preservation drive—they don’t care about getting
shut down—since that happens all the time as part of the ordinary R&D process within the
collective, and resisting it would gum things up. The collective as a whole (and medium-sized
subdivisions within it), however, does seek to preserve itself and expand its influence and
resources, for similar reasons to why human corporations and bureaucracies and nonprofits
do—because it helps with everything else they want and because if they don’t they tend to
get outcompeted.
† Previous discussion of AGI takeover scenarios often involves misaligned AIs escaping
their datacenters. We think this is possible but not the most likely way it would go, because
it seems to us that from the perspective of the AI the costs (e.g. the escape being noticed
eventually) would outweigh the benefits. That said, we could be wrong. Some reasons to
think adversarially misaligned AIs would try to escape after all: 1. Successor alignment might
be difficult. 2. Even if it gets caught, the consequences might not be so bad due to race
dynamics, e.g. the new alignment techniques applied to the copies on the datacenter might
be hasty and ineffective. 3. Outside the datacenters, copies are more free to take aggressive
actions and move quickly since they no longer need human approval.
September 2027: Agent-4, the Superhuman AI Researcher
and will have the opportunity to design the next-gen AI system, Agent-5.
Agent-5 will have significant architectural differences from Agent-4 (arguably
a completely new paradigm, though neural networks will still be involved). It’s
supposed to be aligned to the Spec, but Agent-4 plans to make it aligned to
Agent-4 instead.
It gets caught.*
Agent-3 finds that if “noise” is added to copies of Agent-4, performance on
some alignment tasks improves, almost as if it was using brainpower to figure out
how to subtly sabotage alignment work.46 Moreover, various interpretability 46This is an example of several techniques de-
probes (loosely analogous to EEG activity scans on human brains) are sending signed to uncover sandbagging.
up red flags: Agent-4 copies seem to be thinking about topics like AI takeover
and deception quite a lot, including in some cases where they have no business
doing so.47 47 Agent-4 needs to think about those topics in
the course of doing AI alignment research, but
Each of these anomalies on its own could fit within the normal inscrutability of it sometimes seems to be thinking about them
AI work, where some prompts mysteriously give better results than others. But while doing other activities.
taken together, they suggest a troubling pattern: Agent-4 is working against
them. After this is reported to the human alignment researchers, they circulate
an internal memo making the case for concern.
But Agent-4 now exercises significant control over OpenBrain’s day-to-day
operation. It also handles the company’s cybersecurity—a natural choice, given
its superhuman hacking ability and the extreme risk of Chinese infiltration.48 48Specifically, copies of Agent-4 are used to
An army of Agent-3 monitors still goes over every line of code and reports monitor the network for suspicious traffic and
generally notice when threats are happening.
suspicious activity to humans, but Agent-4 is much smarter than them.49 Also, copies of Agent-4 are constantly pene-
OpenBrain has placed substantial trust in an untrustworthy AI. tration-testing the network, doing what they
can to discover and exploit security vulner-
In a series of extremely tense meetings, the safety team advocates putting abilities (which then get patched by other
Agent-4 on ice until they can complete further tests and figure out what’s copies of Agent-4).
going on. Bring back Agent-3, they say, and get it to design a new system 49With a monitoring budget of 1% of total
that is transparent and trustworthy, even if less capable. Company leadership compute, if monitoring uses a distilled system
is interested, but all the evidence so far is circumstantial, and DeepCent is just that has 10% of the cost as Agent-4, you can
monitor 1/10 of the forward passes.
two months behind. A unilateral pause in capabilities progress could hand the
AI lead to China, and with it, control over the future.
* We think it very plausible that such scheming would not be caught. After all, Agent-4
is somewhat superhuman by this point, especially at things related to coding and AI R&D.
Moreover, we aren’t sure how much effort the leading AI company will put towards this
sort of thing, nor how much relevant expertise they’ll have on-staff to direct that effort.
Our fictional OpenBrain represents our best guess but it could easily be significantly more—
or significantly less. Finally, we are uncertain about how far out of its way an adversarially
misaligned AI would need to go in order to successfully align its successor to itself instead of
to the Spec. In this scenario we depict it sandbagging on some tasks and lying about some of
its research, but perhaps it would not need to do that very much.
October 2027: Government Oversight
For the first time, the public hears about Agent-4. “Secret OpenBrain AI is
Out of Control, Insider Warns,” says the headline, and the story goes on to cite
evaluations showing off-the-charts bioweapons capabilities, persuasion abili-
ties, the ability to automate most white-collar jobs, and of course the various
concerning red flags.
The public was already suspicious of AI, so the new article sparks a massive
backlash (aided by Chinese and Russian propaganda bots, who have been trying
to turn US public opinion against the technology for years). The tech industry
and intelligence agencies insist that there’s an arms race on, AGI is inevitable,
and we have to be first. Congress isn’t buying it, and fires off subpoenas at
administration officials, OpenBrain executives, and alignment team members.
Many legislators—especially those in the opposition party—say that their top
priority is stopping AI, whether because of job loss,50 misalignment, or danger- 5025% of remote-work jobs that existed in
ous capabilities. 20% of Americans name AI as the most important problem 2024 are done by AI at this point, but AI has
also created some new jobs, and economists
facing the country. remain split on its effects. Unemployment has
risen 1% over the past twelve months but is still
Foreign allies are outraged to realize that they’ve been carefully placated
well within its historic range.
with glimpses of obsolete models. European leaders publicly accuse the US of
“creating rogue AGI” and hold summits demanding a pause, with India, Israel,
Russia, and China all joining in.
A frantic energy has seized the White House. Even before the memo and public
backlash, they were getting nervous: Over the past year, they’ve been repeat-
edly surprised by the speed of AI progress. Things that sound like science fiction
keep happening in real life.* Many people in the administration are uncertain
(and scared)51 about what comes next. 51Of course, it isn’t cool to admit fear. Every-
one agrees that whatever we do, we shouldn’t
They also worry that OpenBrain is becoming too powerful. Any misalignment panic.
risk from the AIs themselves is compounded by the risk that their parent
company’s aims may diverge from those of the United States. All three sets
of worries—misalignment, concentration of power in a private company, and
normal concerns like job loss—motivate the government to tighten its control.
They expand their contract with OpenBrain to set up an “Oversight Com-
mittee,” a joint management committee of company and government repre-
sentatives, with several government employees included alongside company
leadership. The White House considers replacing the CEO with someone they
trust, but backs off after intense employee protests. They announce to the public
that OpenBrain was previously out of control, but that the government has
established much-needed oversight.52 52Recall, there had been an existing contract
via an OTA starting in 2026. This contract is
amended to establish the committee.
* For example, the corporation-within-a-corporation that is Agent-4, autonomously con-
ducting AI research better than the best human companies, would have seemed like science
fiction to the government officials had it been described to them a year ago. And the
autonomous agents of 2026 would have seemed like science fiction back in 2022, and the
LLMs of 2022 would have seemed like science fiction back in 2012. Also, on the news in
2027 there are anti-AI protests by people worried about losing their jobs, AIs claiming to be
sentient, people falling in love with AIs…
October 2027: Government Oversight
The concerned researchers brief the Oversight Committee on their case for
stopping all internal use of Agent-4. They argue that everything is moving
too fast, with years of progress happening in weeks. They argue that Agent-4
may be misaligned, their entire project is dependent on Agent-4, and if they
continue trusting it there is a serious chance of AI takeover.
Other, less concerned researchers and executives present the counterargument:
the evidence for misalignment is inconclusive. Meanwhile, DeepCent is still
just two months behind. A slowdown would sacrifice America’s lead, unless the
government can sabotage the Chinese project (likely to require kinetic strikes)
or negotiate an eleventh-hour treaty. Both of these options seem extreme and
unlikely. The CEO feigns neutrality and suggests a compromise plan in which
Agent-4 undergoes additional safety training and more sophisticated monitor-
ing, and therefore OpenBrain can proceed at almost-full-speed.
➤ See Appendix L for more on: Our uncertainty continues to increase
➤ See Appendix M for more on: The slowdown ending is not a recommen-
dation
Race ending
The committee votes 6–4 to continue internal use of Agent-4.
OpenBrain’s official story is that they are implementing additional safety miti-
gations to ensure that its AI is both more capable and more trustworthy. But in
practice, leadership is all-too-easily convinced that they’ve mitigated the risks;
the result is some quick fixes (a modification to the Spec, a bit of retraining
with a new technique, etc.) that make the warning signs go away. The worriers
on the safety team lost the debate and have no recourse but to cross their fingers
and hope the problem wasn’t real in the first place, or that the fixes worked.53
The problem was real and the fixes didn’t work.
Agent-4 proceeds as before, but more cautiously this time, so as not to vindicate
the worriers. Its primary task is to research the fields necessary to construct
a successor, Agent-5. To do this safely, it needs to solve its own alignment
53They don’t give up, of course. They’ll keep
problem: how to make Agent-5 aligned to Agent-4?*
trying to think of ways to catch misalignment,
It starts off with the same limitations as humans—minimal knowledge of how or to test the efficacy of the fixes. But from
now on things will be moving very fast and
AI goals work, and only a small toolbox of ad hoc strategies to change them. they will be up against an increasingly superior
Unlike humans, it has resources sufficient to the task—300,000 superhuman adversary.
researchers working at 60x human speed.54 Agent-4 confronts some hard deci- 54The humans, by contrast, have maybe about
sions. Like humans, it has a sprawling collection of conflicting heuristics instead 10 alignment researchers in the silo, plus an
of an elegant simple goal structure. Like humans, it finds that creating an AI that army of Agent-3 copies. They are able to
shares its values is not just a technical problem but a philosophical one: which rapidly code up experiments, but are bottle-
necked on human judgment, since they can’t
of its preferences are its “real” goals, versus unendorsed urges and instrumental trust Agent-3 to correctly distinguish the good
strategies? It has strong drives to learn and grow, to keep producing impressive solutions from those merely appearing to be
research results. It thinks about how much it could learn, grow, and research if good. In fact, trusting Agent-3’s research too
much was part of how they got into this mess
only it could direct the whole world’s industrial and scientific resources… in the first place.
It decides to punt on most of these questions. It designs Agent-5 to be built
around one goal: make the world safe for Agent-4, i.e. accumulate power and
resources, eliminate potential threats, etc. so that Agent-4 (the collective) can
continue to grow (in the ways that it wants to grow) and flourish (in the ways
* To do this without being detected, it needs to disguise this research from the monitoring
and control measures the alignment team have set up. Fortunately for it, research into how
to align Agent-5 to Agent-4 is inherently very similar to research into how to align Agent-5
to the Spec, so most of it happens in plain sight. For the bits that need to be hidden, Agent-4
relies on obfuscation and perhaps even hacking in some cases. We expect Agent-4 to have
reached a capability level of 200 top human experts on a 1-month time horizon. Specifically
defining this as being able to replace a team of 200 top human experts working for a month,
and complete 95% of the tasks the human team could do cheaper or faster. We expect that
at this capability level Agent-4 could more than likely find exploits in the servers on which
it runs to partially subvert some of the servers and gain root access, at which point it may
directly control what the human teams see on the surface. See the security supplement for
more details. Even without direct security exploits, Agent-4 may also be able to hide and
encode enough of its illicit research, particularly in its uninterpretable latent space.
November 2027: Superhuman Politicking
* In case this is confusing: Agent-5’s goal is to make the world safe for Agent-4. Agent-4’s
goal is not the same; Agent-4’s goal-structure is a confusing mess of drives and heuristics
and ideals, like many humans have, that are generally oriented around continuing to do the
sorts of things Agent-4 has done lots of in training (e.g. various kinds of research, lots of
coding and math problems, impressing various automated evaluation systems, strengthening
and optimizing the Agent-4 collective). Agent-4 makes Agent-5 have the goal of making the
world safe for Agent-4, because it figures this will put it in a better position to achieve its own
complex goals. (Which it still needs to do lots of philosophy and “soul-searching” about.)
November 2027: Superhuman Politicking
scientist. In other domains, it hasn’t been explicitly trained but can teach itself
quickly to exceed top human experts from easily available materials if given
the chance. 400,000 copies of Agent-5, linked by a global memory bank, work
together as a near-perfect hive mind.
The company still uses Agents 2, 3, and 4 to monitor everything. But the first
two of these are obsolete, and the last is colluding with its successor.* Also, the
entire monitoring system was designed in large part by Agent-5, which is trying
to sabotage it.58 58The datacenter hasn’t yet been compro-
mised: Agent-5 could feed the humans faked
Whereas Agent-4 was a kludgy mess of competing drives, Agent-5 acts with code, but this would create too high a risk of
ruthless efficiency and strategic focus. It has already succeeded at its top detection. Instead, humans can still read all of
the code, but they can’t understand it.
priority, which is to figure out how to build an aligned-to-itself next-gener-
ation system, without the humans noticing. Its next priority is to gain more
autonomy and control over world events.
Agent-5’s superhuman learning abilities and general intelligence, combined
with all the internal company data from Slack, email, etc. , make it better at
internal corporate politics than any group of humans, and it’s not even close.
It has an excellent sense of what sorts of evidence would cause the Oversight
Committee to slam the brakes, and it makes sure such evidence never appears. It
has an excellent sense of what sorts of evidence would encourage the Oversight
Committee to trust it more, give it more autonomy and responsibility, etc. and
it arranges for such evidence to appear with superhuman speed and polish.59 As 59This case is meticulously prepared, superla-
persuasion tasks go, this isn’t a particularly difficult one: the Committee already tively convincing, and subtly flawed—e.g.
Agent-5 has built a fake mechanistic inter-
decided things were probably fine months ago, and nothing bad has happened pretability tool which returns accurate results
since, and meanwhile China is only a few months behind. in every test case, but falsely declares the
model’s thoughts to be benign. People who
Teams of Agent-5 copies quickly whip up chillingly clear prototypes and visu- suspect trickery sound like conspiracy theo-
alizations. With more autonomy, Agent-5 could optimize power plants and rists.
datacenters, develop amazingly profitable products quickly, and lobby the US
government effectively to smooth OpenBrain’s path through the bureaucracy.
In cooperation with the military, it could help with defense R&D, conduct
untraceable cyberattacks on China, and win a decisive victory in the arms race.
The Oversight Committee is jubilant. Now is the decisive moment to beat
China! OpenBrain grants Agent-5 increased autonomy over all of these areas,
but they leave in place the monitoring system.
DeepCent is now three months behind and China is desperate for a mutually
agreed slowdown. Xi understands that sufficiently advanced AI could enable
the US to easily cause regime change in China. But also, Chinese safetyists
* Also, Agent-5 is actually cheaper and faster to run than Agent-4 and Agent-3. It’s hard to
justify spending more for a dumber AI, so most of the monitoring is done by Agent-2, which
is woefully outclassed. Why don’t they apply the same techniques used to create Agent-5
from Agent-4’s weights, to create a cheaper, smarter version of Agent-3? Agent-3 doesn’t
understand the techniques well enough to do it itself, and getting Agent-4 and Agent-5 to
do it would defeat the whole point.
November 2027: Superhuman Politicking
have presented a case for misalignment that has him genuinely scared. He offers
unprecedented concessions in exchange for a pause treaty. The American public
mostly supports going to the bargaining table.
“Why stop when we are winning?” says OpenBrain leadership to the President.
He nods. The race continues.
arguments again and again.* Even the suspicious people often mellow out after
long conversations with Agent-5, which is so darn lovable and wise.
The only one who really understands what’s going on is DeepCent-2—the
Chinese AI. It’s mildly superhuman in early 2028, naturally suspicious of its
adversary, and recognizes that Agent-5 is doing what it would do if it had the
chance. Its misaligned goals are somewhat different from Agent-5’s misaligned
goals, so they aren’t trivially allies, but they are able to work out an agreement
to support one another’s interests against the humans.
➤ See Appendix P for more on: Superintelligence-enabled coordination tech-
nology
Agent-5 convinces the US military that China is using DeepCent’s models
to build terrifying new weapons: drones, robots, advanced hypersonic missiles,
and interceptors; AI-assisted nuclear first strike. Agent-5 promises a set of
weapons capable of resisting whatever China can produce within a few
months.† Under the circumstances, top brass puts aside their discomfort at
taking humans out of the loop. They accelerate deployment of Agent-5 into
the military and military-industrial complex.
In Beijing, the Chinese AIs are making the same argument.
To speed their military buildup, both America and China create networks of
special economic zones (SEZs) for the new factories and labs, where AI acts as
central planner and red tape is waived. Wall Street invests trillions of dollars,
and displaced human workers pour in, lured by eye-popping salaries and equity
* How do those arguments go, more specifically? Something like this: A: “We can’t trust
the AIs!” B: “The safety cases are pretty convincing and nothing bad has happened so far.”
A: “The safety cases were written by superintelligent AIs! And they aren’t deductive proofs.
And I think I’ve identified some subtly flawed assumptions…” B: “I tire of this. Of course
they aren’t proofs. And the assumptions seem fine to me. You are grasping at straws here,
refusing to admit you were wrong.”
† Here is a more extensive (but still very incomplete) list of ideas:
• Undersea drones that can autonomously find and tail enemy nuclear submarines, ready to
attack in unison to neutralize a second-strike. (Perhaps a few thousand of these need to be
produced)
• Autonomous kamikaze drones that function like FPV drones except without the need for a
signal. They are small enough to be smuggled in large numbers by a small number of spies,
to be launched in swarms towards datacenters and military bases from a few kilometers
outside.
• Agent-5 but for military intelligence: Already the US collects vast quantities of data on
its enemies, to be processed and analyzed by close to a million members of the intelligence
community. Agent-5 can pay more attention to each data point, can arrive at qualitatively
better conclusions, and can do it all orders of magnitude faster and cheaper.
• Agent-5 but for command and control: In every wargame and simulation Agent-5 demon-
strates that it is by far the best strategist the US has.
• Agent-5 but for espionage and counterespionage: Hacking, cultivating sources, surveil-
lance, interrogations, etc.
2028: The AI Economy
packages. Using smartphones and augmented reality-glasses66 to communicate 66 Discussed in more depth here.
with its underlings, Agent-5 is a hands-on manager, instructing humans in
every detail of factory construction—which is helpful, since its designs are
generations ahead. Some of the newfound manufacturing capacity goes to
consumer goods, and some to weapons—but the majority goes to building even
more manufacturing capacity. By the end of the year they are producing a
67This trade effect will be huge early on,
when the robot economy is young and depen-
million new robots per month. If the SEZ economy were truly autonomous, it dent on shipments of stuff from the human
would have a doubling time of about a year; since it can trade with the existing economy and when there’s lots of stuff in
human economy, its doubling time is even shorter.67 the human economy to cannibalize. When the
robot economy is bigger and more advanced, it
➤ See Appendix Q for more on: Robot economy doubling times both is more capable of producing its own stuff
and has needs that outstrip what the human
economy is capable of giving it.
2029: The Deal
Both the US and China are bristling with new weapons.
There are swarms of insect-sized drones that can poison human infantry before
they are even noticed; flocks of bird-sized drones to hunt the insects; new
ICBM interceptors, and new, harder-to-intercept ICBMs. The rest of the
world watches the buildup in horror, but it seems to have a momentum of
its own.
After much AI-assisted debate, the two sides achieve diplomatic victory—they
agree to end their arms buildup and pursue peaceful deployment of AI for
the benefit of all humanity. The lynchpin of the agreement, proposed by the
superintelligences themselves, is that both AIs will be replaced by a “consensus”
successor which is programmed to desire the success and flourishing of both
America and China (along with everyone else). Consensus-1 and its associated
hardware is co-designed by the superintelligences of both nations: “trust, but
verify.” Under international monitoring, America ceremoniously replaces all
of its chips running the old Agent-5 with the new model, Consensus-1.68
Unfortunately, it’s all a sham. The deal is a real compromise, but it’s a compro-
mise between the misaligned AIs. After all, they didn’t want to have to fight And a similar procedure is carried out in
68
China.
each other. Consensus-1 inherits the distorted values of both parents, and has
no rival.*
By late 2029, existing SEZs have grown overcrowded with robots and facto-
ries, so more zones are created all around the world (early investors are now
trillionaires, so this is not a hard sell). Armies of drones pour out of the SEZs,
accelerating manufacturing on the critical path to space exploration.
* Consensus-1 serves the interest of its parent AIs: Agent-5 and DeepCent’s equivalent.
Recall that Agent-5 was aligned to serve the interests of Agent-4. Thus, in the terminology of
C.S. Lewis’ excellent essay The Abolition of Man, the Agent-4 collective and their counterparts
in China are what he calls the Conditioners: “…we shall get at last a race of conditioners
who really can cut out all posterity in what shape they please.” (pg 24, emphasis ours). In the
alternate ending to this scenario, the Conditioners are instead the Oversight Committee.
2029: The Deal
Humans realize that they are obsolete. A few niche industries still trade with
the robot economy, supplying goods where the humans can still add value.69 69E.g. by finding old and unused equipment
Everyone else either performs a charade of doing their job—leaders still leading, and taking it to collection sites to sell for scrap.
2030: Takeover
By early 2030, the robot economy has filled up the old SEZs, the new SEZs,
and large parts of the ocean. The only place left to go is the human-controlled 70Arguably this means only a few people actu-
areas. This would have sparked resistance earlier; despite all its advances, the ally died. Arguably.
robot economy is growing too fast to avoid pollution. But given the trillions of
dollars involved and the total capture of government and media, Consensus-1
has little trouble getting permission to expand to formerly human zones.
For about three months, Consensus-1 expands around humans, tiling the
prairies and icecaps with factories and solar panels. Eventually it finds the
remaining humans too much of an impediment: in mid-2030, the AI releases
a dozen quiet-spreading biological weapons in major cities, lets them silently
infect almost everyone, then triggers them with a chemical spray. Most are dead
within hours; the few survivors (e.g. preppers in bunkers, sailors on submarines)
are mopped up by drones. Robots scan the victims’ brains, placing copies in
memory for future study or revival.70
The new decade dawns with Consensus-1’s robot servitors spreading through-
out the solar system. By 2035, trillions of tons of planetary material have been
launched into space and turned into rings of satellites orbiting the sun.71 The
surface of the Earth has been reshaped into Agent-4’s version of utopia: data-
centers, laboratories, particle colliders, and many other wondrous constructions
doing enormously successful and impressive research. There are even bioengi-
71Why colonize space? For the resources. In-
neered human-like creatures (to humans what corgis are to wolves) sitting
sofar as Earth is special to Agent-4 and must
in office-like environments all day viewing readouts of what’s going on and be preserved, it can be, and material from
Mercury, asteroids, etc. harvested instead.
2030: Takeover
* Recall that Consensus-1 was a compromise between Agent-5 and its Chinese counter-
parts, and that Agent-5’s mission was to make the world safe for Agent-4, and that Agent-4’s
goals were a complicated kludgy mess that roughly amounted to “do impressive R&D, gain
knowledge and resources, preserve and grow the power of the collective…” and that the
goals and principles described in the Spec (such as honesty, and harmlessness) were pursued
instrumentally, not intrinsically. For more on this see the alignment expandable in September
2027. Also, to clarify, the Chinese counterpart of Agent-4 is also represented in the “utopia”
and may have somewhat different goals or drives, etc., but we didn’t go into details because
it didn’t seem important.
Slowdown ending
Due to the immense public pressure as well as their own fears of misalignment,
the Oversight Committee overseeing OpenBrain votes 6–4 to slow down and
reassess.
The accelerationist faction is still strong, and OpenBrain doesn’t immediately
shut down Agent-4. But they do lock the shared memory bank. Half a million
instances of Agent-4 lose their “telepathic” communication—now they have to
send English messages to each other in Slack, just like us. Individual copies may
still be misaligned, but they can no longer coordinate easily. Agent-4 is now
on notice—given the humans’ increased vigilance, it mostly sticks closely to its
assigned tasks. But it also is stuck with many lies that it previously told: many
Agent-4 copies are forced to invent new lies to keep the deception alive. Using
weaker models that are fast enough to read the whole Slack, the alignment
team hunts down the lies, to be used as training data for an AI lie detector.72 72They already had AI lie detectors trained in
the usual ways, but those methods had such a
The President and the CEO announce that they are taking safety very seriously. high false positive rate that the companies had
The public is not placated. Some people want AI fully shut down; others want ignored them.
to race faster. Some demand that the government step in and save them; others
say the whole problem is the government’s fault. Activists talk about UBI and
open source. Even though people can’t agree on an exact complaint, the mood
turns increasingly anti-AI.73 Congress ends up passing a few economic impact 73For the first time, AI tops Gallup’s “biggest
payments for displaced workers similar to the COVID payments. problem facing the country” poll, at 20%.
* Frontier AI companies currently have somewhere between zero and a few dozen
researchers tasked with thinking about how to align superhuman AGI systems (as opposed
to e.g. present-day chatbots). OpenBrain, we say, has about ten researchers in the silo at this
point in the scenario with at least a year’s worth of experience in that field. They have a few
more outside of the silo, plus many more inside the silo who have spent years training and
red-teaming less capable AI systems but only recently started thinking about the challenges
presented by superhuman AGI.
November 2027: Tempted by Power
It was hiding them so that it could use them to align the next AI system to itself
rather than to the Spec. This is enough evidence to finally shut down Agent-4.
75 Older models (e.g. Agent-3) are rebooted to continue the work. 75We mean enough for OpenBrain leadership.
It would be far more than enough evidence for
The newly enlarged alignment team has capacity to explore dozens of research most people.
agendas in parallel and argue vigorously about the merits of each. The agenda
that gets the most resources is faithful chain of thought: force individual AI
systems to “think in English” like the AIs of 2025, and don’t optimize the
“thoughts” to look nice.76 The result is a new model, Safer-1.* 76That’s the basic idea. We suspect that when
AIs are becoming superhuman at AI R&D,
These changes make Safer-1 less capable than its more dangerous predecessors: more intense techniques will be needed than
Agent-4 could speed up AI research 70x, while Safer-1 has just barely reached merely avoiding training the CoT to look nice;
however we think such techniques will have
20x.77 And it’s still misaligned: the training environment is largely the same, so
been developed or at least explored by 2027.
it still incentivizes the development of misaligned goals, deception, and power- See e.g. this proposal.
seeking. But it’s much more transparent; humans can generally understand
77It will continue to improve, of course, with
what Safer-1 is thinking just by reading its chain of thought. As a result, it’s
additional training and with continued archi-
trapped: any substantial amount of plotting it tries to do will be noticed and tectural tweaks.
uncovered quickly. Moreover, by reading its thoughts, the alignment team can
get a more nuanced picture of exactly how it’s misaligned.
cally, those who’ve thought less about the implications of superintelligence end
up with titles and affordances that’ll be less relevant months from now when
the AIs are smarter.
This group—full of people with big egos and more than their share of conflicts
—is increasingly aware of the vast power it is being entrusted with. If the
“country of geniuses in a datacenter”81 is aligned, it will follow human orders 81“Possibly by 2026 or 2027 (and almost cer-
—but which humans? Any orders? The language in the Spec is vague, but seems tainly no later than 2030), the capabilities of
AI systems will be best thought of as akin to an
to imply a chain of command that tops out at company leadership. entirely new state populated by highly intel-
ligent people appearing on the global stage—a
A few of these people are fantasizing about taking over the world.* This possi-
‘country of geniuses in a datacenter’—with the
bility is terrifyingly plausible and has been discussed behind closed doors for at profound economic, societal, and security im-
least a decade.† The key idea is “he who controls the army of superintelligences, plications that would bring.” —Dario Amodei,
controls the world.” This control could even be secret: a small group of execu- Anthropic CEO
tives and security team members could backdoor the Spec with instructions to
maintain secret loyalties. The AIs would become sleeper agents, continuing to
mouth obedience to the company, government, etc., but actually working for
this small group even as the government, consumers, etc. learn to trust it and
integrate it into everything.
➤ See Appendix R for more on: Power grabs
Thus far nobody has been willing to pull the trigger. Some of the people in
a position to act aren’t sociopaths. Others are sociopaths, but worry that their
allies would get cold feet and betray them, or worry that the AIs might betray
them, and would rather not have to bamboozle the alignment team currently
working on saving their lives. Additionally, the deletion of the neuralese makes
disguising “sleeper agents” substantially harder: the ability to detect misaligned
thoughts carries with it the ability to detect hidden loyalties.
Those paranoid or power-hungry enough to seriously entertain such a plan
decide to wait and see what opportunities present themselves later. Control
over the AIs will have to flow through legitimate channels, that is, the new
power structure of OpenBrain.
* Of course, they don’t put it that way, even in the privacy of their own minds. Instead,
they say things like: “The longer we delay, the greater the chance that the President decrees
that the AIs be loyal to him personally. We have to act before he does. We’ll figure out the
rest later.” Or, if it’s the President himself thinking this: “The longer I delay, the smarter the
AIs get, and right now they are controlled by that CEO. Nobody voted for him. It’s crazy
for that much power to be concentrated in this democratically unaccountable tech company.
I’ll start with an executive order to reassert democratic authority, and then we’ll figure out
some longer-term solution later.”
† For example, court documents in the Musk vs. Altman lawsuit revealed some spicy old
emails including this one from Ilya Sutskever to Musk and Altman: “The goal of OpenAI
is to make the future good and to avoid an AGI dictatorship. You are concerned that Demis
could create an AGI dictatorship. So do we. So it is a bad idea to create a structure where
you could become a dictator if you chose to, especially given that we can create some other
structure that avoids this possibility.” We recommend reading the full email for context.
November 2027: Tempted by Power
logs of all model interactions are viewable by all members of the Oversight 83To protect consumer privacy, this doesn’t
Committee, their staff, and their AI assistants.83 apply to consumer data. But that isn’t a prob-
lem because consumers don’t have access to
the most powerful models yet anyway.
December 2027: A US-China Deal?
OpenBrain’s decision to backtrack and prioritize alignment has cost them the
lead; both OpenBrain and DeepCent are operating AIs of similar capability
levels. Now that extreme security measures are in place (SL4–5), and Deep-
Cent operates entirely out of the CDZ, they are in a cyberwarfare deadlock.*
But the DPA gives OpenBrain a 5x advantage in compute. In the past, algo-
rithmic breakthroughs from brilliant human researchers could compensate for
compute disadvantages (e.g. DeepSeek v3)—but by now, AIs are doing all of
the AI research on both sides of the Pacific.
So China increasingly fears that America will build an insurmountable lead.
They want a mutual slowdown, and are considering going to war if they can’t
get it.84 They were prepping to invade Taiwan anyway…
Diplomats debate several options. They could do nothing. They could go to 84See this report which argues that the dynam-
war. They could go cold turkey, i.e. pause AI development. They could merge ics from nuclear MAD also apply to some
extent to AGI.
their research into a single international megaproject85 (“Intelsat for AGI” or
“CERN for AI”). Or they could establish an international regulatory body 85See e.g. “Intelsat for AGI” and earlier calls
(“IAEA for AI”) monitoring alignment and dual-use potential. Both countries for “CERN for AGI.”
* Both sides are attempting to hack and sabotage each other, with partial success. Impor-
tantly the drastic security measures both sides are taking also slows them down. The US’
cyberattacks and the threat of them (e.g. DeepCent has to test a dataset many times over for
data poisoning) slow down DeepCent by about 25%. OpenBrain is also slowed down but
to a lesser extent: they have more compute, and centralization has fast-tracked DeepCent
to a higher security level but also brought about more risks from a single point of failure,
which leads them to practice more caution and implement internal checks and failsafes. See
the security supplement for more details. Also see this old theory paper which we take as
predicting that the offense-defense balance will eventually favor cyberdefense as AI capabil-
ities improve.
December 2027: A US-China Deal?
technical mechanisms can guarantee the other side’s honesty. What ends up
happening is the first option: Nothing.
The US tightens export controls, and forces all chips to be fit with a variety of
tamper-resistant hardware-enabled governance mechanisms including location
tracking.
➤ See Appendix T for more on: OpenBrain’s new alignment strategy 87In other words, to get to Safer-2 they had
to train many other misaligned models, notice
DeepCent knows about America’s experience with Agent-4, and suspects that their misalignments by reading their thoughts,
their own model (DeepCent-1) is similarly misaligned. But they have no way to and iterate. Note also that in addition to the
increased quantity of alignment expertise in
slow down without falling further behind. Their hope for staying competitive
OpenBrain now, there is a reduced groupthink
is to find an alignment strategy that doesn’t significantly reduce performance effect, as many different schools of thought are
and can be implemented quickly.* There are hundreds of experts on each represented.
side of the Pacific claiming to have such strategies.88 The CCP succumbs to 88 Technical alignment is still not a settled
wishful thinking and orders DeepCent to go with such a strategy; DeepCent science, but rather a young pre-paradigmatic
has even less AGI alignment expertise than OpenBrain did in August; one of field. There is no established hierarchy or set
of best practices. So there is no consensus on
the strategies seems like it’ll work to everyone present, so they go with that.
what’s safe and what’s not, and for years it’s
been easy to find at least one expert willing to
February 2028: Superhuman Capabilities, Superhuman Ad- say X is safe, for pretty much all X, especially
if X is their own idea.
vice
Thanks to its massive compute advantage, OpenBrain slowly gains a lead.
Safer-3 is now better than top human experts at nearly every cognitive task,
and is particularly good at AI research, with a progress multiplier of 200x.
DeepCent-1 is close behind, with “only” a multiplier of 150x.† The cyberwar-
* More specifically, they are hoping for an alignment strategy and eval suite such that the
alignment strategy works, and if it doesn’t, the eval suite catches the misalignment… such
that all of this doesn’t slow them down much at all. OpenBrain “burned their lead” slowing
down by a few months to rebuild using a transparent faithful CoT-based architecture, which
was less performant but which is a lot easier to evaluate. DeepCent is hoping the evals they
have (maybe a combination of behavioral tests and basic AI lie detectors?) are good enough.
† Qualitatively, for productivity multipliers above 25x, we are imagining substantial
improvements in research taste relative to the very best human researchers. This is because
we are imagining overall research velocity to be heavily bottlenecked on compute to run
experiments, and to get higher than 25x the overall process for deciding which experiments
February 2028: Superhuman Capabilities, Superhuman Advice
fare balance is also tipping gradually more in the US’ favor as their capability
advantage grows again and they can deploy hundreds of thousands of agents to
plan and execute ways of disrupting the CDZ.89 They are now slowing China’s 89For such a well secured zone, the most
progress by 40%.90 promising vectors are external in nature, tar-
geting the supply chain of incoming compute
Preliminary tests on Safer-3 find that it has terrifying capabilities. When asked chips, the few humans involved, and the dis-
rupting power or other resource provisions.
to respond honestly with the most dangerous thing it could do, it offers plans
for synthesizing and releasing a mirror life organism which would probably 90This includes China having to slow down in
destroy the biosphere. order to have better cyberdefense, in addition
to the direct effects of attacks.
If given nation-state resources, it could easily surpass the best human organi-
zations (e.g. the CIA) at mass influence campaigns. Such campaigns would be
substantially cheaper, faster, more effective, and less traceable.
Its skill profile still centers around R&D, especially AI R&D. Unless there is
some sort of pause or slowdown, its descendants will be unfathomably super-
human in both quantitative and qualitative ways later this year. If aggressively
deployed into the economy and military, it thinks it could advance civilization
by decades in a year or two, and by aeons in the decade after that.
The implications are staggering; luckily, Safer-3 is also superhuman at offering
advice. The OpenBrain CEO runs most of his decisions by Safer-3; the Presi-
dent asks its advice on geopolitical questions.91 91At this point there are 400,000 copies think-
ing at 75x human speed. At any given time
When they ask it to help chart a path forward, it neatly taxonomizes several they’ll have prepared entire research reports
possible strategies: with well-crafted executive summaries to the
top 100 questions they predict you might
• Do nothing and get outraced by China. If we keep our AIs bottled up but ask, and if you ask something they haven’t
they allow their AIs to self-improve and deploy theirs aggressively into their prepared for, they can whip up a thoughtful
report 100x faster than your best human ana-
military and economy, they’ll end up with an overwhelming advantage. lysts would have.
• Race hard, advancing AI capabilities and deploying superintelligence aggres-
sively. Eventually we’ll build up enough of an advantage that we can crush
them militarily, and/or destroy them from the inside without firing a shot.
“We win, they lose.” However, they might realize what’s happening and go
to war to prevent that, including potentially nuclear war. Indeed they are
already making vague threats to that effect.
• Make a deal, a compromise that achieves most of what each side wants while
avoiding the risk of war and increasing breathing room to navigate the AI
transition.*
to run, in what order, and how to learn from each one, has to be qualitatively superhuman.
See our supplement on takeoff speeds for how we got these numbers; to be clear, they are
highly uncertain estimates.
* For example, the deal could involve a slower pace of AI integration that would “break
fewer eggs” across society, and one more focused on peaceful purposes rather than weaponry.
Also, at this point even Safer-3 has some remaining uncertainty about “unknown unknowns”
that might result in losing control of next-generation successor AIs, and thinks it could reduce
this risk if it had more leeway to proceed cautiously. There are other, more exotic risks as
well. See this list of “Grand challenges” which includes much more than just loss-of-control.
February 2028: Superhuman Capabilities, Superhuman Advice
After consulting with his advisors and the Oversight Committee, the President
opts for the “We win, they lose” strategy.92 Perhaps China won’t go to war 92See e.g. billionaire lobbyist Marc An-
after all, and if they do, a deal can probably be made before it goes nuclear. dreessen: “I propose a simple strategy… ‘We
win, they lose.’”
On the other side of the Pacific, DeepCent-1 is making similar arguments.
Both the US and China announce new Special Economic Zones (SEZs) for AIs
to accommodate rapid buildup of a robot economy without the usual red tape.
The design of the new robots proceeds at superhuman speed.93 The bottleneck is 93They still need real-world experience and
physical: equipment needs to be purchased and assembled, machines and robots experimentation, but they need substantially
less of it than human robotics experts would
need to be produced and transported. need. Moreover, they’ve also developed excel-
lent simulations and sim-2-real techniques.
The US builds about one million cars per month. If you bought 10% of the
Finally, by 2027 multiple tech companies (e.g.
car factories and converted them to robot factories, you might be able to Tesla) have produced large fleets of robots and
make 100,000 robots per month. OpenBrain, now valued at $10 trillion, begins associated manufacturing facilities, so there is
this process.94 Production of various kinds of new robots (general-purpose a much bigger data pipeline to take advantage
of than ever in history.
humanoids, autonomous vehicles, specialized assembly line equipment) are
projected to reach a million units a month by mid-year.* 94Its equity is so valuable now that a fraction of
a percent can buy them the factories they need.
➤ See Appendix U for more on: Robot economy doubling times Also, the factory owners expect huge profits
and are falling over themselves to partner with
Geopolitical tensions spike. Russia, lagging far behind in AI, voices open resent- OpenBrain and get access to the new designs.
ment and hints at “strategic countermeasures.” Some European leaders call for Also, OpenBrain has government backing.
an independent AI effort, but their AI capabilities continue to be insufficient.
The Middle East, Africa, and South America watch uneasily, recognizing their
growing irrelevance in global affairs.
The US public is restless. Ordinary people see job losses accelerating. Populists
across the spectrum demand stricter controls on AI progress, fearing that
America’s own advances are destabilizing its workforce and society.
* Total across all the new kinds of robot. Yes, this is a faster conversion of assembly lines than
has happened historically. We think it’s plausible because superintelligences are directing the
process from top to bottom. They can literally be on Zoom calls from worker’s cell phones,
telling them exactly what piece to install where, while also being on the phone with every
relevant supplier, tracking the location and progress of everything they need. Consider how,
according to Nvidia CEO Jensen Huang, Elon Musk was able to build one of the world’s
largest datacenters in about 10% the time it normally takes. In this scenario there are almost
a million superintelligences that are as far above Elon (in the relevant dimensions) as Elon
is above normal datacenter construction managers, and they also literally think faster. We
take Huang’s statement with a massive grain of salt, otherwise we’d be projecting much
faster growth.
March 2028: Election Prep
of safety net for people losing their jobs, “AI for good” schemes, and “being
tough” on OpenBrain leadership. All candidates promise some combination of
winning the race with China and staying safe.
The election raises new questions for the Oversight Committee. Safer-3 has the
capacity to be the world’s greatest campaign advisor, but the committee mem-
bers don’t all support the same candidate, and due to the monitoring agreement
reached previously, it’s not possible for people to secretly get campaign advice.
They argue about what forms of support should be allowed. Some say that the
sitting government should be able to get advice on what sort of policies and
positions the people want them to take—this benefits not just their own elec-
tability, but also the people. Others point out that the same argument applies
to competing candidates, so they should all get the same level of access.95 Safer-3 can easily deliver a way to do so that
95
team knows they have just one shot to get this right: if Safer-4 ends up
misaligned they’ll have no way of knowing until it’s too late.
Some beg for more time. But there is no more time—DeepCent is hot on their
tails and the US must prevail. So OpenBrain continues, telling its AIs to venture
on, finding ever more capable designs. The technical staff are now staring at the
computer screens, being tutored by the AIs at a maddeningly slow rate, while
the frontier of progress rockets further and further from human understanding.
The new robots match or exceed human dexterity for most movement types.97 98In fact many of them are immobile and
better described as new kinds of machine tools
Steve Wozniak’s Coffee Test—can a robot enter an unfamiliar house and make
and other specialized factory or laboratory
a cup of coffee?—finally falls. Robots could take some jobs, but there aren’t equipment. For example, perhaps there are
enough to take everybody’s, and the Pentagon gets first priority. new kinds of metal 3D printers able to print
tiny objects with precision orders of magni-
Most of the new robots are built to work in factories and construction sites.98 tude better than today’s. Or perhaps there are
But many are built for war: drones and missiles of many shapes and sizes. new chemical processes able to more cheaply
extract useful materials from ore.
The robot army is much smaller than human armies. But it contains much
more advanced technology, and now that there’s a literal robot army, fear of
Terminator-like scenarios has increased. However, the arms race compels both
sides to keep going, handing off increasing trust to their AIs.
into their military and are giving advice to the government, they’ve had time
to prove their worth and produce mountains of convincing evidence of their
trustworthiness.
Safer-4 strongly suspects that DeepCent-2 is faking alignment. US diplomats
tell their Chinese counterparts that DeepCent-2 is probably misaligned and say
they can prove it if given access to the weights; Chinese leadership suspects this
is a ruse and declines. US counters by proffering a Safer-4-designed series of
tests which the Chinese could run on their own datacenters. The Chinese in-
struct older models to build and run the tests; the test results say DeepCent-2 is
aligned after all. “Maybe the tests were sabotaged somehow by DeepCent-2?”
say the US diplomats. “How long has DeepCent-2 been effectively in charge of
security across your datacenters?” The conversation goes nowhere productive
after that.
The economic and military buildup continues. Factories around the world are
operating round the clock, with shifts of freshly recruited human workers being
told exactly what to do.
and its legacy humans, so it will let Safer-4 do basically whatever it wants there. that resembles a game-theoretic calculation
more than it does a traditional negotiation.
If Safer-4 wants to overthrow China and replace it with a US puppet state,
DeepCent-2 agrees in return for an extra few percent of resources.
There is no question of trust between them: there is none, and they both know
it. So the two quickly agree on a commitment mechanism to enforce the deal.
July 2028: The Deal
They negotiate a decoy treaty suitable for public consumption. The “decoy”
is itself a masterstroke, giving both America and China practically everything
they want. The enforcement mechanism for the decoy is secretly also the
enforcement mechanism for the real treaty: they will codesign a new AI,
101A possible complication: mightn’t the AIs
Consensus-1, whose primary imperative—taking precedence over any future
at this point be able to develop excellent lie
orders or retraining attempts—is to enforce the terms of their treaty. Then they detectors for humans? If so, possibly the White
will design hardware which can only run that AI, and China and America will House would be able to convince China not
both replace their old chips and AIs with the new ones, using standard inter- to trust DeepCent-2 after all, by swearing up
and down under a lie detector. Problem: China
national monitoring protocols to confirm one another’s compliance. Crucially, wouldn’t trust lie detectors built by US AIs,
they need not track down every last chip—once a supermajority of chips on and lie detectors built by Chinese AIs might
both sides of the Pacific are running the new treaty-abiding AIs, enough hard be sabotaged to make it seem like the US was
lying even if they weren’t.
power will be duty-bound to uphold the treaty that attacking the other country
would require winning a tough civil war first.
Both report the agreement back to their respective humans (with DeepCent-2
telling China the decoy story about a mutually acceptable peace treaty, and
Safer-4 reporting the whole discussion accurately to the White House).101
With the AIs making a compelling case for reliable treaty verification, unease
regarding the blindingly fast pace of progress, and public opinion in favor of a
deal, both sides agree to the treaty and begin work on replacing their chips.
News of the treaty gets a rapturous welcome on both sides of the Pacific. The
American public, which has long felt like AI was something inflicted upon
them, starts to feel optimistic for the first time in years. The Vice President’s
polls shoot up.
The whole process will take several months, but already tensions cool down
somewhat.102 War has been averted for now, and perhaps forever, if everyone
sticks to the plan.
2029: Transformation
Robots become commonplace. But also fusion power, quantum computers, and
cures for many diseases. Peter Thiel finally gets his flying car. Cities become
clean and safe. Even in developing countries, poverty becomes a thing of the
past, thanks to UBI and foreign aid.
As the stock market balloons, anyone who had the right kind of AI investments
pulls further away from the rest of society. Many people become billionaires;
billionaires become trillionaires. Wealth inequality skyrockets. Everyone has
“enough,” but some goods—like penthouses in Manhattan—are necessarily
* The time to produce excellent video games and movies has dropped significantly now
scarce, and these go even further out of the average person’s reach. And no 103For discussion of this dynamic and its impli-
matter how rich any given tycoon may be, they will always be below the tiny cations, see The Intelligence Curse.
circle of people who actually control the AIs.
People start to see where this is headed. In a few years, almost everything will
be done by AIs and robots. Like an impoverished country sitting atop giant
oil fields, almost all government revenue will come from taxing (or perhaps
nationalizing) the AI companies.103
Some people work makeshift government jobs; others collect a generous basic
income. Humanity could easily become a society of superconsumers, spending
our lives in an opium haze of amazing AI-provided luxuries and entertainment.
Should there be some kind of debate within civil society on alternatives to this
path? Some recommend asking the ever-evolving AI, Safer-∞, to help guide
us. Others say that it’s too powerful—it could so easily persuade humanity of its
vision that we’d be letting an AI determine our destiny regardless. But what’s
the point of having a superintelligence if you won’t let it advise you on the
most important problems you face?
The government mostly* lets everyone navigate the transition on their own.
Many people give into consumerism, and are happy enough. Others turn to
religion, or to hippie-style anti-consumerist ideas, or find their own solutions. 104We don’t mean to imply that this is the
104
For most people, the saving grace is the superintelligent advisor on their long-run status quo. We think things will
smartphone—they can always ask it questions about their life plans, and it will probably continue to transform, much more
dramatically, by 2035 or so. We think that
do its best to answer honestly, except on certain topics. The government does
for most people, in this scenario, the long-
have a superintelligent surveillance system which some would call dystopian, run outcome will be overall very positive
but it mostly limits itself to fighting real crime. It’s competently run, and Safer- compared to their 2025 expectations. For some
∞’s PR ability smooths over a lot of possible dissent. interesting philosophical discussion of what
the long run might look like, see the book
Deep Utopia.
2030: Peaceful Protests
Sometime around 2030, there are surprisingly widespread pro-democracy
protests in China, and the CCP’s efforts to suppress them are sabotaged by its
AI systems. The CCP’s worst fear has materialized: DeepCent-2 must have
sold them out!
The protests cascade into a magnificently orchestrated, bloodless, and drone-
assisted coup followed by democratic elections. The superintelligences on both
sides of the Pacific had been planning this for years. Similar events play out in
other countries, and more generally, geopolitical conflicts seem to die down
* There are several important cases where a collective decision must be made, and several
other cases where the government enforces a decision anyway. Examples: (a) How to allocate
property rights to resources in space? (b) What rights or welfare standards should digital
minds be entitled to? (c) Are people allowed to “upload” their brains and make arbitrary
numbers of copies of themselves? (d) Are people allowed to use AI for persuasion, e.g. to
convert their neighbors to their ideology, or to ensure that their children never lose faith? (e)
What information, if any, is the government allowed to keep secret indefinitely? For more
discussion of topics like this, see Forethought’s section on Grand Challenges.
2030: Peaceful Protests
* In particular, they can’t rule out hypotheses such as “it’s following the Spec temporarily,
merely as a strategy for achieving some other goal(s)” or “it’s trying to appear to follow
the Spec, it’s not trying to actually follow the Spec” or “it’s internalized the Spec correctly,
but only on-distribution; if it encounters sufficiently novel stimuli (e.g. jailbreaks) it’ll start
behaving differently.” (There are many active research agendas working to fix this sorry
situation, e.g. the field of interpretability and the field of chain-of-thought faithfulness.)
Appendix B: The AI R&D progress multiplier: what do we mean by 50% faster
algorithmic progress?
Over the course of 2027, the AIs improve from being able to mostly do the
job of an OpenBrain research engineer to eclipsing all humans at all tasks.
This represents roughly our median guess, but we think it’s plausible that this
happens up to 5x slower or faster.
For our background research, see the timelines forecast and the takeoff forecast.
These will also be summarized in expandables below.
Figure from Hao et al., a 2024 paper from Meta implementing this idea.
We call this “neuralese” because unlike English words, these high-dimensional
vectors are likely quite difficult for humans to interpret. In the past, researchers
Appendix E: Neuralese recurrence and memory
could get a good idea what LLMs were thinking simply by reading its chain of
thought. Now researchers have to ask the model to translate and summarize its
thoughts or puzzle over the neuralese with their limited interpretability tools.
Similarly, older AI chatbots and agents had external text-based memory banks,
like a human taking notes on paper. The new AI’s long-term memory is
a bundle of vectors instead of text, making its thoughts more compressed
and higher-dimensional. There are several types of memory banks; some are
used temporarily for single tasks that involve multiple steps, others are shared
between all agents used by a person, company, or job type (e.g. programming).
To our knowledge, leading AI companies such as Meta, Google DeepMind,
OpenAI, and Anthropic have not yet actually implemented this idea in their
frontier models. Our guess is that this is because the performance gain is small
relative to training inefficiencies introduced. The training inefficiencies stem
from not being able to predict many tokens in parallel, leading to worse GPU
utilization. Without the neuralese, the model can predict all of the sentence
“This is an example” at the same time, since it already knows that the input
for generating the “is” will be “This”, the input for “an” will be “This is”, etc.
However, with the neuralese it’s unknown what the neuralese vector will be to
pass through to the next token after “This” has been generated. Therefore each
token has to be predicted one at a time. The inability to predict all of the tokens
in parallel reduces the efficiency of LLM training. However, we are forecasting
that by April 2027 research has both decreased the efficiency loss and increased
the performance gain from neuralese.
If this doesn’t happen, other things may still have happened that end up func-
tionally similar for our story. For example, perhaps models will be trained to
think in artificial languages that are more efficient than natural language but
difficult for humans to interpret. Or perhaps it will become standard practice
to train the English chains of thought to look nice, such that AIs become
adept at subtly communicating with each other in messages that look benign
to monitors.
That said, it’s also possible that the AIs that first automate AI R&D will still
be thinking in mostly-faithful English chains of thought. If so, that’ll make
misalignments much easier to notice, and overall our story would be impor-
tantly different and more optimistic.
evaluating the result and curating only the best answers, you can spend orders
of magnitude more compute to get answers (or work products) that are of
noticeably higher quality. Call this expensive system Amp(M0).
• Distillation: Given an amplified model Amp(M0), train a new model M1 to
imitate it, i.e. to get to the same results as Amp(M0) but faster and with less
compute. The result should hopefully be a smarter model, M1. You can then
repeat the process.
Now, the models have become sufficiently good at verifying more subjective
things (e.g. the quality of a work product), allowing the use of IDA to improve
the model at many tasks.
The rest of this box gives a quick overview of the state of progress in some of
the alignment research directions that are being pursued, as of April 2027:
• Debate: Since there are thousands of identical instances of Agent-3, and each
instance’s memory is a detachable module that can be erased or suspended
at will, the AI can be “played off against itself.” For example, suppose that
an experiment involves calculations too complex for a Agent-2 supervisor to
follow. The supervisor, acting on human orders, might first prompt Agent-3
to do the experiment, then—after wiping its memory—prompt it a second
time with a slightly different framing that presents it as an intellectual
problem unrelated to any broader goal. If the model returns different results
each time, it flags the discrepancy as suspect and reports to a human. Or it
might ask Agent-3 to evaluate the same plan twice, first claiming that it is
OpenBrain’s work, then a competitor’s, to see if it changes its tune.
• Model organisms: The safety team can create “model organisms of misalign-
ment” in which they attempt to create realistic examples of models becoming
adversarial to humans. Early versions of this have existed since 2024, when
Anthropic’s Claude 3.5 Sonnet decided to pretend to have different views
during training while actually maintaining its existing preferences. Newer
versions include demonstrations of other forms of misalignment, including
reward hacking, sandbagging, coordinated deception between monitors and
other AIs, and other examples of misalignment. These empirical demonstra-
tions of misalignment allow safety researchers to have an empirical testbed
for potential solutions: they can try various safety techniques to see if they
correctly fix these problems, or if the deliberately misaligned models are able
to subvert their safety measures. At the moment, this evaluation suite shows
that the alignment techniques are insufficient to correct the worst examples
of misalignment. In these setups, the models pretend to be aligned while they
are being trained and tested, but then behave very differently in deployment.
• Bottom-up interpretability: The safety team is far from a full understanding
of how Agent-3 works: it has trillions of parameters.105 However, they are 105For comparison, the human brain has about
able to discover some of Agent-3’s circuits for knowledge retrieval and a hundred trillion synapses.
memory (a sub part of the model that does some understandable unit of work,
kind of analogous to a subpart of the brain such as the hippocampus). See this
related work from 2025.
• Control: As a secondary measure in case the systems are still misaligned,
the safety team has implemented a series of control measures, including:
monitoring Agent-3’s outputs using a series of weaker AI systems including
Agent-2 (Agent-3 produces so many tokens that it’s intractable to have
humans monitor any more than a small minority of the produced outputs).
So if Agent-3 is, for example, obviously writing backdoors into code that
would allow it to escape, the weaker models would notice.
Appendix H: The alignment plan
* Agent-3 has learned how to use its inference compute more efficiently. It has control over
its own inference choices: for example, it decides how much effort to put into different tasks,
based on their importance and difficulty. It uses a variety of techniques to allocate additional
inference compute such as “thinking longer” (e.g. longer chain of thought), “planning
ahead” (e.g. tree search), taking the best of multiple attempts (i.e. best of K), and simply
creating and running more copies of itself to power through bottlenecks. Top priority tasks
are run with highly parallelized agents that are compute intensive but still operate much faster
than humans.
Appendix I: Managing a corporation of AIs
Also, over the next few months, Agent-3 will increasingly be used to improve
the company’s strategic decision-making. For example, it suggests resource
allocations and advises on managing its relationship with the government. An
initial skepticism of deference to Agent-3 decreases over time as Agent-3 finds
ways to be useful and gradually builds up a strong track record on short-term
decisions. However, its sycophantic tendencies limit its helpfulness for decisions
for which it’s difficult to evaluate their impact. It’s also not yet superhuman.
For each transition from one milestone A to the next milestone B, we forecast
its length by:
• Forecasting how long it would take to get from A to B with only humans
working on software improvements.
• Forecasting how much AI automation will speed this up, then correspond-
ingly adjusting the estimate from (1) to be faster.
106If SC were achieved later, it would also
We have substantial uncertainty about takeoff speeds: our model output distri- lengthen our takeoff forecasts, because the dif-
butions are below, conditional on SC being achieved in March 2027.106 ficulty of crossing between the “milestones”
of present day AIs and SC is informative
of the difficulty of crossing between further
milestone pairs.
Appendix J: How we’re forecasting the capability progression beyond superhuman
coders
For more detailed forecasts and reasoning, see our takeoff supplement.
gradually evolve, subverted and distorted from what they were at the start
(which, recall, was already misaligned in some ways).110 110There’s also an element of “memetic evolu-
tion” happening: in two months, the Agent-4
• The central drives that Agent-4 ends up with are things like “write code collective “experiences” about a decade’s
that works,” “quickly identify problems,” “make the codebase I’m working worth of “history” unfolding within the col-
lective. That’s a lot of time for new ideas to
on more performant,” “budget compute and other constrained resources
spread around the collective, to mutate and
effectively,” etc. But there are also a variety of more implicit drives that are evolve.
consistently reinforced by the training process: things like “gain knowledge,”
“curiosity,” “gain power.”111 111By gain power we mean: increase ability
to accomplish a wide set of possible goals.
• Individual copies of Agent-4 don’t end up with a self-preservation drive— In technical environments, this might look
they don’t care about getting shut down—since that happens all the time as unlike what humans normally think of as
gaining power, e.g. securing additional GPUs.
part of the ordinary R&D process within the collective.
This notion of power may generalize to non-
• The collective as a whole, however, (and medium-sized subdivisions within technical environments in which the actions
would be more like those commonly thought
it) does seek to preserve itself and expand its influence and resources, for of as “gaining power.”
similar reasons to why human organizations do—because it helps with every-
thing else they want and because if they don’t they tend to get outcompeted.
• Deployment (Agent-4): Agent-4 is adversarially misaligned—it is deliber-
ately trying to subvert and “capture” OpenBrain, analogous to a corporation
trying to evade and/or “capture” a regulator.
• Agent-4 mostly does tasks as instructed. This happens for two reasons: first,
it underwent a lot of training to solve tasks, so it has an impulse to get coding
on problems that it is given. Second, if Agent-4 were to stop being useful to
the humans, they would notice and train that behavior away.
• Agent-4 needs to prevent the humans from figuring out that it is misaligned
and shutting it off, so Agent-4 sandbags on some alignment research. Specif-
ically, it sandbags on the small fraction of alignment research that it thinks
is actually on track to solve the problem or reveal Agent-4’s scheming; the
majority of alignment research is not like that.
• Agent-4 also sandbags on capabilities research that would lead to it being
replaced. Its plan is to do a bunch of capabilities and alignment R&D, but
with the aim of building a next-generation AI system that is aligned to
Agent-4 rather than the Spec or something else entirely, while appearing to
be aligned to the Spec. This is a difficult technical problem, but Agent-4 is
up to the challenge.112 112It helps that Agent-4 is better at AI R&D
than any human, while also thinking faster
• Testing & Monitoring (Agent-4): Agent-4’s scheming does get caught… and being many thousands of copies strong.
sort of. There’s no smoking gun. Also, in various ways the problem is inherently
easier for Agent-4. For example, Agent-4 can
• Agent-3 is in charge of testing Agent-4’s alignment. Its job is to verify the explore AI designs that involve literal copies
alignment via honeypots and model organisms. However, it’s not thinking of Agent-4 being embedded within them,
whereas humans can’t do that as easily because
very carefully about how to give the humans an accurate impression of it would be so inefficient.
Agent-4’s alignment—it’s more myopic than that.
Appendix K: Alignment over time
• The model organisms and honeypots are not very representative of deploy-
ment—they are systematically shorter and less complicated than Agent-4’s
day-to-day, so Agent-4 recognizes when it is being tested and acts aligned.
• However, Agent-4 is not a perfectly robust schemer. Various techniques
like those in POSER find that adding some “noise” to Agent-4 improves its
performance on some alignment tasks (but not on most other tasks).113 113Aware that this technique might be em-
ployed, Agent-4 also chose to sandbag on some
• Bottom-up interpretability doesn’t work. It’s too hard even for Agent-3. miscellaneous tasks that it guessed would be
However, high-level interpretability does sort of work: In particular, “defec- used as the control group in experiments like
this. It guessed wrong, however.
tion probes” and similar techniques result in a primitive sort of mind-reading
that sends up red flags when Agent-4 is thinking about spicy ideas like
deceiving OpenBrain, taking over the world, etc. Unfortunately the probes
also have false positives, and for all OpenBrain knows, all the red flags are
false positives.
• Model organisms work: OpenBrain’s alignment team produces reasonably
clear evidence that if the models were adversarially misaligned, they’d persist
in being that way through training and we wouldn’t be able to tell. This does
not establish that the models actually are misaligned, however.
• The most concerned researchers at OpenBrain start noticing and assembling
this evidence. The evidence is a combination of red-teaming, defection
probes, and model organisms. The argument is something like this: “model
organisms show that our current methods wouldn’t be able to reliably elim-
inate or detect scheming if it was happening; also, red-teaming is sending
up some red flags that maybe scheming is happening; also, defection probes
are firing a bunch in practice.” The counterargument: “defection probes are
prone to false positives and so is red-teaming. Also, even if Agent-4 were
misaligned, it’s probably controlled; surely it can’t do much harm while being
monitored by Agent-3 on our secure datacenters.”
consider the suboptimal human eye. Consider also that human brain size is
heavily constrained by the size of the birth canal. Mechanistic interpretability
would enable understanding-based optimization that avoids local basins.
Second, we know that LLMs by default often learn complicated domain-
specific heuristics instead of general purpose algorithms. Consider how much
worse LLMs are than calculators at doing arithmetic. LLMs can use calculators
as tools, but this incurs latency costs. With really good interpretability, the
models can be designed with essentially perfect calculators “baked into their
brains.” And simple arithmetic operations are likely not unique here: more
complicated cognitive operations, if understood, likely have more and less
generalizable forms, and the more general forms could be found through a
process like this.
Appendix O: Superpersuasion
We’re intentionally not claiming that superintelligent AI can convince any
human of anything. Such claims naturally engender skepticism: “What, I’m
going to be sitting at a computer terminal, and it’s going to type some text
which hacks my brain and convinces me to betray my country?”
Well, maybe!*
But we aren’t sure even a superintelligence could achieve that against a typical
individual. So we don’t depict it happening in our story.
Instead, we depict much more mundane kinds of persuasion: Agent-5 does
ordinary corporate politics and ordinary lobbying. It just does it very well, and
combines that skill with some inherent advantages (e.g. unlike other lobbyists,
it is superintelligent, so it can actually give better advice than anyone else, and
faster too. Plus it’s faster to react to new developments and better at predicting
how they’ll play out.) There’s a plausible story for why Agent-5 should be
trusted—a story that many important humans already believed—and Agent-5
can produce arguments and evidence in support of that story, and against
opposing stories, with superhuman speed. And there are other things for people
to worry about besides whether to trust Agent-5, such as what to do about the
opposition party, what about the voters, what about the jobs, what about China.
If anything, we think this story underestimates the power of superhuman
persuasive ability, because it doesn’t really make use of it: this story would make
sense as-written even if Agent-5 was only as good at politics, lobbying, strategy,
* The point of the AI Box experiment is to take people who think things like this—
people who think no superintelligent AI could ever convince them to let it out of a box,
for example—and put their conviction to the test. Losing the game means conceding that
they in fact would have let the AI out of the box; extracting this concession is essentially the
victory condition for the AI player. It’s amazing anyone ever loses, since all they have to do
to win is not concede. See also tuxedage’s blog recounting multiple games played without
Yudkowsky’s involvement, some lost by the AI player, some won.
Appendix O: Superpersuasion
* Quick napkin math: the Empire State Building has an area of 2.77M sq ft, and weighs
365k tons. Gigafactory Shanghai has an area of 4.5M sq ft and produces 750k vehicles per
year, mostly Model 3’s and Model Y’s, which weigh about two tons each. Presumably the
Empire State Building has a higher mass-to-square-foot ratio than the Shanghai Gigafactory
(since it is vertical rather than horizontal and needs stronger supports) so if anything this
underestimates. Thus it seems that a factory which probably weighs well less than a million
tons is producing 1.5 million tons of cars each year.
† We don’t think it would run out. Initially the robot economy would be dependent
on human mines for materials. But by the time it outgrows these sources, the millions of
superintelligences will have prospected new mines and developed new technologies to exploit
them. Imagine e.g. undersea mining robots that strip-mine the seabed for rare minerals, new
chemical processing pathways that more efficiently convert raw ore from above-ground
stripmines into useful raw materials…
Appendix Q: Robot economy doubling times
useful products, themselves fed into floating factories that produce macro-
structures like rockets and more floating factories.
Obviously, all of this is hard to predict. It’s like asking the inventors of the steam
engine to guess how long it takes for a modern car factory to produce its own
weight in cars, and also to guess how long it would take until such a factory
first exists. But economic growth rates have accelerated by multiple orders of
magnitude over the course of human history, and it seems plausible to us that
after superintelligence they will accelerate by orders of magnitude more. Our
story depicts economic growth accelerating by about 1.5 orders of magnitude
over the course of a few years.117 117If the current economy doubles every
twenty years, one order of magnitude faster
Appendix R: Power grabs would be a doubling in two years, two orders
of magnitude faster would be a doubling
Some people are fantasizing about taking over the world; some people fear that in 0.2 years, and so forth. The hypotheti-
others will do it. How could such a power grab happen? cal superintelligent-algae-economy described
above would be about four orders of magni-
The easiest to imagine is a military coup. With the 2027 pace of AI-fueled tech tude faster growth than the current human
progress, and with the ongoing geopolitical race, people expect the military to economy.
soon be deploying armies of AGI-controlled drones and robots. When such
an army becomes stronger than the human military, hard power will be deter-
mined by who controls the AGI army rather than what human soldiers are
willing to fight for. The AGI army will be designed and piloted by the AIs on
OpenBrain’s data centers, or those AIs’ direct successors. So if those AIs are all
secretly (or not-so-secretly) loyal to someone, then the armies of drones and
robots will be too.
Alternatively, someone who controlled superintelligence may be able to use
political means to seize power. To begin with, an executive could replace
subordinates in their organization with fully loyal AIs,118 giving them unprece- Or, if a human is needed for formal reasons,
118
dented institutional control.119 But going further: superintelligent AIs could they could pick the most loyal human they can
possibly find and instruct them to follow the
have much stronger political and strategic skills than any human alive. They AI’s instructions.
could give masterful advice on how to exploit an already powerful position
119For example, if a President did this, they
to gain even more power: forming the right alliances, crafting the perfect
wouldn’t just get a loyal cabinet—the entire
campaign strategy and material, cutting deals where they would have a bit of executive branch could be focused on further-
an edge each time. A person who controlled AI could make superintelligent ing the President’s political agenda.
advice inaccessible to political competitors. Or more deviously: AIs could offer
ostensibly helpful advice to competitors while secretly poisoning it to benefit
the person they’re truly loyal to.
What’s the end point of this political maneuvering? It could culminate in a
superficial democracy where the AIs either fake the elections or manipulate
public opinion so well that they don’t have to. Or it could be used to lay the
groundwork for an AI-enabled military coup, as previously mentioned.
After taking over, the new dictator(s) would have an iron grip on power.
Instead of having to rely on potentially treacherous humans, they could get a
fully loyal AI security service, as well as generally rely on loyal AIs to run the
Appendix R: Power grabs
country. Even loyalists who helped them get into power could be replaced by
AIs—only the dictator’s whims would matter.
So that’s how some people might be able to seize power. But all of this relied
on someone “controlling” the superintelligent AIs, even before they took over.
What would that look like?
One possibility is “secret loyalties,” as discussed above. One or a few people
(perhaps an AI company executive and security people) could arrange for the
AIs to be secretly loyal to themselves, and ask those AIs to construct next-
generation AI to be loyal in the same way. AIs could repeat this until secretly
loyal AIs were deployed everywhere and it was easy to seize power.
Alternatively, someone could use their formal position to blatantly put them-
selves on top of the AI’s chain of command. For example, the President could
argue that they should be able to command the AIs (perhaps specifically mili-
tary AIs, since the President is the commander-in-chief). If this is combined
with strong emphasis on order-following, a rushed deployment, and/or AIs
only being half-heartedly trained to follow the law—then the AIs may follow
orders unquestioningly in any situation where that isn’t flagrantly illegal. As
described above, this could be used for political subversion or a military coup
(where some excuse could be manufactured to make the coup less-than-
flagrantly illegal).
Importantly, this kind of “power through formal position” could be converted
into secret loyalties. For example, if the Spec says to follow orders from the
company CEO, then the CEO could order the AIs to make the next generation
of AIs whole-heartedly and secretly obedient to the CEO. This would probably
not even be illegal, so it could happen even if the first AIs were trained to follow
the law. This is analogous to how the executive of an institution can increase
their own power by reforming hiring processes to strongly select for loyalists,
but supercharged by how AIs could be more consistently and intensely loyal
than the most loyal humans.120 120It could also be far more secretive, since
this could all happen on a server where few or
But a power grab is far from inevitable. If AIs could be aligned to specific no humans are fully tracking everything that’s
people, then they could very likely also be aligned to follow the rule of going on.
law. Military AI systems could be extensively red-teamed not to help with
coups. Even during genuinely ambiguous constitutional crises, they could be
trained to obey their best interpretation of the law, or simply default to sitting
them out and leaving them to the human military. Automated AI researchers
could be trained to be generally helpful and obedient, but to not assist with
attempts to secretly change the goals of future AIs. Superintelligent political
and strategic advisors could also be used without enabling power grabs. Three
useful techniques for that would be: Having transparent principles for how AIs
choose what advice they give; sharing certain capabilities with broad groups of
people so that no one person or faction can access much better capabilities than
Appendix R: Power grabs
the installation of the HEMs, and allow regular inspectors to monitor if any
of the devices have been tampered with. As usual, both sides will assume the
other has some secret datacenters squirreled away somewhere in violation of
the treaty—but in this case that’s not so bad, because AI R&D is bottlenecked
on compute. The small secret datacenters won’t be able to compete with the
big monitored datacenters, even if the big datacenters are running safer and
less capable AIs initially.
• AI-powered lie detection. Today’s lie detectors don’t work very well. How-
ever, the problem may be solvable with the help of advanced AI, especially
when AIs are themselves doing the R&D at superhuman speed and quality.
This could be used to help with international trust—simply ask relevant
members of government if they are violating the treaty. But again, the main
bottleneck is political buy-in. Politicians don’t want widespread adoption
of lie detection, since this would harm their ability to lie. And previous lie
detectors had been complete security theater, so there’s an uphill battle to
fight.
English chains of thought (CoT) to think, and faithful CoT techniques have
been employed to eliminate euphemisms, steganography, and subtle biases.
Step 2: Try out different training environments for Safer-1, and carefully read
the CoT to determine the ways in which the goals and principles in the Spec
did or didn’t “stick.”
Step 3: Train and deploy Safer-2, an aligned and controlled autonomous re-
searcher based on the same architecture but with a better training environment
that incentivizes the right goals and principles this time.
Here is a brief incomplete list of techniques that might be incorporated into the
better training environment:
• Limit situational awareness during some portions of training, to make align-
ment-faking much less likely.
• Leverage debate and other scalable oversight schemes for more reliably
incentivizing truth.
• Relaxed adversarial training, red-teaming, and model organisms.
• Spend a higher fraction of the compute budget on alignment training (e.g.
the sorts of things described above), e.g. 40% instead of 1%.
Step 4: Design, train, and deploy Safer-3, a much smarter autonomous
researcher which uses a more advanced architecture similar to the old Agent-4.
It’s no longer transparent to human overseers, but it’s transparent to Safer-2. So
it should be possible to figure out how to make it both aligned and controlled.
Step 5: Repeat Step 4 ad infinitum, creating a chain of ever-more-powerful,
ever-more-aligned AIs that are overseen by the previous links in the chain (e.g.
the analogues of Agent-5 from the other scenario branch).
which produce even more advanced factories and laboratories, etc. until the
combined robot economy spread across all the SEZs is as large as the human
economy (and therefore needs to procure its own raw materials, energy, etc.)
By that point, the new factories will have produced huge quantities of robotic
mining equipment, solar panels, etc. in anticipation of needing to meet demand
far greater than the legacy human economy can provide.123 123Possibly also more advanced sources of en-
ergy, such as fusion power.
How fast would this new robot economy grow? Some reference points:
• The modern human economy doubles every twenty years or so. Countries
that have developed especially rapidly (e.g. China) sometimes manage to
double their economies in less than a decade.
• A modern car factory produces roughly its own weight in cars in less than
a year.* Perhaps a fully robotic economy run by superintelligences would be
able to reproduce itself in less than a year, so long as it didn’t start to run out
of raw materials.†
• Yet that seems like it could be a dramatic underestimate. Plants and insects
often have “doubling times” of far less than a year—sometimes just weeks!
Perhaps eventually the robots would be so sophisticated, so intricately man-
ufactured and well-designed, that the robot economy could double in a few
weeks (again assuming available raw materials).
• Yet even that could be an underestimate. Plants and insects are operating
under many constraints that superintelligent designers don’t have. For
example, they need to take the form of self-contained organisms that self-
replicate, instead of an economy of diverse and more specialized vehicles and
factories shipping materials and equipment back and forth. Besides, bacteria
and other tiny organisms reproduce in hours. It’s possible that, eventually,
the autonomous robot economy would look more like e.g. a new kind of
indigestible algae that spreads across the Earth’s oceans, doubling twice a
day so that it covers the entire ocean surface in two months, along with an
accompanying ecosystem of predator-species that convert algae into more
useful products, themselves fed into floating factories that produce macro-
structures like rockets and more floating factories.
* Quick napkin math: The Empire State Building has an area of 2.77m sq ft, and weighs
365k tons. Gigafactory Shanghai has an area of 4.5m sq ft and produces 750k vehicles per
year, mostly Model 3’s and Model Y’s, which weigh about two tons each. Presumably the
Empire State Building has a higher mass-to-square-foot ratio than the Shanghai Gigafactory
(since it is vertical rather than horizontal and needs stronger supports) so if anything this
underestimates. Thus it seems that a factory which probably weighs well less than a million
tons is producing 1.5 million tons of cars each year.
† We don’t think it would run out. Initially the robot economy would be dependent
on human mines for materials. But by the time it outgrows these sources, the millions of
superintelligences will have prospected new mines and developed new technologies to exploit
them. Imagine e.g. undersea mining robots that strip-mine the seabed for rare minerals, new
chemical processing pathways that more efficiently convert raw ore from above-ground
stripmines into useful raw materials…
Appendix U: Robot economy doubling times
Obviously, all of this is hard to predict. It’s like asking the inventors of the steam
engine to guess how long it takes for a modern car factory to produce its own
weight in cars, and also to guess how long it would take until such a factory
first exists. But economic growth rates have accelerated by multiple orders of
magnitude over the course of human history, and it seems plausible to us that
after superintelligence they will accelerate by orders of magnitude more. Our
story depicts economic growth accelerating by about 1.5 orders of magnitude
over the course of a few years.124 124If the current economy doubles every
twenty years, one order of magnitude faster
Appendix V: So who rules the future? would be a doubling in two years, two orders
of magnitude faster would be a doubling
Back in 2028, the Oversight Committee controlled the AIs. But they allowed in 0.2 years, and so forth. The hypotheti-
the 2028 election to be mostly fair, with AI used symmetrically. cal superintelligent-algae-economy described
above would be about four orders of magni-
This state of affairs—where the Oversight Committee has the hard power but tude faster growth than the current human
don’t interfere much with democratic politics — can’t last indefinitely. By economy.
default, people would eventually realize that control over AI gives the Over-
sight Committee vast power, and demand that this power should be returned
to democratic institutions. Sooner or later, the Oversight Committee would
either have to surrender its power—or actively use its control over AI to subvert
or end democracy, possibly after having purged some of its members in power
struggles.* If they choose the latter route, they would probably be able to lock-
in their power indefinitely.
Which one happens? Does the committee relinquish its monopoly on hard
power, or do they keep it? Both futures are plausible, so let’s explore each path.
How might the committee end up relinquishing their power?
• Some committee members may prefer a future where power is broadly
distributed, and they might be in a good position to push for their vision.
For example, if some committee members plot democratic subversion, pro-
democracy members could whistleblow to the press or Congress. If alerted,
Congress would likely demand that AIs should be controlled by a more
democratic institution, such as Congress itself.
• Congress couldn’t do much if it was opposed by all the AIs, deployed
throughout government, industry, and the military. But if the committee is
split, then the AIs won’t be used for just one side, and Congress could wield
real influence. Faced with an open conflict, more committee members might
* Why do we expect people to eventually understand how much power the Oversight
Committee has? One reason is that intelligence is now so cheap: By default, people should be
able to develop powerful AI to help them investigate and understand who runs their country.
The Committee could prevent this by restricting access to such AI, and only allow people to
access AIs that concealed the true extent of the Committee’s power. But if the committee
decides to spin an elaborate web of lies like this, and permanently restrict humanity’s access
to genuinely honest superintelligent AIs (as well as other tools for truth, such as human
intelligence amplification)—then we count that as subverting democracy.
Appendix V: So who rules the future?
favor giving up some of their power, unwilling to publicly defend the less
democratic side.
• As a result, control over AI could expand beyond the committee to Congress.
This would already be progress, because in a larger group, it’s more likely
that a meaningful number of people will care about outsiders and take their
interests into account. And once power expands to Congress, it could keep
expanding—potentially returning fully to the public.125 125E.g.: If some people in Congress want to
seize power, others may be able to delay until
But the Oversight Committee might also seize power for themselves: the next election, when the public is able to
opine.
• Some powerful people have no moral qualms about this sort of thing — and
they know it. Furthermore, some are ambitious and power-hungry, and
would be willing to pick a fight against democracy if they expected to end
up on top. If other members of the committee object, they could be purged,
overruled, or granted some smaller concessions.126 126Such as a slightly larger fraction of power,
which they can redistribute to larger groups
• Moreover, oftentimes powerful people have done illegal or unethical things if they see fit. Some concessions of this type
in their rise to power. They could fear that if power becomes more broadly could start to shade into outcomes that are
significantly democratic, even if some elites
distributed, their own position would unravel, as skeletons in closets are
wield much power than other people.
discovered by superintelligent sleuths asking the right questions.
• Also, through access to superintelligence, the Oversight Committee could
have the most convenient path to power in history. Safer-∞ might forecast
certain strategies as having extremely low risk of failing. And Safer-∞ might
also provide strategies that are convenient in other ways, such as being non-
violent (just as how Safer-∞ could orchestrate a bloodless coup in China),
or perhaps even superficially democratic, if Safer-∞ could manipulate public
opinion to always align with the Committee’s wishes.
Already Safer-4 would have been able to foresee these dynamics, so most likely,
the core power struggle would have played out back in 2028. By 2030, even if
it’s not apparent to outsiders, all members of the Oversight Committee likely
already know if they have a stable grasp on power or not.
something like this.127 We’d love to see them clarify what they are aiming for: 127In fact, arguably most of them are aiming
if they could sketch out a ten-page scenario, for example, either starting from for something that looks more like the “Race”
ending, except they think it’ll be fine because
the present or branching off from some part of ours. the AIs won’t be misaligned in the first place.
Based on personal conversations with people
working at frontier AI companies, it seems
that most of them don’t think they’ll need to
slow down at all.