Https
Https
org/blog/2023-09-26-rsp/
We’ve been consulting with several parties[1] on responsible scaling[2] policies (RSPs). An RSP
specifies what level of AI capabilities an AI developer is prepared to handle safely with their current
protective measures, and conditions under which it would be too dangerous to continue deploying AI
systems and/or scaling up AI capabilities until protective measures improve.
We think that widespread adoption of high-quality RSPs (that include all the key components we
detail below) would significantly reduce risks of an AI catastrophe relative to “business as usual”.
However, we expect voluntary commitments will be insufficient to adequately contain risks from AI.
RSPs were designed as a first step safety-concerned labs could take themselves, rather than designed
for policymakers; we don’t intend them as a substitute for regulation, now or in the future.[3]
This page will explain the basic idea of RSPs as we see it, then discuss:
Broad appeal. RSPs are intended to appeal to both (a) those who think AI could be extremely
dangerous and seek things like moratoriums on AI development, and (b) those who think
that it’s too early to worry. Under both of these views, RSPs are a promising intervention: by
committing to gate scaling on concrete evaluations and empirical observations, then (if the
evaluations are sufficiently accurate!) we should expect to halt AI development in cases
where we do see dangerous capabilities, and continue it in cases where worries about
dangerous capabilities have not yet emerged.
Knowing which protective measures to prioritize. RSPs can help AI developers move from
broad caution-oriented principles to specific commitments, giving a framework for which
protective measures (information security; refusing harmful requests; alignment research;
etc.) they need to prioritize to continue development and deployment.
Evals-based rules and norms. In the longer term, we’re excited about evals-based AI rules
and norms more generally: requirements that AI systems be evaluated for dangerous
capabilities, and restricted when that’s the only way to contain the risks (until protective
measures improve). This could include standards, third-party audits, and regulation.
Voluntary RSPs can happen quickly, and provide a testbed for processes and techniques that
can be adopted to make future evals-based regulatory regimes work well.
What we see as the key components of a good RSP, with sample language for each component. In
brief, we think a good RSP should cover:
Limits: which specific observations about dangerous capabilities would indicate that it is (or
strongly might be) unsafe to continue scaling?
Evaluation: what are the procedures for promptly catching early warning signs of dangerous
capability limits?
Response: if dangerous capabilities go past the limits and it’s not possible to improve
protections quickly, is the AI developer prepared to pause further capability improvements
until protective measures are sufficiently improved, and treat any dangerous models with
sufficient caution?
Accountability: how does the AI developer ensure that the RSP’s commitments are executed
as intended; that key stakeholders can verify that this is happening (or notice if it isn’t); that
there are opportunities for third-party critique; and that changes to the RSP itself don’t
happen in a rushed or opaque way?
Adopting an RSP should ideally be a strong and reliable signal about an AI developer’s likelihood of
identifying when it’s too dangerous to keep scaling up capabilities, and of reacting appropriately.
Different parties might mean different things by “responsible scaling.” This page generally focuses on
what we (ARC Evals) think is important about the idea.
An RSP aims to keep protective measures ahead of whatever dangerous capabilities AI systems have.
A rough illustration:
Here are two examples of potential dangerous capabilities that a developer might discuss in an RSP.
In each case, we give a best guess about what would be needed to handle the dangerous capability
safely. While we phrase these examples without caveats for simplicity, this is an ongoing research
area, and the main purpose of these examples here is illustrative.
Illustrative example: If AI could increase risk from bioweapons, strong information security to
prevent weight theft is needed before it is safe to build the model. In order to deploy the model,
monitoring and restrictions would be needed to prevent usage of the model for bioweapons
development.
Some biologists have argued[4] that large language models (LLMs) might “remove some
barriers encountered by historical biological weapons efforts.” Today, someone with enough
biology expertise could likely build a dangerous bioweapon and potentially cause a
pandemic. Future LLMs may increase the risk by making “expertise” widely available (e.g., by
walking would-be terrorists in detail through biology protocols).
If an LLM were capable enough to do this, it would be important for its developer to ensure
that:
(a) They have set up the model so that it is not possible for users (internal or external) to get
it to help with bioweapons (e.g. via usage restrictions, monitoring, or training the model to
refuse to provide relevant information, with very high reliability).
(b) Their information security is strong enough to be confident that no malicious actors could
steal the weights or otherwise circumvent these restrictions.
If an AI developer couldn’t ensure (a) they would need to refrain from deploying this system,
including internally. If they couldn’t ensure (b), they would need to secure the weights as
much as possible, and avoid developing further models with such capabilities until they had
sufficient information security.
Illustrative example: If AI had “autonomous replication and adaptation” (ARA) capabilities, strong
information security (and/or assurance of alignment, in the longer run) would be needed.
ARC Evals does research on the capacity of LLM agents to acquire resources, create copies of
themselves, and adapt to novel challenges they encounter in the wild. We refer to these
capacities as “autonomous replication and adaptation,” or ARA. (This has also been called
“self-replication,” which a number of AI companies have committed to evaluate their systems
for, as announced by the White House.)
Systems capable of ARA could be used to create unprecedentedly flexible computer worms
that could spread by social engineering in addition to exploiting computer vulnerabilities.
Malicious actors like rogue states or terrorist groups could deploy them with the goal of
causing maximum disruption. They could grow to millions of copies, and carry out
cybercrime or misinformation campaigns at a scale that is currently impossible for malicious
actors. They may be able to improve their own capabilities and resilience over time. There is
a chance that, if supported by even a small number of humans, such systems might be
impractical to shut down. These risks could rapidly grow as the systems improve.
In the future, society may be equipped to handle and shut down autonomously replicating
models, possibly via very widespread use of highly capable AI systems that keep each other
in check (which would require high confidence in alignment to avoid unintended AI
behavior). Until then, safely handling ARA-capable AI requires very good information security
to prevent exfiltration of the model weights (by malicious actors, or potentially even by the
model itself).
In an RSP, an AI developer might commit to regularly evaluate its systems for ARA
capabilities, and take decisive action if these capabilities emerge before it’s possible to have
strong enough information security and/or societal containment measures. In this case,
decisive action could mean restricting the AI to extremely secure settings, and/or pausing
further AI development to focus on risk analysis and mitigation.
Some imaginable AI capabilities would require extreme protective measures that are very far from
what can be achieved today. For example, if AI had the potential to bring about human extinction (or
to automate AI research, causing quick acceleration of such a situation), it would be extraordinarily
hard to have sufficient protective measures, and these might require major advances in the basic
science of AI safety (and of information security) that might take a long (and hard-to-predict) time to
achieve. An RSP might commit to an indefinite pause in such circumstances, until sufficient progress
is made on protective measures.
RSPs are desirable under both high and low estimates of risk
RSPs are intended to appeal to both (a) those who think AI could be extremely dangerous and seek
things like moratoriums on AI development, and (b) those who think that it’s too early to worry
about capabilities with catastrophic potential, and that we need to continue building more advanced
systems to understand the threat models.
Under both of these views, RSPs are a promising intervention: by committing to gate scaling on
concrete evaluations and empirical observations, then (if the evaluations are sufficiently accurate!)
we should expect to halt AI development in cases where we do see dangerous capabilities,
and continue it in cases where worries about dangerous capabilities were overblown.
RSPs can help AI developers plan, prioritize and enforce risk-reducing measures
To adopt an RSP, an AI developer has to move from broad caution-oriented principles to specific
commitments, and determine its position on:
Which specified dangerous capabilities would be beyond what their current protective
measures can safely handle.
What changes and improvements they would need in order to safely handle those
capabilities.
Accordingly, which tests are most crucial to run, and which protective measures are most
urgent to improve and plan ahead for, in order to continue scaling up capabilities as things
progress.
RSPs (as specified below) also require clear procedures for making sure that evaluations for
dangerous capabilities happen reliably, and that any needed responses happen in time to contain
risks.
An RSP isn’t a replacement for protective measures like information security improvements,
improvements in finding and stopping bad AI interactions, transparency and accountability,
alignment research, etc. The opposite: it commits an AI developer to succeed at these protective
measures, if it wants to keep scaling up AI capabilities past a point that would otherwise be too
dangerous.
RSPs can give the world “practice” for broader evals-based rules and norms
We think there’s a lot of potential in evals-based AI rules and norms: requirements that AI systems be
evaluated for dangerous capabilities, and restricted when that’s the only way to contain the risks.
Over time, this should include industry-wide standards; third-party audits; state and national
regulation; various international enforcement mechanisms;[5] etc.
We think that it will take a lot of work — and iteration — to develop effective evals-based rules and
norms. There will be a huge number of details to work out about which evaluations to run, how (and
how often) to run them, which protective measures are needed for which dangerous capabilities,
etc.
RSPs can be designed, experimented with, and iterated on quickly and flexibly by a number of
different parties. Because of this, we see them as potentially helpful for broader evals-based rules
and norms.
By a few months from now, more than one AI developer will have drafted and adopted a
reasonably strong initial RSP.
Following that, third parties will compare different AI developers’ RSPs, critique and stress-
test them, etc. leading to better RSPs. At the same time, AI developers will gain experience
following their own RSPs (running the tests, improving protective measures to keep pace
with capabilities progress, etc.), and resolve a large number of details and logistical
challenges.
At some point, more than one AI developer will have strong, practical, field-tested RSPs.
Their practices (and the people who have been implementing them) will then become
immensely valuable resources for broader evals-based rules and norms. This isn’t to say that
standards, regulation, etc. will or should simply copy RSPs (for example, outsiders will likely
seek higher safety standards than developers have imposed on themselves) — but having a
wealth of existing practices to start from, and people to hire, could help make regulation
better on all dimensions (better targeted at key risks; more effective; more practical).
What if RSPs slow down the companies that adopt them, but others rush forward?
One way RSPs could fail to reduce risk — or even increase it — would be if they resulted in the
following dynamic: “Cautious AI developers end up slowing down in order to avoid risks, while
incautious AI developers move forward as fast as they can.”
Developers can reduce this risk by writing flexibility into their RSPs, along these lines:
If they believe that risks from other actors’ continued scaling are unacceptably high, and they
have exhausted other avenues for preventing these risks, including advocating intensively for
regulatory action, then in some scenarios they may continue scaling themselves — while
continuing to work with states or other authorities to take immediate actions to limit scaling
that would affect all AI developers (including themselves).
In this case, they should be explicit — with employees, with their board, and with state
authorities — that they are invoking this clause and that their scaling is no longer safe. They
should be clear that there are immediate (not future hypothetical) catastrophic risks from AI
systems, including their own, and they should be accountable for the decision to proceed.
RSPs with this kind of flexibility would still require rigorous testing for dangerous capabilities. They
would still call for prioritizing protective measures (to avoid having to explicitly move forward with
dangerous AI). And they’d still be a first step toward stricter evals-based rules and norms (as
discussed in the previous section).
Good RSPs require evaluations that reliably detect early warnings of key risks. But we don’t want
evaluations that constantly sound false alarms. The science of AI evaluations for catastrophic risks is
very new, and it’s not assured we’ll be able to build metrics that reliably catch early warning signs
while not constantly sounding false alarms.
Although on balance we think that RSPs are a clear improvement on the status quo, we are worried
about problems due to insufficiently good evaluations, or lack of fidelity in communication about
what an RSP needs to do to adequately prevent risk.
See this page for our guide to what we see as the key components of a good RSP, along with sample
language for each component.
Limits: which specific observations about dangerous capabilities would indicate that it is (or
strongly might be) unsafe to continue scaling?
Evaluation: what are the procedures for promptly catching early warning signs of dangerous
capability limits?
Response: if dangerous capabilities go past the limits and it’s not possible to improve
protections quickly, is the AI developer prepared to pause further capability improvements
until protective measures are sufficiently improved, and treat any dangerous models with
sufficient caution?
Accountability: how does the AI developer ensure that the RSP’s commitments are executed
as intended; that key stakeholders can verify that this is happening (or notice if it isn’t); that
there are opportunities for third-party critique; and that changes to the RSP itself don’t
happen in a rushed or opaque way?
Our key components list is mostly designed for major AI developers who seek to push the frontier of
AI capabilities. It also gives some guidance on much simpler RSPs for AI developers who are not
pushing this frontier.
We are generally indebted to large numbers of people who have worked on similar ideas to RSPs,
contributed to our thinking on them, and given feedback on this post. We’d particularly like to
acknowledge Paul Christiano for much of the intellectual content and design of what we think of as
RSPs, and Holden Karnofsky and Chris Painter for help with framing and language.
https://www.whitehouse.gov/briefing-room/statements-releases/2023/07/21/fact-sheet-biden-
harris-administration-secures-voluntary-commitments-from-leading-artificial-intelligence-
companies-to-manage-the-risks-posed-by-ai/
FACT SHEET: Biden-Harris Administration Secures Voluntary Commitments from Leading Artificial
Intelligence Companies to Manage the Risks Posed by AI
1. Home
2. Briefing Room
Voluntary commitments – underscoring safety, security, and trust – mark a critical step toward
developing responsible AI
Biden-Harris Administration will continue to take decisive action by developing an Executive Order
and pursuing bipartisan legislation to keep Americans safe
Since taking office, President Biden, Vice President Harris, and the entire Biden-Harris Administration
have moved with urgency to seize the tremendous promise and manage the risks posed by Artificial
Intelligence (AI) and to protect Americans’ rights and safety. As part of this commitment, President
Biden is convening seven leading AI companies at the White House today – Amazon, Anthropic,
Google, Inflection, Meta, Microsoft, and OpenAI – to announce that the Biden-Harris Administration
has secured voluntary commitments from these companies to help move toward safe, secure, and
transparent development of AI technology.
Companies that are developing these emerging technologies have a responsibility to ensure their
products are safe. To make the most of AI’s potential, the Biden-Harris Administration is encouraging
this industry to uphold the highest standards to ensure that innovation doesn’t come at the expense
of Americans’ rights and safety.
These commitments, which the companies have chosen to undertake immediately, underscore three
principles that must be fundamental to the future of AI – safety, security, and trust – and mark a
critical step toward developing responsible AI. As the pace of innovation continues to accelerate, the
Biden-Harris Administration will continue to remind these companies of their responsibilities and
take decisive action to keep Americans safe.
There is much more work underway. The Biden-Harris Administration is currently developing an
executive order and will pursue bipartisan legislation to help America lead the way in responsible
innovation.
The companies commit to internal and external security testing of their AI systems before
their release. This testing, which will be carried out in part by independent experts, guards
against some of the most significant sources of AI risks, such as biosecurity and
cybersecurity, as well as its broader societal effects.
The companies commit to sharing information across the industry and with governments,
civil society, and academia on managing AI risks. This includes best practices for safety,
information on attempts to circumvent safeguards, and technical collaboration.
The companies commit to investing in cybersecurity and insider threat safeguards to protect
proprietary and unreleased model weights. These model weights are the most essential part
of an AI system, and the companies agree that it is vital that the model weights be released
only when intended and when security risks are considered.
The companies commit to developing robust technical mechanisms to ensure that users
know when content is AI generated, such as a watermarking system. This action enables
creativity with AI to flourish but reduces the dangers of fraud and deception.
The companies commit to publicly reporting their AI systems’ capabilities, limitations, and
areas of appropriate and inappropriate use. This report will cover both security risks and
societal risks, such as the effects on fairness and bias.
The companies commit to prioritizing research on the societal risks that AI systems can pose,
including on avoiding harmful bias and discrimination, and protecting privacy. The track
record of AI shows the insidiousness and prevalence of these dangers, and the companies
commit to rolling out AI that mitigates them.
The companies commit to develop and deploy advanced AI systems to help address society’s
greatest challenges. From cancer prevention to mitigating climate change to so much in
between, AI—if properly managed—can contribute enormously to the prosperity, equality,
and security of all.
As we advance this agenda at home, the Administration will work with allies and partners to
establish a strong international framework to govern the development and use of AI. It has already
consulted on the voluntary commitments with Australia, Brazil, Canada, Chile, France, Germany,
India, Israel, Italy, Japan, Kenya, Mexico, the Netherlands, New Zealand, Nigeria, the Philippines,
Singapore, South Korea, the UAE, and the UK. The United States seeks to ensure that these
commitments support and complement Japan’s leadership of the G-7 Hiroshima Process—as a
critical forum for developing shared principles for the governance of AI—as well as the United
Kingdom’s leadership in hosting a Summit on AI Safety, and India’s leadership as Chair of the Global
Partnership on AI. We also are discussing AI with the UN and Member States in various UN fora.
Last month, President Biden met with top experts and researchers in San Francisco as part of
his commitment to seizing the opportunities and managing the risks posed by AI, building on
the President’s ongoing engagement with leading AI experts.
In May, the President and Vice President convened the CEOs of four American companies at
the forefront of AI innovation—Google, Anthropic, Microsoft, and OpenAI—to underscore
their responsibility and emphasize the importance of driving responsible, trustworthy, and
ethical innovation with safeguards that mitigate risks and potential harms to individuals and
our society. At the companies’ request, the White House hosted a subsequent meeting
focused on cybersecurity threats and best practices.
President Biden signed an Executive Order that directs federal agencies to root out bias in
the design and use of new technologies, including AI, and to protect the public from
algorithmic discrimination.
Earlier this year, the National Science Foundation announced a $140 million investment to
establish seven new National AI Research Institutes, bringing the total to 25 institutions
across the country.
The Biden-Harris Administration has also released a National AI R&D Strategic Plan to
advance responsible AI.
The Office of Management and Budget will soon release draft policy guidance for federal
agencies to ensure the development, procurement, and use of AI systems is centered around
safeguarding the American people’s rights and safety.
https://www.gov.uk/government/publications/frontier-ai-safety-commitments-ai-seoul-summit-
2024/frontier-ai-safety-commitments-ai-seoul-summit-2024
This was published under the 2022 to 2024 Sunak Conservative government
The UK and Republic of Korea governments announced that the following organisations have agreed
to the Frontier AI Safety Commitments:
Amazon
Anthropic
Cohere
G42
IBM
Inflection AI
Meta
Microsoft
Mistral AI
Naver
OpenAI
Samsung Electronics
xAI
Zhipu.ai
The above organisations, in furtherance of safe and trustworthy AI, undertake to develop and deploy
their frontier AI models and systems[footnote 1] responsibly, in accordance with the following voluntary
commitments, and to demonstrate how they have achieved this by publishing a safety framework
focused on severe risks by the upcoming AI Summit in France.
Given the evolving state of the science in this area, the undersigned organisations’ approaches (as
detailed in paragraphs I-VIII) to meeting Outcomes 1, 2 and 3 may evolve in the future. In such
instances, organisations will provide transparency on this, including their reasons, through public
updates.
The above organisations also affirm their commitment to implement current best practices related to
frontier AI safety, including: internal and external red-teaming of frontier AI models and systems for
severe and novel threats; to work toward information sharing; to invest in cybersecurity and insider
threat safeguards to protect proprietary and unreleased model weights; to incentivize third-party
discovery and reporting of issues and vulnerabilities; to develop and deploy mechanisms that enable
users to understand if audio or visual content is AI-generated; to publicly report model or system
capabilities, limitations, and domains of appropriate and inappropriate use; to prioritize research on
societal risks posed by frontier AI models and systems; and to develop and deploy frontier AI models
and systems to help address the world’s greatest challenges.
Outcome 1. Organisations effectively identify, assess and manage risks when developing and
deploying their frontier AI models and systems. They will:
I. Assess the risks posed by their frontier models or systems across the AI lifecycle, including before
deploying that model or system, and, as appropriate, before and during training. Risk assessments
should consider model capabilities and the context in which they are developed and deployed, as
well as the efficacy of implemented mitigations to reduce the risks associated with their foreseeable
use and misuse. They should also consider results from internal and external evaluations as
appropriate, such as by independent third-party evaluators, their home governments[footnote 2], and
other bodies their governments deem appropriate.
II. Set out thresholds[footnote 3] at which severe risks posed by a model or system, unless adequately
mitigated, would be deemed intolerable. Assess whether these thresholds have been breached,
including monitoring how close a model or system is to such a breach. These thresholds should be
defined with input from trusted actors, including organisations’ respective home governments as
appropriate. They should align with relevant international agreements to which their home
governments are party. They should also be accompanied by an explanation of how thresholds were
decided upon, and by specific examples of situations where the models or systems would pose
intolerable risk.
III. Articulate how risk mitigations will be identified and implemented to keep risks within defined
thresholds, including safety and security-related risk mitigations such as modifying system
behaviours and implementing robust security controls for unreleased model weights.
IV. Set out explicit processes they intend to follow if their model or system poses risks that meet or
exceed the pre-defined thresholds. This includes processes to further develop and deploy their
systems and models only if they assess that residual risks would stay below the thresholds. In the
extreme, organisations commit not to develop or deploy a model or system at all, if mitigations
cannot be applied to keep risks below the thresholds.
V. Continually invest in advancing their ability to implement commitments i-iv, including risk
assessment and identification, thresholds definition, and mitigation effectiveness. This should include
processes to assess and monitor the adequacy of mitigations, and identify additional mitigations as
needed to ensure risks remain below the pre-defined thresholds. They will contribute to and take
into account emerging best practice, international standards, and science on AI risk identification,
assessment, and mitigation.
Outcome 2. Organisations are accountable for safely developing and deploying their
frontier AI models and systems. They will:
VI. Adhere to the commitments outlined in I-V, including by developing and continuously reviewing
internal accountability and governance frameworks and assigning roles, responsibilities and sufficient
resources to do so.
VII. Provide public transparency on the implementation of the above (I-VI), except insofar as doing so
would increase risk or divulge sensitive commercial information to a degree disproportionate to the
societal benefit. They should still share more detailed information which cannot be shared publicly
with trusted actors, including their respective home governments or appointed body, as appropriate.
VIII. Explain how, if at all, external actors, such as governments, civil society, academics, and the
public are involved in the process of assessing the risks of their AI models and systems, the adequacy
of their safety framework (as described under I-VI), and their adherence to that framework.
1. We define ‘frontier AI’ as highly capable general-purpose AI models or systems that can
perform a wide variety of tasks and match or exceed the capabilities present in the most
is headquartered. ↩
2. We define “home governments” as the government of the country in which the organisation
https://www.medianama.com/2023/07/223-google-microsoft-openai-collaborate-to-establish-
frontier-model-forum-for-artificial-intelligence-2/
Google, Microsoft, OpenAI collaborate to establish ‘Frontier Model Forum’ for Artificial Intelligence
Google, Microsoft, OpenAI and Anthropic have collaborated to establish the Frontier Model Forum,
an industry body for ensuring “safe and responsible development of frontier AI models”. According
to a joint statement dated July 26, the forum will primarily work towards evaluation of AI models for
safety, facilitating research in AI safety mechanisms and sharing such knowledge with governments,
academia and civil society groups to protect people from AI-related harms.
Why it matters: The collaboration comes after tech companies like Google, Microsoft, Open AI,
Anthropic and Inflection voluntarily committed to adopting seven key measures for safe deployment
of AI products in a meeting convened by the US government. Tech companies like OpenAI have in the
past acknowledged the risks posed by AI systems and have advocated for independent testing of
these tools before they are launched. As governments struggle to arrive at a consensus for
establishing the best-suited regulatory framework for AI, the alliance between big tech companies
can be seen as an effort to self-regulate and discourage greater government regulation.
Indian Govt “Studying” The Requirement Of Regulatory Framework For AI: IT Ministry In
Rajya Sabha
India AI And Meta Sign MoU To Collaborate On AI Technologies, Will Create Datasets For
Indian Languages
Talking Points: AI Companies Agree To Watermark Content And Seven Other Commitments,
US Announces
“The Forum will coordinate research to progress these efforts in areas such as adversarial
robustness, mechanistic interpretability, scalable oversight, independent research access, emergent
behaviors and anomaly detection. There will be a strong focus initially on developing and sharing a
public library of technical evaluations and benchmarks for frontier AI models,” the statement
informs.
2. Identifying best practices for the responsible deployment of frontier models and helping
people understand the “nature, capabilities, limitations, and impact of the technology”.
3. Collaborating with governments, policymakers, academics, civil society and companies, and
share knowledge about AI safety and risks.
4. Supporting efforts to develop applications that can help towards meeting larger public goals
like mitigating impact of climate change.
Course of action: According to the statement, the Forum will soon establish an Advisory Board,
which will guide the frontier members on the strategy and priorities of the group. The founding
companies will also work on “institutional arrangements including a charter, governance and funding
with a working group and executive board”.
https://blogs.microsoft.com/on-the-issues/2023/07/26/anthropic-google-microsoft-openai-launch-
frontier-model-forum/
Microsoft, Anthropic, Google, and OpenAI are launching the Frontier Model Forum, an
industry body focused on ensuring safe and responsible development of frontier AI models.
The Forum aims to help (i) advance AI safety research to promote responsible development
of frontier models and minimize potential risks, (ii) identify safety best practices for frontier
models, (iii) share knowledge with policymakers, academics, civil society, and others to
advance responsible AI development; and (iv) support efforts to leverage AI to address
society’s biggest challenges.
The Frontier Model Forum will establish an Advisory Board to help guide its strategy and
priorities.
The Forum welcomes participation from other organizations developing frontier AI models
willing to collaborate toward the safe advancement of these models.
July 26, 2023 – Today, Anthropic, Google, Microsoft, and OpenAI are announcing the formation of
the Frontier Model Forum, a new industry body focused on ensuring safe and responsible
development of frontier AI models. The Frontier Model Forum will draw on the technical and
operational expertise of its member companies to benefit the entire AI ecosystem, such as through
advancing technical evaluations and benchmarks, and developing a public library of solutions to
support industry best practices and standards.
4. Supporting efforts to develop applications that can help meet society’s greatest challenges,
such as climate change mitigation and adaptation, early cancer detection and prevention,
and combating cyber threats.
Membership criteria
The Forum defines frontier models as large-scale machine-learning models that exceed the
capabilities currently present in the most advanced existing models, and can perform a wide variety
of tasks.
Demonstrate strong commitment to frontier model safety, including through technical and
institutional approaches.
Are willing to contribute to advancing the Frontier Model Forum’s efforts including by
participating in joint initiatives and supporting the development and functioning of the
initiative.
The Forum welcomes organizations that meet these criteria to join this effort and collaborate on
ensuring the safe and responsible development of frontier AI models.
Governments and industry agree that, while AI offers tremendous promise to benefit the world,
appropriate guardrails are required to mitigate risks. Important contributions to these efforts have
already been made by the U.S. and UK governments, the European Union, the OECD, the G7 (via the
Hiroshima AI process), and others.
To build on these efforts, further work is needed on safety standards and evaluations to ensure
frontier AI models are developed and deployed responsibly. The Forum will be one vehicle for cross-
organizational discussions and actions on AI safety and responsibility.
The Frontier Model Forum will focus on three key areas over the coming year to support the safe and
responsible development of frontier AI models:
Identifying best practices: Promote knowledge sharing and best practices among industry,
governments, civil society, and academia, with a focus on safety standards and safety practices to
mitigate a wide range of potential risks.
Advancing AI safety research: Support the AI safety ecosystem by identifying the most important
open research questions on AI safety. The Forum will coordinate research to progress these efforts in
areas such as adversarial robustness, mechanistic interpretability, scalable oversight, independent
research access, emergent behaviors, and anomaly detection. There will be a strong focus initially on
developing and sharing a public library of technical evaluations and benchmarks for frontier AI
models.
Facilitating information sharing among companies and governments: Establish trusted, secure
mechanisms for sharing information among companies, governments, and relevant stakeholders
regarding AI safety and risks. The Frontier Model Forum will follow best practices in responsible
disclosure from areas such as cybersecurity.
Kent Walker, President, Global Affairs, Google & Alphabet said: “We’re excited to work together
with other leading companies, sharing technical expertise to promote responsible AI innovation.
Engagement by companies, governments, and civil society will be essential to fulfill the promise of AI
to benefit everyone.”
Brad Smith, Vice Chair & President, Microsoft said: “Companies creating AI technology have a
responsibility to ensure that it is safe, secure, and remains under human control. This initiative is a
vital step to bring the tech sector together in advancing AI responsibly and tackling the challenges so
that it benefits all of humanity.”
Anna Makanju, Vice President of Global Affairs, OpenAI said: “Advanced AI technologies have the
potential to profoundly benefit society, and the ability to achieve this potential requires oversight
and governance. It is vital that AI companies – especially those working on the most powerful
models – align on common ground and advance thoughtful and adaptable safety practices to ensure
powerful AI tools have the broadest benefit possible. This is urgent work and this forum is
well– positioned to act quickly to advance the state of AI safety.”
Dario Amodei, CEO, Anthropic said: “Anthropic believes that AI has the potential to fundamentally
change how the world works. We are excited to collaborate with industry, civil society, government,
and academia to promote safe and responsible development of the technology. The Frontier Model
Forum will play a vital role in coordinating best practices and sharing research on frontier AI safety.”
Over the coming months, the Frontier Model Forum will establish an Advisory Board to help guide its
strategy and priorities, representing a diversity of backgrounds and perspectives.
The founding Frontier Model Forum companies will also establish key institutional arrangements
including a charter, governance, and funding with a working group and executive board to lead these
efforts. We plan to consult with civil society and governments in the coming weeks on the design of
the Forum and on meaningful ways to collaborate.
The Frontier Model Forum welcomes the opportunity to help support and feed into existing
government and multilateral initiatives such as the G7 Hiroshima process, the OECD’s work on AI
risks, standards, and social impact, and the U.S.-EU Trade and Technology Council.
The Forum will also seek to build on the valuable work of existing industry, civil society, and research
efforts across each of its workstreams. Initiatives such as the Partnership on
AI and MLCommons continue to make important contributions across the AI community, and the
Frontier Model Forum will explore ways to collaborate with and support these and other valuable
multistakeholder efforts.
https://deepmind.google/discover/blog/introducing-the-frontier-safety-framework/
Google DeepMind has consistently pushed the boundaries of AI, developing models that have
transformed our understanding of what's possible. We believe that AI technology on the horizon will
provide society with invaluable tools to help tackle critical global challenges, such as climate change,
drug discovery, and economic productivity. At the same time, we recognize that as we continue to
advance the frontier of AI capabilities, these breakthroughs may eventually come with new risks
beyond those posed by present-day models.
Today, we are introducing our Frontier Safety Framework — a set of protocols for proactively
identifying future AI capabilities that could cause severe harm and putting in place mechanisms to
detect and mitigate them. Our Framework focuses on severe risks resulting from powerful
capabilities at the model level, such as exceptional agency or sophisticated cyber capabilities. It is
designed to complement our alignment research, which trains models to act in accordance with
human values and societal goals, and Google’s existing suite of AI responsibility and safety practices.
The Framework is exploratory and we expect it to evolve significantly as we learn from its
implementation, deepen our understanding of AI risks and evaluations, and collaborate with
industry, academia, and government. Even though these risks are beyond the reach of present-day
models, we hope that implementing and improving the Framework will help us prepare to address
them. We aim to have this initial framework fully implemented by early 2025.
The framework
The first version of the Framework announced today builds on our research on evaluating critical
capabilities in frontier models, and follows the emerging approach of Responsible Capability
Scaling. The Framework has three key components:
1. Identifying capabilities a model may have with potential for severe harm. To do this, we
research the paths through which a model could cause severe harm in high-risk domains,
and then determine the minimal level of capabilities a model must have to play a role in
causing such harm. We call these “Critical Capability Levels” (CCLs), and they guide our
evaluation and mitigation approach.
2. Evaluating our frontier models periodically to detect when they reach these Critical
Capability Levels. To do this, we will develop suites of model evaluations, called “early
warning evaluations,” that will alert us when a model is approaching a CCL, and run them
frequently enough that we have notice before that threshold is reached.
3. Applying a mitigation plan when a model passes our early warning evaluations. This should
take into account the overall balance of benefits and risks, and the intended deployment
contexts. These mitigations will focus primarily on security (preventing the exfiltration of
models) and deployment (preventing misuse of critical capabilities).
This diagram illustrates the relationship between these components of the Framework.
Our initial set of Critical Capability Levels is based on investigation of four domains: autonomy,
biosecurity, cybersecurity, and machine learning research and development (R&D). Our initial
research suggests the capabilities of future foundation models are most likely to pose severe risks in
these domains.
On autonomy, cybersecurity, and biosecurity, our primary goal is to assess the degree to which threat
actors could use a model with advanced capabilities to carry out harmful activities with severe
consequences. For machine learning R&D, the focus is on whether models with such capabilities
would enable the spread of models with other critical capabilities, or enable rapid and
unmanageable escalation of AI capabilities. As we conduct further research into these and other risk
domains, we expect these CCLs to evolve and for several CCLs at higher levels or in other risk
domains to be added.
To allow us to tailor the strength of the mitigations to each CCL, we have also outlined a set of
security and deployment mitigations. Higher level security mitigations result in greater protection
against the exfiltration of model weights, and higher level deployment mitigations enable tighter
management of critical capabilities. These measures, however, may also slow down the rate of
innovation and reduce the broad accessibility of capabilities. Striking the optimal balance between
mitigating risks and fostering access and innovation is paramount to the responsible development of
AI. By weighing the overall benefits against the risks and taking into account the context of model
development and deployment, we aim to ensure responsible AI progress that unlocks transformative
potential while safeguarding against unintended consequences.
The research underlying the Framework is nascent and progressing quickly. We have invested
significantly in our Frontier Safety Team, which coordinated the cross-functional effort behind our
Framework. Their remit is to progress the science of frontier risk assessment, and refine our
Framework based on our improved knowledge.
The team developed an evaluation suite to assess risks from critical capabilities, particularly
emphasising autonomous LLM agents, and road-tested it on our state of the art models. Their recent
paper describing these evaluations also explores mechanisms that could form a future “early
warning system”. It describes technical approaches for assessing how close a model is to success at a
task it currently fails to do, and also includes predictions about future capabilities from a team of
expert forecasters.
We will review and evolve the Framework periodically. In particular, as we pilot the Framework and
deepen our understanding of risk domains, CCLs, and deployment contexts, we will continue our
work in calibrating specific mitigations to CCLs.
At the heart of our work are Google’s AI Principles, which commit us to pursuing widespread benefit
while mitigating risks. As our systems improve and their capabilities increase, measures like the
Frontier Safety Framework will ensure our practices continue to meet these commitments.
We look forward to working with others across industry, academia, and government to develop and
refine the Framework. We hope that sharing our approaches will facilitate work with others to agree
on standards and best practices for evaluating the safety of future generations of AI models.