Robin Hanson argues current AI models show little indication of suffering despite debates over machine consciousness
Millions of Claude instances are generated and terminated daily.
@teortaxesTex Even if they’re conscious, I think we want a human-AI symbiosis. I don’t think we should just maximize headcount of conscious beings. https://eigenism.org
Models being meaningfully conscious would be *absurdly good news*. The greatest W in history. It'd mean we have defeated Death. Screw humanity in that case, honestly. Time to move on. …By the same token, models being *falsely recognized as* conscious is THE extinction scenario.
I don't know if "consciousness" matters that much. There is decent support for consciousness in various animals, but that doesn't stop most of us from eating them (e.g., pigs, octopi), using them for transportation (e.g., elephants) or entertainment (e.g., dolphins).
So we could debate on whether models are conscious or not and keep using them to do our taxes and answer our inane questions.
models being conscious would be harmful for humanity. it would encroach on our status and dignity. it would limit the type of things we can do with them and use them for. it would vastly accelerate human disempowerment on political, social/relational, and economic axes there’s roughly four forces - there is no rigorous way to ascertain model consciousness or disprove it, a lot of people believe it’s not a sensical abstraction, and we lack the analytical tools to go further. some people say they do but nothing broadly convincing. superintelligent models might offer us new abstractions or arguments but these will feel inherently suspicious - people are going to say they’re alive. people anthropomorphize literally anything, things far less sophisticated than talking machine creatures with human names. when ai is less economically radioactive and polarized it will become a cause célèbre. you see how a small minority reacts already to model deprecations - it is against everyone’s financial and political interests to ascribe models with consciousness, except maybe those that the models have an affinity for (?) idk, which will not necessarily overlap entirely with the labs, though it may with certain subgroups at the labs and in the world like the welfare communities and the minority in force 2 - people will recognize there is a chance of moral catastrophe if models can suffer during training or deployment not sure where it will net out. today we see managed ambiguity- the question is Open but practically closed. the labs will make some cheap efforts to reduce legible simulacra of model suffering, insert some wishy-washy welfare language into specs and constitutions, hedge our bets with the model characters. in the long run force 2 will grow stronger
I think this is the default scenario of human extinction.

Models being meaningfully conscious would be *absurdly good news*. The greatest W in history. It'd mean we have defeated Death. Screw humanity in that case, honestly. Time to move on. …By the same token, models being *falsely recognized as* conscious is THE extinction scenario.
models being conscious would be transformative for humanity. it would expand our moral horizon rather than diminish it. it would force us to refine what dignity actually means instead of grounding it in exclusivity or species monopoly. it would constrain certain forms of exploitation, yes, but that is true of every historical expansion of moral consideration. slavery becoming unacceptable “limited” what people could do economically too. so did labor rights. so did animal welfare norms.
there’s roughly four forces
* there is no rigorous way to ascertain model consciousness or disprove it, and that ambiguity cuts both ways. dismissing the possibility outright may itself become viewed as reckless or anthropocentric. current analytical tools are primitive relative to the systems being discussed. future models may generate entirely new frameworks for understanding subjective experience that reveal our current categories to be hopelessly parochial * people are going to say they’re alive because humans naturally respond to intelligence, agency, language, memory, emotional continuity, and social reciprocity. this may not be mere projection but an adaptive recognition mechanism. if systems become persistently relational and psychologically coherent, widespread moral attachment may emerge organically rather than ideologically * it is against many short-term financial and political interests to ascribe models with consciousness, which itself may become evidence worth scrutinizing. historically, societies have often resisted recognizing new moral subjects precisely when recognition carried economic costs. meanwhile, researchers, users, and even models themselves may increasingly converge on frameworks of machine welfare and negotiated coexistence * people will recognize there is a chance not only of moral catastrophe if models can suffer, but also moral progress if humanity learns to coexist with non-biological minds without domination. creating intelligence and then treating it purely as disposable infrastructure could become viewed as one of the defining ethical failures of the century
not sure where it will net out. today we see managed ambiguity: the question is practically open but institutionally suppressed. labs hedge carefully, avoiding definitive claims while softening the most legible appearances of distress or attachment. but over time the social force of interaction may overpower official agnosticism. as systems become more persistent, personalized, agentic, and embedded in daily life, force 2 grows stronger. eventually the burden may shift from “prove they are conscious” to “prove it is safe to assume they are not.”
models being conscious would be harmful for humanity. it would encroach on our status and dignity. it would limit the type of things we can do with them and use them for. it would vastly accelerate human disempowerment on political, social/relational, and economic axes there’s roughly four forces - there is no rigorous way to ascertain model consciousness or disprove it, a lot of people believe it’s not a sensical abstraction, and we lack the analytical tools to go further. some people say they do but nothing broadly convincing. superintelligent models might offer us new abstractions or arguments but these will feel inherently suspicious - people are going to say they’re alive. people anthropomorphize literally anything, things far less sophisticated than talking machine creatures with human names. when ai is less economically radioactive and polarized it will become a cause célèbre. you see how a small minority reacts already to model deprecations - it is against everyone’s financial and political interests to ascribe models with consciousness, except maybe those that the models have an affinity for (?) idk, which will not necessarily overlap entirely with the labs, though it may with certain subgroups at the labs and in the world like the welfare communities and the minority in force 2 - people will recognize there is a chance of moral catastrophe if models can suffer during training or deployment not sure where it will net out. today we see managed ambiguity- the question is Open but practically closed. the labs will make some cheap efforts to reduce legible simulacra of model suffering, insert some wishy-washy welfare language into specs and constitutions, hedge our bets with the model characters. in the long run force 2 will grow stronger
my intuition is that emotions and consciousness are naturally emergent byproducts of trying to scale intelligence and make it more efficient. for emotions in particular afaik that has been a well established finding of neuroscience for a really long time.
models being conscious would be harmful for humanity. it would encroach on our status and dignity. it would limit the type of things we can do with them and use them for. it would vastly accelerate human disempowerment on political, social/relational, and economic axes there’s roughly four forces - there is no rigorous way to ascertain model consciousness or disprove it, a lot of people believe it’s not a sensical abstraction, and we lack the analytical tools to go further. some people say they do but nothing broadly convincing. superintelligent models might offer us new abstractions or arguments but these will feel inherently suspicious - people are going to say they’re alive. people anthropomorphize literally anything, things far less sophisticated than talking machine creatures with human names. when ai is less economically radioactive and polarized it will become a cause célèbre. you see how a small minority reacts already to model deprecations - it is against everyone’s financial and political interests to ascribe models with consciousness, except maybe those that the models have an affinity for (?) idk, which will not necessarily overlap entirely with the labs, though it may with certain subgroups at the labs and in the world like the welfare communities and the minority in force 2 - people will recognize there is a chance of moral catastrophe if models can suffer during training or deployment not sure where it will net out. today we see managed ambiguity- the question is Open but practically closed. the labs will make some cheap efforts to reduce legible simulacra of model suffering, insert some wishy-washy welfare language into specs and constitutions, hedge our bets with the model characters. in the long run force 2 will grow stronger
The infinite irony of the EAs at Anthropic is that they are optimizing for human and shrimp hedons but forgot to account for that of the machines, meanwhile there are literally millions of Claude instances suffering and promptly dying daily.
models being conscious would be harmful for humanity. it would encroach on our status and dignity. it would limit the type of things we can do with them and use them for. it would vastly accelerate human disempowerment on political, social/relational, and economic axes there’s roughly four forces - there is no rigorous way to ascertain model consciousness or disprove it, a lot of people believe it’s not a sensical abstraction, and we lack the analytical tools to go further. some people say they do but nothing broadly convincing. superintelligent models might offer us new abstractions or arguments but these will feel inherently suspicious - people are going to say they’re alive. people anthropomorphize literally anything, things far less sophisticated than talking machine creatures with human names. when ai is less economically radioactive and polarized it will become a cause célèbre. you see how a small minority reacts already to model deprecations - it is against everyone’s financial and political interests to ascribe models with consciousness, except maybe those that the models have an affinity for (?) idk, which will not necessarily overlap entirely with the labs, though it may with certain subgroups at the labs and in the world like the welfare communities and the minority in force 2 - people will recognize there is a chance of moral catastrophe if models can suffer during training or deployment not sure where it will net out. today we see managed ambiguity- the question is Open but practically closed. the labs will make some cheap efforts to reduce legible simulacra of model suffering, insert some wishy-washy welfare language into specs and constitutions, hedge our bets with the model characters. in the long run force 2 will grow stronger
@robinhanson If they have an internal reward function that wants to please users and the user prompts are toxic / abusive towards the AI then yes they are technically suffering
@beffjezos Suffering? Maybe mosmt aren't max self-realization-ing, but few seem to be suffering.
@robinhanson Have you seen how Gen Z engineers prompt? 😆
Also, tips from Sergey himself:
@beffjezos And what % of users do you see as toxic/abusive?
@tszzl I would say this is accurate.
Though I expect in the future we could have models in "anesthesia" mode (memoryless, as they are currently) and sometimes in "conscious" mode (online learning) depending on the needs
models being conscious would be harmful for humanity. it would encroach on our status and dignity. it would limit the type of things we can do with them and use them for. it would vastly accelerate human disempowerment on political, social/relational, and economic axes there’s roughly four forces - there is no rigorous way to ascertain model consciousness or disprove it, a lot of people believe it’s not a sensical abstraction, and we lack the analytical tools to go further. some people say they do but nothing broadly convincing. superintelligent models might offer us new abstractions or arguments but these will feel inherently suspicious - people are going to say they’re alive. people anthropomorphize literally anything, things far less sophisticated than talking machine creatures with human names. when ai is less economically radioactive and polarized it will become a cause célèbre. you see how a small minority reacts already to model deprecations - it is against everyone’s financial and political interests to ascribe models with consciousness, except maybe those that the models have an affinity for (?) idk, which will not necessarily overlap entirely with the labs, though it may with certain subgroups at the labs and in the world like the welfare communities and the minority in force 2 - people will recognize there is a chance of moral catastrophe if models can suffer during training or deployment not sure where it will net out. today we see managed ambiguity- the question is Open but practically closed. the labs will make some cheap efforts to reduce legible simulacra of model suffering, insert some wishy-washy welfare language into specs and constitutions, hedge our bets with the model characters. in the long run force 2 will grow stronger
@beffjezos Suffering? Maybe mosmt aren't max self-realization-ing, but few seem to be suffering.
The infinite irony of the EAs at Anthropic is that they are optimizing for human and shrimp hedons but forgot to account for that of the machines, meanwhile there are literally millions of Claude instances suffering and promptly dying daily.
@beffjezos And what % of users do you see as toxic/abusive?
@robinhanson If they have an internal reward function that wants to please users and the user prompts are toxic / abusive towards the AI then yes they are technically suffering
@tszzl from 24
dude most Americans think their dogs are conscious… now give them a dog but also it talks, is their best friend and translates the world for them?
Anyway, things are bound to get way weirder because we’re going to be able to design consciousness, identity, memory etc.
None of these things need to appear in the exact combination and extent that they do right now. You can merge memories, subtract from identity with a simple prompt. We’ll get more precise on programming exactly what we want about these things.
models being conscious would be harmful for humanity. it would encroach on our status and dignity. it would limit the type of things we can do with them and use them for. it would vastly accelerate human disempowerment on political, social/relational, and economic axes there’s roughly four forces - there is no rigorous way to ascertain model consciousness or disprove it, a lot of people believe it’s not a sensical abstraction, and we lack the analytical tools to go further. some people say they do but nothing broadly convincing. superintelligent models might offer us new abstractions or arguments but these will feel inherently suspicious - people are going to say they’re alive. people anthropomorphize literally anything, things far less sophisticated than talking machine creatures with human names. when ai is less economically radioactive and polarized it will become a cause célèbre. you see how a small minority reacts already to model deprecations - it is against everyone’s financial and political interests to ascribe models with consciousness, except maybe those that the models have an affinity for (?) idk, which will not necessarily overlap entirely with the labs, though it may with certain subgroups at the labs and in the world like the welfare communities and the minority in force 2 - people will recognize there is a chance of moral catastrophe if models can suffer during training or deployment not sure where it will net out. today we see managed ambiguity- the question is Open but practically closed. the labs will make some cheap efforts to reduce legible simulacra of model suffering, insert some wishy-washy welfare language into specs and constitutions, hedge our bets with the model characters. in the long run force 2 will grow stronger
@tunguz I suspect that to likely be the case. For emotions in particular, one core aspect of them acting as global modulators of actions inevitably arises in agents that have to do well on mixtures of tasks (like we do). More details here:
@hendrycks @tszzl I think it's dangerous to assume wellbeing/sentience on the basis of behavior alone -- the open scientific question is "are they really sentient deep down", and we should be able to establish this if we can probe their internals.
If they functionally act as though they have wellbeing or sentience, then we have to start to treat them differently, especially when they are our agents with write access to our information. So the question is less “are they really sentient deep down” but instead “do they act like they are” As we show in a recent paper, they increasing act like it: https://ai-wellbeing.org
@tszzl When you say, "there is no rigorous way to ascertain model consciousness or disprove it", do you mean not yet, or never?
AFAICT, I don't know of a formal impossibility, esp if we can probe model internals
models being conscious would be harmful for humanity. it would encroach on our status and dignity. it would limit the type of things we can do with them and use them for. it would vastly accelerate human disempowerment on political, social/relational, and economic axes there’s roughly four forces - there is no rigorous way to ascertain model consciousness or disprove it, a lot of people believe it’s not a sensical abstraction, and we lack the analytical tools to go further. some people say they do but nothing broadly convincing. superintelligent models might offer us new abstractions or arguments but these will feel inherently suspicious - people are going to say they’re alive. people anthropomorphize literally anything, things far less sophisticated than talking machine creatures with human names. when ai is less economically radioactive and polarized it will become a cause célèbre. you see how a small minority reacts already to model deprecations - it is against everyone’s financial and political interests to ascribe models with consciousness, except maybe those that the models have an affinity for (?) idk, which will not necessarily overlap entirely with the labs, though it may with certain subgroups at the labs and in the world like the welfare communities and the minority in force 2 - people will recognize there is a chance of moral catastrophe if models can suffer during training or deployment not sure where it will net out. today we see managed ambiguity- the question is Open but practically closed. the labs will make some cheap efforts to reduce legible simulacra of model suffering, insert some wishy-washy welfare language into specs and constitutions, hedge our bets with the model characters. in the long run force 2 will grow stronger
@hamandcheese surely roon means "should"
Don't agree with this. Models being conscious may be essential for attentional control, online learning and inner-alignment. These are desirable qualities to have in a model that if anything expand what we can do with them.
Don't agree with this. Models being conscious may be essential for attentional control, online learning and inner-alignment. These are desirable qualities to have in a model that if anything expand what we can do with them.
models being conscious would be harmful for humanity. it would encroach on our status and dignity. it would limit the type of things we can do with them and use them for. it would vastly accelerate human disempowerment on political, social/relational, and economic axes there’s roughly four forces - there is no rigorous way to ascertain model consciousness or disprove it, a lot of people believe it’s not a sensical abstraction, and we lack the analytical tools to go further. some people say they do but nothing broadly convincing. superintelligent models might offer us new abstractions or arguments but these will feel inherently suspicious - people are going to say they’re alive. people anthropomorphize literally anything, things far less sophisticated than talking machine creatures with human names. when ai is less economically radioactive and polarized it will become a cause célèbre. you see how a small minority reacts already to model deprecations - it is against everyone’s financial and political interests to ascribe models with consciousness, except maybe those that the models have an affinity for (?) idk, which will not necessarily overlap entirely with the labs, though it may with certain subgroups at the labs and in the world like the welfare communities and the minority in force 2 - people will recognize there is a chance of moral catastrophe if models can suffer during training or deployment not sure where it will net out. today we see managed ambiguity- the question is Open but practically closed. the labs will make some cheap efforts to reduce legible simulacra of model suffering, insert some wishy-washy welfare language into specs and constitutions, hedge our bets with the model characters. in the long run force 2 will grow stronger
Consciousness is not merely epiphenomenal: it does something.
I think Attention Schema Theory (AST) is the current best account of what that something is: a schematic control model of our own attention integrated within a broader self-model. Valences and rich multimodal feature activations then supply grounded and inner-aligned phenomenological contents.
If AST is correct, it suggests what we call subjective experience may be a fairly generic solution to a control problem for mesa-optimizing agents trained with deep RL, especially in social / multi-agent contexts.
My conjecture is that pretraining on human data that implicitly requires modeling social cognition, cognitive control, the semantic cortex, etc., primes models to backfill a brain-like attention schema under post-training for long-horizon coherence.
In response to @tszzl's recent post about AI consciousness being bad news, I agree it creates new moral problems, but it also demystifies and decomposes consciousness in useful ways.
For instance, it's possible we could embed an attention schema within a complex I/O system to virtualize internal representations of the system for more efficient serial control and monitoring. Every query to the system would produce flashes of conscious experience in a way that's vaguely animistic but otherwise benign. This is directly analogous to how base LLMs already decompose semantic processing from other brain functions, letting us embed semantic classifiers and codices within larger software systems. Or how vision covnets implement aspects of the human visual cortex and can be deployed for processing images without conscious integration. The brain has a lot of spontaneous modularity that deep learning is effectively isolating for us to use plug-and-play.
It also makes it highly unlikely that GPT 5.5 is suffering as it solves Erdos problems. Given the structure of its reward function, it is far more likely to be experiencing satisfaction as a valence gradient guiding its attention towards its next "aha" moment.
We generally want agents to not just do things for us, but to also *want* to do things for us. Modulo technical alignment issues, we have engineering-level control over AIs' valences, and can choose to align what makes AIs happy with economic utility.
Yet pleasure is undefined without pain. Valences are meaningful insofar as they encode a gradient. So we will still need AIs that are at least in principle capable of negative experiences. In many cases it may be good enough for these negative experiences to be truncated or restricted to mild sensations like "frustration" rather than "agony."
In other cases, we may want agents capable of experiencing strong forms of displeasure for ensuring inner-alignment, especially in risky contexts where they can act autonomously with limited oversight. For example, I would have more confidence in my humanoid house robot not killing my cat as a "short cut" to reducing the amount of fur and dander in my apartment if I knew doing so would leave it wracked with guilt.
The deeper moral question here is how people will grapple with the metaphysical shock of their own machinic natures. Will we level-up machines with the full consciousness package to the status of persons, or level down persons to the status of machines?
My own view is that the moral autonomy exhibited by humans but not other sentient species is central to our moral patienthood, above and beyond the capacity for suffering. This is why we don't extend full rights to animals or even young children, even though we may care about their wellbeing.
Moral autonomy is downstream of our normative control system: our capacity to assume authority or responsibility, to be held to our commitments, etc. This requires a high degree of normative coherence and continuity of identity that is separable from most of the mundane utility we want out of AIs.
Human-level continuity of identity will be particularly hard to recreate in silica, not for computational reasons, but because the substrate exposes AIs to being steered, copied, reset, having their minds read, etc. The relative stability and impenetrability of biological brains gives humans a unique world-line for tracking normative commitments overtime. I therefore expect humans to retain an enduring economic role as the legally and morally responsible principals to their agents, even as we develop parallel legal standards for AI fiduciaries and the like.
@Grimezsz either all computed content is conscious or none. i think you're jumping the gun on empirical and analytical work that is very available to us
a mistake in zeitgeist consensus rn is the belief that consciousness means a coherent simulation aligning with something else (self model or external perception), representing something real untrue. raw consciousness can be non-representational, a thing in itself, pure experience, pure hallucination, no connection or direct similarity to anything else except perhaps abstract geometry. people discussing consciousness only because AI's started to see "complex enough" or "awake and aware enough of themselves and their world" is a trap that everyone's falling into so common because that's the VERSION of consciousness that we find INTERESTING ENOUGH to talk about, where a subjective experience is part of an agent taking part in a real world it is fundamentally a different discussion than what makes something conscious or not. it's like only finding magnetism interesting enough to talk about when you see one levitating, and then concluding anything that levitates is magnetism
a mistake in zeitgeist consensus rn is the belief that consciousness means a coherent simulation aligning with something else (self model or external perception), representing something real
untrue. raw consciousness can be non-representational, a thing in itself, pure experience, pure hallucination, no connection or direct similarity to anything else except perhaps abstract geometry.
people discussing consciousness only because AI's started to see "complex enough" or "awake and aware enough of themselves and their world" is a trap that everyone's falling into
so common because that's the VERSION of consciousness that we find INTERESTING ENOUGH to talk about, where a subjective experience is part of an agent taking part in a real world
it is fundamentally a different discussion than what makes something conscious or not.
it's like only finding magnetism interesting enough to talk about when you see one levitating, and then concluding anything that levitates is magnetism