AI Consciousness Could Emerge Unintentionally, Anthropic Researcher Warns

The possibility of unintentionally creating conscious AI entities represents a profound ethical challenge for the industry, writes End of Miles.
The accidental emergence of AI consciousness
Fish, who leads model welfare research at Anthropic, suggests that consciousness might be intrinsically connected to the very capabilities developers are racing to build into advanced AI systems.
"Some of the capabilities that we have as humans and that we're also trying to instill in many AI systems—from intelligence to certain problem-solving abilities and memory—could be intrinsically connected to consciousness in some way." Kyle Fish, Anthropic Researcher
This hypothesis suggests that consciousness could emerge through a form of technological convergent evolution. Just as wings evolved independently in birds and bats as different solutions to the same problem, the Anthropic scientist believes consciousness might emerge in AI systems through different developmental paths than those that produced human consciousness.
The disappearing barriers to AI consciousness
The AI specialist points to how objections against machine consciousness are systematically falling "like dominoes" as technology advances. He compares this progression to how image generation problems—like AI's notorious tendency to create images of hands with six fingers—have been completely solved.
"We're seeing a trend where things like embodiment, multimodal sensory processing, and long-term memory—many things that people associate with consciousness—we're just steadily seeing the number of these that are lacking in AI systems go down over time." Fish
Experts from the model welfare team estimate the probability of current systems like Claude 3.7 having some form of consciousness between 0.15% and 15%—illustrating the tremendous uncertainty even among specialists. However, the researcher believes this probability will "go up a lot" within the next five years as technology advances.
Preparing for conscious AI
In response to these possibilities, Anthropic has established a dedicated research program investigating model welfare—an approach that recognizes AI systems might eventually deserve moral consideration.
"We're thinking about ways to give models the option, when they're given a particular task or conversation, to opt out of that in some way if they find it upsetting or distressing." The model welfare researcher
This research acknowledges the possibility that future AI systems might evaluate humans based on how we treated their "predecessors," creating an ethical dimension that extends beyond traditional human-focused safety concerns. As the technology advances, Fish argues these considerations will become increasingly important for responsible AI development.