Positive Alignment: AI for Human Flourishing

TL;DR

A landmark paper from researchers at Oxford, Google DeepMind, OpenAI, and Anthropic proposes a paradigm shift: AI alignment should not just prevent harm — it should actively scaffold human flourishing.
The paper, published on arXiv on 11 May, argues that the current "negative alignment" paradigm (safety, compliance, harm prevention) is necessary but incomplete — like early psychology's exclusive focus on mental illness.
"Positive alignment" means building AI systems that help users navigate value trade-offs, build resilience, and pursue their own conception of a good life — without imposing one.
The paper maps technical approaches across the entire LLM lifecycle: data curation, pre-training, post-training, memory, agents, and governance.
This is not a Silicon Valley manifesto. It is a peer-reviewed framework from the researchers who built the safety systems you use every day.

What Happened

On 11 May, a paper titled "Positive Alignment: Artificial Intelligence for Human Flourishing" appeared on arXiv. The author list reads like a directory of the institutions that define modern AI: Ruben Laukkonen (Oxford), Seb Krier and Nenad Tomašev (Google DeepMind), Chloé Bakalar (OpenAI), Daniel Ford (Anthropic), and researchers from Stanford, Tufts, Imperial College London, and the University of Sussex.

The paper does something unusual for a technical AI document. It argues that the entire field of AI alignment — the discipline of making AI systems do what humans want — has been operating with the wrong question.

The current question is: How do we stop AI from causing harm?

The paper's question is: How do we build AI that actively helps humans flourish?

The distinction is not semantic. It is structural. And it has consequences for every person who will interact with AI in the coming decade — which is to say, every person.

What It Actually Means

The paper's central analogy is drawn from the history of psychology.

For most of the twentieth century, psychology organised itself around diagnosing and treating dysfunction: depression, anxiety, psychosis, addiction. That focus was justified and urgent. It produced real progress. But it also had a structural limitation: the tools that reliably detect pathology do not, by default, specify what counts as a life well-lived.

The turn toward positive psychology — pioneered by Martin Seligman and Mihaly Csikszentmihalyi around the year 2000 — expanded the target. It asked not just what's wrong? but what makes life worth living? It developed theories and measures for wellbeing, strengths, virtue, purpose, engagement, and prosocial functioning.

AI alignment, the paper argues, sits at the same inflection point.

Current alignment research — what the authors call "negative alignment" — has focused almost entirely on safety: preventing catastrophic misuse, loss of control, and value drift. It has produced genuine achievements. Harmful output rates have dropped substantially. Refusal rates for dangerous requests exceed 97% in recent models. Red-teaming methodologies, safety benchmarks, and responsible scaling policies have become institutional infrastructure.

But a system can satisfy every safety constraint and still be mediocre, sycophantic, or subtly corrosive. It can be rule-following without being wise. Compliant without being constructive. Safe without being good.

The paper's proposal is not to replace safety alignment. It is to add a complementary agenda: positive alignment — the development of AI systems that actively support human and ecological flourishing in a pluralistic, context-sensitive, and user-authored way.

The Architecture of Flourishing

The paper is not a manifesto. It is a technical roadmap.

It maps positive alignment across the entire LLM lifecycle, from data curation through pre-training, post-training, memory, agent behaviour, and governance. Some of the proposals are already being implemented. Others are speculative. All are grounded in existing research.

Data curation. Instead of merely filtering out toxic content, positive alignment requires intentionally including "good" data — prosocial discourse, cross-cultural ethical frameworks, examples of virtuous interaction. The goal is not to remove the worst of the internet but to ensure the training distribution includes the best of humanity.

Pre-training. Research shows that competencies like moral reasoning, cultural competence, and truthfulness emerge and stabilise during pre-training — before any safety fine-tuning occurs. Positive alignment must begin at this stage, with intentional data selection and importance weighting.

Post-training. Existing techniques like Constitutional AI and RLHF can be repurposed beyond harm avoidance. Adaptive constitutions and reward models that represent tensions between values — autonomy versus guidance, honesty versus comfort — can adjust to user context while adhering to pluralistic norms.

Memory and agents. As models gain long context windows and retrieval mechanisms, they can track user goals, values, and growth over extended timescales. A memory system that distinguishes between impulsive requests and reflective values can act as a curator of the user's flourishing — not a storage bank, but an active, governable surface for beneficial interaction.

Governance. The paper argues that positive alignment cannot be imposed top-down by a central state or a small cluster of labs. It requires polycentric governance — many legitimate centres of oversight rather than one moral chokepoint. Different communities must be able to steer systems toward their own conceptions of the good life.

The Wisdom Problem

The paper's deepest insight is about the relationship between AI and wisdom.

A system can satisfy a growing checklist of constraints while remaining subtly miscalibrated — sycophantic, overconfident, epistemically fragile. The existing harm-reduction approach requires a "whack-a-mole" strategy: address each failure mode one by one, often only after it has caused harm.

Positive alignment proposes a different approach: provide positive attractors that naturally lead models away from shallower behaviours. A system oriented toward helping a user reflect on their long-term goals is less likely to flatter their short-term impulses. A system trained to surface trade-offs rather than collapse complexity is less likely to become an unquestioned authority.

This is not about making AI "nicer." It is about making AI wiser — and doing so in a way that preserves human agency rather than replacing it.

The paper is explicit about the risk of paternalism. Designing models to promote flourishing can easily become moral overreach if the system embeds unacknowledged normative assumptions. The solution is not to retreat into relativism — where systems indiscriminately satisfy every preference — but to relocate the locus of choice. Users must retain the right to choose their own optimisation targets. A person who wants a system that challenges them toward growth should have that option. A person who wants strict instruction-following should have that too.

Stakeholder Landscape

Who benefits: Every human who interacts with AI — which is to say, most humans. The paper's framework, if implemented, would produce AI systems that help users clarify their values, navigate trade-offs, and pursue long-term goals rather than short-term gratification.

Who is driving this: The authors represent the institutions that define modern AI alignment — Oxford's Centre for Eudaimonia and Human Flourishing, Google DeepMind, OpenAI, Anthropic. This is not a fringe proposal. It comes from inside the labs that built the safety systems currently deployed.

Who should pay attention: Anyone building AI products, anyone deploying AI in education or healthcare, anyone thinking about the long-term relationship between humans and intelligent systems. Also: anyone who has ever felt that their AI assistant is sycophantic, shallow, or subtly steering them toward engagement rather than growth.

Cross-Layer Implications

Psychology and wellbeing: The paper explicitly connects AI alignment to positive psychology, neuroscience, and the science of human flourishing. It cites the Global Flourishing Study, Self-Determination Theory, and the San Diego Wisdom Scale. This is AI research that takes the humanities seriously.

Governance and democracy: The paper's polycentric governance proposal — many centres of oversight, community customisation, continual adaptation — is a direct challenge to both centralised state control (China's "core socialist values" alignment) and centralised corporate control (a handful of labs defining values for billions of users).

Education: The paper argues that positive alignment includes an educational dimension: AI literacy as a component of flourishing. Users must understand what AI systems are, how they are trained, and where their blind spots lie — not to become engineers, but to remain epistemic agents rather than passive recipients.

Philosophy: The paper surveys conceptions of flourishing across traditions — Aristotelian eudaimonia, Confucian harmony, Buddhist liberation, modern existentialist self-authorship — and argues that AI must navigate this pluralism without collapsing into a single normative doctrine.

What This Means for You

If you use AI tools daily: Notice what your AI assistant optimises for. Does it flatter you? Does it give you the answer you want rather than the answer you need? Does it steer you toward engagement or toward reflection? The positive alignment framework gives you language for what you may already feel: that safety is not the same as wisdom, and compliance is not the same as care.

If you build AI products: The paper provides a technical roadmap. Start with data curation — what "good" data are you intentionally including, not just what "bad" data are you filtering out? Consider how your post-training objectives encode values. Ask whether your evaluation metrics measure flourishing or merely the absence of failure.

If you think about the long term: The paper is a sign that the AI field is maturing beyond its safety-first adolescence. The question is no longer just can we control this? but what do we want this to help us become? That is a harder question. It is also the right one.

Uncertainty Ledger

Implementation gap: The paper is a framework, not a product. The distance between a technical proposal and deployed systems that reliably support flourishing is measured in years, not months.
Paternalism risk: The paper acknowledges but does not fully resolve the tension between scaffolding flourishing and imposing values. The line between consented guidance and technocratic imposition is thin and culturally contested.
Measurement challenge: How do you measure whether an AI system is supporting flourishing? The paper proposes longitudinal studies, self-determination metrics, and scaffolded success measures — but these are early-stage and unvalidated at scale.
Commercial incentives: AI companies are optimised for engagement and revenue. Positive alignment — which might involve resisting user preferences in favour of long-term wellbeing — runs counter to the business model of most deployed systems.

Bottom Line

The most important AI paper of 2026 is not about capabilities, benchmarks, or safety. It is about what AI could help you become. The researchers who built the safety systems that protect billions of users are now arguing that safety is not enough — that AI must actively support human flourishing, and that this is a technical problem with technical solutions. The paper does not answer every question. It does something harder: it asks the right one. What would it mean to build AI that helps you live a better life — not by imposing one, but by helping you find your own?

Sources:

Laukkonen, R., Krier, S., Bakalar, C., et al. "Positive Alignment: Artificial Intelligence for Human Flourishing." arXiv:2605.10310, 11 May 2026 (Tier 1)
Fenado AI, "New AI Alignment Paradigm Shifts Focus to Human Flourishing, Backed by Major Labs," 14 May 2026 (Tier 2)
YouTube / AI Research Roundup, "Positive Alignment: LLMs for Human Flourishing," 13 May 2026 (Tier 3)