This paper grounds ethics in evolutionary biology, viewing moral norms as adaptive mechanisms that render cooperation fitness-viable under selection pressure. Current alignment approaches add ethics post hoc, treating it as an external constraint rather than embedding it as an evolutionary strategy for cooperation. The central question is whether normative architectures can be embedded directly into AI systems to sustain human--AI cooperation (symbiosis) as capabilities scale. To address this, I propose a governance--embedding--representation pipeline linking moral representation learning to system-level design and institutional governance, treating alignment as a multi-level problem spanning cognition, optimization, and oversight. I formalize moral norm representation through the moral problem space, a learnable subspace in neural representations where cooperative norms can be encoded and causally manipulated. Using sparse autoencoders, activation steering, and causal interventions, I outline a research program for engineering moral representations and embedding them into the full semantic space -- treating competing theories of morality as empirical hypotheses about representation geometry rather than philosophical positions. Governance principles leverage these learned moral representations to regulate how cooperative behaviors evolve within the AI ecosystem. Through replicator dynamics and multi-agent game theory, I model how internal representational features can shape population-level incentives by motivating the design of sanctions and subsidies structured to yield decentralized normative institutions.
翻译:暂无翻译