Every engineering discipline pays a safety tax. Bridges are overbuilt, aircraft have redundant systems, and pharmaceutical development includes years of clinical trials. Artificial intelligence is no different โ the alignment tax is the capability cost of ensuring AI systems remain beneficial.
Paul Christiano framed this concept precisely: if we could build powerful AI that is not aligned, the alignment tax is the additional cost (in time, compute, or capability) required to build an equally powerful system that is aligned. The central question for AI governance is whether this tax is small enough to be borne voluntarily or large enough to create dangerous competitive incentives to skip it.
This simulator models the dynamics using three coupled equations. Capability grows at a base rate reduced by alignment overhead: C(t) = Rยท(1-o)^t, where R is the base rate and o is the fraction of resources devoted to alignment. Risk is proportional to capability and inversely related to alignment quality: Risk = Cยท(1-q)/(C+1). Social welfare combines both: W = Cยท(1-Risk).
The key finding is that optimal alignment overhead is never zero. Even a small investment in safety produces outsized welfare gains when capabilities are high, because the marginal cost of a catastrophe scales with capability. Conversely, excessive alignment overhead starves capability development, reducing welfare through the other channel.
The model reveals a phase transition in optimal strategy as alignment quality improves. When alignment techniques are crude (low quality), the best strategy is to slow capability development. When techniques are refined (high quality), the best strategy is to accelerate capability development alongside proportional alignment investment. This underscores the importance of alignment research as a multiplier on the entire AI enterprise.