Some initial musings on the politics of longtermist trajectory change
What thinking beyond extinction might mean for AI Governance
Summary
Trajectory shaping is, for many of us, a thoroughly important intervention that we think on the margin we should have more people working on
However, given the dire situation we are in, this work fairly rapidly needs to be concretised. We need to acknowledge the “politics” of the situation, and better understand how we can get from where we are, to a good future. This probably means a diversity of theories of change.
There strike me as two strategies in general one can take to do trajectory change: lock-in your values from today, or set up the conditions that allow for a (Long) Reflection
Both of these are woefully neglected, and need more resources and ideas flowing to them. We then want to move quickly from the “seminar room” to concrete interventions that can be competitive in an uncaring world
Introduction
As I see it, there are two approaches to ensuring your values dominate the future: lock them in or setting up the correct deliberation process. I say “ensuring your values dominate the future”, which in many contexts sounds overly imposing, even imperial, and I do not wish to say this. To a moral realist, this may mean “the correct values”, and one generally assumes a correct set of values is closer to your values than to an arbitrary set of values, say, paperclips. To a moral anti-realist, at least to some, this might directly mean just your object level values. There is more meta-ethical complexity as to what this statement means, but I don’t think this matters particularly.
Now before starting on the two strategies, I want to give a lay of the land. We are not in a good situation. Timelines to AGI are, by any reasonable metric of short, short. Public trust, and the ability to engage in good faith deliberation, is declining. Political polarisation makes compromise undesirable. Governments seem asleep at the wheel, as AI companies equivocate over and often lobby against regulation. It is plausible that AGI will be developed under a certain kind of race conditions, as the feasibility of international cooperation is declining. AGI companies, who many had hoped would be friendly to our cause, are, at least in my view, rather not. On top of all of that, we are in a world that is, at least in my view, often completely indifferent to some of the most significant parts of my value system.
The world cares little for my values. Sometimes what is allowed by the forces of competition, sometimes what intellectual ideas get prominent and what powerful actors are happy to countenance, leads to improvements. Often, this is still because people stood up and actively opposed entrenched forces. But even then, even as material wealth has uplifted much of the world, because the vast majority of the world is deeply uncaring towards many of the things that truly matter (eg animal welfare), humanity continues to carry out some of the worst atrocities we have ever carried out. Despite in many ways the world getting better, because of this uncaringness to relatively subtle values, it is my judgement that the world has been getting worse. I have some sympathy for the Yudkowskian flavoured idea that “near misses” can lead to catastrophes; although less confidence than Yudkowsky does that this catastrophe involves killing all of humanity, and not something vastly different but morally quite similar.
Whilst the world cares about my values little now, as various forms of political competition heats up, the world will likely care about my values less - unless I do something about it. In the face of political competition, simplistic logics of survival thrive, and we are likely to see universal values skewed in favour of more nationalistic flavours. We are likely to see weird beliefs about the importance of Digital Minds be seen as a luxury belief. This may even be from people some perceive as ‘allies’ as they fight for more credibility, putting aside some of our weird beliefs. Whilst Anthropic have, to their credit, hired an AI Welfare researcher, Amodei’s “Machines of Loving Grace” essay seems far from engaging with almost all the most important questions of post-AGI governance and politics. It seems very far removed from the real questions about how to have a good future and how to ensure a Long Reflection. Even from outside the labs, essays like “The Intelligence Curse” doesn’t just seem narrowly focused on human control, but on human welfare in a way that seems insufficiently caring for other sentient beings.
Importantly, a “vibe” I often feel is missing from the space of people who primarily want to change the trajectory of the future, is one of ruthlessly understanding the political situation we are in, and determination to win despite it. That situation is of course changing and very uncertain, with different actors and values of plausibly different levels of importance. Importantly, however, it can’t be ignored. Because it is likely within a tense political situation, be it one of geostrategic or economic competition, corporate competition or policymakers asleep at the wheel as the world changes beneath them, that we will have to try and make our way to the Promised Land.
In a short window, we have to try and get an often indifferent world, distracted and engaged with various other considerations and pressures, towards as good a future as we can. Because this may be our only shot. Value lock-in is much more plausible post-AGI than pre-AGI. Space expansion might mean much of our “species” moves beyond our causal control quite rapidly. In the worst case scenarios, AI wipes us out, or obsoletes us, so these next few years might literally be our only chance to impact the value of the future at all.
So, if we are to have this influence, we need to think about how we are going to do this in these bad circumstances. This requires us to move this work far outside of the seminar room and into the ruthless world of politics. Playing politics here may involve playing for proxy goals - stuff that makes the world better but also rhymes with the security politics of the day. It may involve direct public advocacy on specific goals that have public support. It may involve getting really good at model spec writing and trying to control that key input. Or it may include bravely advocating for unpopular things to act as a radical flank and hope we can pull the world closer to ourselves. This may be advocacy, or policy or even technical work - technology is a form of politics, and one that, given our communities skills, we might be particularly good at leveraging. But all of this requires approaching improving trajectory change with ruthless pragmatism, and trying to improve on the failings of the AI XRisk community that came before us.
One final note before explaining the different theories of change. My argument is not that the few people explicitly doing work in this space at present have either not thought about the broader AI politics, or have bad theories of change. My point is that I think there is an impression of seeing these things as quite separate, which might lead to the assumption that “Forethought are on the ball on this” and “Eleos are doing the relevant Digital Minds work”. However these organisations have very specific theories of change, which may fail under a number of different scenarios, and certainly don’t cover that much of the ground we need covering. In the same way as we (rightfully) have not put all our eggs in one basket with avoiding extinction, and have taken approaches that differ based on a variety of assumptions, we need to do the same here.
Strategies for a good future
The first strategy takes our roughly object-level values, and tries to align the AGIs to these. One simple version of the strategy involves racing, winning, setting oneself up as the chief decisionmaker on Earth then deciding (perhaps under AI advice) what values ought to dominate the future. A similar strategy involves winning the race and aligning an AI to a particular moral system - say, setting up your singleton to be a utilitarian. But there are risks involved in this strategy - it's deeply uncooperative and it's very similar to a strategy where takeover happens (so it may be making that more likely). This isn’t to say it’s a particularly bad strategy - it is a bad strategy, but every strategy we have is truly terrible. It's not the only one either. We can try and convince many, or all, the AI companies to take on some of our values - say, that animal welfare matters. We could advocate for labs and decisionmakers to take Digital Sentience seriously today, in the hopes that this will allow them to be better prepared when they make decisions that affect the longterm future. We might choose to make ourselves very good model spec writers, so we can smuggle our values without any real accountability. Or, we might take a completely different approach, and focus on novel technical interventions that approximate the things we care about - for example, we care about S-Risks so we focus on cooperative AI.
Many of these strategies people are already taking, but there are certainly more that can be taken. Even the ways we go about, for example, getting the labs to care about the things we care about can take a plethora of different approaches. We could, for example, take a protest strategy similar to the animal welfare movement, or we could try and become friends and convince them to embed values similar to ours. We could even go as far as just supporting the lab with values most similar to ours to completely win the race. There are more, depending on what the values we want propagated are, and I’m sure there are many I haven't thought of yet. The operative point is, there seems to be a lot more space to do this sort of normative alignment work than is currently being covered. This is particularly true if you also consider the number of other strategic parameters we need to be concerned with - timelines, value drift during superalignment, degree of securitisation, which actors will be powerful, whether there is a national project etc.
The second strategy, and the one I think has been worked on even less, is about setting up a governance structure such that post-AGI we can do the sorts of deliberation and reflection needed to bring about a good future.
This strategy is notably still normative - it is not value neutral. But rather than establishing the ethical norms we want AI systems to follow directly, or at least roughly specifying this, we try and create a structure that can follow the sorts of deliberative or epistemic processes we think ought to be followed in deciding the value of the future. There are a number of different arrangements of these - we could try and create deliberative assemblies, we could use various bargain theoretic set ups, we could all agree to let AIs “solve philosophy”, or we could split up the universe between us. Some of these seem prudent to me, others awful. But they are all premised on particular normative premises - premises that, if you buy that anything even close to the orthogonality thesis also applies to epistemic normativity, we can’t just have sufficiently smart AIs intuit and solve for us. We have to build these conditions, and there looks like there is very little thinking going on at the moment about how we move from where we are now, to a world where we can have such a Long (or Short) Reflection.
One important step we need to figure out is what actually such a post-AGI would look like. Far too little work has focused on realistically thinking about “post-AGI political philosophy”; what the state of affairs, and how to govern it, post AGI/superintelligence. Some of the former FHI crowd have done, but to a large extent that’s it. Many of the rest of the visions seem either focused on the vastly wrong questions (like narrow, present day balance of power questions), or to present a vision of the future far too tethered to the present (as I think Amodei, for example, does). This sort of conceptual work seems highly neglected, urgent and rather difficult.
It's not clear to me exactly what a concrete strategy to bring about such a reflection could be. There is the obvious one, available as above - win the race, impose your conditions. But this is pretty difficult, and requires doing something similar to carrying out a (hopefully bloodless) coup; it's not obvious that this is what we want as our only strategy. Another is to promote cooperation between major powers developing this - but then we need to make sure that this cooperation instrument can allow a transition from primarily a competition and power based international arena to one where values can actually prevail. There may be more - from technical interventions promoting the right types of cooperation between AIs, to work on what the model spec should be - but this essay is not primarily about laying out what these strategies are. Just that, when trying to bring them about, we need to really think through a theory of change that works given our constraints.
There are two more points in this vicinity. Firstly, it is plausible that these interventions may overlap with interventions to reduce loss of control risk. However, I would be sceptical if we thought all the good interventions looked the same. Given how relatively little thought has currently been put into this problem, we might think this convergence is too convenient. Similarly, I have in a few cases heard fairly dismissive responses that this problem is easy to solve - such as if EAs control even a small amount of value going into the future, then we will be more patient, so trade for more resources. This, whilst in some ways plausible, in its most dismissive form massively misses much of the point - we need to have a state of affairs where such trades are viable, which doesn’t necessarily come by default. So we still need to figure out a way to influence the post-AGI politics such that we can do this. I’m also sceptical it will be so easy for a number of other reasons beyond the scope of this post. But I think we should also be sceptical of simple solutions presented in conversations at parties as the solution to the key problem of our time - these feel too neat, too convenient, and too much like we don’t need to have any additional strategy to bring these futures about. Given how little work has been done, I’d be reluctant to accept such conclusions so quickly.
Of course, there also exist proposals somewhere in the middle of these two strategies. Really, these are two end members on a spectrum, but I think they focus what strategies that can be taken well.
One tangent, but that I think is relevant, is the fact that technical work is often one of our best forms of governance; this, in fact, applies to AI Safety more broadly. I think when people think about influencing the value of the future, this often raises ideas of philosophers, and then when I say concretising it, either campaigners or people working on governance. But many agendas that seem promising actually focus on technical measures. There are many reasons to prefer technical measures as a way of playing politics. It's ‘home turf’ for a lot of EAs and so we have a comparative advantage. It often happens with less oversight than much explicit policy work - when our values are esoteric, this can be a good (although maybe uncooperative) thing. Technical work can often be done unilaterally, or at least, once it has been unilaterally done there is easier multilateral adoption than policy work often has. Technical lock-in can mean others dislodging the work we have done is harder. And many technological solutions can turn difficult, zero-sum policy situations to those that are more positive-sum or can get broader buy-in; we can often simply bypass difficult decisions. But technical work is still done in the context of the politics of the AI space, something that I think many EAs often forget. Technical measures (including solving alignment!) are one measure amongst many in our governance arsenal - and inmany, but by no means all, situations they are amongst our best tools.
The need to think bigger in AI Governance work
What we care about is not whether AI systems will be interpretable. It is not whether a particular compute governance regime will work. It is not whether terrorists will steal the model weights and use it to create a bioweapon. We care about making the future good. We need more (even a small bit more) work on that more fundamental question.
Firstly, more work needs to be done to take us from where we are now, with mostly seminar room conversations and more narrow theories of change, to a plethora of views on how to do pragmatic value shaping properly. We need to do the theories of change work to concretise the conceptual work that has been done, and do the conceptual work that needs to be done that can then get concretised. Organisations like Forethought are doing some of this, but there definitely seems like more work needs to be done here - especially in trying to spin up ideas that are plausible under very short timelines. This also needs people with many different sorts of experiences - security, technical, political, with the labs, public advocacy etc. There was some great discussions on the EA Forum about the importance of trajectory changing interventions, and I’d like to see the same level of intellectual activity generating concrete proposals for what this can look like. I’d like 80000 hours to think more on this, direct people to careers on this. I’d like more people in this community doing research projects for programmes like IRG, FIG, ERA etc on what we should actually concretely do to improve the value of the future.
One objection to this is maybe there just aren’t tractable, concrete interventions to find. This isn’t impossible, but I don’t think it should be the default assumption. Certainly, I think we have to try first to really figure this out. From my sense of the interventions I see people doing already, it strikes me as unlikely that we won’t be able to find more potentially tractable interventions. This space feels nascent enough that, given there are some approaches that seem to work, it would be surprising if we had exhausted the space yet.
Secondly, we need to act on these plans. One solution is to set up more organisations doing this - I saw some exciting work in this vicinity at the AI, Animals and Digital Minds conference in London I attended, but it felt I wanted to see more organisations, and I wanted more “edge” to them. A different way to do work with similar outcomes is for existing organisations to integrate (longtermist) value-shaping priorities into their theories of change more than they currently do. I see some work at some organisations that seems like it's essentially only meant to help with human welfare or representation in the very short term - given the pressures we’re under, this seems mostly not like what we should prioritise, even if it is valuable. If just these parts of the budget of these organisations went more towards longtermist value shaping, that would be a big shift compared to where we are today.
And, on top of this, when trying to do this work, we need to do something I think EAs are generally uncomfortable with - understand power and conflict properly. Sometimes it's prudent to ally with power. Sometimes it's prudent to oppose it. Sometimes there are positive sum trades, but sometimes, we are in a conflict, and we just have to win. I expect that conflict (even if there are compromises to be won, which I expect there often are) will be more common in this work than in general AI Safety, so we need to be much better prepared for it than we have been in the past.
I’m generally pretty excited about exploring these areas, and so if anyone else if, I’d be very excited for people to reach out to me to chat!
I should say, this is my first time trying to publish fairly unpolished writing and thoughts - writing more like I speak than in more polished essay/paper format. So if people have feedback on the style, length of the pieces etc, I'd be really interested to hear
What would it mean for an AI to be interpretable? Is AI at present interpretable?