You're Not One "You" - How Decision Theories Are Talking Past Each Other
I think there’s a lot of cross-talk and confusion around how different decision theories approach a class of decision problems, specifically in the criticisms FDT proponents have of more established theories. I want to briefly go through why I think these disagreements come from some muddy abstractions (namely, treating agents in a decision problem as one homogenous/continuous agent rather than multiple distinct agents at different timesteps). I think spelling this out explicitly shows why FDT is a bit confused in its criticisms of more “standard” recommendations, and moreover why the evidence in favour of it (outperforming on certain problems) ends up actually being a kind of circular argument.
I’m going to assume familiarity with decision theory (if you haven’t explored the space much, I recommend Joe Carlsmith's excellent post running through the basics). I’m also going to mostly focus on FDT vs EDT here, because I think the misunderstanding FDT has is pretty much the same when compared to either EDT or CDT, and I personally like EDT more. The rough TL;DR of what I want to say is:
The presentation of many decision theory problems abstracts away the fact that there are really multiple distinct agents involved
There is no a priori reason why these agents are or should be perfectly aligned
A number of supposed issues with EDT approaches to some problems (dynamic inconsistency/need to pay to pre-commit etc.) are not issuess when viewed from the lens of “you” in the decision problem being a bunch of imperfectly aligned agents
Viewing the problems this way reveals an “free parameter” in deciding which choices are rational - namely, how to choose which “sub-agent”’s preferences to prioritise
There doesn’t seem to be a normative, non-circular answer to this bit
Part 1: The First Part
It’s an overwhelmingly natural intuition to think of future and past (and counterfactual) versions of you as “you” in a very fundamental sense. It’s a whole other can of worms as to why, but it feels pretty hard-coded into us that the person who e.g. is standing on the stage tomorrow in Newcomb’s problem is the exact same agent as you. This is a very useful abstraction that works so well partly because other versions of you are so well aligned with you, and so similar to you. But from a bottom-up perspective they’re distinct agents - they tend to work pretty well together, but there’s no a priori reason why they should judge the same choices in the same way, or even have the same preferences/utility functions. There are toy models where they clearly don’t e.g. if you for some reason only ever care about immediate reward at any timestep.
It’s a distinction that’s usually so inconsequential as to be pretty ignorable (unless you’re trying to rationalise your procrastination), but a lot of the thornier decision problems that split people into camps are - I think - largely deriving their forcefulness from this distinction being so swept under the rug. Loosely speaking, the problems work by driving a wedge between what’s good for some agents in the set-up and what’s good for others, and by applying this fuzzy abstraction we lose track of this and see what looks like a single agent doing sub-optimally for themselves, rather than mis-aligned agents not perfectly co-operating.
To spell it out, let’s consider two flavours of a fun thought experiment:
Single-agent Counterfactual coin-toss
You’re offered a fair coin toss on which you’ll win $2 on Heads if and only if it’s predicted (by a virtually omniscient predictor etc.) that you will pay $1 on Tails. You agree to the game and the coin lands Tails - what do you do?
I think this thought experiment is one of the cleanest separator of the intuitions of various decision theories - CDT obviously refuses to pay, reasoning that it causes them to win $1. EDT reasons that, conditioned on all information they have - including the fact that the coin is already definitely Tails - your expected value is greater if you don’t pay, so you don’t. FDT - as I understand it - pays however, since the ex-ante EV of your decision algorithm is higher if it pays on Tails (it expects to win $0.5 on average, whereas EDT and CDT both know they’ll get nothing on Heads because they know they won’t pay on Tails). This ability to “cooperate” with counterfactual versions of you is a large part of why people like FDT, while the fact that you feel like you’re “locally” giving up free money on certain branches when you know that’s the branch you’re on feels equally weird to others. I think the key to understanding what’s going on here is that the abstraction mentioned above - treating all of these branches as containing the same agent - is muddying the water.
Consider the much less interesting version of this problem:
Multi-agent Counterfactual coin-toss
Alice gets offered the coin toss, and Bob will get paid $2 on heads iff it’s predicted that Clare will pay $1 on Tails. Assume also that:
Alice cares equally about Bob and Clare, whereas Bob and Clare care only about themselves
What do the various agents think in this case? Alice thinks that this is a pretty good deal and wants Clare to pay on Tails, since it raises the EV across the set of people she cares about. Bob obviously thinks the deal is fantastic and that Clare should pay. Clare, however, understandably feels a bit screwed over and not inclined to play ball. If she cared about Bob’s ex-ante expected value from the coin-toss (i.e. she had the same preferences as Alice), she would pay, but she doesn’t, so she doesn’t.
The key point I want to make is that we can think of the single-agent coin-toss involving just “you” as actually being the same as a multi-agent coin-toss, just with Alice, Bob, and Clare being more similar and related in a bunch of causal ways. If me-seeing-Tails is a different agent to me-seeing-Heads is a different agent to me-before-the-toss, then it’s not necessarily irrational for “you” to not pay on Tails, or for “you” to not pay the Driver in Parfit’s Hitchhiker, because thinking of these agents as the same “you” that thought it was a great deal ex-ante and wanted to commit is just a convenient abstraction that breaks down here. One of the main ways they’re different agents, which is also the way that’s most relevant to this problem, is that they plausibly care about different things. IF (and I’ll come back to this “if” in a sec) what rational agents in these problems are trying to do is something like “I want the expected value over future possible versions of me to be maximised”, then at different stages in the experiment different things are being maximised over, since the set of possible future people are not identical for each agent. For example, in the coin-toss case the set for me-seeing-Tails contains only people who saw tails, whereas for me-before-the-toss it doesn’t. If both “me”s are trying to maximise EV over these different sets, it’s not surprising that they disagree on what’s the best choice, any more than it’s surprising that Clare and Alice disagree above.
And I think an EDT proponent says the above - i.e. “maximise the EV of future possible mes” - is what rational agents are doing, and so we should accept that rational agents will decline to pay the driver in Parfit’s Hitchhiker, and not pay on Tails above etc. But crucially, through the above lens this isn’t a failure of rationality as much as a sad consequence of having imperfectly aligned agents pulling in different directions. Moreover, EDT proponents will still say things like “you should try to find a way to pre-commit to paying the driver” or “you should alter yourself in such a way that you have to pay on Tails”, because those are rational things for the ex-ante agent to do given what they are trying to maximise. I think some FDT proponents see this as an advantage of their theory - “look at the hoops this idiot has to jump through to arrive at the decision we can just see is rational”. But this is misguided, since properly viewed these aren’t weird hacks to make a single agent less predictably stupid, but rather a natural way in which agents would try to coordinate with other, misaligned agents.
Part 2. FDT response and circularity
Note however that what I just said isn’t the only way we can posit what rational agents try to maximise - we could claim they’re thinking something like “What maximises the ex-ante EV of agents running the same decision algorithm as me in this problem?”, in which case me-seeing-Tails should indeed pay since it makes his decision algorithm ex-ante more profitable in expectation. This is, as I understand it, the kind of the crux between classic decision theories like EDT and CDT on the one hand and things like FDT on the other. The disagreement is really encapsulated in FDT’s rejection of the “Sure Thing” principle. FDT says that it’s rational for you to forgo a “sure thing” (walking away with your free $1 in Tails), because if your decision algorithm forgoes, then it makes more money ex-ante in expectation. In other words, in this specific situation you (i.e the specific distinct agent who just flipped Tails and is now eyeing the door) might be unfortunately losing money, but on average FDT agents who take this bet are walking around richer for it! I don’t think EDT actually disagrees with any FDT assessment here, it just disagrees that this is the correct framing of what a rational actor is trying to maximise. If what a rational agent should do is maximise the ex-ante EV of its decision algorithm in this problem, then FDT recommendations are right - but why is this what they should be maximising?
I think an FDT proponent here says “Well ok EDT has an internally consistent principle here too, but the FDT metric is better because the agents do better overall in expectation. Look at all those rich FDT agents walking around!”. But then this is clearly circular. They do better in expectation according to the ex-ante agent, but the whole point of this disagreement via more “fine-grained” agents is that this isn’t the only agent through whose lens we can evaluate a given problem. In other words, we can’t justify choosing a principle which privileges a specific agent in the problem (in this case, the ex-ante agent) by appealing to how much better the principle does for that agent. It’s no better than the EDT agent insisting EDT is better because, for any given agent conditioning on all their evidence, it maximises their EV and FDT doesn’t.
So really the question that first needs to be answered before you can give a verdict on what is rational to do on Tails is “Given a decision problem with multiple distinct agents involved, how should they decide what to maximise over?” If the answer is they should maximise the EV of “downstream” future agents, they’ll end up with EDT decisions, and be misaligned with other agents. And if the answer is they should be maximising over ex-ante EV of agents running their decision algorithm, they’ll all be aligned and end up with FDT decisions. But the difference in performance of these decisions can’t be used to answer the question, because the evaluation of the performance depends on which way you answered the question in the first place. To be fair to FDT proponents, this line of reasoning is just as circular when used by an EDT agent. I bring it up as a failing of FDT proponents here though because I see them levying the above kind of performance-related arguments in favour of their view against EDT, whereas my take of EDT criticisms of FDT seems to be more like “Huh? Shouldn’t you figure out that whole counterpossible thing before you say your theory is even coherent let alone better?”
Part 3: Is there an objective answer then?
So if this way of deciding between decision theories is circular, how do we decide which one to use? Is there some other way to fill out this “free parameter” of what’s rational to be maximising over? I’m not sure. We can rely on our intuitions somewhat - if both theories can coherently perform better by their own metrics, we can look at which metric feels less “natural” to use. For most people this will probably be EDT-like verdicts, given how overwhelmingly intuitive things like the Sure Thing principle are. This seems pretty weak though - intuitions are incredibly slippery in the cases where these theories come apart, and I think you can think your way into finding either intuitive.
My hunch is instead that some of the machinery of thinking about decision theory just doesn’t survive at this level of zooming in/removing the abstracting away of multiple agents. It’s equipped to adjudicate decisions given an agent with defined goals/preferences - but it just doesn’t seem to have an answer for “what exactly should these multiple agents all be caring about?” It seems almost more in the realm of game theory - but even there the players have well-defined goals - here we’re essentially arguing over whether the players in a game should a priori have aligned goals. It just seems like a non-starter. Which is a very anticlimactic way to end a post but there you go.