It sounds obvious, but it’s worth noting that all human values are conditional. You value dogs as pets and not as food because you live in a country that outlaws dog meat, have tastier options, and have learned a passed down moral aversion to eating pets. Unfortunately, often times when speaking of “human values,” we pretend they exist in the Platonic-realm separated from the circumstances of a particular time and place. Is boredom a human value? The recovery rate for heroin addicts is less than 30%, and that’s with the horrible side effects and societal pressures against it. I can very easily imagine a scenario where all humans do is (increasingly large amounts of) heroin, for as long as they live. Since you’re reading this blog, you’re probably familiar with the so-called “Rationalist” community and their concern with the control problem. This post touches on some of my thoughts on some directly and tangentially related problems.
I’ve heard the control problem stated as “guaranteeing a self modifying AI has human-aligned goals.” I’m skeptical that anything but the worst aspects of humanity will come of this. Human goals – for groups of humans, are conditional on their circumstances, so aligning an AI with any particular group of humans’ goals may produce wildly different goals in the AI. The AI may come to believe that humans value democracy if it went polling for human values now, but I can imagine another point in time where the AI would come to believe that humans value socialism or some other form of government. It’s easy to say that humans’ governmental preferences are just instrumental, and reflect some deeper terminal goal, but taken to its logical conclusion, this argument ends in a situation where the only value humans possess is “utility.”
Humans have very limited imaginations. It’s hard to imagine what human values will be in circumstances that no human or group of humans has ever been in. Infants and centenarians have very different values. It seems likely that humans that live to be 400 will have different values still. What will human values look like when increasingly intelligent friendly AI are seeking to maximize those values while simultaneously affecting those values? What will human values look like when an artificial intelligence is capable of performing all tasks that humans can do, only much better? Do we expect humans will want the AI to play dumb, or to not exist at all to prevent them from feeling obsolete? If humans are alright with being second class intellects, they may happily give into a wireheaded future.
One objection I see to this line of thought is the claimed existence of some universal state which maximizes human value. While this may be true, I fear this state is one where all humans are receiving maximal stimulation to their reward systems, and is not the same as the sorts of states that most people who discuss friendly AI refer to when they speak of maximizing human values. I think (although I could be very wrong) most who are interested in friendly AI wish for the AI to maintain a state where humans are able to pursue things like art, science, mathematics, etc, even at the expense of pure utility. But the preservation of these things means nothing by itself. We currently have programs that can produce complex proofs of mathematical theorems, and we use them. If math is pursued because of the joy of the process, then we’ve already begin to give up on that front. If math is pursued because of the status it brings to those who are successful at it, well then, is it really worth preserving? If math is pursued because the answers it provides can be used to make better and better gadgets which quickly approach perfect hedon delivery systems, well, you see where this is going… Of course these are not the only reasons for pursuing math, but some seam intuitively more meaningful than others, and it appears to me that utility and meaning can be in stark contrast. I think if humans are not careful they will gladly trade meaning for utility. I think that any friendly AI needs to consider meaningfulness alongside utility when it attempts to optimize for something, and it seems like meaningfulness is more difficult to reduce to physical properties than utility – at least we can point to dopamine and serotonin in the brain.
There are a lot of unsolved control problems. We’re not very good stopping young men from becoming radicalized on the internet; that is to say, we’re not good at preventing divergent human values in individuals (or the circumstances that lead to them). I don’t think we have a good sense of how terms in human utility functions change over time. Clearly they change as we age, but it’s difficult to control for the change in circumstances that also occur as we age. If there is some convergence to pure hedonism as people get older, we are not equipped to deal with it. If there is some large divergence of values as we age, we may also not be equipped to deal with it. The poor ability of humans to accurately compute their future utility also leads to some difficult-to-control problems.
Try to imagine experiencing “ten times the utility you experienced in the best moment of your life.” If you’re anything like me, this is difficult. The fact that it’s difficult, means you are less susceptible to making trade offs favoring your own utility versus the status quo (see this for potential evidence). This failure of humans to accurately calculate their own utility functions contributes to what it means to be human, but failure to calculate utility can also have disastrous or wasteful effects; just consider the time and resources spent acquiring things that don’t bring us much happiness at all (for the macro version see, this). For an easy example to analyse – consider habitual gamblers. Attempts to control this basically boil down to two things – present very obvious and hard to ignore negative utility (sometimes this occurs naturally, i.e. “rock bottom”), generally in the form of social-stigma/jail time/fines, or do the normal reinforcement learning thing and reward not-gambling, typically in the form of positive social reinforcement, or even things like “chips” a la AA. These aren’t exactly that effective given that gambling addiction relapse rates are around 80%. But this is for a scenario that is well understood, and is not pervasive. If either of these conditions is broken, I imagine the odds of controlling the addiction will be much worse.
Individual humans are difficult to control, but Gods are likely harder. Now’s the point in this post where I tell you to read Scott Alexander’s, Meditations on Moloch, if you haven’t already. TLDR; Moloch is the personification (deification?) of a set of problems arising form humans acting according to self-interest at the expense of group-interest and better outcomes overall, the simplest example being two defectors in the prisoners’ dilemma. The solution to most of these problems is a mob boss, or an outsider that can coordinate and enforce the better solution (cooperate/cooperate), but we can’t always guarantee a mob boss, or even that the mob boss isn’t experiencing a prisoner’s dilemma of his own.
To summarize, humans are difficult to control. Environments which have huge effects on human values are also difficult to control, and it’s difficult to imagine unforeseen states which human utility calculations are not designed for. Humans may trade meaning for utility, individual utility for group utility, or anything for utility really, and they may even believe that they are maximizing their utility while doing exactly the opposite. The situation gets worse among groups of humans when considering Moloch-like problems, especially when taking the problems with individual humans into account.
Sorry, no solutions here.