Artificial Morality (or How Do We Teach Robots to Love)

One Saturday morning I came upon the website 80000 Hours. The idea of the site is to direct our activity to maximise impact. They have a list of world problems here. One of the most pressing is explained as the artificial intelligence “control problem” : how do we control forces that can out think us? This got me thinking. Here are those thoughts.


The Definition Problem (You Say Semantics…)

As with any abstraction, we are first faced with the problems of definition. You could base a doctorate on this alone.

At its heart, ‘morality’ is about ‘right’ and ‘wrong’. These can be phrased as ‘good’ and ‘bad’ or ‘should’ and ‘should not’.

This is about where the agreement ends.

Let’s start with scope. Does morality apply to an internal world of thought as well as an external world of action? Religions often feature the concept of ‘immoral’ thoughts; however, most would agree that action is the final arbiter. Without getting too metaphysical, I would argue that thoughts (or data routines) are immoral to the extent that they cause physical change in the world in a manner that increases the likelihood of an immoral action (even though that action need not actually occur). For example, ruminating on killing is immoral in the sense that it leads to physical changes in the brain that make a person more likely to kill in future situations.

The main show in morality revolves around the moral groupings: just what is ‘right’ or ‘wrong’? This is where the mud tends to be thrown.

‘Morality’ itself has had a bad rap lately. There are overhangs from periods of dogmatic and repressive religious control. Post modernism, together with advanced knowledge of other cultures, has questioned the certainties that, at least in Europe and North America, supported the predominantly Judeo-Christian moral viewpoint. This has lead to some voices questioning the very basis of morality: if the moral groupings seem arbitrary, do we even need them?

As with other subjects, I think the existential panic that post modernism delivered is constructive for our thinking on morality, but we should use it to build from firmer foundations rather than abandon the building altogether. The body of knowledge from other cultures helps us map the boundaries and commonalities in human morality that can teach us how to construct an artificial machine morality.

Interestingly, morality does appear to be a binary classification. For me concepts, such as an action being half moral or a quarter immoral don’t really make sense. When thinking of morality, it is similarly hard to think of a category that is neither moral nor immoral. There is the concept of amorality – but this indicates the absence of a classification. Hence, morality is a binary classification that can itself be applied in a binary manner.

An Aside on Tribalism

Morality has deep ties to its abstractive cousins: politics and religion. Moral groupings are often used to indicate tribal affiliations in these areas. Indeed, some suggest that the differences in moral groupings have come about to better delineate social groupings. This means that disagreement often becomes heated as definitions are intrinsically linked to a definition of (social) self.

Fear of falling into the wrong side of a social grouping can often constrain public discourse on morality. This is possibly one of the reasons for the limited field size described in the 80000 hours problem profile.

Another, often overlooked point, is that those with the strongest personal views on morality tend to lie on the right of the political spectrum (i.e. be conservative), whereas those writing about morality in culture and academia tend to lie on the left (i.e. be liberal in the US sense). Hence, those writing “objectively” about morality tend to view the subject from a different subjective viewpoint than those who feel most passionately about right and wrong. This sets up a continuing misunderstanding. In my reading I have felt that those on the left tend to underestimate the visceral pull of morality, while those on the right tend to over-emphasise a fixed rules based approach.

Seductive Rules

Programmers and engineers love rules. A simple set of rules appears as a seductive solution to the problem of morality: think the Ten Commandments or Asimov’s Three Laws. However, this does not work in practice. This is clear from nature. Social life is far too complex.

Rules may be better thought of as a high-level surface representation of an underlying complex  probabilistic decision-making process. As such, in many situations the rules and behaviour will overlap. This gives us the causative fallacy that the rules cause the behaviour, whereas in reality similarities in genetics and culture lead human beings to act in ways that can be clustered and labelled as ‘rules’.

This is most apparent at edge cases of behaviour – in certain situations humans act in a reasonable or understandable way that goes against the rules. For example, “Thou shall not kill” unless you are at war, in which case you should. Or “Thou shall not steal”, unless your family is starving and those you are stealing from can afford it. Indeed, it is these messy edge cases that form the foundations of a lot of great literature.

However, we should not see rules of human behaviour as having no use – they are the human-intelligible labels we apply to make sense of the world and to communicate. Like the proverbial iceberg tip, they can also guide us to the underlying mechanisms. They can also provide a reference test set to evaluate an artificial morality: does our morality system organic arrive at well-known human moral rules without explicit programming?

How Humans Do It (Lord of the Flies)

When we evaluate artificial intelligence we need to understand we are doing this relative to human beings. For example, an artificial morality may be possible that goes against commonly-agreed moral groupings in a human based morality. Or we could come up with a corvid morality that overlapped inexactly with a human morality. However, the “control problem” defined in the 80000hours article is primarily concerned with constructing an artificial morality that is beneficial for, and consistent with generally held concepts of, humanity.

As with many philosophical abstracts, human morality likely arises from the interplay of multiple adaptive systems.  I will look at some of the key suspects below.

(Maternal) Love is All You Need

In at least mammals, the filial bond is likely at the heart of many behavioural aspects that are deemed ‘good’ across cultures. The clue is kind of in the name: the extended periods of nursing found in mammals, and the biological mechanisms such as oxytocin to allow this, provide for a level of self-sacrifice and concern that human beings respect and revere. The book Affective Neuroscience gives a good basic grounding in these mechanisms.

This, I think, also solves much of the control problem – parents are more intelligent than their children but (when things are working) do not try to exterminate them as a threat at any opportunity.

Indeed, it is likely not a coincidence that the bureaucratic apparatus that forms the basis for the automation of artificial intelligence first arose in China. This is a country whose Confucian/Daoist morality prizes filial respect, and extends it across non-kin hierarchies.

If our machines cared for us as children we may not control them, but they would act in our best interest.

Moreover, one of the great inventions of the mono-theistic religions of the Middle East, was the extension of filial love (think Father and Son) to other human beings. The concepts of compassion and love that at least Christian scholars developed in the first millennium (AD) had at their basis not the lust of romantic love but the platonic love of parent and child. This in turn was driven by the problem of regulating behaviour in urban societies that were growing increasing distant from any kind of kin relationship.

Social Grouping

The approach discussed above does have its limitations. These are played out all over history. Despite those mono-theistic religions extending the filial bond, they were not able to extend it to all humanity; it hit a brick wall at the limits of social groups.

Although it goes in and out of fashion, it may be that the group selection mechanisms explored by clever people such as Edward O. Wilson, are at play. Are social group boundaries necessary for the survival of those within the group? Is there something inherently flawed, in the form of long-term survival, if the filial bond is extended too far? Or is this limitation only in the constraints of the inherited biology of human beings?

Returning to morality, Jared Diamond notes in The World Until Yesterday that many tribal cultures group human beings into ‘within tribe’ and ‘outside tribe’, wherein the latter are classed as ‘enemies’ that may be ‘morally’ killed. Furthermore, many tribal cultures are plagued by a tit-for-tat cycle of killing, which was deemed the ‘right’ action until the later arrival of a state mechanism where justice was out-sourced from the tribe. We are reminded that “Thou shall not kill” does not apply to all those smitten in the Old Testament.

For machines and morality, this seems an issue. Would an artificial intelligence need to define in and out groups for it to be accepted and trusted by human being? If so how can we escape cataclysmic conflict? Do you program a self driving car to value all life equally, or those of your countries citizens above others? As has been pointed out by many, bias may be implicit in our training data. Does our culture and observed behaviour train artificial intelligence systems to naturally favour one group over another? (Groups being defined by a collection of shared features detected from the data). If so this may be an area where explicit guidance is required.


Marc Hauser in Moral Minds touches on how many visceral feelings of right and wrong may be driven, or built upon, our capacity for disgust.

Disgust as an emotion has clearly defined facial expressions (see the work of Paul Ekman) that are shared across different human groups, indicating a deep shared biological basis in the brain.

Disgust is primarily an emotion of avoidance. It is best understood as a reaction to substances and situations that may be hazardous to our health. For example, disgust is a natural reaction to faeces, tainted foods and water supplies, vomit and decaying flesh. This makes us avoid these items and thus avoid the diseases (whether viral, bacterial or fungal) that accompany them. The feeling itself is based around a sensing and control of digestive organs such as the stomach and colon, the feeling is the pre-cursor to adaptive behaviours to purge the body of possibly disease-ridden consumables.

Hauser discusses research that suggests that the mechanisms of disgust have been extended to more abstract categories of items. When considering these items, people who have learned (or possibly inherited) an association feel an echo of the visceral disgust emotion that guides their decision making. There are also possible links to the natural strength of the disgust emotion in people and their moral sense: those who feel disgust more strongly tend also to be those who have a clearer binary feeling of right and wrong.

This is not to say that this linking of disgust and moral sense is always adaptive (and possibly ‘right’). Disgust is often a driving factor in out-group delineation. It may also underlie aversion to homosexuality among religious conservatives. However, it is often forgotten in moral philosophy, which tends to avoid ‘fluffy’ ‘feelings’ and subjective minefield this opens up.

Any artificial morality needs to bear disgust in mind though. Not only does it suggest one mechanism for implementing a moral sense at a nuts and bolts level, any implementation that ignores it will likely slip into the uncanny valley when it comes to human appraisal.


Another overlooked component of a human moral sense is fear.

Fear is another avoidance emotion that is primarily driven through the amygdala. Indeed, there may be overlaps between fear and disgust, as implemented in the brain. The other side of fear is the kick-starting of the ‘fight’ reflex, the release of epinephrine, norepinephrine and cortisol.

In moral reasoning, fear, like disgust, may be a mechanism to provide quick decision making. Fear responses may be linked to cultural learning (e.g. the internalised response to an angry or fearful parent around dangerous or ‘bad’ behaviours) and may guide the actual decision itself, e.g. pushing someone off a bridge or into a river is ‘bad’ because of the associated fear of falling or drowning, which gives us a feeling of ‘badness’.

Frontal Lobes

The moral reasoning discussed above forms the foundations of our thoughts. The actual thoughts themselves, including their linguistic expression in notes such as this, are also driven and controlled by the higher executive areas of the frontal lobes and prefrontal cortex. These areas are the conductor, who oversees the expression of neural activity over time in the rest of the cortex, including areas associated with sensory and motor processing.

In the famous case of Phineas Gage, violent trauma to the frontal lobes led to a decline in ‘moral’ behaviour and an increase in the ‘immoral’ vices of gambling, drinking and loose women. Hence, they appear to form a necessary part of our equipment for moral reasoning. Indeed, any model of artificial morality would do well to model the action of the prefrontal cortex and its role in inhibiting behaviour that is believed to be morally unsound.

The prefrontal cortex may also have another role: that of storyteller to keep our actions consistent. You see this behaviour often with small children: in order to keep beliefs regarding behaviour consistent in the face of often quite obvious inconsistencies, elaborate (and often quite hilarious) stories are told. It is also found in split brain patients to explain a behaviour caused by a side of the brain that is inaccessible to consciousness. Hence, human beings highly rate, and respond to, explanations of moral behaviour that are narratively consistent, even if they deviate from the more random and chaotic nature of objective reality. This is the critical front-end of our moral apparatus.

Where Does Culture Fit In?

Culture fits in as the guiding force for growth of the mechanisms discussed above. Causation is two-way, the environment drives epigenetic changes and neural growth and as agents we shape our environment. This all happens constantly over time.

Often it is difficult to determine the level at which a behaviour is hard-wired. The environmental human beings now live in around the world has been largely shaped by human beings. Clues for evaluating the depth of mechanisms, and for determining the strength of any association, include: universal expression across cultures, appearance in close genetic relatives such as apes and mammals, independent evolution in more distant cousins (e.g. tool use and social behaviour in birds), and consistency of behaviour over recent recorded time (10k years).

My own inclination is that culture guides expression, but it is difficult if not impossible to overwrite inherited behaviour. This is both good and bad. For example, evidence points to slavery and genocide as being cultural, they come and go throughout history. However, it is very difficult to train yourself not to gag when faced with the smell of fresh vomit or a decaying corpse.

A Note on Imperfection

Abuse. Murder. Violence. Post-natal depression. Crimes of passion. War. Things can and do go wrong. Turn on the news, it’s there for all to see.

Humans have a certain acceptance that humans are imperfect. Again a lot of great art revolves around this idea. People make mistakes. However, I’d argue that a machine that made mistakes wouldn’t last long.

A machine that reasons morally would necessarily not be perfect. To deal with the complexity of reality machines would need to reason probabilistically. This then means we have to abandon certainty, in particular the certainty of prediction. Classification rates in many machine learning tasks plateau at an 80-90% success rate, with progress then being measured for years in fractions of a percent. Would we be happy with a machine that only seems to be right 80-90% of the time?

Saying this I do note a tendency towards expecting perfection in society in reason years. When something goes wrong someone is to blame. Politicians need to step down; CEOs need to resign. There are lawsuits for negligence. This I feel is the flipside of technological certainty. We can predict events on a quantum scale and have supercomputers in our pockets; surely we can control the forces of nature? Maybe the development of imperfect yet powerful machines will allow us to regain some of our humanity.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s