Bio-Inspired Robotics for Beginners

There are several well-known problems with modern “deep” learning approaches. These include the need for large quantities of training data and a lack of robustness. These are related. Neural network architectures are trained by computing an error between some “ground truth” data in the training data and the architecture’s prediction of that same data, given a set of associated input data. The error is directed based on differentials for each component of the architecture. This feels to me as doing everything in reverse.

Animal brains offer an alternative view of intelligence. Animals are not taught complex behaviours by providing millions of labelled training examples. Instead, they learn the natural (local) structure of their environment over time.

Photo by Pixabay on Pexels.com

What Do We Know that Might Help?

So, what do we know about animal brains that could help us build artificial intelligent systems?

Quite a lot it turns out.

Mammals have evolved a clever solution to the problem of predicting their environment. They have multiple sense organs that provide information in the form of electrical signals. These electrical signals are provided to neural structures. There are roughly three tiers: the brainstem, the midbrain and the cortex. These represent different levels of processing. They also show us the pathway of evolution: newer structures have been built on top of older structures, and older structures have been co-opted to provide important support functions for the newer structures.

In the mammalian brain there are some commonalities as to how information is received and processed. The cortex appears to be the main computational device. Different sensory modalities appear to be processed using similar cortical processes. Pre-processing and feedback is controlled by the mid-brain structures such as the thalamus and basal ganglia.  

In terms of knowledge of sensory processing, we know most about the visual system. We then know similar amounts about auditory and motor systems, including perception of the muscles and skin (somatosensory). We know the least about smell and interoception, our sensing of internal signals relating to organs and keeping our body in balance. All the sensory systems use nerve fibres to communicate information in the form of electrical signals.

Cortex I/O

In mammalian brains, the patterns of sensory input to the brain and motor output are fairly well conserved across species. All input and output (apart from smell) is routed via the thalamus. The basal ganglia appears to be a support structure for at least motor control. The brain is split into two halves, and each half receives input from one side of the body. Most mammals have the following:

  • V1 – the primary visual cortex – an area of the cortex that receives an input from the eye (the retina) via the thalamus;
  • A1 – the primary audio cortex – an area of the cortex that receives an input from the ear via the thalamus;
  • S1 – the primary somatosensory cortex – an area of the cortex that receives an input from touch, position, pain and temperature sensors that are positioned over the skin and within muscle structures (again via the thalamus);
  • Insula – an area of the cortex that receives an input from the body, indicating its internal state, from the thalamus; and
  • M1 – the primary motor cortex – an area of the cortex that provides an output to the muscles (it receives feedback from the thalamus and basal ganglia).

For a robotic device, we normally have access to the following:

  • Image data – for video, frames (two-dimensional matrices) at a certain resolution (e.g. 640 by 480) with a certain frame rate (e.g. 30-60 frames per second) and a certain number of channels (e.g. 3 for RGB). The cortex receives image information via the lateral geniculate nucleus (LGN) of the thalamus. The image information is split down the centre, so each hemisphere receives one half of the visual field.  The LGN has been shown to perform a Difference Of Gaussians (DOG) computation to provide something similar to an edge image. The image that is formed in M1 of the cortex is also mapped to a polar representation, with axes representing an angle of rotation and a radius (or visual degree from the horizontal).
  • Audio – in one form, a one-dimensional array of intensities or amplitudes (typically 44100 samples per second) with two channels (e.g. left and right). The cochlea of the ear actually outputs frequency information as opposed to raw sound (e.g. pressure) information. This can be approximated in a robotic device by taking the Fast Fourier Transform to get a one-dimensional array of frequencies and amplitudes. 
  • Touch sensors / motor positions / capacitive or resistive touch – this is more ropey and has a variety of formats. We can normally pre-process to a position (in 2 or 3 dimensions) and an intensity. This could be multiple channel, e.g. at each position we could have a temperature reading and a pressure reading. On a LEGO Mindstorms ev3 robot, we have a Motor class that provides information such as the current position of a motor in pulses of the rotary encoder, the current motor speed, whether the motor is running and certain motor parameters. 
  • Computing device information – this is again more of a jump. An equivalent of interoception could be seen as information on running processes and system utilization. If the robotic device has a battery this could also include battery voltages and currents. In Python, we can use something like psutil to get some of this information. On a LEGO Mindstorms ev3 robot, we have PowerSupply classes that provide information on the battery power.
  • Motor commands – robotic devices may have one or more linear or rotary motors. Typically, these are controlled by sending motor commands. This may be to move to a particular relative position, to move at a given speed for a given time and/or to rotate by a given number of rotations.

We will leave smell for now (although it’s a large part of many mammals sensory repertoire). It probably slots in best as part of interoception. One day sensors may provide an equivalent data stream. 

We can summarise this rough mapping as follows:

BrainRobot
Eyes / Retina > V1Video Camera > Split L/R > Polar Mapping
Ears / Cochlea > A1Microphone > Channel Split L/R > FFT
Outer body + muscle > S1Multisensor > Position + value
Interoception > InsulaDevice measurements > numeric array
M1 > musclesNumeric commands > motors

One advantage of the “deep” learning movement is that it is now conventional to represent all information as multidimensional arrays, such as Numpy arrays within Python. Indeed, we can consider “intelligence” as a series of transformations between different multidimensional arrays.

Cortex Properties 

The mammalian cortex is a two-dimensional sheet. This provides a strong constraint on the computational architecture. Human brains appear wrinkly because they are trapped within our skulls and need to maximise their surface area. Mouse brains are quite smooth.

The cortex is a layered structure, providing its thickness. Each layer is a few mm in height. There are between 4 and 6 layers, depending on the area of the cortex. The layers contain a large number of implementing neurons. These neurons provide a combination of excitatory and inhibitory connections. Different layers receive different inputs and provide different outputs. Feedback appears to be supplied over a large area via layer 1, input is received from the thalamus at layer 4, input is received from other parts of the cortex at layers 3 and 4, layer 2 provides feed back to a neighbouring cortical area, layer 3 provides a feed forward output to other cortical areas (wider range) and layers 5 and 6 provide feedback to the thalamus, layer 5 also provides a feed forward output to a neighbouring cortical area. Computation occurs vertically in the layers and information is passed within the plane of the cortical sheet.

The two-dimensional cortical sheet of many mammals appears to have a common general topology. The input and output areas appear reasonably fixed, and are likely genetically determined.  The visual, somatosensory and motor areas appear aligned. This may be what creates “embodied” intelligence; we think conceptually using a common co-ordinate system. For example, the half an image from the eyes is aligned bottom-to-top within the visual processing areas, which is aligned with the feet-to-head axis of the body as felt and the feet-to-head axis of the body as acted-upon (i.e. V1, S1 and M1 are aligned). This makes sense – it is more efficient to arrange our maps in this way.

From Finlay et al

The cortex also appears to have neuronal groupings that have certain functional roles. This is best understood in the visual processing areas, where different cortical columns are found to relate to different portions of the visual field; each column has a receptive field equivalent to a small group of pixels (say somewhere around 1000). Outside of the visual field evidence is more shaky. 

The cortex of higher mammals also appears to have a uniform volume but a differing neuronal density. This neuronal density appears similar to a diffusion gradient. Within the visual areas towards the back of the brain there are a large number of neurons per square millimetre; towards the front of the brain there are fewer neurons per square millimetre. The baboon has a 4:1 ratio. However, because the volume of the cortical sheet is reasonably constant, the neurons towards the front of the brain are more densely connected (as there is room). A simple gradient in neuron number may provide an information bottleneck that forces a compression of neural representations, leading to greater abstraction.

Considering a two-dimensional computing sheet gives us an insight into two cortical pathways that have been described for vision. A first processing pathway (the dorsal stream) is the line drawn from V1 to S1 and M1. This pathway is swayed towards motion, i.e. information that is useful for muscle control. A second pathway (the ventral stream) is the line drawn from V1 to A1. This pathway is swayed towards object recognition, which makes sense because we can correlate audio and visual representations of objects to identify them. What is also interesting is that the lower visual field abuts the first processing pathway and the upper visual field abuts the second processing pathway – this may reflect the fact that the body is orientated below the horizontal line of vision and so it makes sense to map the lower visual field to the body in a more one-to-one manner.

The cortical sheet also gives us an insight into implementing efficient motor control. You will see that S1, the primary somatosensory area, is adjacent to M1, the primary motor cortex. This means that activation of the motor cortex, e.g. to move muscles, will activate the somatosensory representations regardless of any somatosensory input from the thalamus. We thus have two ways of activating the body representations, one being generated from the motor commands and one being generated by the input from the body. This gives us the possibility to compare and integrate these signals in time to provide control. For example, if the signal received from the body did not match the proposed representation from the motor commands then either the motor commands may be modified or our sensing of our body is modified (or both).

So to recap:

  • the main computing machinery of the brain is a two-dimensional sheet;
  • it has an embodied organisation – the relative placement of processing areas has functional relevance;
  • it has a density gradient that is aligned with the embodied organisation; and
  • it appears to use common, repeated computing units.

Cortex to Robot

This suggests a plan for organising computation for a robotic device.
Computers typically work according to a one-dimensional representation: data in memory. If we are creating a bio-inspired robotic device, we need to extend this to two dimensions, where the two dimensions indicate an ordering of processing units. It might be easier to image a football field of interconnected electronic calculators.

We can then align our input arrays with the processing units. At input and output areas of any computing architecture there could be a one-to-one organisation of array values to processing units. The number of processing units may decrease along an axis of the two-dimensional representation, while their interconnections increase. In fact, a general pattern is that connectivity is local at points of input or output but becomes global in between these points. These spaces in between store more abstract representations.

This architecture of the cortex reminds us of another architecture that has developed with “deep” learning: the autoencoder.

Standard Autoencoder
Standard Autoencoder

Indeed, this is not by accident – the structure of the visual processing areas of the brain has been the inspiration for these kind of structures. Taking a two-dimensional view of computation also allows us to visualise connectivity using a two-dimensional graph structure.

Onward!

So we can start our bio-inspired robot by constructing a setup where different sensory modalities are converted into Numpy arrays and then processed so that we can compare them in a two-dimensional structure. The picture becomes murkier once we look at how associations are formed and how the thalamus comes into the equation. However, by looking at possible forms of our signals in a common processing environment we can begin to join the dots.

References

Advertisements

Capturing Live Audio and Video in Python

In my robotics projects I want to capture live audio and video data and convert it into Numpy multi-dimensional arrays for further processing. To save you several days, this blog post explains how I go about doing this.

Audio / Video Not Audio + Video

A first realisation is that you need to capture audio and video independently. You can record movie files with audio, but as far as I could find there is no simple way to live capture both audio and video data.

Video

For video processing, I found there were two different approaches that could be used to process video data:

  • Open CV in Python; and
  • Wrapping FFMPEG using SubProcess.

Open CV

The default library for video processing in Python is OpenCV. Things have come a long way since my early experiences with OpenCV in C++ over a decade ago. Now there is a nice Python wrapper and you don’t need to touch any low-level code. The tutorials here are a good place to start.

I generally use Conda/Anaconda these days to manage my Python environments (the alternative being old skool virtual environments). Setting up a new environment with Jupyter Notebook and Open CV is now straightforward:

conda install opencv jupyter

As a note – installing OpenCV in Conda seems to have been a pain up to a few years ago. There are thus several out of date Stack Overflow answers that come up in the searches, that refer to installing from specific sources (e.g. from menpo). This appears not to be needed now.

One problem I had in Linux (Ubuntu 18.04) is that the GTK libraries didn’t play nicely in the Conda environment. I could capture images from the webcam but not display them in a window. This lead me to look for alternative visualisation strategies that I describe below.

A good place to start with OpenCV is this video tutorial. As drawing windows led to errors I designed a workaround where I used PIL (Python Image Library) and IPython to generate an image from the Numpy array and then show it at about 30 fps. The code separates out each of the YUV components and displays them next to each other. This is useful for bio-inspired processing.

# Imports
import PIL
import io
import cv2
import matplotlib.pyplot as plt
from IPython import display
import time
import numpy as np

# Function to convert array to JPEG for display as video frame
def showarray(a, fmt='jpeg'):
    f = io.BytesIO()
    PIL.Image.fromarray(a).save(f, fmt)
    display.display(display.Image(data=f.getvalue()))

# Initialise camera
cam = cv2.VideoCapture(0)
# Optional - set to YUV mode (remove for BGR)
cam.set(16, 0)
# These allow for a frame rate to be printed
t1 = time.time()

# Loops until an interrupt
try:
    while(True):
        t2 = time.time()
        # Capture frame-by-frame
        ret, frame = cam.read()
        # Join components horizontally
        joined_array = np.concatenate(
        (
            frame[:,:,0], 
            frame[:, 1::2, 1], 
            frame[:, 0::2, 1]
        ), axis=1)
        # Use above function to show array
        showarray(joined_array)
        # Print frame rate
        print(f"{int(1/(t2-t1))} FPS")
        
        # Display the frame until new frame is available
        display.clear_output(wait=True)
        t1 = t2
except KeyboardInterrupt:
    # Release the camera when interrupted
    cam.release()
    print("Stream stopped")</code></pre>

In the above code, “frame” is a three-dimensional tensor or array where the first dimension relates to rows of the image (e.g. the y-direction of the image), the second dimension relates to columns of the image (e.g. the x-direction of the image) and the third dimension relates to the three colour channels. Often for image processing it is useful to separate out the channels and just work on a single channel at a time (e.g. equivalent to a 2D matrix or grayscale image).

FFMPEG

An alternative to using OpenCV is to use subprocess to wrap the FFMPEG, a command line video and audio processing utility.

This is a little trickier as it involves accessing the video buffers. I have based on solution on this guide by Zulko here.

#Imports
import subprocess as sp
import numpy as np
import matplotlib.pyplot as plt

FFMPEG_BIN = "ffmpeg"
# Define command line command
command = [ FFMPEG_BIN,
            '-i', '/dev/video0',
            '-f', 'image2pipe',
            '-pix_fmt', 'rgb24',
            '-an','-sn', #-an, -sn disables audio and sub-title processing respectively
            '-vcodec', 'rawvideo', '-']
# Open pipe
pipe = sp.Popen(command, stdout = sp.PIPE, bufsize=(640*480*3))

# Display a few frames
no_of_frames = 5
fig, axes = plt.subplots(no_of_frames, 1)

for i in range(0, no_of_frames):
    # Get the raw byte values from the buffer
    raw_image = pipe.stdout.read(640*480*3)
    # transform the byte read into a numpy array
    image = np.frombuffer(raw_image, dtype='uint8')
    image = image.reshape((480,640, 3))
    # Flush the pipe
    pipe.stdout.flush()
    axes[i].imshow(image)

Now I had issues flushing the pipe in a Jupyter notebook, so I ended up using the OpenCV method in the end. Also it is trickier working out the byte structure for YUV data.

Audio

My middle daughter generates a lot of noise.

For audio, there are also a number of options. I have tried:

Now PyAudio appears to be preferred. However, I am quickly learning that audio / video processing in Python is not yet as polished as pure image processing or building a neural network.

PyAudio provides a series of wrappers around the PortAudio libraries. However, I had issues getting this to work in an Conda environment. Initially, no audio devices showed up. After a long time working through Stack Overflow, I found that installing from the Conda-Forge source did allow me to find audio devices (see here). But even though I could see the audio devices I then had errors opening an audio stream. (One tip for both audio and video is to look at your terminal output when capturing audio and video – the low level errors will be displayed here rather than in a Jupyter notebook.)

AlsaAudio

Given my difficulties with PyAudio I then tried AlsaAudio. I had more success with this.

My starting point was the code for recording audio that is provided in the AlsaAudio Github repository. The code below records a snippet of audio then loads it from the file into a Numpy array. It became the starting point for a streaming solution.

# Imports
import alsaaudio
import time
import numpy as np

# Setup Audio for Capture
inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NONBLOCK, device="default")
inp.setchannels(1)
inp.setrate(44100)
inp.setformat(alsaaudio.PCM_FORMAT_S16_LE)
inp.setperiodsize(160)

# Record a short snippet
with open("test.wav", 'wb') as f:
    loops = 1000000
    while loops > 0:
        loops -= 1
        # Read data from device
        l, data = inp.read()
      
        if l:
            f.write(data)
            time.sleep(.001)

f = open("test.wav", 'rb')

# Open the device in playback mode. 
out = alsaaudio.PCM(alsaaudio.PCM_PLAYBACK, device="default")

# Set attributes: Mono, 44100 Hz, 16 bit little endian frames
out.setchannels(1)
out.setrate(44100)
out.setformat(alsaaudio.PCM_FORMAT_S16_LE)

# The period size controls the internal number of frames per period.
# The significance of this parameter is documented in the ALSA api.
# We also have 2 bytes per sample so 160*2 = 320 = number of bytes read from buffer
out.setperiodsize(160)

# Read data from stdin
data = f.read(320)
numpy_array = np.frombuffer(data, dtype='<i2')
while data:
    out.write(data)
    data = f.read(320)
    decoded_block = np.frombuffer(data, dtype='<i2')
    numpy_array = np.concatenate((numpy_array, decoded_block))

The numpy_array is then a long array of sound amplitudes.

Sampler Object

I found a nice little Gist for computing the FFT here. This uses a Sampler object to wrap the AlsaAudio object.

from collections import deque
import struct
import sys
import threading
import alsaaudio
import numpy as np

# some const
# 44100 Hz sampling rate (for 0-22050 Hz view, 0.0227ms/sample)
SAMPLE_FREQ = 44100
# 66000 samples buffer size (near 1.5 second)
NB_SAMPLE = 66000

class Sampler(threading.Thread):
    def __init__(self):
        # init thread
        threading.Thread.__init__(self)
        self.daemon = True
        # init ALSA audio
        self.inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NORMAL, device="default")
        # set attributes: Mono, frequency, 16 bit little endian samples
        self.inp.setchannels(1)
        self.inp.setrate(SAMPLE_FREQ)
        self.inp.setformat(alsaaudio.PCM_FORMAT_S16_LE)
        self.inp.setperiodsize(512)
        # sample FIFO
        self._s_lock = threading.Lock()
        self._s_fifo = deque([0] * NB_SAMPLE, maxlen=NB_SAMPLE)

    def get_sample(self):
        with self._s_lock:
            return list(self._s_fifo)

    def run(self):
        while True:
            # read data from device
            l, data = self.inp.read()
            if l > 0:
                # extract and format sample (normalize sample to 1.0/-1.0 float)
                raw_smp_l = struct.unpack('h' * l, data)
                smp_l = (float(raw_smp / 32767) for raw_smp in raw_smp_l)
                with self._s_lock:
                    self._s_fifo.extend(smp_l)
            else:
                print('sampler error occur (l=%s and len data=%s)' % (l, len(data)), file=sys.stderr)

Next Steps

This is where I am so far.

The next steps are:

  • look into threading and multiprocessing so that we can run parallel audio and video sampling routines;
  • extend the audio (and video?) processing to obtain the FFT; and
  • optimise for speed of capture.

Artificial Morality (or How Do We Teach Robots to Love)

One Saturday morning I came upon the website 80000 Hours. The idea of the site is to direct our activity to maximise impact. They have a list of world problems here. One of the most pressing is explained as the artificial intelligence “control problem” : how do we control forces that can out think us? This got me thinking. Here are those thoughts.

carnival-686283_1280

The Definition Problem (You Say Semantics…)

As with any abstraction, we are first faced with the problems of definition. You could base a doctorate on this alone.

At its heart, ‘morality’ is about ‘right’ and ‘wrong’. These can be phrased as ‘good’ and ‘bad’ or ‘should’ and ‘should not’.

This is about where the agreement ends.

Let’s start with scope. Does morality apply to an internal world of thought as well as an external world of action? Religions often feature the concept of ‘immoral’ thoughts; however, most would agree that action is the final arbiter. Without getting too metaphysical, I would argue that thoughts (or data routines) are immoral to the extent that they cause physical change in the world in a manner that increases the likelihood of an immoral action (even though that action need not actually occur). For example, ruminating on killing is immoral in the sense that it leads to physical changes in the brain that make a person more likely to kill in future situations.

The main show in morality revolves around the moral groupings: just what is ‘right’ or ‘wrong’? This is where the mud tends to be thrown.

‘Morality’ itself has had a bad rap lately. There are overhangs from periods of dogmatic and repressive religious control. Post modernism, together with advanced knowledge of other cultures, has questioned the certainties that, at least in Europe and North America, supported the predominantly Judeo-Christian moral viewpoint. This has lead to some voices questioning the very basis of morality: if the moral groupings seem arbitrary, do we even need them?

As with other subjects, I think the existential panic that post modernism delivered is constructive for our thinking on morality, but we should use it to build from firmer foundations rather than abandon the building altogether. The body of knowledge from other cultures helps us map the boundaries and commonalities in human morality that can teach us how to construct an artificial machine morality.

Interestingly, morality does appear to be a binary classification. For me concepts, such as an action being half moral or a quarter immoral don’t really make sense. When thinking of morality, it is similarly hard to think of a category that is neither moral nor immoral. There is the concept of amorality – but this indicates the absence of a classification. Hence, morality is a binary classification that can itself be applied in a binary manner.

An Aside on Tribalism

Morality has deep ties to its abstractive cousins: politics and religion. Moral groupings are often used to indicate tribal affiliations in these areas. Indeed, some suggest that the differences in moral groupings have come about to better delineate social groupings. This means that disagreement often becomes heated as definitions are intrinsically linked to a definition of (social) self.

Fear of falling into the wrong side of a social grouping can often constrain public discourse on morality. This is possibly one of the reasons for the limited field size described in the 80000 hours problem profile.

Another, often overlooked point, is that those with the strongest personal views on morality tend to lie on the right of the political spectrum (i.e. be conservative), whereas those writing about morality in culture and academia tend to lie on the left (i.e. be liberal in the US sense). Hence, those writing “objectively” about morality tend to view the subject from a different subjective viewpoint than those who feel most passionately about right and wrong. This sets up a continuing misunderstanding. In my reading I have felt that those on the left tend to underestimate the visceral pull of morality, while those on the right tend to over-emphasise a fixed rules based approach.

Seductive Rules

Programmers and engineers love rules. A simple set of rules appears as a seductive solution to the problem of morality: think the Ten Commandments or Asimov’s Three Laws. However, this does not work in practice. This is clear from nature. Social life is far too complex.

Rules may be better thought of as a high-level surface representation of an underlying complex  probabilistic decision-making process. As such, in many situations the rules and behaviour will overlap. This gives us the causative fallacy that the rules cause the behaviour, whereas in reality similarities in genetics and culture lead human beings to act in ways that can be clustered and labelled as ‘rules’.

This is most apparent at edge cases of behaviour – in certain situations humans act in a reasonable or understandable way that goes against the rules. For example, “Thou shall not kill” unless you are at war, in which case you should. Or “Thou shall not steal”, unless your family is starving and those you are stealing from can afford it. Indeed, it is these messy edge cases that form the foundations of a lot of great literature.

However, we should not see rules of human behaviour as having no use – they are the human-intelligible labels we apply to make sense of the world and to communicate. Like the proverbial iceberg tip, they can also guide us to the underlying mechanisms. They can also provide a reference test set to evaluate an artificial morality: does our morality system organic arrive at well-known human moral rules without explicit programming?

How Humans Do It (Lord of the Flies)

When we evaluate artificial intelligence we need to understand we are doing this relative to human beings. For example, an artificial morality may be possible that goes against commonly-agreed moral groupings in a human based morality. Or we could come up with a corvid morality that overlapped inexactly with a human morality. However, the “control problem” defined in the 80000hours article is primarily concerned with constructing an artificial morality that is beneficial for, and consistent with generally held concepts of, humanity.

As with many philosophical abstracts, human morality likely arises from the interplay of multiple adaptive systems.  I will look at some of the key suspects below.

(Maternal) Love is All You Need

In at least mammals, the filial bond is likely at the heart of many behavioural aspects that are deemed ‘good’ across cultures. The clue is kind of in the name: the extended periods of nursing found in mammals, and the biological mechanisms such as oxytocin to allow this, provide for a level of self-sacrifice and concern that human beings respect and revere. The book Affective Neuroscience gives a good basic grounding in these mechanisms.

This, I think, also solves much of the control problem – parents are more intelligent than their children but (when things are working) do not try to exterminate them as a threat at any opportunity.

Indeed, it is likely not a coincidence that the bureaucratic apparatus that forms the basis for the automation of artificial intelligence first arose in China. This is a country whose Confucian/Daoist morality prizes filial respect, and extends it across non-kin hierarchies.

If our machines cared for us as children we may not control them, but they would act in our best interest.

Moreover, one of the great inventions of the mono-theistic religions of the Middle East, was the extension of filial love (think Father and Son) to other human beings. The concepts of compassion and love that at least Christian scholars developed in the first millennium (AD) had at their basis not the lust of romantic love but the platonic love of parent and child. This in turn was driven by the problem of regulating behaviour in urban societies that were growing increasing distant from any kind of kin relationship.

Social Grouping

The approach discussed above does have its limitations. These are played out all over history. Despite those mono-theistic religions extending the filial bond, they were not able to extend it to all humanity; it hit a brick wall at the limits of social groups.

Although it goes in and out of fashion, it may be that the group selection mechanisms explored by clever people such as Edward O. Wilson, are at play. Are social group boundaries necessary for the survival of those within the group? Is there something inherently flawed, in the form of long-term survival, if the filial bond is extended too far? Or is this limitation only in the constraints of the inherited biology of human beings?

Returning to morality, Jared Diamond notes in The World Until Yesterday that many tribal cultures group human beings into ‘within tribe’ and ‘outside tribe’, wherein the latter are classed as ‘enemies’ that may be ‘morally’ killed. Furthermore, many tribal cultures are plagued by a tit-for-tat cycle of killing, which was deemed the ‘right’ action until the later arrival of a state mechanism where justice was out-sourced from the tribe. We are reminded that “Thou shall not kill” does not apply to all those smitten in the Old Testament.

For machines and morality, this seems an issue. Would an artificial intelligence need to define in and out groups for it to be accepted and trusted by human being? If so how can we escape cataclysmic conflict? Do you program a self driving car to value all life equally, or those of your countries citizens above others? As has been pointed out by many, bias may be implicit in our training data. Does our culture and observed behaviour train artificial intelligence systems to naturally favour one group over another? (Groups being defined by a collection of shared features detected from the data). If so this may be an area where explicit guidance is required.

Disgust

Marc Hauser in Moral Minds touches on how many visceral feelings of right and wrong may be driven, or built upon, our capacity for disgust.

Disgust as an emotion has clearly defined facial expressions (see the work of Paul Ekman) that are shared across different human groups, indicating a deep shared biological basis in the brain.

Disgust is primarily an emotion of avoidance. It is best understood as a reaction to substances and situations that may be hazardous to our health. For example, disgust is a natural reaction to faeces, tainted foods and water supplies, vomit and decaying flesh. This makes us avoid these items and thus avoid the diseases (whether viral, bacterial or fungal) that accompany them. The feeling itself is based around a sensing and control of digestive organs such as the stomach and colon, the feeling is the pre-cursor to adaptive behaviours to purge the body of possibly disease-ridden consumables.

Hauser discusses research that suggests that the mechanisms of disgust have been extended to more abstract categories of items. When considering these items, people who have learned (or possibly inherited) an association feel an echo of the visceral disgust emotion that guides their decision making. There are also possible links to the natural strength of the disgust emotion in people and their moral sense: those who feel disgust more strongly tend also to be those who have a clearer binary feeling of right and wrong.

This is not to say that this linking of disgust and moral sense is always adaptive (and possibly ‘right’). Disgust is often a driving factor in out-group delineation. It may also underlie aversion to homosexuality among religious conservatives. However, it is often forgotten in moral philosophy, which tends to avoid ‘fluffy’ ‘feelings’ and subjective minefield this opens up.

Any artificial morality needs to bear disgust in mind though. Not only does it suggest one mechanism for implementing a moral sense at a nuts and bolts level, any implementation that ignores it will likely slip into the uncanny valley when it comes to human appraisal.

Fear

Another overlooked component of a human moral sense is fear.

Fear is another avoidance emotion that is primarily driven through the amygdala. Indeed, there may be overlaps between fear and disgust, as implemented in the brain. The other side of fear is the kick-starting of the ‘fight’ reflex, the release of epinephrine, norepinephrine and cortisol.

In moral reasoning, fear, like disgust, may be a mechanism to provide quick decision making. Fear responses may be linked to cultural learning (e.g. the internalised response to an angry or fearful parent around dangerous or ‘bad’ behaviours) and may guide the actual decision itself, e.g. pushing someone off a bridge or into a river is ‘bad’ because of the associated fear of falling or drowning, which gives us a feeling of ‘badness’.

Frontal Lobes

The moral reasoning discussed above forms the foundations of our thoughts. The actual thoughts themselves, including their linguistic expression in notes such as this, are also driven and controlled by the higher executive areas of the frontal lobes and prefrontal cortex. These areas are the conductor, who oversees the expression of neural activity over time in the rest of the cortex, including areas associated with sensory and motor processing.

In the famous case of Phineas Gage, violent trauma to the frontal lobes led to a decline in ‘moral’ behaviour and an increase in the ‘immoral’ vices of gambling, drinking and loose women. Hence, they appear to form a necessary part of our equipment for moral reasoning. Indeed, any model of artificial morality would do well to model the action of the prefrontal cortex and its role in inhibiting behaviour that is believed to be morally unsound.

The prefrontal cortex may also have another role: that of storyteller to keep our actions consistent. You see this behaviour often with small children: in order to keep beliefs regarding behaviour consistent in the face of often quite obvious inconsistencies, elaborate (and often quite hilarious) stories are told. It is also found in split brain patients to explain a behaviour caused by a side of the brain that is inaccessible to consciousness. Hence, human beings highly rate, and respond to, explanations of moral behaviour that are narratively consistent, even if they deviate from the more random and chaotic nature of objective reality. This is the critical front-end of our moral apparatus.

Where Does Culture Fit In?

Culture fits in as the guiding force for growth of the mechanisms discussed above. Causation is two-way, the environment drives epigenetic changes and neural growth and as agents we shape our environment. This all happens constantly over time.

Often it is difficult to determine the level at which a behaviour is hard-wired. The environmental human beings now live in around the world has been largely shaped by human beings. Clues for evaluating the depth of mechanisms, and for determining the strength of any association, include: universal expression across cultures, appearance in close genetic relatives such as apes and mammals, independent evolution in more distant cousins (e.g. tool use and social behaviour in birds), and consistency of behaviour over recent recorded time (10k years).

My own inclination is that culture guides expression, but it is difficult if not impossible to overwrite inherited behaviour. This is both good and bad. For example, evidence points to slavery and genocide as being cultural, they come and go throughout history. However, it is very difficult to train yourself not to gag when faced with the smell of fresh vomit or a decaying corpse.

A Note on Imperfection

Abuse. Murder. Violence. Post-natal depression. Crimes of passion. War. Things can and do go wrong. Turn on the news, it’s there for all to see.

Humans have a certain acceptance that humans are imperfect. Again a lot of great art revolves around this idea. People make mistakes. However, I’d argue that a machine that made mistakes wouldn’t last long.

A machine that reasons morally would necessarily not be perfect. To deal with the complexity of reality machines would need to reason probabilistically. This then means we have to abandon certainty, in particular the certainty of prediction. Classification rates in many machine learning tasks plateau at an 80-90% success rate, with progress then being measured for years in fractions of a percent. Would we be happy with a machine that only seems to be right 80-90% of the time?

Saying this I do note a tendency towards expecting perfection in society in reason years. When something goes wrong someone is to blame. Politicians need to step down; CEOs need to resign. There are lawsuits for negligence. This I feel is the flipside of technological certainty. We can predict events on a quantum scale and have supercomputers in our pockets; surely we can control the forces of nature? Maybe the development of imperfect yet powerful machines will allow us to regain some of our humanity.

Twitter Robots on a Raspberry Pi

Or how to get very quickly write-restricted by Twitter.

This is a short guide to playing around with the Twitter API using Python on a Raspberry Pi (or any other Linux machine).

Overview

The process has four general steps: –

  1. Setup a new Twitter account and create a new Twitter app;
  2. Setup the Raspberry Pi to access Twitter;
  3. Write code to return tweets associated with given search terms; and
  4. Write code to post tweets based on the given search terms.

Step 1 – Create New Twitter Account and App

First it helps to have an email alias to avoid spam in your main email account. I found the site www.33mail.com, which offers you multiple email address for free in the form [X]@[your-username].33mail.com.

Next register a new Twitter account using the email alias. To do this simply log out of any active Twitter accounts then go to www.twitter.com and sign up for a new account. It’s pretty quick and straight forward. Skip all the ‘suggested follows’ rubbish.

I used public domain images from WikiMedia Commons (this British Library collection on Flickr is also great) for the profile and added a relevant bio, location and birthday for my Robot.

Once the new Twitter account is set up log in. Then go to http://dev.twitter.com and create a new application. Once the new application is created you can view the associated consumer key and secret (under the ‘Access Keys’ tab). You can also request an access token and secret on the same page.

Some points to note:

  • You need to register a phone number with your new Twitter account before it will allow you to create a new application. One phone number can be linked with up to 10 Twitter accounts. Beware that SMS notifications etc will be sent to the most recently added account – this was fine by me I typically avoid being pestered by SMS.
  • You are asked to enter a website for your application. However, this can just be a placeholder. I used “http://www.placeholder.com&#8221;.

Step 2 – Setup Raspberry Pi

I have a headless Raspberry Pi I access via SSH with my iPad. Any old Linux machine will do though.

To configure the computer do the following:

  • Create a GitHub repository (mine is here).
  • SSH into the computer.
  • Clone the remote repository  (e.g. ‘git clone [respository link]’). I find it easier to use SSH for communication with the GitHub servers easier (see this page for how to Setup SSH keys).
  • CD into the newly generated folder (e.g. ‘cd social-media-bot’).
  • Initialise a new virtual environment using virtualenv and virtualenvwrapper. I found this blogpost very helpful to do this. (Once you have installed those two tools via ‘pip’ use ‘mkvirtualenv social-media-bot’ to setup then ‘workon social-media-bot’ to work within the initialise virtual env. For other commands (I haven’t used any yet) see here.)
  • Install Twitter tools and other required libraries. This was as simple as typing ‘pip install twitter’ (within the ‘social-media-bot’ virtualenv).

Step 3 – Write Search Code

As with previous posts I decided to use ConfigParser (or configparser with Python 3+) to hide specific secrets from GitHub uploads.

My Python script thus uses a settings.cfg file structured as follows:
—-
[twitter_settings]
ACCESS_TOKEN = [Your token here]
ACCESS_SECRET = [Your secret here]
CONSUMER_KEY = [Your key here]
CONSUMER_SECRET = [Your secret here]

[query_settings]
query_term = [Your query term here]
last_tweet_id = 0

[response_settings]
responses =
 ‘String phrase 1.’;
 ‘String phrase 2.’;
 ‘String phrase 3.’
—-

Create this file in the directory with the Python code. 

  • The first section (‘twitter_settings’) stores the Twitter app access keys that you copy and paste from the ‘Access Keys’ tab of the Twitter developer webpage. 
  • The second section (‘query_settings’) stores the query term (e.g. ‘patent’) and a variable that keeps track of the highest tweet ID returned by the last search.
  • The third section (‘response_settings’) contains string phrases that I can randomly select for automated posts.

The Python code for accessing Twitter is called twitter_bot.py. Have a look on GitHub – https://github.com/benhoyle/social-media-bot.

The comments should helpfully make the code self-explanatory. Authentication, which is often fairly tricky, is a doddle with the Python ‘Twitter’ library – simply create a new ‘oauth’ object using the access keys loaded from the settings.cfg and use this to initiate the connection to the Twitter API:

settings_dict = dict(parser.items('twitter_settings'))
oauth = OAuth(settings_dict['access_token'], settings_dict['access_secret'], settings_dict['consumer_key'], settings_dict['consumer_secret'])

# Initiate the connection to Twitter REST API
twitter = Twitter(auth=oauth)

The script then searches for tweets containing the query term. 


tweets = twitter.search.tweets(q=expanded_query_term, lang='en', result_type='recent', count='10', since_id=last_tweet_id)['statuses']

Points to note:

  • There is a lot of ‘noise’ in the form of retweets and replies on Twitter. I wanted to look for original, stand-alone tweets. To filter out retweets add ” -RT” to the query string. To filter out replies use ” -filter:replies” (this isn’t part of the documented API but appeared to work).
  • I found that search terms often meant something else in languages other than English. Using ‘lang=’en” limited the search to English language posts.
  • The parameters for the API function map directly onto the API parameters as found here: https://dev.twitter.com/rest/reference/get/search/tweets.
  • The ‘since_id’ parameter searches from a given tweet ID. The code saves the highest tweet ID from each search so that only new results are found.

Step 4 – Posting Replies

A reply is then posted from the account associated with the token, key and secrets. The reply randomly selects one of the string phrases in the responses section. The reply is posted as a reply to tweets that contain the query term.


# Extract tweetID, username and text of tweets returned from search

tweets = [{

   'id_str': tweet['id_str'],

   'screen_name': tweet['user']['screen_name'],

   'original_text': tweet['text'], 

   'response_text': '@' + tweet['user']['screen_name'] + ' ' + random.choice(responses)

   } for tweet in tweets if query_term in tweet['text'].lower()]

#Posting on twitter

for tweet in tweets:

#Leave a random pause (between 55 and 75s long) between posts to avoid rate limits

 twitter.statuses.update(status=tweet['response_text'], in_reply_to_status_id=tweet['id_str'])

 #print tweet['original_text'], '\n', tweet['response_text'],  tweet['id_str']

 time.sleep(random.randint(75,120))

There is finally a little bit of code to extract the maximum tweetID from the search results and save it in the settings.cfg file.


#Don't forget running list of IDs so you don't post twice - maybe use since_id to do this simply - record latest id return by search and start next search from this

id_ints = [int(t['id_str']) for t in tweets]

# Add highest tweetID to settings.cfg file

parser.set('query_settings', 'last_tweet_id', str(max(id_ints)))

# Write updated settings.cfg file

with open('settings.cfg', 'wb') as configfile:

    parser.write(configfile)

I have the script scheduled as a cron job that runs every 20 minutes. I had read that the rate limits were around 1 tweet per minute so you will see above that I leave a random gap of around a minute between each reply. I had to hard-code the path to the settings.cfg file to get this cron job working – you may need to modify for your own path.

I also found that it wasn’t necessarily clear cut as to how to run a cron command within a virtual environment. After a bit of googling I found a neat little trick to get this to work: use the Python executable in the ‘.virtualenvs’ ‘bin’ folder for the project. Hence the command to add to the crontab was:


~/.virtualenvs/social-media-bot/bin/python ~/social-media-bot/twitter_bot.py

Result

This all worked rather nicely. For about an hour. Then Twitter automatically write-restricted my application. 

A bit of googling took me to this article: – https://support.twitter.com/articles/76915. It appears you are only allowed to post replies to users if you are a large multi-national airline. Nevermind. 

Maybe automated tweet ‘quoting’, favouriting or just posting would work better for next time. Still it was an enjoyable play-around with the dynamics of the Twitter API. It should be easy to incorporate tweets into future projects.

Hacker News Update: Raspicam & WeMo

A quick update on my recent discoveries.

Raspicam

I now have a Raspberry Pi Camera Board (Raspicam)!

There is a brilliant combo deal on at the moment allowing you to buy a Raspicam, Model A + 4GB SD card for about £35 (including VAT + shipping!)! That’s £35 for a device that can run OpenCV with a camera capable of 30fps at HD resolutions. I will leave you to think about that for a moment.

The downside is that the software is still not quite there. The Raspicam couples directly to the Raspberry Pi; this means it is not (at the moment) available as a standard USB video device (e.g. /dev/video0 on Linux). Now most Linux software and packages like SimpleCV work based on a standard USB video device. This means as of 24 October 2013 you cannot use SimpleCV with the Raspicam.

However, not to fret! The Internet is on it. I imagine that we will see better drivers for the Raspicam from the official development communities very soon. While we wait:

WeMo and Python

As you will see from the previous posts I have been using IFTTT as a make-shift interface between my Raspberry Pi and my WeMo Motion detector and switch.  This morning though I found a Python module that appears to enable you to control the Switch and listen to motion events via Python. Hurray!

The module is called ouimeaux (there is a French theme this week). Details can be found here: link.

Very soon I hope to adapt my existing code to control my Hue lights based on motion events (e.g. turn on when someone walks in the room, turn off when no motion). Watch this space.

Face Tracking Robot Arm

Ha – awesome – I have made a face tracking robot arm. The 12 year-old me is so jealous.

Here’s how I did it (on Ubuntu 12.04 but should be portable to the Raspberry Pis):

I installed SimpleCV: – http://simplecv.org/ .
(I love this – makes it so simple to prototype.)

I built this robot arm: – http://www.maplin.co.uk/robotic-arm-kit-with-usb-pc-interface-266257 .

I installed pyusb: http://sourceforge.net/apps/trac/pyusb/.

(I did first try sudo apt-get install python-usb – it was already installed and didn’t work giving me errors when trying to import usb.core. I found on the web that the solution to this was removing python-usb and installing from the above site (e.g. download zip, extract, run setup.py).)

I stuck a Microsoft Lifecam Cinema on the top of the assembled robot arm.

I adapted the code below from a SimpleCV example and the arm control code (calling it arm_track.py).


from SimpleCV import Camera, Display
import usb.core, usb.util, time

# Allocate the name 'RoboArm' to the USB device
RoboArm = usb.core.find(idVendor=0x1267, idProduct=0x0000)

# Check if the arm is detected and warn if not
if RoboArm is None:
raise ValueError("Arm not found")

# Create a variable for duration
Duration=1

# Define a procedure to execute each movement
def MoveArm(Duration, ArmCmd):
# Start the movement
RoboArm.ctrl_transfer(0x40,6,0x100,0,ArmCmd,1000)
# Stop the movement after waiting specified duration
time.sleep(Duration)
ArmCmd=[0,0,0]
RoboArm.ctrl_transfer(0x40,6,0x100,0,ArmCmd,1000)

cam = Camera()

disp = Display(cam.getImage().size())

#Get centre of field of vision
centre = []
centre.append(cam.getImage().size()[0]/2)
centre.append(cam.getImage().size()[0]/2)

while disp.isNotDone():
img = cam.getImage()
# Look for a face
faces = img.findHaarFeatures('face')
if faces is not None:
# Get the largest face
faces = faces.sortArea()
bigFace = faces[-1]
# Draw a green box around the face
bigFace.draw()
face_location = bigFace.coordinates()
print face_location, centre
offset = (face_location[0] - centre[0])/float(200) #/cam.getImage().size()[0]
if offset < 0:
print "clockwise", offset
MoveArm(abs(offset),[0,2,0]) #Rotate base clockwise
time.sleep(abs(offset))
else:
print "anticlockwise", offset
MoveArm(abs(offset),[0,1,0]) #Rotate base anticlockwise
time.sleep(abs(offset))

img.save(disp)