# An Introduction to the Predictive Brain

Through a variety of sources, including Sam Harris’ discussion with Anil Seth and Lisa Feldman Barrett’s How Emotions Are Made, I’ve been hearing a lot recently about the “Predictive Brain”. This is a theory of cognition that has rapidly gained ground over the last couple of decades.

Talk of a “predictive brain”, in my reading, can be broken down into theories in several key areas:

• the “Bayesian” brain, or the application of work on Bayesian probabilities to cognition;
• predictive coding, a specific framework for modelling information flow between cortical areas, e.g. developed from work on the visual system; and
• feedback circuitry within the brain, or the ongoing discovery of general patterns of feedback within the cortex and mid-brain structures.

## The Bayesian Brain

Many of us will know Bayes Theorem:

$P(Y|X)=\frac{P(X|Y)P(Y)}{P(X)}$

The Bayesian brain hypothesis is that the brain is performing some form of computation that may be modelled using Bayesian probability frameworks.

Within the context of the brain, we can treat ‘X’ as our sensory input (e.g. signals from the retina or cochlea). This is typically in the form of the firing output of groups of neurons.

The ‘Y’ varies depending on how Bayes Theorem is being applied. In many cases, it appears to be applied in a relatively general manner. For example, in this Tutorial Introduction to Bayesian Models of Cognitive Development by Amy Perfors et al, ‘Y’ is taken to refer to a “hypothesis” ($h_i$). Bayes Theorem thus provides a way to compare the probabilities of different hypotheses given (the ‘|’ symbol) our sensory input (‘X’). If we calculate $P(h_1|X), P(h_2|X), \dots P(h_n|X)$, we can choose the hypothesis with the highest probability. This then becomes the “explanation” for our data. The idea is that the brain is (somehow) performing an equivalent comparison.

As you can imagine, this first approach is fairly high level. It considers a “hypotheses” as synonymous with human “reasons” for the data. In reality, the brain may be performing hundreds of thousands of low-level inferences that are difficult to put into words. In these cases, our “hypotheses” may relate to feature components such as possible orientations of observed lines or a pronounced phoneme.

However, I have also seen Bayes Theorem used to model activity in lower level neural circuits (normally in the cortex and in the visual areas). In these cases:

• P(Y|X) can be seen as the probability of some form of neural process, i.e. the output of a neural circuit, given a particular sensory input, e.g. a context at a particular time. This is our “prediction” in a model of the “predictive brain”. In probability terms, it is known as the “posterior”.
• P(X) is the probability of our sensory input per se. This can be thought as a measure of how likely the sensory input is, outside of any particular context, e.g. how often has the neural circuit experienced this particular pattern of input firing. In probability terms, it is known as the “evidence” or “marginal likelihood”. It acts as a normalising factor in Bayes Theorem, i.e. acts so that P(Y|X) is a true probability with a value of between 0 and 1.
• P(Y) is the probability of the output of the neural circuit. Many neural circuits implement some form of function on their inputs, such as acting as integrators. ‘Y’ may be considered a pattern of firing that arises from the neural circuit, and so P(Y) indicates a measure of how often this output pattern of firing is experienced. In probability terms, this is know as the “prior”. It encapsulates what is known about the output before experiencing the sensory input at a particular time.
• P(X|Y) is part of the “magic sauce” of Bayes Theorem. It is a probability of the sensory input given or assuming a particular neural circuit output. For example, for all the instances where you see a given pattern of output firing for a particular neural circuit, how common is the sensory input that is experienced? It is known as the “likelihood”.

Some neuroscientists are thus looking at ways Bayesian models of probabilistic computation are implemented by neural circuits. Questions arise such as:

• How do the terms of the Bayesian model relate to structures in the brain, such as cortical columns, neurons, cortical layers and mid-brain structures?
• How is the Bayesian model applied by the brain? The evidence appears to be steering towards the presence of a hierarchical model of inference, e.g. there are large numbers of neural circuits performing computations in parallel that may each be approximated using Bayesian models.
• How do we relate the way data is encoded and communicated in the brain to numeric values? Neurons have axons, dendrites and synapses and come in a variety of flavours. Neurons fire, and they fire at different rates depending on the sensory input and the results of computations. Synapses are modulated chemically, and different types of synapse may be present for a single neuron, where each type of synapse may have different chemical and temporal properties.
• How do we relate our high-level, top-down probabilistic models of computation (e.g. which “hypothesis” is more likely) to low-level, bottom-up probabilistic models of neural circuits?

Bayesian models are useful as they provide a framework to make predictions in a mathematical manner. They are useful as they decompose the prediction into a number probabilistic components, where the components may be easier to measure and/or compute.

## Predictive Coding

Predictive coding is a theory that may model the activity of lower level neural circuitry. It was originally presented in the context of visual processing performed by the cortex (* short prayer interlude for those brave monkeys, cats and mice *).

Predictive coding models cortical sensory regions as containing functionally distinct sub-populations of neurons:

• a population that attempts to predict an input based on a current hypothesis; and
• a population that determines an error between the actual inputs and the predicted inputs.

At a high level, predictive coding is based around the idea that your brain issues a storm of predictions, simulates the consequences as if they were present, and checks and corrects those predictions against actually sensory input. The book How Emotions are Made brilliantly explains some of the high level thinking.

Again, we can ask the question: what is a “hypothesis” for a cortical sensory region?

I have seen “hypotheses” explained at both high and low levels. As before, at a high level, a “hypothesis” may be something like “Tiger?”. At a lower level, a “hypothesis” may be something like “face?”. And at a very low level, a “hypothesis” may be something like “line at 45 degrees in a small part of my upper right visual field?” or “tone change from 5kHz to 3kHz?”.

Predictive coding is a theory that is routed to the cortex of the brain, the wrinkly table-cloth-sized sheet of pink-grey matter that most people visualise when they think of the brain.

It has been know for a while that sensory processing in the cortex of the brain is configured hierarchically. For example, there are areas of the visual cortex that receive input from the retina (via the thalamus – important to note for later), perform cortical “computation” and pass “data” via patterns of firing to different areas of the visual cortex (many of them neighbouring areas). In the Figure below, visual input arrives at V1 and then is processed from left to right.

The processing of the cortex is a processing hierarchy as “higher” cortical regions have a larger receptive field (e.g. working up to the full visual or auditory field) and receive complex inputs, e.g. inputs from many abstract “lower” cortical regions. These “lower” cortical regions have smaller, more specific receptive fields. At the bottom of the hierarchy you have either sensory input or motor output.

In fact, the cortex has multiple hierarchies: an input hierarchy for sensory modalities and an output hierarchy for motor outputs, where the connecting “pinch point” of cortex is fairly wide and deep and contains the abstractions of the associative areas.

It has also been known for a while that the cortex has a layered structure. This was discovered through early staining experiments that showed different bands of neuronal density.

Most sensory areas of the cortex have around six layers. Each layer has been found to have a different function.

The functional neuron populations required for predictive coding may be found in this layered structure of the cortex. Layers 2 and 3 provide an output from a cortical column, this may be seen as a feed-forward output that is received by layer 4 of a subsequent (higher level) cortical column. In predictive coding models this is seen as a “prediction error”. Layers 5 and 6 also provide an output from a cortical column, this may be seen as a feed-back output that is received by layer 1 of (many) previous (lower) cortical columns. In predictive coding models this is seen as a “prediction”. In this manner, a processing hierarchy is generated.

More detail of the configuration of these neural circuits may be found in the also excellent paper by AM Bastos et al – Canonical Microcircuits for Predictive Coding. Another great paper for explaining predictive coding in the context of the visual cortex is Rajesh PN Rao’s (semi-famous) paper – Predictive Coding in the Visual Cortex. Rao’s paper sets out a rather nice model of predictive coding applied to images that I will have to try to implement.

Predictive coding theories have some nice properties. If a neural circuit can successfully predict the input it should be expecting, it does not output a prediction error. This is efficient – populations of neurons only expend energy when a sensory input cannot be predicted. This also provides a model for how cascades of activity can pass through the hierarchical areas of the brain – prediction errors are passed “upwards” until they meet a neural circuit that can successfully predict its input, and this then leads to a feedback cascade with the successful prediction being passed down the hierarchy.

Let’s try to explain this in words with an image processing example.

Let’s have two layers of cortex that receive an input image. Before receipt of the image our prediction from our top layer (2) is a null or resting prediction (say 0). This is passed to the first layer (1). The first layer (1) receives the image (I) and applies a function to it to generate what Rao calls a set of “causes” for the input. As we have a non-zero input, these causes will be non-zero. A “prediction” error is calculated between the causes as generated by the first layer (1) and the prediction (which may be said to be a prediction of the causes). This error is then passed to the top layer (2). This error will be non-zero as our initial prediction is zero and our causes in the first layer (1) are non-zero: what we expect from the top down at the start is not what we see from the bottom up. The top layer (2) receives the error and applies a function to determine a set of second order causes (or layer 2 causes). These second order causes are then sent to the first layer (1) as the prediction from the top layer (2). The first layer (1) thus receives a modified prediction from the top layer (2) and a new prediction error is generated. The process can repeat over time until the system stabilises.

There are some gaps in my understanding of predictive coding. I need to play around with some actual models to see how information flows up and down the processing hierarchy. A good place to start is the Predictive Coding Networks described here. This video lecture by David Cox is also great.

## Feedback in the Brain

The theories of predictive coding are built upon the levels of feedback that are observed in the visual cortex. From several decades of research we now know that there are multiple levels of feedback that are occurring in the brain. This feedback provides the basis for many theories of “prediction” as they represent pathways for information to flow in a top-to-bottom manner (in addition to conventional feed-forward bottom-to-top manner). Here “top” can be seen as more complex integrated representations and “bottom” can be seen as closer to raw sensory input.

At a first level, we have feedback within cortical layers. This is explained nicely in Bastos’ Canonical Microcircuits for Predictive Coding. Within cortical layers there appear to be recurrent connections (for example within layers 2 and 3 and layers 5 and 6).

At a next level, we have feedback within cortical columns. For example, neurons in layers 2 and 3 excite neurons in layers 5 and 6 and neurons in many of the layers both excite and inhibit neurons in upper layers.

We then have feedback between cortical areas. This resembles the feedback modelled with predictive coding.

As well as feedback within the cortex, there are also loops that extend between sub-cortical structures such as the thalamus and the basal ganglia. Sensory input arrives at the thalamus from sense organs and is projected to the cortex. However, there appear to be 10-100 times more connections from the cortex to the thalamus as from the thalamus to the cortex. This suggests that the thalamus applies some kind of sensory gating and/or attention based on cortical feedback. The basal ganglia appears to be an adapted early muscle control centre that is important for action selection and error identification. The basal ganglia receives cortical inputs and also projects to the thalamus.

These various levels of feedback may embody error signals that indicate differences between what is being perceived and what is being predicted. For example, connections from the cortex to the thalamus may gate sensory input in the case where we have successful prediction.

## Summing Up

In this post we have looked at some of the ways the brain may be said to “predict” the outside world.

The brain may be modelled using Bayesian approaches. Predictive coding provides a way to understand perception. And the brain has many feedback and recurrent couplings that appear to pass information from higher processing areas to lower processing areas.

The challenge now is to use this knowledge to start building intelligent systems.