Tuesday, 7 November 2017

Reading the brain

Contents
  • Introduction
  • Brain work. Scanning the brain and maps of the brain
  • Mental arithmetic. Some facts and figures
  • Progress. Progress with reading words in the brain with fMRI scanners
  • Digression one: prediction. Predicting the next sound, the next letter or the next word
  • Digression two: searching for documents. Where google started out
  • Digression three: Trump cells. Neurons which fire for Trump and Trump alone
  • Digression four: other lines of inquiry. Rounding out the story of the search for the brain
  • Conclusions
  • References
Introduction

A post mainly prompted by reference 1, written under the Damasio umbrella, but with plenty of support from elsewhere, in particular from LGE (leading google engineer) Mikolov and his colleagues at the googleplex, for which see reference 2.

A post about our growing ability to read what is going on in a human brain when that human is either reading or being read to. We are not that far off, for example, being able to tell from the outside when you are thinking about aardvarks. Rather primitive mammals, with some description to be found at reference 3.

Supplemented with four digressions about related lines of inquiry.

Brain work

The idea here is to put people inside scanners, to stimulate them with words or stories in some organised way and to capture their experience in a number of images of the brain, one for, say, every couple of seconds. These images of the brain might be thought of as coloured pictures of the cortical sheet, spread out flat in a recognised way, with one part for each hemisphere and with the colours – often graded from red though to blue – being a proxy for the activity of that particular bit of the cortical sheet at that particular time.

Figure 1
With the image above, taken from reference 5, being derived from fMRI images, rather than being one, but it does give the idea.

A lot of work has gone into the development of algorithms for flattening a cortical sheet out and then comparing one with another, perhaps by registering each newly flattened cortex to a reference flattened cortex, perhaps that called ‘Colin’ and looked after by McGill University. A matter of geometry and gross structure rather than being a matter of neurons, neuron mixture or neuron activation. A recent paper which touches on this can be found at reference 10.

Then, if we collect enough fMRI data about the neural activation resulting from this or that stimulus, can we work backwards? Can we start with the images and say something about the words, something about the story?

The answer to which question seems to be yes.

It would be a bonus if the brain mapped data about words onto the two cortical sheets in a pleasing way, in the same way for everybody, rather in the way of the models that phrenologists used to have in their consulting rooms. A functional description of the brain, rather than the more structural definition offered by Brodman.

Figure 2 - the phrenological view

Figure 3 - the Brodman view

Figure 4 - a combined view
Figure 4 is taken from reference 11 where it has the legend: ‘Cortex is delineated (A) and reconstructed (B). Cuts are applied to avoid gross distortions (C), and individual surfaces are subsequently flattened (D) and overlaid with the Colin atlas (E)’.

But it seems that the pleasing map is not to be. There may be some pleasing mapping of the activation associated with higher level concepts onto the cortical sheets, but not of the details, not of the individual words.

Now given that this large and important part of the brain is a two-dimensional sheet, this desire for a pleasing map is a manifestation of the more general desire to map complicated phenomena into low dimensional real spaces.

On which current thinking is that rather than try to map the words themselves onto a plane or a three-dimensional space, we come up with a number of properties or features of words, perhaps hundreds of them, with every word taking some real value for each of those properties, giving us a vector for each word. Then, if we have been clever about choosing those properties and putting them into order, we get something attractive if we map all our words onto the first two or three dimensions. And if we have done our work well, words like ‘boat’ will turn up close to words like ‘ship’, ‘boats’ and ‘boating’. And, by extension, to phrases like ‘boat yard’.

Another way into this is to turn our set of words into a graph, with the words being the nodes of the graph and the links between words, links with strength and perhaps direction, being the edges. And then one tries to visualise the graph in some small number of dimensions in the same way as before.

Then maybe, those dimensions will map onto the surface of the brain, the cortical sheet in some pleasing way. So dimension X of our vector space maps nicely onto area Y of the brain, giving us a 21st century version of the phrenological head. Unfortunately, they do not. Workers have come up with some small number of dimensions, but these dimensions map onto the cortical sheet in a very distributed way. There are lots of voxels contributing to dimension X, lots of voxels which might be a bit clustered, or which perhaps have a bit of a gradient across the brain, but which come from all over. Not like the nice neat phrenological head at all.

Mental arithmetic

Some relevant numbers:
  • Reading speed in English is of the order of 100-200 words a minute, say of the order of 2 or 3 words per second
  • The cortical sheet is around 5mm thick. We work with two dimensional, square tiles of sheet, rather than three dimensional voxels of brain, the usual unit of fMRI
  • The area of the cortical sheet varies across individuals, but taking the two hemispheres together, is of the order of 5,000cm^2. Or 125,000 2mm tiles
  • fMRI does not work quite like this, rather working in terms of successive slices through the whole brain, But one can get from slices to tiles
  • fMRI scanning speeds are coming down, but let us say we get a complete brain image every 1 or 2 seconds, rather more than the 2 or 3 words per second with which we started.
The fMRI numbers have been taken from a quick search of the Internet, and are probably coming down all the time – that is to say scanning speed is coming down and spatial resolution is coming up. But note that:
  • There is some trade-off between temporal and spatial resolution
  • Temporal resolution is physiology limited and far less than that of EEG (see digression 4 below).
We make the assumption that we can extract what is special about a word from the fMRI signal, possibly, in the jargon of the trade, some sort of a contrast, giving us a trace of some seconds from the time that the word was uttered, with the traces of successive words overlapping. We might also assume that such signals are additive. Then statistical magic might then mean that we can work back from the composite fMRI signal to get the sequence of words.

In which we have the implicit assumption that large scale patterns of neural activity in the brain do carry semantic information. An assumption which is justified by the fact that we can recover such semantic information from fMRI scans. But these large scale patterns across the brain are not the arrangement proposed for LWS (see reference 6), where the content of consciousness is hypothesised to reside in a small, compact area of cortex, just a few cc^2 – remembering here that the content of consciousness is not the same as the machinery which might be needed to generate that content.

Progress

Despite the size, expense and general clunkiness of the machinery, plenty of experiments have been done to try an find the fMRI signature of individual words or objects. So, for example, after suitable training, the computer can tell what picture, what picture from a limited collection of pictures of suitably different everyday objects, a subject is looking at from their fMRI image. Or at least predict with reasonably accuracy, with the predictions being well above what you would get by chance.

Another example of this sort of thing was described recently at reference 7, where the computer and the scanner between them could tell whether you were thinking about (for example) playing tennis or the smell of roast beef, given that it knew that you were thinking about one of them.

There is plenty of interest in how all this varies between individuals, over time and across languages. Does a Chinese person generate the same fMRI image for a cat as a Russian person? On which last, current thinking seems to be that ‘cat’ is too specific. Different people are going to have built ‘cat’ into their brains in different ways, albeit along the same general lines. Different people are going to be different, different cultures and languages are going to be different.

So generalising a bit, reference 1 is interested in the fMRI images which result from stories, and in the way that they might vary between individuals and across languages. With each such image being converted to a vector of something over 200,000 real numbers.

The work described there involved three groups of 30 subjects, native English, Chinese and Farsi native speakers. Then 40 stories, sourced in English, précis’d down into 150 words or so and translated to the other two languages.

We train the computer on large amounts of data in all three languages and then turn the 40 stories into something called paragraph vectors (of real numbers) in each of the three languages. See the end of the section that follows for a bit more about these vectors.

Then, in the context of a suitable experimental protocol, we present the written stories to the subjects in their own native language.

We then ask the computer to match the fMRI scans, each expressed as one of the vectors 200,000 real numbers  mentioned above, to one of the paragraph vectors, first from among those in the right language, then from among those in one of the wrong languages. And it turns out, to a reasonable degree of accuracy, the computer can pull off both tricks.

So the brain is producing some higher level representation of the stories, distributed across considerable chunks of cortex, which can be matched by a computer with the paragraph vectors and which seems, at least to some degree, to be independent of the language involved. The news here being not so much that such representations exist, which many have taken for granted for a long time, but that at least one of them has been found.

Digression one: prediction

A lot of language processing on computers is organised around problems of the form ‘if x(1), x(2), x(3), … x(n) is a sequence of somethings, perhaps written words or spoken phonemes, presented one after the other, what is your best guess for x(n+1)?’. Or slightly more complicated, ‘what is the probability distribution for x(n+1)?’. These problems are interesting because languages contain plenty of regularity and the distribution for x(n+1) is far from independent of what has gone before.

And a good problem, in part, because you are getting feedback about errors as you go and can gradually tune up your machinery for prediction.

For present purposes we are mainly concerned with sequences of symbols drawn from finite vocabularies: somebody or something has done some pre-processing of the raw sound in the air or the raw marks on the page into words.

One might have the computer starting from scratch and having to learn about the domain in question as it goes along. Or one might have a period of training before you ask the computer to do any prediction – with the data being used for training sometimes being called the corpus.

Figure 5
There are a number of complications:
  • The amount of error, the amount of noise in the sequence. Some computers are going to better at coping with error than others.
  • Whether the computer has any concept of one symbol being like or unlike another. So, for example, ‘boy’ is very like ‘boys’ and is quite like ‘child’, but is quite unlike ‘teapot’
  • What about misspellings? Computers are quite good a detecting and correcting misspellings, but what does one do here?
  • Whether the computer allows one symbol to have more than one meaning. Symbols like, for example, ‘bark’, ‘nail’, ‘jam’ or ‘mine’
  • Whether the vocabularies include punctuation or not. Things to mark the ends of words in sequences of letters and things to mark the end of sentences in sequences of words. Classical Latin, for example, did not usually include spaces between words
  • There are lots of alphabets. Upper and lower-case letters? Numerals? Special characters? One of the various varieties of Unicode? – for which see reference 4.
A lot of work on prediction was done with things called n-grams. With an n-gram being some fixed, usually small, number of words, say five or less, and the idea is to say what might come after any given n-gram, a useful and quite successful simplification of the more general prediction problem. A n-gram might be considered as an ordered set, in which the order of words mattered, or an unordered set, in which it did not. Over time quite a lot of bells and whistles were added to this basic model.

The next thing was to map one’s symbols onto fixed length, real valued vectors of properties, arranged so that symbols which were similar or otherwise related mapped onto similar vectors. An example of such a property would be the probability that the word X is used within N words of the subject word. Another might be 1 if the word was the name for something living, 0 otherwise. Another might be 1 if the word was the name for something which moved about, 0 otherwise – with vitality and movement being core concepts for human. One might have several hundred such properties – meaning, inter alia, that one needed a big computer and plenty of time. Distances, easily defined on such vectors, made sense in the real world of meanings. There was even the rather surprising result that one could do sums with such vectors. So up to a point we had it that [uncle]-[male]+[female]=[aunt], where [abc] denotes the vector for ‘abc’. Prediction machines which worked in terms of such vectors did a lot better than those which did not, addressing, inter alia, the second of the issues listed above.

A further development was to let the computer – a neural network sort of computer – work out these properties for itself. The properties which best supported the prediction task.

Another further development was the application of the same techniques to map arbitrary amounts, arbitrary blobs of text, not just words, onto similar vectors, sometimes called paragraph vectors because they describe paragraphs rather than words.

So it turns out that attacking the problem of predicting the next symbol, with which we started this section, was a good way to produce, to generate these vectors, which, in turn, turned out to have other interesting applications. The same word and paragraph vectors which are useful with this kind of prediction are also useful when trying to predict how words and paragraphs might stimulate the brain. Or, contrariwise, in working out which words and paragraphs caused the stimulation of the brain observed (by an fMRI scanner).

Google make available lots of resources about all this, including the results of training relevant algorithms on huge amounts of text data. They are not going to tell you exactly how their search works, but they are giving out lots of hints. See, for example, reference 8.

Current thinking is that neural networks, perhaps recurrent neural networks, are a good way to do prediction, are rather more flexible in the ways that they can exploit the regularities of speech and language than conventionally programmed mathematical and statistical algorithms. Remembering here that a lot of mathematics and statistics has gone into building the neural network machinery in the first place.

Figure 6
We attempt to summarise some of this in the diagram above.

In the blue, we have where we believe the present paper to be, that is to say reference 1. We have one world which uses many languages. Most languages have various bodies of text (aka corpuses or corpora). We can train computers on such a body of text to produce word vectors. We can break down such bodies into paragraphs and train computers to produce paragraph vectors. With the difference that we get a vector for each and every paragraph. We don’t get a vector for each and every time that a word is used, just for a word.

In the green, we have a couple of elaborations between corpus and paragraph. Left for books, right for the Internet. But it may well be that things are best left simple; one can do well enough without these complications.

Digression two: searching for documents

Google came to fame and fortune by coming up with the best way of finding needles in haystacks.

That is to say you give it a few words, a few search terms, aka a search condition, and it whizzes through all the billions of documents on the Internet that it knows about and tells you about the ones which best match your search condition.

In theory, one way to do this is to map all the documents into some multi-dimensional vector space, a space in which one has the concepts of both direction and distance. Then to map the search condition into that same space. The search results are then the documents nearby, ranked by order of distance.

An important extra twiddle being to combine the search terms you supply with what Google knows about you and your searching habits to make an enlarged search condition.

The relevance here being the need to turn sets of words, with the sizes of those sets varying by several orders of magnitude, into vectors.

Digression three: Trump cells

So far we have been looking at whole brains of healthy volunteers using clunky but non-invasive fMRI scanners.

A rather different strand of activity concerns the identifying of individual neurons which fire for, are sensitive too particular things, like Marilyn Monroe, Donald Trump, Beethoven or Victoria railway station in London. See, for example, reference 9. With at least some of these neurons firing however Marilyn, for example, is presented. Side view, back view, top view, voice or name – assuming that is that she has a distinctive voice which is known to the subject. These neurons are mostly firing for a concept rather than any particular image.

This data is collected by electrodes inserted into the brain in the margins of operations to relieve otherwise intractable epilepsy. Such electrodes might collect signals from a hundred or so individual neurons (with the cerebral cortex of as a whole containing some 15 billions), or possibly very small groups of neighbourly neurons. Fishing expeditions, but fishing expeditions which are more productive than one might expect, with a surprisingly large fraction of the neurons sampled being of interest.

The place cells and grid cells of rats are yet another, Nobel Prize winning, strand of activity.

This kind of firing is invisible to an fMRI scanner, which operates at a different scale, but underpins the network activity which is so visible. Network activity which is needed, inter alia, to assemble the kind of information postulated for LWS.

We imagine that different people are going to have neurons which track, which activate for different concepts.

We also imagine that these neurons are fairly stable, although they are going to vary in detail from person to person and from time to time. Again, unlike the LWS, the contents of which are being more or less completely refreshed all the time.

Digression four: other lines of inquiry

We close by mentioning, to round out the story, two other lines of inquiry.

Figure 7
First, EEG, the recording of electrical activity from the surface of the brain, usually using electrodes taped to the surface of the scalp. Far cheaper and easier than fMRI, with good temporal resolution but bad spatial resolution.

Sometimes there is occasion, opportunity or need to place an array of electrodes directly on a patch of exposed brain – so moving in the direction of the electrodes of the previous digression.
That apart, there is some overlap between what can be detected by EEG and what can be detected by fMRI.

Second, dissection and analysis of dead brains. A line of inquiry with a very long history, documented, I dare say by Egyptians living thousands of years ago. One which tells us a great deal about the physical structure of the brain; its growth, its geometry and cellular composition. Structure which serves to both inform and constrain other, more electrical lines of inquiry.

Conclusions

We may not yet be able to read minds with machines, to read inner thoughts with machines, but we are making good progress.

Maybe getting subjects to dictate to scanners, that is to say the subjects read aloud and the computer, working from the fMRI product, writes down what is said, could form the basis of an annual University Challenge. Along the lines of the annual robot football competition. Perhaps we should approach some benevolent oligarch for funding.

Along the way, we have learned that there is meaning is spread across the brain, not just concentrated in the LWS of reference 6, and that meaning is not expressed on the surface of the brain in the nice neat way of the phrenologists. The real world is rather more messy than they allowed.

Furthermore, for the moment, the LWS itself remains inaccessible. Firstly, we have postulated its existence, somewhere deep inside the brain, but have not located it. For a while, the claustrum looked like a candidate, but it has now been lost to counterexample (see, for example, reference 12). Secondly, even if we had somewhere to look, we do not yet have machinery capable of recording, let alone unpicking, the electrical goings-on in a few square centimetres of cerebral cortex involving millions of neurons and embedded deep inside the living brain. Neither fMRI from the outside, EEG with its electrodes on the scalp, nor electrodes on the inside cut the mustard.

References

Reference 1: Decoding the Neural Representation of Story Meanings across Languages - Morteza Dehghani and others – 2017.

Reference 2: Distributed Representations of Words and Phrases and their Compositionality - Tomas Mikolov and others – 2013.

Reference 3: https://en.wikipedia.org/wiki/Aardvark.

Reference 4: http://www.unicode.org/standard/principles.html.

Reference 5: A Continuous Semantic Space Describes the Representation of Thousands of Object and Action Categories across the Human Brain - Alexander G. Huth and others – 2012.

Reference 6: http://psmv3.blogspot.co.uk/2017/09/geometry-and-activation-in-world-of.html.

Reference 7: http://psmv3.blogspot.co.uk/2017/09/ruminations.html.

Reference 8: https://code.google.com/archive/p/word2vec/.

Reference 9: Explicit encoding of multimodal percepts by single neurons in the human brain - Quian Quiroga R, Kraskov A, Koch C, Fried I – 2009.

Reference 10: Cortical Flattening Applied to High-Resolution 18F-FDG PET - Johannes C. Klein and others – 2017.

Reference 11: http://www.asociacioneducar.com/. The probable source of Figure 3 although google found the version used here. But try searching for ‘Search for ‘asociación educar ciencias y neurociencias’.

Reference 12: The effect of claustrum lesions on human consciousness and recovery of function - Aileen Chau, Andres M. Salazar, Frank Krueger, Irene Cristofori, Jordan Grafman – 2015.

No comments:

Post a Comment