Monday, 13 November 2017

More Google

Figure 1
Last week I was reading about scanning the brain while it was processing language, reading which resulted in reference 1. This week I am reading about image processing, landing at one point at reference 2, another product of the googleplex. One connection between the two being the fact that both sorts of processing commonly involve large neural networks on computers, something which Google is clearly very good at. Not only do they win at Go, a few years back they also won one of the various image processing challenges documented at reference 3. For all I know they have carried on winning.

Some of Google’s interest stems from their work with driverless cars and the need to be able to sort out the streams of image data from their on-board cameras. Another line of inquiry mines the huge numbers of house number images in Street View to help investigate how getting computers to read numbers and words might best be integrated with getting them to take an intelligent interest in pictures – an interest which can sometimes be helped along if the computer can read the incidental numbers and words. Like the labels on bottles of wine. See reference 4.

Without pretending to understand much of the detail of such neural networks, I have attempted the overview diagram included above.

Starting at the middle, the red boxes are the neural network and its supporting environment. Supposed here to consist of a large number of weights to be applied to inputs to the rather smaller but still large number of neurons (possibly as many as hundreds of thousands), some system parameters (controlling things like the details of the optimisation involved in training and of the competitions between candidate labels) and a small number of design constraints (things like the speed with which an image must be labelled, the accuracy required of the labelling and the amount of computer power that is available).

The neurons are arranged in layers, with one layer feeding into the next. In the beginning, a small number of layers, but now Google has learned how to have lots of layers – yielding impressive improvements in capability and performance.

Moving to the left, the green boxes are the images which we want the computer to label, using the vocabulary included far right. Words like ‘red’, ‘stripey’, ‘coypu’ and ‘octopus’, where we allow more than one label to the image. For the moment we suppose that the images are about things and that we are interested in what those things are, not in what they might be doing. We suppose also that most of the images have been labelled by humans and most of those labels are available to the computer for training purposes.

The validation images are used to tune the system parameters and the testing images are used to test the tuned system. Does the system actually label previously unseen images in the way intended? With the separation of validation from training being roughly comparable to the separation of system from unit testing in regular systems.

Figure 2 - taken from reference 6
In the beginning the images might be relatively easy, things like individual, distinctive animals, the sort of animals you might see in zoos and aquaria, for example zebras, elephants and lobsters, set against neutral backgrounds. Then, gradually, things are made harder with things like half empty bottles of water against a cluttered background. The ImageNet people have a huge database of images suitable for this sort of work.

The usual drill is that you present the system with images, compare the labels that it outputs to the labels that are desired and then to feed the differences back into the system as incremental changes to the weights. Do it again. And again. Gradually, if you have set things up properly, the weights will converge to a place where the system will correctly label all kinds of images, including ones that it has not seen before.

A new-to-me drill is that instead of feeding differences back into the weights, you feed back into the images. So you have an image of a zebra which you want the trained system to label as a whale.

Figure 3
What is the minimum you have to do to the zebra in order to trick the system into thinking that it is a whale? With the rather surprising answer, not very much – although I think it is fair to say that the resultant images are unlikely to turn up in the real world. And with this being the source of the generated variations in the green box at the top.

Figure 4
A less perverse wheeze, harking back to a game one can play with children, would be to tell the computer that you want to see a zebra in a cloudscape. How much do you have to tweak the clouds to get a zebra? With the example at figure 3 above reflecting the animal picture background of the system involved – and with some details at Figure 4 below.

Putting it another way, instead of tweaking the system to fit the images, you tweak the images to fit the system.

A more obviously practical application, testing complex systems, is to be found at reference 5.

Reference 1: http://psmv3.blogspot.co.uk/2017/11/reading-brain.html

Reference 2: https://research.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html.

Reference 3: http://image-net.org/challenges/LSVRC/2016/index.

Reference 4: http://ufldl.stanford.edu/housenumbers/.

Reference 5: DeepXplore: Automated Whitebox Testing of Deep Learning Systems - Kexin Pei and others – 2017. See https://arxiv.org/pdf/1705.06640.pdf. With Arxiv being an imprint of the Cornell organisation.

Reference 6: ImageNet Large Scale Visual Recognition Challenge - Olga Russakovsky and others – 2014. See at https://arxiv.org/pdf/1409.0575v3.pdf.

No comments:

Post a Comment