Saturday, 27 January 2018

Computers that play chess

Figure 1
From the newsletter published by the people at reference 1, I learn about the work reported at reference 2. Getting computers to beat grandmasters at chess is clearly not good enough any more.

People have been trying to get computers to play board games such as Checkers (also known as Draughts), Chess, Shogi (a more complicated variation of chess) and Go for a very long time. See, for example, the much cited paper from 1959 about checkers at reference 4.

All these games are two player board games played on a rectangular grid, from eight by eight for Chess to nineteen by nineteen for Go. One sort of piece in Checkers and Go, a dozen or so in Chess and Shogi. Perhaps black pieces for player one, white pieces for player two. A repertoire of permitted moves, which might involve the capture of one of the opponent’s pieces. Games can result in a win, a draw or a loss. At higher levels, played with rules about time. All the games are open, with both players having perfect knowledge of the rules and of the play so far, memory permitting. Lots of history. Lots of competition.

All these games are discrete and finite. There is nothing like, for example, position in a war game, where a unit may move to any position in game world, not constrained to some small number of positions on a grid, maybe not even to positions on some two-dimensional surface. Where the attributes of units are orders of magnitude more complicated than those of the pieces in these games. And moves alternate in an orderly way – which may not be the case in a war game, played along realistic lines. Three properties – discrete, finite and alternate – which make them attractive targets for computer programs. But with the catch that, with the possible exception of checkers, there are still far too many possibilities for brute force, for the exhaustive search of possibilities, using any computer likely to be available any time soon. Something more cunning than brute force is needed. And even if this were not true, cunning is more interesting, more entertaining than brute force.

In the event, most of the many computer programs used to play these games have been organised in a straightforward way, at least at the highest level, with an evaluation function and a search. The evaluation function says whether this or that position is good or bad. The search function looks ahead from the current position and uses the evaluation functions to score all the possibilities and so to decide what to do.

Evaluation functions are often defined by means of features and weights. One has a long list of features that any given position may or may not have and one has a long list of weights. And one has some function which combines the features and the weights to give a score. Frequently the function is simply the weighted sum, the scalar product. And one of the tasks of the program is to learn the weights that work, that result in wins.

Given the convention that player one on his move wants to maximise the evaluation function and that player two on his move wants to minimise his (usually, in these programs, the same function), the search is usually some variation on the alphabeta variety of the minimax search. Wikipedia knows all about these matters and one might also take a look at reference 6.

Huge amounts of effort have been put into building and tuning these two functions, using huge amounts of domain specific knowledge. Some of the programs include libraries of beginnings and ending and are able to concentrate on what lies in between. Some of them include cunningly crafted algorithms to deal with particular problems and issues. Some of them try to encode the wisdom of the masters in some other way.

Huge amounts of time were put into training these programs, training which often made use of the libraries of expert chess games which are now available. Other programs trained by playing themselves or each other.

All this reached the point, relatively recently, where computer programs supported by large computers could beat world ranking humans.

Alphazero

But now, a lot of this has been swept away, by the success of the people at Google with their Alphazero program, the program reported on at references 1 and 2.

All this program knows about is this class of games in general and the bare rules of the games that it can play in particular. The only slightly odd looking bits of information that it needs are the length of a reasonable game and the maximum length that it need bother with. It has no libraries of grand master games or anything else. It has no cunningly crafted algorithms doing special, game specific things. And it can learn to play world standard chess, or whatever, more or less overnight – with Figure 1 above illustrating the convincing results of testing this allegation by competition with other world ranking programs - including here the previous incarnation of itself.

One of the interesting things that this program does is generate winning moves that a human player would regard as bizarre. In which it seems to be going beyond what humans have managed so far.

I did not notice anything about how a big a computer is needed to run this program, how it compares, for example, with the sort of thing needed to forecast the weather or make sense of the goings on in the large hadron collider.

Bits and pieces

Along the way, a lot of work has been done on something called TD(λ) learning, where ‘TD’ is temporal difference and ‘λ’ is a parameter which controls the use of information from past times. With the issue being how relate outcomes, good or bad, to actions (in this case moves) which might have taken place some time ago and with TD(λ) learning being all about learning as you go along, without waiting for the final outcome. A tool which has been around for a long time, successful a long time ago on backgammon, but now seen to be of wider applicability. See reference 3.

Along my way, in writing this post, I came across the work of John Holland, in particular his bucket brigade algorithm and his Echo program. Which last now seems to live at the interesting outfit called the Santa Fe institute of reference 7. But both Google and Bing were oddly ineffective in turning up a proper description of the bucket brigade. So far, the best they can do is the introductory reference 8, as it happens the work of a fellow blogger.

Feature vectors also feature in the language work noticed at reference 9 and the image work noticed at reference 10 – both involving, as it happens, engineers from Google.

Following reference 5, I was interested to see layers coming into play with Alphazero, with information being stored on a large number of planes, with each plane carrying one bit, or perhaps one byte, of information about each position on the board.

I do not pretend to understand much of what the Google team have done. But I believe that much of their progress has stemmed from their having learned how to train neural networks with lots of layers, with previous efforts having been stuck at a very small number of layers. With more layers neural networks can do much more complicated stuff, stuff that one could probably not contemplate using the hand crafted code of old. I imagine that this includes better training for evaluation functions.

I also believe that they have moved decisively away from the alphabeta searches which have dominated the field hitherto, towards a more statistical approach. There is talk of methods involving the phrase ‘Monte Carlo’.

I associate to the march of progress in other spheres, with years of work, satisfying careers even, being swept away by some invention or other. With the way of life of the likes of Silas Marner being swept away, in his case, by the advent of the mechanical, the industrial loom. And of Edward Casaubon being pipped at the post by some Germans from some place called Tübingen.

Conclusions

Computer programs can now beat humans at these kinds of games and to that extent the glamour has gone out of trying to write them. But the Google team will no doubt be looking to leverage their inventions in other fields of endeavour.

While I wonder what lessons can be extracted from the way in which Alphazero plays its games. What does Alphazero tell us about these games and the way in which they should be played? Or if it is still too much down to the brute force of big computers to be of much interest to a human player, what does it tell us about how a brain might do other things?

It is not enough, at least not enough for me, to say that we have built a black box which solves the puzzle. I want the know all about the workings of the black box. Analogies might be the structures postulated by Freud – such as the id and the superego – to explain the workings of the mind, the chemical structures which do explain the workings of chemicals, without needing to worry all the time about the myriad activities of sub-atomic particles, or the diagrams which are drawn to make sense of what one sees down a microscope.

PS: Google’s personnel people clearly search far and wide for their engineers, if the names on the paper referenced below are anything to go by.

References

Reference 1: http://www.kurzweilai.net/.

Reference 2: Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm - David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis – 2017. 

Reference 3: Reinforcement learning in board games - Imran Ghory – 2004. Written as a final year dissertation at the University of Bristol and which, inter alia, contains useful background and tutorial material.

Reference 4: Some studies in machine learning using the game of checkers - Arthur L. Samuel - 1959. But beware: there are a paltry 19 footnotes and references. Clearly not a proper academic at all.

Reference 5: http://psmv3.blogspot.co.uk/2018/01/an-introduction-to-lws-n.html.

Reference 6: http://chessprogramming.wikispaces.com/Alpha-Beta.

Reference 7: https://www.santafe.edu/.

Reference 8: http://vinodwadhawan.blogspot.co.uk/2013/05/79-bucket-brigade-in-john-hollands.html.

Reference 9: http://psmv3.blogspot.co.uk/2017/11/reading-brain.html.

Reference 10: http://psmv3.blogspot.co.uk/2017/11/more-google.html.

No comments:

Post a Comment