psmv3: More meeting in the middle

The other day I talked (at reference 1) about meeting in the middle in the context of bridges.

Now, this is also something that one does in the context of brains. One looks at the insides, all the billions of cells in the brain and tries to work up and out. Or one can look at the outside, perhaps from right outside with a scanner, or from the inside, as a person with subjective experience might, and try to work down and in. To try to figure out what would be necessary to generate what one sees; a process sometimes called reverse engineering, something that far eastern engineers – Chinese, Japanese or Korean – are said to be very good at – they want the product, while we are vain and want the glory. With the hope that the two ways of doing things are going to meet in the middle.

A more tractable version of this problem is to try and figure out how an image from the internet which is displayed on your telephone is put together. This problem is illustrated above, where I have gone to the BBC website and right clicked something to bring up the html code, with the result that the stuff on the right hand side describes most of what I see on the left. Which, as can be seen, is a reasonably complex business.

A more tractable version still is to try and figure out how the Powerpoint slide that you see in the conference room or lecture theatre is put together. What might you find if you inspected the pptx file which underlay what you are seeing? What would have to be there, at least in some sense, for you to be seeing what you are?

In the present post, I focus on just one feature of Powerpoint, the jump within document.

We consider the Powerpoint file as a stream of characters. We suppose that this file can be broken into a series of segments. A header segment, followed by a series of body segments and finished off with a trailer segment, where a body segment might either be a slide segment or an information segment. With the distinction being that while both slide segments and information segments are necessary, what you actually see is mainly specified in the slide segment, with a one to one correspondence between slides that you see and segments. We are vague about what the information segments might do and about how many of them there might be. But I feel sure that they are there, rather like all the padding in DNA.

The header segment will contain data about the presentation as a whole, for example the name of the font to be used by default, in the absence of any further specification. The trailer segment might contain some statistics, like the number of slides, which can be checked against the rest of the file. Integrity checks, just to make sure that something has not gone wrong during construction. And if you look very hard at a slide segment you might be able to find some of the words that you see when the slide is displayed, just as if you look very hard at the right hand side of the illustration above, you might be able to see some of the words which appear in the left hand side. It will probably help if you click to enlarge.

On a Powerpoint slide, there will often be words as well as pictures. Some of the words and phrases will be in blue underline, with the convention being that such a word or phrase marks a hyperlink: if you – or the presenter – clicks on it, you will be taken through hyperspace and deposited somewhere else, in the case with which we are here concerned, somewhere else in the same Powerpoint presentation.

So what we do now is speculate about how this might be done.

From errors in such links that I have turned up in the past, one possibility is that the link comes in the form of an integer, either positive or negative but not zero. In the case of a positive number, jump forward so many slides, in the case of a negative number jump back so many slides. In the code, this might be expressed as something like ‘… {anthropomorphic:-432} …’, where anthropomorphic is the bit which is to be displayed in blue underline and minus 432 is the jump number to take you to the definition of this interesting sounding word, first seen some hours previously. Curly brackets and colon are then special characters which are either otherwise forbidden or subject to special treatment, special treatment which I shall not go into here, beyond saying that you could ask google about escape characters.

A problem with this implementation is that while it is easy to insert such a link while in edit mode, such links need to be maintained. Every time you insert a new slide or delete an old slide, Powerpoint needs to check for any such link which that insertion or deletion affects and adjust it accordingly, adding one or subtracting one. If you delete a whole bunch of slides it needs to do something slightly more complicated. My suspicion was that Powerpoint was sometimes getting this wrong, a supposition which would have accounted for the errors in links that I was getting in my rather large Powerpoint, then running at more than 1,000 slides. Maybe the problem was that Powerpoint, for some reason that one can only guess at, only allowed such numbers to have at most three digits, with strange things (which standards people sometimes rather kindly call implementer defined) happening if you went to four. Such things do happen in the world of bits and bytes, for what at the time seem like perfectly good reasons, however silly it might look when it goes wrong.

Another possibility was that Powerpoint assigned every slide, as it was created, a unique reference number, a number which would not be reused (allowing reuse is apt to cause complications) and which would be included in every slide segment. Your link could then be in the form of a reliable, absolute address rather than an unreliable relative address. The pain would come at execution time: instead of just nipping backwards or forwards through so many slides, Powerpoint would, potentially, have to search through the whole file to find the address in question. Which, although this seems a bit unlikely these days, might take a while in the case of a very big presentation.

So one makes an index. One includes a table in the header segment which maps one from absolute address to slide number. One includes another table which maps one from slide number to absolute position in the file, the serial number of the character in the character stream which makes up the file. In the olden days, this might have been the actual address on a disc unit, the track and segment numbers, if I have remembered the jargon aright. Then, given the reference number of the slide you want to jump to, you get the slide number from a search of the first table, a rather quicker business than a search of the whole file, and you get the absolute position from a search of the second table. And off you go.

Leaving someone else to remember to generate new reference numbers when you copy and paste a slide in presentation edit mode, rather than just copying the old reference, along with the rest of the old slide.

Which is all fine and dandy, but you now have a much more complicated piece of machinery and there is a lot more to go wrong. There is a lot more code for Microsoft to look after, and to test, every time a significant change is made in any other part of the system, just in case there was some unexpected & untoward interaction. Or side effect as they say in Big Pharma.

Yet another possibility is that rather than relative slide number, the link is itself an absolute position in the file. Hopefully, Microsoft did not select this one.

An even more exotic possibility is that the link contains a search term rather than an address. So when you go to jump, Powerpoint executes the search term – for example find the slide which contains the words ‘red’ and ‘apple’ – and takes you to the first such slide that it turns up. Hopefully there is only one such. The up-side of this one is that you avoid the need for obscure reference numbers, reference numbers which will almost certainly confuse some poor sap of a maintenance programmer, some years down the line.

And no doubt, if I gave it a bit more time, I could come up with other possibilities.

The lesson which I draw from all this is that even in this seemingly simple example, there is plenty of scope for complication under the covers. It is hard to work out from what one sees from the safety & comfort of the lecture theatre what is going on under them. So people who want to work out brains, beware!

The good news is that there are usually errors, and from the errors one can sometimes get a handle on the machinery; errors are often a lot more revealing about what is going on inside than normal workings, with normal workings often, by design, doing a very thorough job of hiding what is going on inside. Good news which people who look at brains exploit to the full.

PS: it is, of course, always possible that the file which underlies a Powerpoint presentation is not organised by slide at all, rather by feature. First it keeps all the letters from A thru M, then it keeps all the letters M thru Z, then the numbers, then the special characters, then the rectangles, then the ellipses, then the text boxes and so on. And then assembles a slide from all these bits and pieces when it is needed. Just in time, as the supply chain jargon goes. Which, I believe, is roughly the way that the brain does things.

Reference 1: http://psmv3.blogspot.co.uk/2016/10/meeting-in-middle.html.

psmv3

Saturday, 15 October 2016

More meeting in the middle

No comments:

Post a Comment