Thursday, 28 September 2017

Scoring for music

Introduction

In posts about LWS to date there has been a bias towards vision and it might be thought that being built on vision, LWS will not be much use at other modalities, such as taste, touch, smell and sound.

Here we try to tip the balance the other way a bit, to do something with sound. With the big difference that sound is expressed in time while vision is expressed in (more or less two dimensional) space.

Furthermore, our frames of consciousness have hitherto been rather static, rather like the frames of a cinema film, and assembled by the compiler for host consumption one after the other. But if we have it that a frame lasts for the order of a second, we are going to have to update frames as we go along, as we do not think that our subjective experience of sound can be more than a few tens of milliseconds behind the real action, if for no other reason than it would be very hard for musicians in a chamber group, who do not have the advantage of visual cues from a conductor, to keep time with each other otherwise. We shall be saying something about this frame update below.

We notice three sorts of sound: western classical music, speech and other. In what follows we are mainly concerned with the first of these.

In the past, for example at reference 1, we have talked of threads, sometimes corresponding to the staves of a musical score. But threads have been rather lost of late, vaguely subsumed into layers.

We talked at reference 5 of doing colour using three texture nets string across the one region, corresponding roughly to the RGB setup in Microsoft Office on computers, which gives colour as the weighted sum of red, green and blue, with the weights taking integer values in the range [0, 255].

While sound has a frequency distribution, with any particular frequency having frequency and amplitude, both non negative reals, which is a rather different problem.

We believe in mapping data, the data content of consciousness, to LWS in a way which preserves spatial and frequency appearances, rather in the way that somatopic organisation has been demonstrated in some areas of the brain proper. But with audible sound lying in the range 10Hz to 4,000Hz, some sort of scaling is going to be needed to capture this in a direct way in the firing rates of neurons, or in the activation of our neural nets.

In what follows we try out a scheme for the expression of western classical music in the world of LWS-N. Perhaps the sort of thing illustrated by the page of score included below.

Four lines of music for each of the four instruments of a string quartet: first violin, second violin, viola and cello, with the lines being known to musicians as staves, not to be confused with the staves of a barrel. The lines are broken down in time into bars, indicated by the vertical lines, with each bar containing the same number of beats, being of the same duration in time. Each bar is made up of a sequence of notes and rests of various denominations, adding up to the appropriate number of beats, with the pitch of the notes indicated by their vertical position on the stave and with their volume often indicated by various signs and marks below the stave in question, for example an ‘f’ for forte or loud or a ‘p’ for piano or quiet. All this, somehow, needs to be mapped to, coded by LWS-N. That is not to say that the subjective experience is the same as the score, but we are suggesting that LWS-N is organised in much the same way.

Figure 1
Our wheeze is to say that a line of music is a linear layer object, with each region representing the sort of note which might appear in a score, a crochet, a quaver or whatever, with the density in space of the texture net nodes coding for frequency. Perhaps not a one-to-one map between notes & rests and regions, but something along those lines. See references 2 and 3 for earlier takes on linear objects.

Activation washes across the region in real time, then moves onto to the next, never to return.

Whereas activation of a visual scene washes around the region, or perhaps around the layer as a whole, for the duration of the frame. Put another way, the frame of sound grows through its duration, whereas the frame of sight is much more static – although we have described elsewhere wheezes to deal with movement in the visual field.

In the case of our quartet, the duration of the frame might well be some small number of bars; perhaps just one, while the content of the frame might well be some larger number, including a few of history before and a few of prediction after.

Tone

At reference 5 we talked about coding for colour using texture nets. Here we propose using texture nets to code for the tone of a note. Where by tone we mean something a bit broader than frequency or pitch. A tone is rarely pure, it is rarely just a single frequency of sound, much more a blended object with overtones, undertones and noise. Something of this can be seen in the post about bells at reference 6.

Therefore, just as with colour, we allow more than one texture net to the region, where the note in question has more than one component, which it usually will, and where it makes sense to separate out components.

The first idea was that we would code for pitch using the density of the vertices of texture nets, perhaps high densities for high tones, low densities for low tones. With the density sometimes varying across the region, in time or otherwise, with the result that a region can be a lot more complicated than one of the notes of Figure 1.

But there are other possibilities, some of which we will mention below.

The linear layer object

We introduced shape nets in reference 7, nets which defined the shape of layer objects, but did not go on to define linear layer objects. In the illustration below we show a modified version of the shape net shown there as Figure 11.

Figure 2
Figure 2 is the shape net of a linear layer object, a linear layer object with six regions or parts.

Rule one: a linear layer object is connected.

Rule two: a linear layer object has at least two regions.

Rule three: two of the regions have just one neighbour, any other regions there may be have exactly two. The first two are the object’s terminal regions, the remainder are its interior regions.

We might add a rule four which gives the object a direction, with a beginning and an end. This has been expressed in Figure 2 by having a source top left and a sink upper middle, at the other end of the object.

The line of music

Our basic construct in the line of music, expressed as a linear layer object, in which the regions express the notes. This is illustrated in the figure which follows.

Figure 3
So we do sound in two dimensions; volume on the Y axis and time on the X axis.
We have shown this linear layer object as a series of rectangles. The height of the rectangles, the regions, expresses the volume, while their width expresses the duration, which is fine when we have rectangles, unlikely in neural practise. So we will need to define some proxy for height and width in the more general case.

Activation moves from left to right, with the current activation, the current time marked by the vertical red line, here called the time line.

Figure 4
Tone is expressed by the texture nets of the regions, here marked by colour. So in this line we have both the tone and the volume changing from left to right through time. And, as noted above, while any one region may have more than one texture net, expressing more than one note at a time, we have not yet devised a neat way to show this on the page.

We also have gaps, marked by uncoloured rectangles, but which serve to maintain the connection between the regions of the linear object which codes for this line of the music.

We might have the convention that a narrow uncoloured rectangle, here called a gap region, separates two notes which are played one after the other with no scored interval. No gap region is where the notes are run together and a big gap region expresses a scored rest, of the sort taken from Wikipedia (reference 4) in Figure 5 below.

Figure 5
Several lines of music

Figure 6
Here we have three linear layer objects, the blue, the green and the pink, with the regions running across the page. And while it does not have to be rectangles, it does make it easier if the linear objects are parallel, with their times lining up.

Figure 7
So in Figure 7 we have a few bars of each of two lines of music, with the idea that the supporting neurons handle the synchronisation of activation along the two lines – but Figure 6 certainly looks simpler.

We allow a degree of prediction, thus accounting for the presence of data to the right of the red time line in Figure 6.

There is a certain amount of activation to the left of the time line, representing the current phrase being held in working memory, rather less to the right, representing expectations. With the detail here depending on person, time and occasion. A knowledgeable musician or someone who knew the piece of music concerned well might have stronger expectations than others.

On this account, listening to music could easily give much the same result as reading a score, with both being organised in roughly the same way, at least as far as Figure 6 is concerned. But while there might be one layer object for each instrument or voice, things may be expressed at a higher level. We may have a tenor line standing for all ten tenors in the choir. Or a string line standing for all the strings in an orchestra, without distinguishing violins from double basses.

Observations

There is plenty of coding space to spare here. We have made little use of the shape of our regions, which does not seem quite right.

There is an arbitrariness about using up for volume and across for time which is unappealing. But perhaps we are stuck with something of the sort if we stick with mapping stuff onto our patch of neurons in a more or less pictorial fashion.

We have not said much about why the texture nets coding for colour give rise to a different subjective experience to those for coding for sound. There are, no doubt, various possibilities:

  • The density of the vertices of the texture net spanning the part, or neighbourhood within the part. Maybe different density bands for the different modes
  • The mean number of edges to the vertex in the texture net. The mean number of edges to the region.
  • The regularity of the texture net. With the highest score for a regular net of triangles, low scores for very mixed nets.

We have disturbed the organisation of consciousness into frames. The talk here is of activation sweeping across the growing page, rather than swirling around the static page.
We might have activation sweeping across the frame. But in a visual frame it is all set up at the outset, and possibly because density of vertices is low, it can sweep across again and again during the duration of a frame. While in an aural frame the frame is being built up, from left to right, for the duration, with a frame only being stopped and a new one started when the music comes to some sort of a period. Or the space allocated to the frame fills up. In this case the activation doesn’t get to the end and so it just carries on.

Conclusions

We have sketched a way of expressing western classical music in the world of LWS-N, following reasonably closely the way in which it is written down on paper.

Plenty of details to be filled in, but the direction seems promising.

References

Reference 1: http://psmv3.blogspot.co.uk/2016/08/describing-consciousness.html.

Reference 2: http://psmv3.blogspot.co.uk/2017/01/lines.html.

Reference 3: http://psmv3.blogspot.co.uk/2017/07/rules-supplemental.html.

Reference 4: https://en.wikipedia.org/wiki/Rest_(music).

Reference 5: http://psmv3.blogspot.co.uk/2017/09/coding-for-colour.html.

Reference 6: http://psmv3.blogspot.co.uk/2017/01/virtual-pitch.html.

Reference 7: http://psmv3.blogspot.co.uk/2017/09/geometry-and-activation-in-world-of.html.

Group search key: srd.

No comments:

Post a Comment