psmv3: On pitch

In the course of the advertisement at reference 1, I mentioned the business of pitch, as in the stuff in music, rather than the stuff you paint on the bottom of boats. Today, I thought to put pen to paper, or at least my two index fingers to key. Many learned tomes and many learned papers on the subject notwithstanding.

Something of a salutary tale, in that while we might start out by thinking that pitch is a simple matter, we then look at all the work which has been done, at all the knowledge that has been accumulated – and find that there is still a great deal to learn about how exactly the brain accomplishes this particular miracle.

Introduction

Pitch is a subjective attribute of some sounds. Generally speaking, sounds either have pitch or they do not, and the population of sounds which have pitch will vary a bit from person to person and from time to time; variation which we neglect for present purposes. Very roughly speaking the noises made by mammals have pitch, with all mammals having vocal apparatus not so very different in principal from, say, an oboe, whereas a lot of the other noises in the jungle do not. Or put another way, the sounds produced by vibrating strings and vibrating columns of air have pitch, a consequence of the physics of such things - remembering here that the vocal and neural apparatus which does pitch evolved a long time before we got around to either of them.

Pitch has one very important property. If one makes two sounds with pitch, one after the other, most people can reliably say which one has a higher pitch than the other. A property which is also transitive, so if sound A has a higher pitch than sound B and sound B has a higher pitch than sound C, then sound A has a higher pitch than sound C. Which is more or less equivalent to saying that there is a map from the world of pitch to the world of positive real numbers. Some people, for example my late brother, have something called absolute pitch, which means that not only can they say this sound has a higher pitch than that sound, they can say that this sound has a pitch of the F# above middle C, that is to say 370Hz – although I imagine most musicians do not bother much with the Hz. But if you do, see reference 6.

But a map which is not one-to-one. We can happily say that two sounds have the same pitch which are otherwise quite different. One sound might be the long note produced by a clarinet while another, with the same pitch, might be the short note produced by a piano.

Pitch has another property called salience, roughly speaking how clear it is that the sound in question has a pitch. This because while what we say above about sounds either having pitch or not is fair enough, there are still plenty of sounds which are in between. Remembering here that salience has nothing much to do with how loud the sound in question is.

And pitch is very useful, a very useful way of classifying sound, rather as colour is a very useful way of classifying light. It is a very helpful, for example, in deciding what sort of a person is speaking to one and what sort of a state they are in. Maybe it is one of the cues used to sort out the various streams of sound at the cocktail party (which features in so many essays about sound). And if you could not do pitch, you could not do Chinese at all, which would be awkward as there a lot of people who do.

Spectra

Figure 1

One way into this is to look at the frequency spectrum produced by the sound, the sort of thing you get if you sing to your laptop when it is looking at reference 2. The snap above is the first line of me singing ‘Jack and Jill went up the hill’. If it helps, the note called middle C by musicians, around the middle of a piano keyboard, is around 250Hz, towards the bottom of the snap, which is in kHz.

Now if you played a note on your clarinet to your laptop rather than singing to it, what you should get, aside from noise, is a set of horizontal lines, each one standing for one of the harmonics of the note in question. Often, all those harmonics bar the one at the bottom will be overtones of the one at the bottom. So if the one at the bottom has the frequency 100Hz, then then other strong ones might be the simple (integral) multiples 200Hz, 400Hz and 500Hz. With 300Hz being a deliberate omission; they don’t all need to be there. And the pitch of the note would be 100Hz. As would a pure note with frequency 100Hz played by the computer or a mixture like 100Hz, 200Hz and 300Hz played by some other instrument.

Or, and this was the intriguing property of bells I talked of at reference 3, the one at the bottom might be missing. So the pitch of a mixture like 200Hz, 300Hz and 500Hz would still be 100Hz. Noting that the spectrum illustrating that post is loudness up by frequency across, while the one illustrating this post is loudness by colour, frequency up and time across.

Rules for pitch

So Rule No.1 is that the pitch is given by the highest common divisor of the frequencies of the principal components of the sound in question. In this we allow a bit of rounding, so we might round 103.4Hz to 100Hz. With exactly how much being one of many tricky questions.

If you fancy your singing voice, you can try singing scales to your laptop, which does then produce quite respectable lines of this sort, with no omissions, bottom or otherwise.

The first complication after that is that Rule No.1 does not work when the principal components have rather high frequencies compared to that of the highest common divisor, say a mixture of 2,000Hz, 2,100Hz and 2,200Hz. Rule No.2 is that in cases like this the pitch is the average.

In other circumstances again, it might be the loudest, so 2,000Hz might be picked from the three frequencies just given. This is Rule No.3.

Another quite different way of looking at pitch is to say for Rule No.4 that a sound has pitch if it repeats, with the length of the repeat lying roughly in the range 0.25ms to 25ms. With the ear being to detect pitch on the basis of some quite small number of repeats. The frequency of the pitch is given by the reciprocal of the length of the repeat, so something from 4,000Hz to 40Hz, with our sense of pitch deteriorating quite quickly outside of this range. With one technical worry being that to do 25ms, quite a long time in brain speak, the brain needs to be able to store information, in one way or another, for several times that sort of time, say 50ms or more.

This sound might just be a few milliseconds of noise, completely random. But if that completely random segment of sound repeats, it will acquire pitch.

Figure 2

With the famous Fourier transform (of reference 4) providing a bridge back to where we started, a transform which says that something that repeats can be expressed as a sum of sinusoidal terms. It is illustrated in the snap above, with ‘x’ being time, the ‘n’ being positive integers, that is to say whole numbers, the ‘A’s being real numbers, the weights, as it were, ‘P’ being the period or duration of the repeat, the ‘ϕ’s being the phases of the various terms and with each term corresponding, more or less, to a horizontal line on the snap above. And with ‘π’ being pi, the very important (and transcendental) number which tells us the length and area of a circle for a given radius.

One measure of the complexity of such a periodic sound is the number of terms one needs to add up to obtain a reasonable approximation to what we started with.

If we worked harder, we could probably come up with a set of rules which defined pitch, an algorithm if you will; something which makes what was subjective into something which is objective. Something which you could code into your laptop and which would be able to tell you at what pitch you were singing or what the pitch was, if any, of any other sort of sound you might care to provide.

Sounds which are not pitches

Then no doubt experiments have been done on asking people about their perception of notes which have been mixed up. Do they hear a single pitch or several notes, each with their own pitch? Guessing, the mixture of one note and a second note an octave above the first (double the frequency), sounds very like one note with the pitch of the first as all the components of the second are also components of the first.

But what about more complicated mixtures, perhaps of notes of varying durations? What about sounds which vary in time, what about music? Or the sound of a mountain stream? All kinds of possibilities, which lead to the thought that sounds with pitch are actually particularly simple forms of sound, only making up a small proportion of all possible sounds, rather in the way that the integers only make up a small proportion of all possible numbers.

Brains

What is more tricky still is finding out how the brain does all this, remembering here that most of us have a very good sense of pitch - and I dare say that is has been demonstrated that some animals do too. This despite the huge amount of work which has been done over the years on the perception of pitch, the subjective experience.

It has been demonstrated that the signals running from the ears to the brain stem contain all that is needed to do the sums - but how exactly the brain actually does the sums is quite another question.

Another tricky question is where and how the answer is stored in the brain. No-one, at least at the time reference 5 was written, has found neurons in the brain which code for pitch, in the way the people have found neurons in the brain which code for particular faces, perhaps those of Donald Trump or Jeremy Corbyn. Perhaps the answer is that it is not like that, that there is not a bit of brain containing neurons tuned for all the various pitches of interest.

So plenty of work yet to be done.

References

Reference 1: http://psmv3.blogspot.co.uk/2018/04/advertisement.html.

Reference 2: https://www.auditoryneuroscience.com/spectrogram.

Reference 3: http://psmv3.blogspot.co.uk/2017/01/virtual-pitch.html.

Reference 4: https://en.wikipedia.org/wiki/Fourier_series.

Reference 5: Auditory Neuroscience: Making Sense of Sound - Jan Schnupp, Israel Nelken and Andrew King - 2012.

Reference 6: http://peabody.sapp.org/class/st2/lab/notehz/. Being the work of Craig Stuart Sapp now or lately at the Computer Music Department, Peabody Conservatory of Music, Johns Hopkins University.

psmv3

Friday, 6 April 2018

On pitch

No comments:

Post a Comment