‘High Fidelity’ reproduction was something we could only strive towards in the 1960’s because the intrinsic noise level of even the best analog tape recordings was just too high. Then the introduction of DolbyA noise reduction made possible near-perfect master recordings. With the introduction of the CD around 1983 it was suddenly possible for the public to hear the full quality of the master recording, but there was a new problem, though it was never properly explained. Some people thought the new CD’s sounded ‘harsh’, and in some cases they did.
The Need for ‘Headroom’
For many years a myth circulated in the HI-Fi world that the new CD players had very high output levels, and would overload normal inputs. Devices were even sold that attenuated the output level. The truth was that CD players did produce higher levels than other devices, but only on brief peaks. Manufacturers had not chosen to make CD’s louder, in fact the ‘typical’ loudness of a 1980’s CD played without any attenuation was much the same as from radio or compact cassette, but the waveform contained brief peaks at high levels that had never been seen before – recording engineers had increased the available ‘headroom’. It was true that some audio system inputs could not cope, and were clipping the brief peaks harshly, causing distortion at all volume settings. Where this was not the case, power amplifiers and speakers were unable to cope with the brief peaks at all but the quietest listening levels. At typical volume settings, the brief peaks were way above what the power amplifier could cope with, and were simply ‘clipped’, giving harsh distortion. Some blamed the power amps, noticing that older valve (tube) amps sounded much better, and a new market sprang up in valve amps, sold to high-end connoisseurs at ridiculous prices. Others blamed the new ‘digital’ sound as somehow fundamentally flawed; which sometimes it was, because digital convertors and their associated filters, and ‘dithering’ had yet to be perfected. The biggest problem though, went unrecognised: the problem of ‘Headroom Management’.
High fidelity recordings, like live sound, have enormous ‘dynamic range’, but the term dynamic range can have many meanings. Music can obviously have loud and quiet passages, varying over minutes, and loud sounds within those passages (like drum beats and cymbal clashes) lasting seconds or less, but when looked at on a much shorter time-scale the waveform of music can also contain very brief peaks, often arising from the initial strike of a cymbal or drum, or the initial plucking of a string. These peaks are so loud that even the best professional equipment of today cannot usually reproduce them. In modern terms, audio waveforms are to some extent ‘fractal’ – the closer you look, the higher the peaks, and this is even true when you look with wider bandwidth, because impulsive sounds like cymbal crashes contain frequencies way above what we can hear. So the peaks get even higher when we use wide-bandwidth microphones, and though their components may be inaudible they still have to be coped with properly as peaks, without clipping if they are not to cause mysterious effects. Herein lies the cause of yet more myths concerning the ‘different’ sound high-sample-rate digital ‘super-audio’, but that’s a topic for detailed examination elsewhere.
Though very loud, brief peaks do not necessarily sound loud, because our ears cannot respond instantly to bursts of sound; in fact they may hardly contribute to the apparent loudness of a recording, but what they do add is ‘sparkle’, especially on cymbals, bells, and plucked strings. In the early days of recording, brief peaks were simply lost in the process of recording to analog tape. Recording engineers knew that the more they pushed up the recording level so that the meter went ‘into the red’ the less intrusive the inevitable tape hiss would be. The ‘soft-limiting’ characteristic inherent in the very process of tape magnetisation, together with the brief duration (milliseconds) of short peaks meant that this could be done without obvious distortion, and the BBC, realising this, developed a PPM (Peak Programme Meter), with an ‘integration time’ of about 10 milliseconds, just slow enough to ignore the very peaks that the engineer could safely afford to lose by soft-limiting. For decades fast peaks were banished, lost in recordings, not seen on meters, hardly missed because systems able to record and reproduce them simply didn’t exist. Audio lacked sparkle.
When CD’s arrived, some thought they were artificial; too bright and ‘electronic- sounding’; others liked the new brightness, but few realised that it was not just a super-flat frequency response, and ultra-low noise level that was making the difference. Engineers had been recording at lower levels for some time, now that Dolby had reduced the noise level of tape drastically, so fast peaks were being preserved more on master tapes. CD had an even lower noise level than the master tapes, so these could be transferred directly to the new medium without losing anything – provided that full-scale digital-level was reserved for the accommodation of fast-peaks only, with the average level kept well down. Most early CD’s – Pink Floyd, The Beatles, and all the ‘back-catalogue’ as well as the new Dire Straits and others, had a ‘typical’ level around 18dB below maximum, and although most originated as analog tapes they are to this day the best test material available to the listening public. Headroom had arrived, but not for long.
As noted already, power amps and speakers were being strained by the new fast-peak content. Where previously the problem of headroom management had been carried out almost unnoticed, by the ubiquitous analog tape machine, now it was shifted to the listening equipment, which could only clip the peaks – the worst thing possible because clipping produces high-order harmonics, the harshest form of distortion. Most audio systems wouldn’t go loud enough without distorting. Brief-peaks 20dB above ‘normal’ programme level, have an amplitude ten time higher, but this corresponds to 100 times the power level. A typical home system, with 100 watts per channel, and domestic speakers (86dB SPL for 1w) can produce a maximum sound level of around 105dB SPL at the listening position, and though this sounds a lot, it falls short of what is needed to produce realistic levels on music. Remember, this is maximum, level. Because music bounces up and down in level, it must spend much of its time below this level, so on a sound level meter which measures A-weighted RMS (root mean square) the maximum reading on music will always be be lower than this, perhaps 90dBA SPL. But fast peaks need around 20dB more, though they contribute little to the loudness – and that’s 10 times the level, or 100 times the power – 10,000 watts to produce 125dB SPL. Yes, a home audio system, if it is to cope with high-fidelity reproduction at a level that is quite loud, but still not as loud as the real thing, needs 10kW of amplifiers; and speakers to match. That’s approaching the sort of power level used in outdoor pop concerts. It sounds all wrong, but its true! Before going any further, it should be noted that installing 10kW amps in a the average listening room would be dangerous – we need brief peaks at that level, but we also need seriously good protection from the possibility of continuous sound at that sort of level (120dB SPL or more) if we are not to suffer permanent hearing damage the first time we plug in a phono lead with the amp turned on!
Radio broadcasters traditionally worked with 8dB of headroom, so in order to cope with CD’s they had to use compression, and a new breed of compressor arrived. Where early compressors, like dBX had suffered from ‘pumping’; the tendency of the whole sound to pump up and down in level following the bass, new systems like ‘Optimod’ spit the signal into many frequency bands before compressing each one individually and then remixing them. In fact they were re-composing the original balance of the music, but because they got rid of fast peaks, they produced a ‘punchy’ sound, which could now be turned up loud without distorting. The poorer the replay system, the more it gained from the new compression, and radio stations realised that they could compete for attention by pushing the compression to the extreme. It wasn’t hi-fi, but it was generally acceptable, and soon most CD’s would also use extreme compression in order to compete. A compressed CD allowed the volume to be turned up without distortion! And the days of Hi-Fidelity were over. A true measurement of distortion on the output of a typical top quality audio system today, one based on comparing the output with the original sound, would give a figure of, quite possibly 1000% – forget those 0.001% THD equipment specs. Of course, we don’t measure distortion that way, and we don’t put distortion specs on compressors, but we probably should do.
Compression became commonplace, and as cleverer and better compressors became available, the concept of high fidelity would gradually be abandoned, leading to a strange dichotomy. Modern audio sounds good. It is loud, and ‘clean’ and noise-free, but it is almost never ‘realistic’. What’s missing is ‘Headroom’. We have at last the technical ability to record with very high fidelity, in fact such are the wonders of modern electronics that almost anyone can do so. We also have the technology to make such recordings available without having to compromise their quality. The problem is that compression is necessary. It’s necessary if the in-car listener is to be able to hear anything but the occasional loud passage, above the noise of even a modern car interior. It’s necessary, but to a lesser degree, if the average modern sound system is to give its best at the loudness levels people expect for ‘partying’ without the sound being atrociously distorted. And it’s necessary to some degree, if broadcasters are to maintain reasonable ‘signal to noise ratio’ on historically compromised (FM) transmitters; but because a compressed signal can never be truly described as ‘high fidelity’.
A Solution for High Fidelity Recording and Reproduction
Compression should be put off until the last stage of the listening chain as much as possible.
In a separate article, ‘Analysing Programme Levels’, I have shown that current 24-bit recordings can capture the full ‘true dynamic range’ of just about any music source, and that 16-bit recordings such as CD’s can preserve most of this range, and certainly more than even the best currently available loudspeakers can reproduce. Given that fast peaks need to be preserved for maximum realism, it therefore makes sense to leave around 18dB of headroom on all recordings, using just a small amount of soft limiting to reduce the range of the master recording. If the soft limiting characteristic were standardised, operating so as to make the top 6dB of all 16-bit recordings span 12dB of input range, then it would even be possible to restore the original peaks on any future system considered capable of handling them. While such a scheme cannot operate without introducing some distortion (because the soft-limiting generates harmonics which are lost in the 20kHz bandwidth of a CD) this would probably be completely inaudible, given that these levels are only accommodating occasional fast peaks.
It then has to be acknowledged that compression is needed on FM broadcasts, but this could be mild, as currently applied to BBC Radios 3 and 4 which currently operate successfully at programme levels well below the level of all other broadcasts, in order to preserve headroom, though unfortunately they often fail to make use of the available headroom because compressors are used in the studio or on outside broadcasts without proper consideration of the entire chain.
Digital broadcasts – DAB and DTV can easily carry the full 16-bit dynamic range over their bandwidth compression codecs like MPEG2, and should really be working to a new standard, with 18dB of headroom. Instead, they seem to be mostly being driven by heavily the compressed signals that feed their corresponding FM transmitters, often at close to maximum level. This is far from ideal, and if the feed from the studio was always based on 18dB of headroom, and only the FM transmitter was provided with a compressor at its input, then listeners could expect real gains when they listened to digital broadcasts. This assumes that digital radios and set-top boxes have the low noise level needed for such operation, which unfortunately is not currently the case. My Freeview box has processor buzz on it measuring –50dB AL 468-weighted, which is 16dB worse than true 16-bit performance (-66dB), and most PC’s, even with external ‘sound cards’ have noise levels in the range –50 to –60dB.
Another weakness of the current approach in broadcasting concerns video production. Currently, most television broadcasters require programmes to be submitted with ‘legal sound’, meaning that it must be checked on a PPM (IEC type II) and fall within the traditional +8dB limit for broadcasting (PPM6). This makes no sense, because simply reducing the level of sound in a programme to stay within +8dB is not usually an option. Loudness is a matter of artistic composition, and so compression must be used to achieve the desired loudness while staying within the rules. This produces a programme master with dull sound, when it would really be much better to work with 18dB of headroom while editing, and produce a master capable of being used in cinemas or over future broadcast systems with properly managed headroom. Once again, the compressor should be considered a temporary ‘fix’ at the input of the current television transmitter.
With headroom maintained as far as possible right up to the listener’s equipment, it then becomes necessary to consider compression as a means of providing the listener with a better experience. Car radios in particular, need a compressor option if listeners are to hear anything other than just the loudest passages above engine and road noise. Home systems need a compressor to allow for quiet background listening, and maybe some soft limiting at the power amplifier to ensure that when it is overdriven it overloads gracefully. All these should be optional though – providing a really effective extra button instead of all those gimmicks like ‘jazz setting’ and ‘speech’ which make no sense at all when we are simply trying to reproduce the original!
Compressing Without Headroom Loss
Once the need for both headroom and compression is properly understood a new distinction becomes clear. Since our ears respond only partially to brief peaks they contribute little to overall loudness, even though they are at levels well above normal. So they don’t need to be removed during compression! In fact there are two reasons to compress; to avoid overloading the channel, or to raise the level of quiet passages for an easier listening experience. When headroom is maintained from studio to listener it should be perfectly possible to compress using a slow-attack time, in a way that raises the level of quiet passages but leaves the fast peaks untouched, providing easier listening without the horrible ‘squashed’ sound that is currently heard on most broadcasts. This distinction does not seem currently to have been recognised.