Digital sampling, ‘PCM sampling’, or just ‘sampling’ is the process of representing a signal waveform as a series of numbers which represent the measurement of the signal’s amplitude, taken at regular intervals. This process is widely used in modern audio and video systems, including television and telephone networks.
Strictly speaking, the process of sampling must be regarded as separate from the process of digitising. Sampling produces a series of values which may be represented in various ways – the output from the process can be a series of analog pulses, pulse-height modulation, pulse position modulation or pulse-width modulation. Most commonly though, the samples are represented by binary numbers, in a process known as PCM, an acronym for Pulse-code modulation, because they are then amenable to storage and processing in digital systems.
The basic theory of digital sampling in relation to audio and video is often misunderstood by the thinking that the samples are the signal. This misconception is understandable, given that it is indeed possible to listen to digital samples directly or to view video samples directly but this must be regarded as a ‘cheap and cheerful’ approach, and misses out a vital component of basic sampling theory – the ‘reconstruction filter’.
While it is intuitive that sampling an original waveform and then presenting the sample values as joined-up segments will produce an approximation to the original, this misses the true realisation that Nyquist made.
What Nyquist realised was that if an original signal is filtered (band-limited) to remove all frequencies above what we call the Nyquist frequency, then it is possible to reproduce the exact (band-limited) waveform by processing the samples in a ‘reconstruction filter’, which is simply another low-pass filter with a cut-off frequency equal to the Nyquist frequency. There is no approximation, no distortion, what goes in comes out, apart from any components above the Nyquist frequency. It is these frequencies above the Nyquist frequency that are often a source of colouration in poorly designed digital equipment, as they can intermodulate to produce audible artefacts.
Errors resulting from the Nyquist limitation
This is only literally true if the two filters employed are ‘brick-wall filters’, in other words they cut off totally above the Nyquist frequency. Even if such filters were realisable in practice basic theory says that they would have infinite delay – they would take forever to produce any output. This must not be seen as an obstacle to perfect reproduction though. By designing with a ‘guard band’ it is possible to use imperfect filters to obtain output that is as accurate as we care to make it (within the bandwidth limitation).
Quantising errors resulting from the process of digitisation
In digital sampling, the accuracy of the resulting waveform is also affected by the stepwise nature of the digitising process, resulting in what is referred to as ‘quantisation error’. This error, which occurs from sample to sample, is not necessarily random, but may be correlated with the signal, producing serious audible distortion in audio systems that do not take steps to eliminate it. Some early CD’s suffered from quantising distortion, which was especially audible on quiet piano notes, adding a granular noise that sounded like ‘sand in the speakers’. It could also be heard as spurious tones accompanying higher frequencies. Quantising distortion soon became a thing of the past though, with a better understanding of the process of ‘dither’ which involved adding a low level of noise to the signal before sampling in order to randomise the individual sample errors and hence ‘de-correlate’ the resultant errors from the signal, so that all that was heard was noise (hiss).
Digital sampling in audio
Audio waveforms are commonly sampled at 44.1k samples (CD) or 48k/96k/192k samples (professional audio). CD’s use 16-bit digital representation, and would sound ‘granular’ because of the quantising noise, were it not for the addition of a small amount of noise to the signal before digitisation, known as ‘dither’. Adding dither eliminates this granularity, and gives very low distortion, but at the expense of a small increase in noise level. Measured using ITU-R 468 noise weighting, this is about 66dB below alignment level, or 84dB below FS (full scale) digital, which is somewhat lower than the microphone noise level on most recordings, and hence of no consequence (see ‘Programme Levels’ for more on this).
Optimising dither waveforms
In a seminal paper published in the AES Journal Lipschitz and Vanderkoy pointed out that different noise types, with different probability density functions (PDF’s) behave differently when used as dither signals, and suggested optimal levels of dither signal for audio. Gaussian noise requires a higher level for full elimination of distortion than Rectangular PDF or triangular PDF noise. Triangular PDF noise has the advantage of requiring a lower level of added noise to eliminate distortion and also minimising ‘noise modulation’. The latter refers to audible changes in the residual noise on low level music that are found to draw attention to the noise.
Noise shaping for lower audibility
An alternative to dither is noise shaping, which involves a feedback process in which the final digitised signal is compared with the original, and the instantaneous errors on successive past samples integrated and used to determine whether the next sample is rounded up or down. This smooths out the errors in a way that alters the spectral noise content. By the neat device of inserting a weighting filter in the feedback path the spectral content of the noise can be shifted to areas of the ‘equal-loudness contours’ where the human ear is least sensitive, producing a lower subjective noise level (-68/-70dB typically ITU-R 468 weighted).
24-bit and 96kHz pro-audio formats
24-bit audio does not require dithering, the noise level of the digital convertor being far higher in practise than the required level of any dither that might be applied.
The trend towards higher sampling rates, at two or four times the basic requirement, has not been justified theoretically, or shown to make any audible difference, even under the most critical listening conditions, but nevertheless a lot of 96kHz equipment is now used in studio recording, and ‘super-audio’ formats are being promised to consumers, mostly as a DVD option. Most articles purporting to justify a need for more than 16-bits and 48kHz state that the ‘dynamic range’ of 16-bit audio is 96dB, a figure commonly derived from the simple ratio of quantising level to full-scale level, which is 2 to the power 16 (65536). This calculation fails to take into account the fact that peak level is not maximum permitted sine-wave signal level, and quantising step size is not rms noise level, and even if it were it would not represent loudness, without the application of the ITU-R 468 weighting function. A proper analysis of typical programme levels throughout the audio chain reveals the fact that the capabilities of well engineered 16-bit recording far exceed those of the very best hi-fi systems, with the microphone noise and loudspeaker headroom being the real limiting factors.
There is however, a good case for 24-bit recording in the live studio, because it enables greater headroom (often 24dB or more rather than 18dB) to be left on the recording without compromising noise. This means that brief peaks are not harshly clipped, but can be compressed or soft-limited later to suit the final medium.