Audio Processing Explained by Jeff DePolo

Up one level
Back to Home Audio Processing In Mobile Radio
By Jeff DePolo WN3A

A question posed to the Repeater-Builder email list:
I have very little experience with using audio processing. But of the two-way or amateur repeater systems I have heard using it, usually it sounds like crap. Mostly for that reason, I have never used audio compression. It does seem to work well in broadcast. What tricks do the broadcast techs use to make it work well?

The reply by Jeff DePolo, WN3A:
Since one of the things I do for a living is tailoring audio processing for broadcast stations, I'll take a quick shot at this. It's one of those topics that to really do justice would take megabytes, but I'll try to keep it short.

First off, compression is only one of many forms of processing done in a typical broadcast audio chain. A modern FM stereo broadcast processing chain typically looks like:

1. Wideband AGC. Each of the two channels (left and right) has a slow automatic gain control, sometimes operating idependently, sometimes coupled together (to prevent excessive left-to-right gain mismatch). AGC is slow gain-riding, sort of an "extra hand on the console pot" to keep the audio levels sane going into the latter stages of processing. Attack and release times for AGC are usually in the range of 1 to 10 dB per second in a typical setup (just to give some idea on how fast it responds). The AGC will also have upper gain limit, such as +10 dB max, to prevent it from "sucking up" background noise excessively during very quiet passages. AGC's will usually have a gating circuit to prevent the AGC from gaining up the background noise during silence.

2. Multiband compression. Each of the two channels' audio spectrum is split into multiple sub-bands, such as a low-bass region, a middle-bass region, a midrange region, an upper-midrange region, and a high-end region. Some processor manufacturers give nicknames to each of these bands to make them more meaningful such as "bottom end", "warmth", "presence", "brilliance", etc.. Each sub-band is then compressed separately (though often there is some intentional coupling between sub-bands). Multiband compression tends to make everything sound the same spectrally, while at the same time increasing the overall loudness. Compression is faster than AGC, typically on the order of 10 to 20 dB per second. The deeper the multiband compressors are driven into gain reduction, and the quicker the attack and release times, the more "smashed" and the less "open" the audio sounds. Multiband compression may be making gain adjustments to each of the subbands to the order of a few dB at moderate processing levels to maybe 20 dB when really driven hard. Like the AGC, each subband may have a gating control to prevent it from doing strange things during passages that have very un-equal spectral power distributions (such as a drum solo that is bass-heavy or a flute with primarily upper-midrange content). Different formats require different amounts of compression, attack/release times, etc. depending on what the desired sound is to be. Multiband compression tends to make everything sound roughly the same spectrally. Done excessively, it can "color" the sound of the material, making some frequency ranges sound abnormally louder/quieter than they should be. Everything's a compromise in processing.

3. Either as part of multiband compression, or as a separate stage, are various stages of limiting. Because the attack time of compression isn't instantaneous, there will always be brief amplitude peaks that sneak past the compression. Left unattenuated, these peaks would cause overmodulation, or additional disortion in some of the following hard-limiting stages. Limiting attempts to tame those brief peaks. Limiting is very fast, but again not quite instantaneous. Done right, and not excessively, limiting should be relatively transparent, unlike clipping. There will typically be some form of clipping either incorporated into the limiting stages, or separately after the limiters to catch the remaining overshoots. How hard the clippers are driven (i.e. how high the relative amplitude peaks are leaving the limiter) affects the overall loudness as well as distortion products. There are various DSP forms of "distortion-cancelling clipping" that attempt to clip/hard limit but cancel some of the more objectionable distortion products, either via actual distortion cancelling in the spectral sense, or by using "look ahead limiting" where the audio is actually delayed a few milliseconds to allow the clipper to act more like a softer limiter rather than producing square clipping edges.

4. Preemphasis is then added, and the audio goes to a high-frequency limiter to prevent the preemphasized high-frequency region of the audio from exceeding modulation limits.

5. The two audio channels then go to the stereo generator to generate the L+R main carrier modulation and the L-R stereophonic subcarrier. In most modern processors, a composite clipper follows the stereo generator to again catch any overshoots and/or to gain a little extra loudness at the expense of added distortion. The 19 kHz pilot tone then gets injected after the composite clipping stages, and then the composite audio goes to the exciter for transmission.

If you go back through and analyzer the above five items, one thing that should stand out is that processing generally follows a slow-to-fast progression as time constants go. You start out with a slow AGC, follow it with a faster compressor, then a faster limiter, and finally an instantaneous clipper. Re-arranging that order is rarely beneficial.

In addition, there may be other "effects" boxes within the chain somewhere, such as spatial enhancers to "widen" the perceived stereo field, frequency equalization of varying forms, noise reduction, harmonic enhancers, etc.

Now, that's kind of the simple description of how a typical multiband FM processor works. There are various permutations of the above, lots of "patended processes" that make each processor somewhat unique, but that's the high-level view of the various pieces of the puzzle. Virtually all multiband processors nowadays use a whole lot of DSP horsepower to accomplish all of this. Some stations, especially those that try to stay on the "natural" side, may only have one integrated AGC/multiband compressor/limiter/stereo generator in the chain. Some of the more esoteric arrangements may have 5 or more different boxes in the chain.

So, why does it work so well in broadcast but not in two-way? Well, several big differences:

1. Broadcast uses 75 microsecond preemphasis whereas two-way has continuous preemphasis. As such, in broadcast, most of the "meat" of the audio spectrum is not preemphasized. If you were to try to use a single-band compressor on discriminator (preemphasized) audio in a repeater, it would sound like hell because high-frequency noise components would be higher in relative amplitude than the meat of the voice region due to preemphasis, thereby driving the control stages in the compressor into gain reduction. In other words, noisy signals would sound unnaturally quiet. Compression on a well-quieted deemphasized signal can work, but given #2 below, only moderate amounts of compression are really practical.

2. In broadcast, you have much higher S/N ratio to work with. You're starting off with good, quiet program material, in some cases 90 dB or more of S/N, not noisy HT users or mobile flutter laden signals with as little as a few dB of average S/N, maybe 40 dB on a "full-quieting" signal. As such, you can get away with doing 10 or 20 dB of gain adjustment without bring up the noise floor so much as to be objectionable. Likewise, in broadcast, you have much wider dynamic range due to the lower S/N ratio.

3. In FM broadcast, you have 15 kHz of audio bandwidth. In two-way, you have 3 kHz nominally. Multiband compression isn't too practical for two-way because the bands would end up being excessively narrow making voices sound unnatural. And single-band compression has the S/N problems per #2 above.

4. In broadcast, you're mostly dealing with music. You can introduce a LOT more distortion and multiband compression coloration in music without it being too objectionable. In two-way, we're dealing with voice. The human ear is exceptionally critical of distortion and frequency response abberations when listening to human voice. As such, you can't get away with all of the loudness-increasing techniques on voice as you can with music. FM talk/news stations will have a much more tame processing chain configuration than would your typical rocker or urban station for obvious reasons.

5. In two way, your source material is rather random in terms of quality. There is clipping/limiting done in the user radio, with relatively little regard to distortion content. You may have signals with hiss, others with mobile flutter (which can really wreak havoc on a compressor since the flutter noise is often higher in amplitude than the average voice level), bad audio quality in terms of frequency response, etc. So the deck is stacked against you as far as quality goes, even before you get to the processing-related obstacles.

So realistically, what kind of audio processing can you do on a repeater that will make it sound louder/better and yet not be objectionable?

AGC has limited use because you don't have much dynamic range to begin with. If you figure that someone that is under-modulated might only be 6 dB or so less than what you would consider "fully modulated" audio, you would only have use for up to 6 dB of AGC. But because you don't want noise to affect the AGC too drastically, you would have to low-pass the audio so that only the prime voice region drove the AGC's control voltage. This is usually referred to as a form of "side chain processing", where the control stage is steered not by the full-range audio that it is acting on, but rather an external source, in this case, low-pass filtered audio. From experience and experimentation, a low-pass filter in the 800 to 1500 Hz range is what you would want to use to drive the AGC. Here AGC would help make everyone sound about the same level without really making anyone particularly louder than the next.

Like AGC, some compression can be acceptable provided that it's done right. Compression and limiting are where you gain loudness. Very fast compression and limiting are synonymous. The amount of compression and the attack and release times have to be tailored to human speech such that brief peaks don't "punch holes" in the audio, yet still be fast enough that there is somewhat of an increase in perceived loudness to be meaningful.

Now, the hard part comes in, preemphasis. Since AGC and compression will only work well with deemphasized audio, even if you manage to get well-limited audio out of the compression/limiting, you're then forced to throw a wrench in the works by adding preemphasis. So how do you deal with limiting the peaks to 5 kHz after preemphasis - that's the real challenge.

Hard clipping will produce odd-order harmonics for a sinusoidal wave. That is, if you feed a perfect sine wave into a clipper, you'd get a squared-off waveform containing the original fundamental tone, along with odd-order harmonic products (3rd, 5th, etc.). If the audio isn't a perfect sinusoid (I don't know many people that talk in pure tones), then clipping will also produce even-order harmonics wherever the positive peaks aren't the same amplitudes as the negative peaks. In reality, even-order harmonics aren't that big of an enemy. It's odd-order harmonics that make voices sound "harsh". So taming those is the challenge.

If you figure that the high-frequency cutoff for two-way radio is 3 kHz, then you can clip anything above 1 kHz without causing all that much audible distortion the odd-order harmonics (3 kHz and above) will be filtered off. Meanwhile, everything below 3 kHz should be limited softer to prevent excessive distortion. So how can you do this? Basically you'd use a two-band limiter, with the low band having soft limiting and the high band using clipping. To think about it another way, build a crossover at 1 kHz, feed the low-pass side to a soft limiter (fast compressor), and the high-pass side to a clipper. Mix the outputs of the low-side limiter and the high-side clipper back together equally to get a reasonably well-limited end product. Finally, like in broacast, a final clipper would follow (preceding the 3 kHz low-pass filter) to catch the overshoots. There will be overshoots induced in this two-band process - the slower low-side limiter is not instanteneous so there will be brief peaks that sneak past it, and also because you're adding the high-pass and low-pass halves of the audio spectrum together that were limited independently and not necessarily phase-coherently. After this "final clipper" would be the 3 kHz low-pass "splatter filter".

Soft-limiting the meat of the voice region below 1 kHz has another benefit besides reducing harmonic distortion - less noticible intermodulation distortion. Clipping is a non-linear function, which generates intermodulation products in addition to harmonic products. Substantial IM on voice will make it sound "garbled", "watery", or "mushy" to use some familiar adjectives. Vocal nuances will tend to be masked by the IM distortion, resulting in an overall degredation in clarity and ability to be easily understood. IM in the higher frequency region (above the 1 kHz crossover in this example) may make siblance sounds ("S" sounds, "TH" sounds, etc.) sound somewhat exaggerated or "steamy" or "lispy", but with only a 3 kHz audio bandwidth, it won't be too objectionable due to the low-pass filtering. So the goal is to soft-limit the meat of the voice region, and then have a clipper to just catch whatever brief peaks sneak past the limiter. Provided the clipper is only acting on very short-duration and infrequent peaks, the distortion it introduces should not be objectionable, or even not perceptable if it's done right.

Now, to make things even worse, the 3 kHz low-pass "splatter filter" actually causes some overmodulation. What you say? How can that be? It's just a low-pass filter, how can it cause overmodulation problems?

Before I explain that, think about this. Have you ever set up an exciter (MICOR, MASTR II, whatever flavor) using a fixed audio tone (say, 1 kHz), where you adjust the deviation control pot to yield 5 kHz deviation. The limiter in most two-way radios is really just a clipper; no fancy compression or AGC or anything. So theoretically whatever tone you stick into the input should be hard-limited at 5 kHz deviation by the clipper. But if you crank up the audio generator some more, say increase it another 6 dB, the deviation will creep up somewhat, maybe to 5.5 kHz. If you really slam it hard, you might see 6 kHz deviation or more. Why? Is the clipper failing to clip? Nope. The problem is caused by the low-pass "splatter" filter. Here's why...

As I mentioned before, clipping produces odd-order harmonics. The low-pass filter's job is to scrub off those clipping harmonics to prevent the bandwidth from exceeding limits. Occupied bandwidth in FM is a function of the deviation AND the audio bandwidth. The goal is to keep both properly limited to prevent the signal from getting to wide and "splattering" onto adjacent channels. The problem comes in when the audio is excessively clipped, which puts more and more energy into the harmonics. The splatter filter attenuates those harmonics -- that's its job. In order for the signal to remain perfectly limited (clipped), all of those harmonics need to be maintained, both in amplitude as well as phase coherence, but obviously we can't do that. As the harmonics are filtered off, the fundamental will overshoot the preset clipping point. And the more harmonic content is being filtered off, the more overshoot there will be.

I went through all of that explanation of low-pass filter induced overshoots just to be thorough. But in reality, if you set your peak deviation at a reasonable level (say, 4.5 kHz), as long as you're not driving the $^!+ out of the clipper, the slight increase in peak modulation due to hot user audio won't be so severe as to cause serious overdeviation problems.

Alright, this ended up being way longer than I wanted, and I know I still didn't do it full justice. If you're still reading this, you're either very inquisitive or very bored. On my main system, I am using a processing setup similiar to what I described above as what is "resonable" for processing two-way audio. In addition, I use something that most broadcast chains don't have, and that's a downward expander (I'll save an explanation for what that is in the interest of brevity). So how does it sound overall? Well, it sounds OK in my opinion. I've never had any complaints, and I doubt that most people even realize it's being done. That's the cardinal rule in processing - don't do it so excessively that it draws attention to itself. I'm not thrilled with it; there are some drawbacks that just can't be avoided in the two-way domain. But I think it sounds better than without it, particularly in its ability to make everyone sound somewhat similiar in terms of amplitude without grossly degrading the S/N ratio. The purists set repeater audio up for "one to one" input to output; "make it sound like simplex" is some people's goal. Listen around on simplex sometime and tell me how many people sound like hell. If you can do something to improve on that, why not. After all, all broadcast stations do it. But some of the gimmicks out there aimed at the two-way repeater market such as simple frequency-insensitive AGC's (to wit: ACC RC850), or combination EQ/compressors (C3I, etc.) are not of the same caliber as the kinds of trickery done in the broadcast world. But then again, they don't cost tens of thousands of dollars either. My opinion, based on field and bench experience, is that most of them do more harm than good as far as the overall audio quality goes.

--- Jeff

Back to the top of the page
Up one level
Back to Home

This web page, this web site, the information presented in and on its pages and in these modifications and conversions is © Copyrighted 1995 and (date of last update) by Kevin Custer W3KKC and multiple originating authors. All Rights Reserved, including that of paper and web publication elsewhere.