Could you elaborate on this a bit more?
The problem I see is that you don't want to increase the gain so much that the peaks are outside the (-1.0, +1.0) normalized range, otherwise you'll get clipping. How would you propose to calculate an appropriate gain?Well, to avoid clipping totally, we would have to actually compress the song, as Tony described. This would, however, be much more complicated than just increasing a 'gain' field, so it isn't feasable.
But clipping a few samples doesn't affect sound quality nearly as much as many think. Just to make sure I know what I say, I loaded an already normalized song (Uriah Heep - Lady In Black) into CoolEdit and amplified it by 10dB (factor 3). Clipping, of course, occured all over the place - nearly every beat was clipped. But listening to it, it still sounded good. Of course, I could hear a difference, but this is rather an extreme example.
Thinking about it, I now see that we actually have
two questions here:
1. What's the right 'metric' to measure the 'loudness' of a song?The whole idea of normalization is that you don't have to fumble with the volume control all the time. That means that all the songs should have the same subjective 'loudness'. When we want to write a program that lets the user decide how much he wants to amplify a certain song, then we need to tell him how loud it actually is. If this has to be just
one value, then it really should be averaged over the whole song, not just a few samples. The average energy in the sound signal (expressed in dB) is not a bad metric for how loud a human ear senses music. CoolEdit uses RMS, expressed in dBFS (Sine or Square Wave) in its Statistics panel.
2. How can we automatically find the optimum gain?I agree that this is not trivial. But just using the peak value could even achieve an adverse effect to what we intend. Imagine two songs, both with the same subjective loudness. The first is highly compressed (little dynamic), whereas the second uses all available dynamic. All the samples in the first will be very close to its average value, none of them reaching 100%. Peak normalizing that song would make it subjectively much louder. On the other hand, the uncompressed song will most likely already contain some 100% samples, so that normalizing would not change its loudness at all. Net effect: normalizing actually introduces a difference in loudness instead of eliminating it. (In a nutshell: peak normalization will generally make compressed songs appear louder than uncompressed.)
So, peak detection is not really a good criterion. We want the subjective loudness to be equal, so calculations could be based on what I described under 1. However, I agree that this alone is not good enough, as it could lead to some clipping distortion. What I suggest is that we let the user enter a desired loudness, then check to see how much clipping would occur. If this is over a certain limit, then we can either warn the user, or decrease the target loudness until we fall under that limit.
The definition of this limit needs some more experimentation and discussion, I think. Some ideas:
- not more than x samples - which is ~x/44000 seconds - in a row are clipped
- samples are clipped by no more than y%, i.e. no sample is over (100+y)% fullscale
- clipped energy (max(sample-fullscale, 0), summed up over the whole waveform) is below z.
Oh, and btw:
CoolEdit 2000 can already do peak normalization of MP3 files. :-)
Daniel