Home |
Search |
Today's Posts |
#1
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
Hello,
I was discussing this topic in another area and was pointed in this direction. My other post can be seen he http://groups.google.com/group/rec.a...ead/thread/a8b... What kind of math are you using to mix values from one audio file with another? In the post above, one suggestion was to add the values of 2 or more audio samples, and find the average of them and use the resulting value for each value set (excuse my lack of terminology). In another forum, I was told this was not a good idea. Instead, I was strictly to add values (but this could cause clipping). The main point I understand now is mixing is an art, not just a science. As I am attempting to automate the process of combining voice over music from my server (for fast podcast polishing), I need to science up. What are your thoughts on the following brainstorms? Option 1 Sample A is the voice. Sample B is the music, pre-volume adjusted to be a bit softer. For the length of the voice content, Values from sample A are added to sample B. If clipping occurs when adding two values together, that extra amount is subtracted from the music value before getting added to the voice value. One problem I can think of using this method is that the resulting music volume will jump around when the voice picks up. Would this be noticeable or an issue? Option 2 Both sample A and B are analyzed, and the highest values are returned from each. As a result, every value in the music sample is lowered or highered based on ((highest A + highest B) - clip value); A problem I see here is if the voice clipped at some point, there would be no music through the whole thing! Option 3 Perhaps there is a way to smooth out option 1. Let's say that the voice clipped somewhere, and the audio disappears. The music would then fade back gradually to its normal level. I'm not quite sure yet on how to solve this one, but some late night mocha and loud music may resolve this. The main thing here is giving voice priority over music, or one audio sample over the other. I wonder what I would do to get both samples equal priority when mixing. Average? I appreciate your feedback on this. |
#2
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
"Ultrus" wrote in message ups.com... Hello, I was discussing this topic in another area and was pointed in this direction. My other post can be seen he http://groups.google.com/group/rec.a...ead/thread/a8b... You should use simple addition of the data streams from each source. If you sum two signals of the same amplitude in phase, you will get a 6dB increase in the output (doubling, e.g., 1 + 1 = 2). This can cause clipping if you don't make adjustments for that: You can reduce both inputs by 6 dB (e.g, 0.5 + 0.5 = 1), or you can allocate one additional bit in the summing register (each additional bit doubles the dynamic range). After you sum the bits, you can right shift the summing register to restore the result to the original bit lanes (which reduces the output by 6dB). Similarly, if you are summing multiple channels, you need to account for the build-up from all of the channels. With some software configuration, DSP chips can do all this automatically for you. Their summing registers are usually much wider than the audio data busses, and they can be programmed to shift the bits back into the desired bus lanes in real time, after summing. If you are doing this in a general-purpose computer, you might just use integer arithmetic (32 or 64 bits), and right shift the result by 1 bit. Then, cast to a short (16 bits) for output to wav format. |
#3
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
"Ultrus" writes:
[...] Ultrus, Believe it or not, this type of operation, while seemingly simple, is actually quite involved, as I think you're finding out. To go into all the "science" behind some of the options, even superficially, would require a significant amount of time and writing. If you don't care about the absolute highest audio quality, you can simply sum the two channels using a 32-bit integer, then right-shift the result one-bit and store the least-signficant word back to a 16-bit integer. Of course this is assuming 16-bit samples. This will guarantee that you'll never clip, and will probably sound just fine for your application. Sorry I can't give you any better news, but I've spent my life studying these types of things (and other electrical engineering topics), and it is not really reasonable to expect someone to attempt to try to explain them in one or two usenet posts. -- % Randy Yates % "With time with what you've learned, %% Fuquay-Varina, NC % they'll kiss the ground you walk %%% 919-577-9882 % upon." %%%% % '21st Century Man', *Time*, ELO http://home.earthlink.net/~yatescr |
#4
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
Hello Karl,
Thanks for your feedback on this. It helped clearify several items! |
#5
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
Sorry I can't give you any better news, but I've spent my life
studying these types of things (and other electrical engineering topics), and it is not really reasonable to expect someone to attempt to try to explain them in one or two usenet posts. Hello Randy, Thanks for your feedback. I agree with you in that the topic I'm looking into is much more complex than I originally anticipated. As max quality is not my concern, the simple techniques discussed here will work great. In the future, I will look into "ducking" (new term I learned this morning). It would be great to give voice priority over any music that tries to compete. |
#6
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
"Karl Uppiano" wrote ...
"Ultrus" wrote ... Hello, I was discussing this topic in another area and was pointed in this direction. My other post can be seen he http://groups.google.com/group/rec.a...ead/thread/a8b... You should use simple addition of the data streams from each source. If you sum two signals of the same amplitude in phase, you will get a 6dB increase in the output (doubling, e.g., 1 + 1 = 2). This can cause clipping if you don't make adjustments for that: You can reduce both inputs by 6 dB (e.g, 0.5 + 0.5 = 1), or you can allocate one additional bit in the summing register (each additional bit doubles the dynamic range). After you sum the bits, you can right shift the summing register to restore the result to the original bit lanes (which reduces the output by 6dB). Of course, note that shifting the binary value right by one bit is the very definition of division by 2. :-) |
#7
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
"Randy Yates" wrote in message ... "Ultrus" writes: [...] Ultrus, Believe it or not, this type of operation, while seemingly simple, is actually quite involved, as I think you're finding out. To go into all the "science" behind some of the options, even superficially, would require a significant amount of time and writing. If you don't care about the absolute highest audio quality, you can simply sum the two channels using a 32-bit integer, then right-shift the result one-bit and store the least-signficant word back to a 16-bit integer. Of course this is assuming 16-bit samples. This will guarantee that you'll never clip, and will probably sound just fine for your application. I decided not to discuss dithering in my earlier post, but right-shifting the data will delete any dither from the original data streams. Ideally, it would need to be added back in. |
#8
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
"Ultrus" wrote in message ups.com... Hello Karl, Thanks for your feedback on this. It helped clearify several items! I forgot to mention: be sure to use signed integers, and do not use floating point! Depending on the computer language you use, floating point arithmetic might be the default, and you might have to go through some gymnastics to enforce pure integer arithmetic. Although floating point gives the illusion of more dynamic range and more precision, it is actually not appropriate for most audio DSP applications. |
#9
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
Karl Uppiano wrote:
"Ultrus" wrote in message ups.com... Hello Karl, Thanks for your feedback on this. It helped clearify several items! I forgot to mention: be sure to use signed integers, and do not use floating point! Depending on the computer language you use, floating point arithmetic might be the default, and you might have to go through some gymnastics to enforce pure integer arithmetic. Although floating point gives the illusion of more dynamic range and more precision, it is actually not appropriate for most audio DSP applications. That's an interesting comment: Digital broadcast mixers that I'm familiar with used all to be fixed point but more and more are going over to floating point with newer DSP implementations. Any reason why? Anything to do with the ubiquity of Sharc processors? S. |
#10
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
"Serge Auckland" wrote in message ... Karl Uppiano wrote: "Ultrus" wrote in message ups.com... Hello Karl, Thanks for your feedback on this. It helped clearify several items! I forgot to mention: be sure to use signed integers, and do not use floating point! Depending on the computer language you use, floating point arithmetic might be the default, and you might have to go through some gymnastics to enforce pure integer arithmetic. Although floating point gives the illusion of more dynamic range and more precision, it is actually not appropriate for most audio DSP applications. That's an interesting comment: Digital broadcast mixers that I'm familiar with used all to be fixed point but more and more are going over to floating point with newer DSP implementations. Any reason why? Anything to do with the ubiquity of Sharc processors? S. Well, IEEE 754 single-precision floating point numbers are represented in 32 bits, with a 23-bit mantissa, 8-bit exponent and a sign bit, so you could conceivably use these for studio-quality DAW applications (24-bit integers, including the sign bit, are generally used for "studio quality" applications). All else being equal, if you're going to use 32 bits anyway, I guess single-precision floating point might be a more flexible representation. Having said that, I am not convinced that there is a need for the exponent in digital audio applications, especially since you never get more than 24 bits of resolution (144 dB) anyway. Perhaps the advanced math packages are more readily available for floating point. Floating point is more compute-intensive, which could present a problem for real-time processing. Processors keep getting faster though... |
#11
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
"Karl Uppiano" writes:
"Randy Yates" wrote in message ... "Ultrus" writes: [...] Ultrus, Believe it or not, this type of operation, while seemingly simple, is actually quite involved, as I think you're finding out. To go into all the "science" behind some of the options, even superficially, would require a significant amount of time and writing. If you don't care about the absolute highest audio quality, you can simply sum the two channels using a 32-bit integer, then right-shift the result one-bit and store the least-signficant word back to a 16-bit integer. Of course this is assuming 16-bit samples. This will guarantee that you'll never clip, and will probably sound just fine for your application. I decided not to discuss dithering in my earlier post, but right-shifting the data will delete any dither from the original data streams. Ideally, it would need to be added back in. Hi Karl, If you added it back in, you'd again run the risk of overflow. I think you're mis-stating the situation. You really don't need or want to "add back in the original dither." Instead, the real goal is to "requantize" the sum "nicely" back to the original bit-width. In general, the sum of two N-bit values produces an N+1-bit result. Forcing those N+1 bits back into an N-bit word requires some type of (re)quantization no matter how you approach it. This is the key issue. There are two methods of requantization: 1. Truncating (what we have both suggested to Ultrus). 2. Rounding (better - the resulting error has zero-mean). In addition, there are architectures that improve on basic requantization in various ways: 1. Dithering following by requantization. 2. Noise-shaping by placing feedback around the quantizer. There are almost endless possibilities in how the noise-shaping is performed: a. Simple zeros at DC. b. Zeros at psychoacoustically significant places. c. Higher-order filtering with psychoacoustic optimizations. 3. Noise-shaping and dithering. These all produce results of varying quality. -- % Randy Yates % "Though you ride on the wheels of tomorrow, %% Fuquay-Varina, NC % you still wander the fields of your %%% 919-577-9882 % sorrow." %%%% % '21st Century Man', *Time*, ELO http://home.earthlink.net/~yatescr |
#12
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
"Randy Yates" wrote in message ... "Karl Uppiano" writes: "Randy Yates" wrote in message ... "Ultrus" writes: [...] Ultrus, Believe it or not, this type of operation, while seemingly simple, is actually quite involved, as I think you're finding out. To go into all the "science" behind some of the options, even superficially, would require a significant amount of time and writing. If you don't care about the absolute highest audio quality, you can simply sum the two channels using a 32-bit integer, then right-shift the result one-bit and store the least-signficant word back to a 16-bit integer. Of course this is assuming 16-bit samples. This will guarantee that you'll never clip, and will probably sound just fine for your application. I decided not to discuss dithering in my earlier post, but right-shifting the data will delete any dither from the original data streams. Ideally, it would need to be added back in. Hi Karl, If you added it back in, you'd again run the risk of overflow. I think you're mis-stating the situation. You really don't need or want to "add back in the original dither." Instead, the real goal is to "requantize" the sum "nicely" back to the original bit-width. In general, the sum of two N-bit values produces an N+1-bit result. Forcing those N+1 bits back into an N-bit word requires some type of (re)quantization no matter how you approach it. This is the key issue. There are two methods of requantization: 1. Truncating (what we have both suggested to Ultrus). 2. Rounding (better - the resulting error has zero-mean). In addition, there are architectures that improve on basic requantization in various ways: 1. Dithering following by requantization. 2. Noise-shaping by placing feedback around the quantizer. There are almost endless possibilities in how the noise-shaping is performed: a. Simple zeros at DC. b. Zeros at psychoacoustically significant places. c. Higher-order filtering with psychoacoustic optimizations. 3. Noise-shaping and dithering. These all produce results of varying quality. When I say "adding back in" I really meant re-dithering by some means, although since dither typically has a triangular probability density of 1/3 LSB, as a practical matter, summing it in is unlikely to overflow any registers. The methods for generating/applying dither could be done with the noise-shaping technique, as you mention. Van Der Kooy and Lipschitz described an algorithm like that in an article from the late 1980s I think it was. It also occurred to me that if the OP were to sum two properly dithered signals without changing their original amplitude, the dither would involve the two LSBs, and no re-dithering would be necessary at all. It would remain in the new LSB after the right shift. If there was any gain reduction in either channel prior to summing, then it would be advisable to re-dither. |
#13
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
On 2007-02-10, Ultrus wrote:
What kind of math are you using to mix values from one audio file with another? In the post above, one suggestion was to add the values of 2 or more audio samples, and find the average of them and use the resulting value for each value set (excuse my lack of terminology). In another forum, I was told this was not a good idea. Instead, I was strictly to add values (but this could cause clipping). The main point I understand now is mixing is an art, not just a science. If you are mixing two signals then, as has been pointed out, the basic mathematics is just addition: output = I1 + I2 [1] Yes that will clip if the sum of the inputs is too big to be represented by the mixer's output word length. Of course, the same happens on an analogue mixer due to its finite power supply voltage rail. The typical solution is the same - scale the inputs to a lower level: output = a * I1 + b * I2 [2] Where a and b are constants in the range 0.000... to 1.000... (and I am talking radix 2 numbers here so the level below 1.000... is 0.111...). The products above are the same as having level controls on analogue inputs. Some people in this thread have suggested a = b = 0.1000 (0.5 in decimal) which will always avoid clipping for a two-signal mixer, but does not allow you to balance the two levels appropriately if required. However whenever you scale a signal in audio processing (i.e. multiply it) you will, in general, get word-length growth at the bottom (LSB) end. If you truncate or round each individual operation back to some lower level of precision you will get non-linear quantization distortion. So either you must dither each product before truncation or keep enough bits for the quantization distortion not to matter. Ideally keep all bits (note that a 16-bit word times an 8-bit word will, in general, have 16+8 = 24 bits in the product, etc.). However, even if you can avoid dithering at intermediate stages, you do need to dither the final output of the mixer if you want it to be reduced to the same word length as the inputs and be free from (non-linear) quantization products. A way to approach a particular problem like mixing two signals of 16-bit length (as an example) is to use 24-bit fixed-length signed arithmetic. Add 8 zeros to the bottom of the input words before you start to process them as 24-bit words and you can do a reasonable amount of simple rounded/truncated but undithered arithmetic, like [2] above, with little risk of overflow (if you scale the inputs suitably) and little risk of audible quantization distortion. Then dither the 24-bit result by generating a dither signal of the appropriate type and resolution, and adding it to the 24-bit processed output before finally truncating it to 16 bits. For very simple operations you may not need to use as many as 8 extra bits and for more complex processing it may be better to use more. -- John Phillips |
#14
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
"Karl Uppiano" wrote in message
news:P1nzh.6928$Yn4.5950@trnddc03 "Randy Yates" wrote in message ... "Ultrus" writes: [...] Ultrus, Believe it or not, this type of operation, while seemingly simple, is actually quite involved, as I think you're finding out. To go into all the "science" behind some of the options, even superficially, would require a significant amount of time and writing. If you don't care about the absolute highest audio quality, you can simply sum the two channels using a 32-bit integer, then right-shift the result one-bit and store the least-signficant word back to a 16-bit integer. Of course this is assuming 16-bit samples. This will guarantee that you'll never clip, and will probably sound just fine for your application. I decided not to discuss dithering in my earlier post, but right-shifting the data will delete any dither from the original data streams. .... as will just about any means for obtaining significant amounts of gain reduction. Ideally, it would need to be added back in. Agreed. |
#15
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
"Arny Krueger" wrote in message . .. "Karl Uppiano" wrote in message news:P1nzh.6928$Yn4.5950@trnddc03 "Randy Yates" wrote in message ... "Ultrus" writes: [...] Ultrus, Believe it or not, this type of operation, while seemingly simple, is actually quite involved, as I think you're finding out. To go into all the "science" behind some of the options, even superficially, would require a significant amount of time and writing. If you don't care about the absolute highest audio quality, you can simply sum the two channels using a 32-bit integer, then right-shift the result one-bit and store the least-signficant word back to a 16-bit integer. Of course this is assuming 16-bit samples. This will guarantee that you'll never clip, and will probably sound just fine for your application. I decided not to discuss dithering in my earlier post, but right-shifting the data will delete any dither from the original data streams. ... as will just about any means for obtaining significant amounts of gain reduction. Ideally, it would need to be added back in. Agreed. I actually doubt that the OP's application is critical enough to require dither, but it is an interesting discussion. I wonder why he doesn't just use off the shelf (free) software to perform this operation. Even the free stuff has lots more features than I usually use. I don't know if they bother to re-dither or not. I am using some free DAW software to transcribe my old vinyl library to digital, and the vinyl provides *way* more dither than I actually need anyway :-) |
#16
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
I actually doubt that the OP's application is critical enough to require
dither, but it is an interesting discussion. Yes. This is interesting and trying to take everything in, but I'm following the "Keep It Simple Stupid" method while scripting. I wonder why he doesn't just use off the shelf (free) software to perform this operation. Even the free stuff has lots more features than I usually use. I'm editing audio from the source using scripts that rip apart 16 bit audio files, giving me an array of values ranging from -32768 to 32767. I play with those values, messing them up in limitless ways, then covert the array back into an audio file. I've looked for good audio software that I could connect to from scripts on a server, but have not had the success I desire. Thanks to great info from input above, I should be able to mix my full volume voice with soft music in the background. To take it a step further, let's say my music volume was a little louder, and I want it to "duck" when it competes with the vocie. Would you think the following formula would work for each "word"/array value of voice sample and music sample? in English: if voice value plus music value is greater than clip value, then new value equals voice value plus music value minus clip value plus voice value otherwise if voice value plus music value is less than negative clip value, then new value equals voice value plus music value minus negative clip value plus voice value otherwise new value equals voice value plus music value how I would write it in php: if($voiceValue + $musicValue $clipValue) { $newValue = $voiceValue + $musicValue - $clipValue + $voiceValue; } else if($voiceValue + $musicValue -$clipValue) { $newValue = $voiceValue + $musicValue - -$clipValue + $voiceValue; } else { $newValue = $voiceValue + $musicValue; } Thoughts on this? I did not increase the bit rate and drop it back down while mixing. I just subtracted any music that was interfering with the voice. |
#17
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
"Arny Krueger" writes:
"Karl Uppiano" wrote in message news:P1nzh.6928$Yn4.5950@trnddc03 "Randy Yates" wrote in message ... "Ultrus" writes: [...] Ultrus, Believe it or not, this type of operation, while seemingly simple, is actually quite involved, as I think you're finding out. To go into all the "science" behind some of the options, even superficially, would require a significant amount of time and writing. If you don't care about the absolute highest audio quality, you can simply sum the two channels using a 32-bit integer, then right-shift the result one-bit and store the least-signficant word back to a 16-bit integer. Of course this is assuming 16-bit samples. This will guarantee that you'll never clip, and will probably sound just fine for your application. I decided not to discuss dithering in my earlier post, but right-shifting the data will delete any dither from the original data streams. ... as will just about any means for obtaining significant amounts of gain reduction. Ideally, it would need to be added back in. Agreed. You're both confused in several respects. Dither isn't some companion to the signal. It is something that is added to the signal prior to a quantizer in order to linearize the quantization. After the quantization, the signal is just the signal, NOT the signal plus the dither. Neither is it gain that "takes out the dither," since the dither isn't there in the first place. Rather, gain, in general, and for either positive or negative (dB) values, increases the resolution and requires a requantization step to go back to less resolution. There are exceptions of course, e.g., when the gain is a power of two. Finally, the entire operation of shifting right one bit and taking the least-significant word is one of requantization, not gain. -- % Randy Yates % "I met someone who looks alot like you, %% Fuquay-Varina, NC % she does the things you do, %%% 919-577-9882 % but she is an IBM." %%%% % 'Yours Truly, 2095', *Time*, ELO http://home.earthlink.net/~yatescr |
#18
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
"Ultrus" wrote in message ups.com... I actually doubt that the OP's application is critical enough to require dither, but it is an interesting discussion. Yes. This is interesting and trying to take everything in, but I'm following the "Keep It Simple Stupid" method while scripting. I wonder why he doesn't just use off the shelf (free) software to perform this operation. Even the free stuff has lots more features than I usually use. I'm editing audio from the source using scripts that rip apart 16 bit audio files, giving me an array of values ranging from -32768 to 32767. I play with those values, messing them up in limitless ways, then covert the array back into an audio file. I've looked for good audio software that I could connect to from scripts on a server, but have not had the success I desire. Thanks to great info from input above, I should be able to mix my full volume voice with soft music in the background. To take it a step further, let's say my music volume was a little louder, and I want it to "duck" when it competes with the vocie. Would you think the following formula would work for each "word"/array value of voice sample and music sample? in English: if voice value plus music value is greater than clip value, then new value equals voice value plus music value minus clip value plus voice value otherwise if voice value plus music value is less than negative clip value, then new value equals voice value plus music value minus negative clip value plus voice value otherwise new value equals voice value plus music value how I would write it in php: if($voiceValue + $musicValue $clipValue) { $newValue = $voiceValue + $musicValue - $clipValue + $voiceValue; } else if($voiceValue + $musicValue -$clipValue) { $newValue = $voiceValue + $musicValue - -$clipValue + $voiceValue; } else { $newValue = $voiceValue + $musicValue; } Thoughts on this? I did not increase the bit rate and drop it back down while mixing. I just subtracted any music that was interfering with the voice. Ducking the music is something that audio engineers used to do manually (with a gain control on analog mixers) or automatically, using compressors and such. In the digital realm, it is done more or less as you describe, although, you need to fade the music in and out: you can't abruptly change the gain without making clicks and other noises. This means you probably need to buffer a few seconds of audio, so you can "look ahead" to see if clipping would occur, and then fade the music by a rate that allows you to reach your target level at the right time. It's always something, isn't it? :-) |
#19
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
It's always something, isn't it? :-)
Yes, the challenges bring me joy. The fading makes sense. I'm off to think on this for a bit. |
#20
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
"Randy Yates" wrote in message ... "Arny Krueger" writes: "Karl Uppiano" wrote in message news:P1nzh.6928$Yn4.5950@trnddc03 "Randy Yates" wrote in message ... "Ultrus" writes: [...] Ultrus, Believe it or not, this type of operation, while seemingly simple, is actually quite involved, as I think you're finding out. To go into all the "science" behind some of the options, even superficially, would require a significant amount of time and writing. If you don't care about the absolute highest audio quality, you can simply sum the two channels using a 32-bit integer, then right-shift the result one-bit and store the least-signficant word back to a 16-bit integer. Of course this is assuming 16-bit samples. This will guarantee that you'll never clip, and will probably sound just fine for your application. I decided not to discuss dithering in my earlier post, but right-shifting the data will delete any dither from the original data streams. ... as will just about any means for obtaining significant amounts of gain reduction. Ideally, it would need to be added back in. Agreed. You're both confused in several respects. Dither isn't some companion to the signal. It is something that is added to the signal prior to a quantizer in order to linearize the quantization. After the quantization, the signal is just the signal, NOT the signal plus the dither. I totally get that. Neither is it gain that "takes out the dither," since the dither isn't there in the first place. Rather, gain, in general, and for either positive or negative (dB) values, increases the resolution and requires a requantization step to go back to less resolution. There are exceptions of course, e.g., when the gain is a power of two. Increasing the gain of a dithered signal will preserve the dither along with its linearizing effects (but the noise floor will be increased also, just like analog). Reducing the gain will reduce the dither along with it, and right shifting an optimally dithered signal (triangular probability density at 1/3 LSB) will eliminate it completely. If there is more dither than optimally necessary, it might survive some gain reduction. Finally, the entire operation of shifting right one bit and taking the least-significant word is one of requantization, not gain. It is re-quantization, but it is to restore the gain to fit into the 16-bit target. Every right shift reduces the gain by 6 dB. Every left shift increases the gain by 6 dB. I was suggesting adding two 16-bit samples, which could require one additional bit. I suggested that the accumulator register be, I don't know, 32 bits (a convenient word size in most computers today) and far more than enough to hold the overflow. Then right-shift one bit to restore the original dynamic range for the 16-bit target. This is a re-quantization operation, which ideally requires re-dithering, because of the truncation or rounding that will take place. % Randy Yates % "I met someone who looks alot like you, %% Fuquay-Varina, NC % she does the things you do, %%% 919-577-9882 % but she is an IBM." %%%% % 'Yours Truly, 2095', *Time*, ELO http://home.earthlink.net/~yatescr |
#21
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
On Feb 11, 4:12 pm, "Karl Uppiano" wrote:
Neither is it gain that "takes out the dither," since the dither isn't there in the first place. Rather, gain, in general, and for either positive or negative (dB) values, increases the resolution and requires a requantization step to go back to less resolution. There are exceptions of course, e.g., when the gain is a power of two. Increasing the gain of a dithered signal will preserve the dither along with its linearizing effects (but the noise floor will be increased also, just like analog). Reducing the gain will reduce the dither along with it, and right shifting an optimally dithered signal (triangular probability density at 1/3 LSB) will eliminate it completely. If there is more dither than optimally necessary, it might survive some gain reduction. And this is where you're getting lost. Reducing the gain will NEVER "eliminate it completely," because "it" is not separate or separable form the signal itself. And the dither, even though it may be only a fraction of an LSB in the original signal, does not have an effect which is limited to that small portion of the signal. It has an effect across the entire range of the signal. Consider, for example, what might happen to a high-level signal when dither is added. Say our range is that of a 16-bit signed integer, -32768 to +32767. COnsider a signal whose value is 0.67. Add a random +-1/2 LSB dither signal to that and truncate. the result is that roughly 2/3 of the time, the result will be 1, and 1/3 of the time, it will be 0. Average a large enough collection of those, and the result is ... 0.67. Now consider a signal whose original value is, oh, 29,355.67. Add a +-1/2 LSB random signal to that, and the effect will be that about 2/3 of the time, the value once truncated to an integer, will be 29,356, and 1/3 of the time it will truncate to 29,355. Look at the average, and the result is, ... 29,355.67. That's how dither works. The dither has not only affected tiny signals, it's affected the big ones. Now, truncate enough, and, yes, the EFFECTS will be less and less, simply because you've reduced the dynamic range enough. |
#22
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
"Karl Uppiano" writes:
[...] "Randy Yates" wrote in message After the quantization, the signal is just the signal, NOT the signal plus the dither. I totally get that. [...] followed by: Increasing the gain of a dithered signal will preserve the dither ... You have just created a contradiction. (but the noise floor will be increased also, just like analog). Reducing the gain will reduce the dither along with it, and right shifting an optimally dithered signal (triangular probability density at 1/3 LSB) will eliminate it completely. This is complete nonsense, as I've said, what, 3 times now that THERE IS NO DITHER IN THE SIGNAL. Please write on the board 100 times, "THERE IS NO DITHER IN THE SIGNAL. THERE IS NO DITHER IN THE SIGNAL. ..." Also, I don't know where you came up with, or what you even mean by, "triangular probability density at 1/3 LSB." A TPDF that has a probability density function ranging from -1 to +1 bit (or equivalently, with a variance of delta^2 / 4, where delta is the value of the LSB) is required to completely decorrelate the mean and variance (i.e., the first and second moments) of the quantization noise from the input signal. This is shown in Wannamaker's excellent thesis [wannamaker]. --Randy @article{wannamaker, title = "{The Theory of Dithered Quantization}", author = "{Robert~A.~Wannamaker}", journal = "Ph.D. Thesis University of Waterloo Applied Mathematics Department", year = "1997"} -- % Randy Yates % "With time with what you've learned, %% Fuquay-Varina, NC % they'll kiss the ground you walk %%% 919-577-9882 % upon." %%%% % '21st Century Man', *Time*, ELO http://home.earthlink.net/~yatescr |
#23
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
wrote in message ps.com... On Feb 11, 4:12 pm, "Karl Uppiano" wrote: Neither is it gain that "takes out the dither," since the dither isn't there in the first place. Rather, gain, in general, and for either positive or negative (dB) values, increases the resolution and requires a requantization step to go back to less resolution. There are exceptions of course, e.g., when the gain is a power of two. Increasing the gain of a dithered signal will preserve the dither along with its linearizing effects (but the noise floor will be increased also, just like analog). Reducing the gain will reduce the dither along with it, and right shifting an optimally dithered signal (triangular probability density at 1/3 LSB) will eliminate it completely. If there is more dither than optimally necessary, it might survive some gain reduction. And this is where you're getting lost. Reducing the gain will NEVER "eliminate it completely," because "it" is not separate or separable form the signal itself. And the dither, even though it may be only a fraction of an LSB in the original signal, does not have an effect which is limited to that small portion of the signal. It has an effect across the entire range of the signal. Consider, for example, what might happen to a high-level signal when dither is added. Say our range is that of a 16-bit signed integer, -32768 to +32767. COnsider a signal whose value is 0.67. Add a random +-1/2 LSB dither signal to that and truncate. the result is that roughly 2/3 of the time, the result will be 1, and 1/3 of the time, it will be 0. Average a large enough collection of those, and the result is ... 0.67. Now consider a signal whose original value is, oh, 29,355.67. Add a +-1/2 LSB random signal to that, and the effect will be that about 2/3 of the time, the value once truncated to an integer, will be 29,356, and 1/3 of the time it will truncate to 29,355. Look at the average, and the result is, ... 29,355.67. That's how dither works. The dither has not only affected tiny signals, it's affected the big ones. Now, truncate enough, and, yes, the EFFECTS will be less and less, simply because you've reduced the dynamic range enough. I understand place value. But whenever a signal is re-quantized, it needs to be re-dithered as well, because the quantization introduces a whole new set of truncation and rounding errors that are not random, and are correlated with the signal itself. The original dither does not mitigate that. If the original random noise is large enough, the new errors may be negligible, even randomized somewhat. But now you have a noise level problem, which, though less severe than the correlated error signals, is still higher than you might want. I suspect you understand all that, but perhaps I was not making myself clear. |
#24
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
"Ultrus" wrote...
I've looked for good audio software that I could connect to from scripts on a server, but have not had the success I desire. Audacity is one of the more popular freeware audio editing/processing applicaitons and it is open-source. |
#25
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
"Randy Yates" wrote in message ... "Karl Uppiano" writes: [...] "Randy Yates" wrote in message After the quantization, the signal is just the signal, NOT the signal plus the dither. I totally get that. [...] followed by: Increasing the gain of a dithered signal will preserve the dither ... You have just created a contradiction. (but the noise floor will be increased also, just like analog). Reducing the gain will reduce the dither along with it, and right shifting an optimally dithered signal (triangular probability density at 1/3 LSB) will eliminate it completely. This is complete nonsense, as I've said, what, 3 times now that THERE IS NO DITHER IN THE SIGNAL. Say it as many times as you want. There is no discrete representation for a sample that is part way between two quantization levels. The difference between the input and the encoded sample is noise. As the signal changes, the noise level changes, correlated with the input. That produces strong spectral lines (intermodulation and THD mostly) that are far more audible than random noise. By randomizing the input signal by exactly the right amount, you can spread the error energy over a wide spectrum, instead of concentrating it at a specific frequency. The quantizer still cannot represent the sample exactly, but since the signal is changing randomly, the quantization error is also random. But this randomization amounts to white noise, and it does add a bit to the noise floor, or more pointedly, it puts a noise floor where one belongs, but did not exist, in an un-dithered system. When done optimally, it is the same noise energy, but re-allocated to different frequencies. Implementations vary, but oversampling quantizers use noise shaping to add or subtract errors from adjacent samples -- a form of negative feedback, and thus de-correlate the errors. The basic physics -- and the end result -- is the same. Please write on the board 100 times, "THERE IS NO DITHER IN THE SIGNAL. THERE IS NO DITHER IN THE SIGNAL. ..." Holy crap, dude. Get a grip. What I refer to as a "properly dithered signal" differs from the datastream that would have been created without dither. Dither added, errors re-allocated -- same thing. What I refer to as "preserving dither" is the same as not introducing new errors. Also, I don't know where you came up with, or what you even mean by, "triangular probability density at 1/3 LSB." A TPDF that has a probability density function ranging from -1 to +1 bit (or equivalently, with a variance of delta^2 / 4, where delta is the value of the LSB) is required to completely decorrelate the mean and variance (i.e., the first and second moments) of the quantization noise from the input signal. This is shown in Wannamaker's excellent thesis [wannamaker]. I read a different paper from you. That's how I came up with it. Van Der Kooy & Lipschitz from the early '90s. I cannot locate it now, but it was published in AES. --Randy @article{wannamaker, title = "{The Theory of Dithered Quantization}", author = "{Robert~A.~Wannamaker}", journal = "Ph.D. Thesis University of Waterloo Applied Mathematics Department", year = "1997"} -- % Randy Yates % "With time with what you've learned, %% Fuquay-Varina, NC % they'll kiss the ground you walk %%% 919-577-9882 % upon." %%%% % '21st Century Man', *Time*, ELO http://home.earthlink.net/~yatescr |
#26
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
"Karl Uppiano" writes:
"Randy Yates" wrote in message ... "Karl Uppiano" writes: [...] "Randy Yates" wrote in message After the quantization, the signal is just the signal, NOT the signal plus the dither. I totally get that. [...] followed by: Increasing the gain of a dithered signal will preserve the dither ... You have just created a contradiction. (but the noise floor will be increased also, just like analog). Reducing the gain will reduce the dither along with it, and right shifting an optimally dithered signal (triangular probability density at 1/3 LSB) will eliminate it completely. This is complete nonsense, as I've said, what, 3 times now that THERE IS NO DITHER IN THE SIGNAL. Say it as many times as you want. There is no discrete representation for a sample that is part way between two quantization levels. True, but that doesn't make what you've got "signal + dither" - it is just "signal." You see the *effects* of the dither that was added prior to the quantizer, but the dither itself is gone bye-bye. The difference between the input and the blah blah blah. Thanks, but I've already read up on the theory, and I take my theory from engineering papers and texts, not someone on usenet. But this randomization ... does add a bit to the noise floor ... When done optimally, it is the same noise energy Well which is it? Does it add energy (more correctly, power) or doesn't it? The answer is that nRPDF dither (see again Wannamaker for this nomenclature) adds n+1 times the noise power of the basic delta^2/12 quantizer. So rectangular (uniform) dither (1RPDF) adds 3 dB, 2RPDF == TPDF adds 4.77 dB, etc. Implementations vary, but oversampling quantizers use noise shaping to add or subtract errors from adjacent samples -- a form of negative feedback, and thus de-correlate the errors. The basic physics -- and the end result -- is the same. Huh???!?? When did we go from talking about dither to noise-shaping? Noise-shaping is definitely NOT the same as dither, even in the "end result." Who said anything about oversampling the signal??? Please write on the board 100 times, "THERE IS NO DITHER IN THE SIGNAL. THERE IS NO DITHER IN THE SIGNAL. ..." Holy crap, dude. Get a grip. What I refer to as a "properly dithered signal" differs from the datastream that would have been created without dither. Dither added, errors re-allocated -- same thing. What I refer to as "preserving dither" is the same as not introducing new errors. Also, I don't know where you came up with, or what you even mean by, "triangular probability density at 1/3 LSB." A TPDF that has a probability density function ranging from -1 to +1 bit (or equivalently, with a variance of delta^2 / 4, where delta is the value of the LSB) is required to completely decorrelate the mean and variance (i.e., the first and second moments) of the quantization noise from the input signal. This is shown in Wannamaker's excellent thesis [wannamaker]. I read a different paper from you. That's how I came up with it. Van Der Kooy & Lipschitz from the early '90s. I cannot locate it now, but it was published in AES. You mean @article{resolutionbelowlsb, title = "{Resolution Below the Least Significant Bit in Digital Systems with Dither}", author = "{John~Vanderkooy, Stanley~P.~Lip****z}", journal = "Journal of the Audio Engineering Society", year = "1984", month = "February"} ??? -- % Randy Yates % "Maybe one day I'll feel her cold embrace, %% Fuquay-Varina, NC % and kiss her interface, %%% 919-577-9882 % til then, I'll leave her alone." %%%% % 'Yours Truly, 2095', *Time*, ELO http://home.earthlink.net/~yatescr |
#27
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
Audacity is one of the more popular freeware audio
editing/processing applicaitons and it is open-source. Hello Richard, I'm a fan of Audacity. I contacted the developers a while back on this, and it seems like they're not that far yet. I can't use their software through the command line or similar method. It's on my wish list however. |
#28
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
"Randy Yates" wrote in message ... "Karl Uppiano" writes: "Randy Yates" wrote in message ... "Karl Uppiano" writes: [...] "Randy Yates" wrote in message After the quantization, the signal is just the signal, NOT the signal plus the dither. I totally get that. [...] followed by: Increasing the gain of a dithered signal will preserve the dither ... You have just created a contradiction. (but the noise floor will be increased also, just like analog). Reducing the gain will reduce the dither along with it, and right shifting an optimally dithered signal (triangular probability density at 1/3 LSB) will eliminate it completely. This is complete nonsense, as I've said, what, 3 times now that THERE IS NO DITHER IN THE SIGNAL. Say it as many times as you want. There is no discrete representation for a sample that is part way between two quantization levels. True, but that doesn't make what you've got "signal + dither" - it is just "signal." You see the *effects* of the dither that was added prior to the quantizer, but the dither itself is gone bye-bye. Exactly. It was transformed. That was the "blah blah blah" part which you so blithely dismiss: The difference between the input and the blah blah blah. Thanks, but I've already read up on the theory, and I take my theory from engineering papers and texts, not someone on usenet. Well, that's a load off. I was afraid you were depending on me. But this randomization ... does add a bit to the noise floor ... When done optimally, it is the same noise energy Well which is it? Does it add energy (more correctly, power) or doesn't it? Of course it has to add a small amount of power -- the dither has to modulate the encoder even during complete silence so the signal doesn't kick up noise out of nothing. The answer is that nRPDF dither (see again Wannamaker for this nomenclature) adds n+1 times the noise power of the basic delta^2/12 quantizer. So rectangular (uniform) dither (1RPDF) adds 3 dB, 2RPDF == TPDF adds 4.77 dB, etc. I'm glad you have the documents handy. I'm working from memory. If I were going to go off and implement one of these things, I would most certainly refresh my memory before doing anything else. I haven't needed to in over 15 years, so that information is swapped out to disk or something. I did not feel compelled to spend days reading up to be all authoritative so I could respond to a question that did not require detailed dither design parameters in the first place. Implementations vary, but oversampling quantizers use noise shaping to add or subtract errors from adjacent samples -- a form of negative feedback, and thus de-correlate the errors. The basic physics -- and the end result -- is the same. Huh???!?? When did we go from talking about dither to noise-shaping? Noise-shaping is definitely NOT the same as dither, even in the "end result." Who said anything about oversampling the signal??? When did we do that? Well, I would estimate it happened at the very instant I typed it. Of course noise shaping is not the same as dither, but noise shaping definitely has an impact on how you implement dither in a particular design. Oversampling converters using noise shaping can do some clever implementations, both of which IIRC were described in the article I referenced, which *was* about dither. Please write on the board 100 times, "THERE IS NO DITHER IN THE SIGNAL. THERE IS NO DITHER IN THE SIGNAL. ..." Holy crap, dude. Get a grip. What I refer to as a "properly dithered signal" differs from the datastream that would have been created without dither. Dither added, errors re-allocated -- same thing. What I refer to as "preserving dither" is the same as not introducing new errors. Also, I don't know where you came up with, or what you even mean by, "triangular probability density at 1/3 LSB." A TPDF that has a probability density function ranging from -1 to +1 bit (or equivalently, with a variance of delta^2 / 4, where delta is the value of the LSB) is required to completely decorrelate the mean and variance (i.e., the first and second moments) of the quantization noise from the input signal. This is shown in Wannamaker's excellent thesis [wannamaker]. I read a different paper from you. That's how I came up with it. Van Der Kooy & Lipschitz from the early '90s. I cannot locate it now, but it was published in AES. You mean @article{resolutionbelowlsb, title = "{Resolution Below the Least Significant Bit in Digital Systems with Dither}", author = "{John~Vanderkooy, Stanley~P.~Lip****z}", journal = "Journal of the Audio Engineering Society", year = "1984", month = "February"} ??? That might have been it. There were a lot of articles going around at that time. It could have been a series. I thought it was from 1992 (e.g., http://www.aes.org/e-lib/browse.cfm?elib=7047 seems to ring a bell, and I seem to recall Wannamaker was credited as well). I had it in my stack of stuff, but it went missing when we moved to our new dumpster. -- % Randy Yates % "Maybe one day I'll feel her cold embrace, %% Fuquay-Varina, NC % and kiss her interface, %%% 919-577-9882 % til then, I'll leave her alone." %%%% % 'Yours Truly, 2095', *Time*, ELO http://home.earthlink.net/~yatescr |
#29
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
Audacity is one of the more popular freeware audio
editing/processing applicaitons and it is open-source. Hello Richard, I'm a big fan of Audacity. I spoke with the developers about this topic. While it is in the works long term, there is currently not a way I can access the software from the command line or related method. It is on my wish list however. |
#30
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
On Feb 11, 10:55 pm, "Ultrus" wrote:
Audacity is one of the more popular freeware audio editing/processing applicaitons and it is open-source. Hello Richard, I'm a big fan of Audacity. I spoke with the developers about this topic. While it is in the works long term, there is currently not a way I can access the software from the command line or related method. It is on my wish list however. ahhh oops. This went into a second page. Sorry for the repeated post. |
#31
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
|
#33
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
John Phillips writes:
[...] I think KU contends that if you have a signal, add dither at a level appropriate to a specific quantizer (call it quantizer X) and you scale the signal+dither to a lower level before quantizing with quantizer X then this fails to eliminate quantization distortion. No, I don't think that is what Mr. Uppiano is contending. Instead, he is contending that if you scale (gain) the output of quantizer X using a gain of 1/2 (or less), you lose the "dither." What I have been contending is that THERE IS NO DITHER IN THE OUTPUT OF THE QUANTIZER, thus if you scale it, you don't "lose" it since you don't lose what you never had. Think of it this way. Let's say you properly dither the input signal x[n] to quantizer A and the resulting output is y[n]. Then you want to gain y[n] down by a factor of 0.48147583 = 15777 / 32768 (just for example). The proper way to do this would be to first multiply y[n] by 15777. You then have a 32-bit number (assuming 16-bit signals). You then must requantize this value back to a 16-bit number, so again you dither (at bit position 16) and requantize (by taking the most-significant word). The result is that the original signal (y[n]) IS STILL PRESENT. This is because, by dithering the result of the gain, the signal components are preserved (albeit are noisier) ala "resolution below the least significant digit with dither." I do agree with dpierce that such a scaling does not completely eliminate the effects on the most significant bits of adding the dither. However my prejudice is that by doing this scaling before quantizing the amplitude of the dither relative to the dither amplitude requirement for quantizer X becomes insufficient to completely de-correlate the signal from the quantization error. So in this sense, although the dither cannot disappear completely through scaling to a lower level it does, I think, become ineffective. I have written software to successfully dither signals before re-quantization and I have verified that: (a) with TPDF dither at 2 LSB p-p WRT the quantizer this works to completely eliminate quantization effects; and (b) with dither at a lower level the quantization distortion is NOT completely eliminated. However I have not actually tried to add the right level of dither, scale the signal to a lower level and then quantize. I will do so soon (tonight if I have the time) and see in practice what happens - it's only a few lines of code (!). I can tell you right now that it won't work properly that way. The dither level must be matched to the quantizer step. -- % Randy Yates % "My Shangri-la has gone away, fading like %% Fuquay-Varina, NC % the Beatles on 'Hey Jude'" %%% 919-577-9882 % %%%% % 'Shangri-La', *A New World Record*, ELO http://home.earthlink.net/~yatescr |
#34
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
On 2007-02-12, Randy Yates wrote:
John Phillips writes: [...] I think KU contends that if you have a signal, add dither at a level appropriate to a specific quantizer (call it quantizer X) and you scale the signal+dither to a lower level before quantizing with quantizer X then this fails to eliminate quantization distortion. No, I don't think that is what Mr. Uppiano is contending. Instead, he is contending that if you scale (gain) the output of quantizer X using a gain of 1/2 (or less), you lose the "dither." Ah, if so then I agree that after the quantizer the dither is no longer relevant/functional. All you have is a noisier quantized signal. ... However I have not actually tried to add the right level of dither, scale the signal to a lower level and then quantize. I will do so soon (tonight if I have the time) and see in practice what happens - it's only a few lines of code (!). I can tell you right now that it won't work properly that way. The dither level must be matched to the quantizer step. That's what I believe. -- John Phillips |
#35
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
On 2007-02-12, Don Pearce wrote:
On 12 Feb 2007 10:29:26 GMT, John Phillips wrote: On 2007-02-11, wrote: On Feb 11, 4:12 pm, "Karl Uppiano" wrote: Neither is it gain that "takes out the dither," since the dither isn't there in the first place. Rather, gain, in general, and for either positive or negative (dB) values, increases the resolution and requires a requantization step to go back to less resolution. There are exceptions of course, e.g., when the gain is a power of two. Increasing the gain of a dithered signal will preserve the dither along with its linearizing effects (but the noise floor will be increased also, just like analog). Reducing the gain will reduce the dither along with it, and right shifting an optimally dithered signal (triangular probability density at 1/3 LSB) will eliminate it completely. If there is more dither than optimally necessary, it might survive some gain reduction. And this is where you're getting lost. Reducing the gain will NEVER "eliminate it completely," because "it" is not separate or separable form the signal itself. And the dither, even though it may be only a fraction of an LSB in the original signal, does not have an effect which is limited to that small portion of the signal. It has an effect across the entire range of the signal. An interesting discussion. I am trying to follow the precise point of contention. I think KU contends that if you have a signal, add dither at a level appropriate to a specific quantizer (call it quantizer X) and you scale the signal+dither to a lower level before quantizing with quantizer X then this fails to eliminate quantization distortion. I do agree with dpierce that such a scaling does not completely eliminate the effects on the most significant bits of adding the dither. However my prejudice is that by doing this scaling before quantizing the amplitude of the dither relative to the dither amplitude requirement for quantizer X becomes insufficient to completely de-correlate the signal from the quantization error. So in this sense, although the dither cannot disappear completely through scaling to a lower level it does, I think, become ineffective. I have written software to successfully dither signals before re-quantization and I have verified that: (a) with TPDF dither at 2 LSB p-p WRT the quantizer this works to completely eliminate quantization effects; and (b) with dither at a lower level the quantization distortion is NOT completely eliminated. However I have not actually tried to add the right level of dither, scale the signal to a lower level and then quantize. I will do so soon (tonight if I have the time) and see in practice what happens - it's only a few lines of code (!). When you say you scale the signal to a lower level, do you mean you reduce the amplitude of the entire signal, including the added dither? If so, the dither no longer functions, because it must cover +/- 1 lsb to de-quantize that step. That is why the dither is always an integral part of the AtoD process - so it doesn't get affected by changes to the signal level. Yes - that's how I understand it, from theory and practice. Of course real-world signals will almost always carry their own noise at far higher level than any added dither signal, so quantization products are, in practice, rarely a problem. Any noise will de-correlate a quantized signal - it doesn't need to be specially added. Often true, but after the first quantization (the A/D in the case of analogue signals) I think you still need to attend to the dithering if you need to suppress quantization products. (You don't always need to, of course.) -- John Phillips |
#36
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
On 12 Feb 2007 12:53:18 GMT, John Phillips
wrote: On 2007-02-12, Don Pearce wrote: On 12 Feb 2007 10:29:26 GMT, John Phillips wrote: On 2007-02-11, wrote: On Feb 11, 4:12 pm, "Karl Uppiano" wrote: Neither is it gain that "takes out the dither," since the dither isn't there in the first place. Rather, gain, in general, and for either positive or negative (dB) values, increases the resolution and requires a requantization step to go back to less resolution. There are exceptions of course, e.g., when the gain is a power of two. Increasing the gain of a dithered signal will preserve the dither along with its linearizing effects (but the noise floor will be increased also, just like analog). Reducing the gain will reduce the dither along with it, and right shifting an optimally dithered signal (triangular probability density at 1/3 LSB) will eliminate it completely. If there is more dither than optimally necessary, it might survive some gain reduction. And this is where you're getting lost. Reducing the gain will NEVER "eliminate it completely," because "it" is not separate or separable form the signal itself. And the dither, even though it may be only a fraction of an LSB in the original signal, does not have an effect which is limited to that small portion of the signal. It has an effect across the entire range of the signal. An interesting discussion. I am trying to follow the precise point of contention. I think KU contends that if you have a signal, add dither at a level appropriate to a specific quantizer (call it quantizer X) and you scale the signal+dither to a lower level before quantizing with quantizer X then this fails to eliminate quantization distortion. I do agree with dpierce that such a scaling does not completely eliminate the effects on the most significant bits of adding the dither. However my prejudice is that by doing this scaling before quantizing the amplitude of the dither relative to the dither amplitude requirement for quantizer X becomes insufficient to completely de-correlate the signal from the quantization error. So in this sense, although the dither cannot disappear completely through scaling to a lower level it does, I think, become ineffective. I have written software to successfully dither signals before re-quantization and I have verified that: (a) with TPDF dither at 2 LSB p-p WRT the quantizer this works to completely eliminate quantization effects; and (b) with dither at a lower level the quantization distortion is NOT completely eliminated. However I have not actually tried to add the right level of dither, scale the signal to a lower level and then quantize. I will do so soon (tonight if I have the time) and see in practice what happens - it's only a few lines of code (!). When you say you scale the signal to a lower level, do you mean you reduce the amplitude of the entire signal, including the added dither? If so, the dither no longer functions, because it must cover +/- 1 lsb to de-quantize that step. That is why the dither is always an integral part of the AtoD process - so it doesn't get affected by changes to the signal level. Yes - that's how I understand it, from theory and practice. Of course real-world signals will almost always carry their own noise at far higher level than any added dither signal, so quantization products are, in practice, rarely a problem. Any noise will de-correlate a quantized signal - it doesn't need to be specially added. Often true, but after the first quantization (the A/D in the case of analogue signals) I think you still need to attend to the dithering if you need to suppress quantization products. (You don't always need to, of course.) I would hope that after the first quantization everything is done in high precision floating point so that no further dithering is needed until the final quantization to generate the CD file. Dithering is necessary whenever you change the amplitude of a quantized signal, so doing the maths on non-quantized FP numbers is far better. d -- Pearce Consulting http://www.pearce.uk.com |
#37
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
"John Phillips" wrote in message ... On 2007-02-12, Don Pearce wrote: Of course real-world signals will almost always carry their own noise at far higher level than any added dither signal, so quantization products are, in practice, rarely a problem. I strongly agree that the real-world analog signals we record will almost always carry their own noise, at far higher level than an optimal dither signal would have, at the point where that signal is initially digitized. Modest alterations of real-world signals will very often produce quantization errors that are small enough to be effectively decorrelated by the noise that is embedded in the signal. Any noise will de-correlate a quantized signal - it doesn't need to be specially added. Any is a very strong word. It is true that any noise signal that contains a TPDF component that is sufficient to decorrelate any quantization errors that are generated by that process will result in effective decorrelation of all of the quantization errors that the process generates. However, this is not a perfectly general solution for all kinds of noise and any amount of noise. It is not even a solution that is general for all practical cases. Often true, but after the first quantization (the A/D in the case of analogue signals) I think you still need to attend to the dithering if you need to suppress quantization products. (You don't always need to, of course.) Agreed. |
#38
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
On 2007-02-12, Don Pearce wrote:
On 12 Feb 2007 12:53:18 GMT, John Phillips wrote: ... after the first quantization (the A/D in the case of analogue signals) I think you still need to attend to the dithering if you need to suppress quantization products. (You don't always need to, of course.) I would hope that after the first quantization everything is done in high precision floating point so that no further dithering is needed until the final quantization to generate the CD file. In the code I use to generate test CDs for various purposes I use double precision (64-bit) floating point so that I can do any maths required and not worry about anything except the final quantization into 16-bit representation. Dithering is necessary whenever you change the amplitude of a quantized signal, so doing the maths on non-quantized FP numbers is far better. Indeed, it's just so much simpler. -- John Phillips |
#39
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
"Randy Yates" wrote in message ... John Phillips writes: [...] I think KU contends that if you have a signal, add dither at a level appropriate to a specific quantizer (call it quantizer X) and you scale the signal+dither to a lower level before quantizing with quantizer X then this fails to eliminate quantization distortion. No, I don't think that is what Mr. Uppiano is contending. Instead, he is contending that if you scale (gain) the output of quantizer X using a gain of 1/2 (or less), you lose the "dither." No, that is not what I am saying. I would like to set the record straight. What I have been contending is that THERE IS NO DITHER IN THE OUTPUT OF THE QUANTIZER, thus if you scale it, you don't "lose" it since you don't lose what you never had. I think you already understand this, I just want to restate what I have been trying to say all along: Viewed in the time domain, dither works by continuously and randomly modulating the quantizer at the LSB level such that the average quantizer output over time is proportional to the original analog value, even if the average input is between quantization levels. Viewed in the frequency domain, because of the non-linearities involved with discrete quantization, the signal and the dither will combine in complex ways that depend on the proximity to a quantization level. So the spectrum of the quantizer output will not be the same as the dithered analog version, but since the dither is random, the new signal is randomized too. Just different. I think your statement that THERE IS NO DITHER IN THE OUTPUT OF THE QUANTIZER is misleading, because the quantized output of a dithered input has a noise floor where none existed before -- instead of absolute silence except when a signal kicks up correlated noise, and it is infinitesimally linear, but has random uncertainty, instead of stair-steps. Turning off dither changes the output in measurable ways, or there would be no point in using it. I'm sorry if my wording seemed to imply that truncating the LSB would remove the dither. You can still see evidence of dither having been applied, but without the critical information encoded in the LSB, it does little or no good anymore. Think of it this way. Let's say you properly dither the input signal x[n] to quantizer A and the resulting output is y[n]. Then you want to gain y[n] down by a factor of 0.48147583 = 15777 / 32768 (just for example). The proper way to do this would be to first multiply y[n] by 15777. You then have a 32-bit number (assuming 16-bit signals). You then must requantize this value back to a 16-bit number, so again you dither (at bit position 16) and requantize (by taking the most-significant word). The result is that the original signal (y[n]) IS STILL PRESENT. This is because, by dithering the result of the gain, the signal components are preserved (albeit are noisier) ala "resolution below the least significant digit with dither." Yes, because you never destroyed the effects of the original dither in the first place, and then you re-dithered prior to re-quantizing. No problem here. I do agree with dpierce that such a scaling does not completely eliminate the effects on the most significant bits of adding the dither. However my prejudice is that by doing this scaling before quantizing the amplitude of the dither relative to the dither amplitude requirement for quantizer X becomes insufficient to completely de-correlate the signal from the quantization error. So in this sense, although the dither cannot disappear completely through scaling to a lower level it does, I think, become ineffective. I have written software to successfully dither signals before re-quantization and I have verified that: (a) with TPDF dither at 2 LSB p-p WRT the quantizer this works to completely eliminate quantization effects; and (b) with dither at a lower level the quantization distortion is NOT completely eliminated. However I have not actually tried to add the right level of dither, scale the signal to a lower level and then quantize. I will do so soon (tonight if I have the time) and see in practice what happens - it's only a few lines of code (!). I can tell you right now that it won't work properly that way. The dither level must be matched to the quantizer step. -- % Randy Yates % "My Shangri-la has gone away, fading like %% Fuquay-Varina, NC % the Beatles on 'Hey Jude'" %%% 919-577-9882 % %%%% % 'Shangri-La', *A New World Record*, ELO http://home.earthlink.net/~yatescr |
#40
Posted to rec.audio.tech
|
|||
|
|||
mathmatics behind mixing voice over music
"Karl Uppiano" writes:
You can still see evidence of dither having been applied, but without the critical information encoded in the LSB, it does little or no good anymore. I think I see what you're trying to say, but you're wrong. The fallacy in your thinking can be exposed by examining a real scenario using actual implementation details, e.g., assuming two's complement arithmetic. Consider the following thought experiment: We digitize a sine wave with peak-to-peak amplitude of 1/8 LSB and with a DC offset of -1/4 LSB using dither into a 16-bit signed, two's complement digital signal. Let's also assume our analog sine wave is noiseless, just for heuristic purposes. So if we didn't have dither, the digital signal would just be zero. With dither, we get a very noisy sine wave. Alternately, assuming the dither is less than or equal to 1 LSB peak-to-peak, the digital signal resulting from the dither will be bouncing between 0 and -1, which in 16-bit signed two's complement is 0000h and FFFFh (hexadecimal). So if we hack off the LSB, guess what? We've still got a whole lotta variation in the signal. We didn't "lose" the effect of the dither completely - we still have 15 bits banging back and forth. The bottom line is that you can't view the dither as just wiggling the LSB. It's simply not the case. To return to our topic, the proper way to view the situation is that the signal is, well, the signal. It is a (e.g.) 16-bit word in which each bit is significant. And when we multiply that 16-bit word by another 16-bit word, for example, when performing a gaining operation, the result MUST be 32 bits in length in order to simultaneously maintain precision and avoid overflow. So then when we convert this 32-bit result back to 16 bits, we necessarily REQUANTIZE it. And again, just as in any quantization step, we must use some form of linearization (e.g., dither, dither with noise-shaping, etc.) if we want to maintain "resolution below the least-significant bit." -- % Randy Yates % "How's life on earth? %% Fuquay-Varina, NC % ... What is it worth?" %%% 919-577-9882 % 'Mission (A World Record)', %%%% % *A New World Record*, ELO http://home.earthlink.net/~yatescr |
Reply |
|
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
The Role of the New Producer/Engineer? | Pro Audio | |||
Some Recording Techniques | Pro Audio | |||
Some Mixing Techniques | Pro Audio | |||
Fwd: Research Says Music Really Does Have Charms to Soothe the Savage Breast ... and So On.... | Audio Opinions | |||
Voluntary Collective Licensing of Music File Sharing | Pro Audio |