Reply
 
Thread Tools Display Modes
  #1   Report Post  
Posted to rec.audio.tech
Ultrus Ultrus is offline
external usenet poster
 
Posts: 14
Default mathmatics behind mixing voice over music

Hello,
I was discussing this topic in another area and was pointed in this
direction. My other post can be seen he
http://groups.google.com/group/rec.a...ead/thread/a8b...

What kind of math are you using to mix values from one audio file with
another? In the post above, one suggestion was to add the values of 2
or more audio samples, and find the average of them and use the
resulting value for each value set (excuse my lack of terminology). In
another forum, I was told this was not a good idea. Instead, I was
strictly to add values (but this could cause clipping). The main point
I understand now is mixing is an art, not just a science.

As I am attempting to automate the process of combining voice over
music from my server (for fast podcast polishing), I need to science
up. What are your thoughts on the following brainstorms?

Option 1
Sample A is the voice. Sample B is the music, pre-volume adjusted to
be a bit softer. For the length of the voice content, Values from
sample A are added to sample B. If clipping occurs when adding two
values together, that extra amount is subtracted from the music value
before getting added to the voice value.

One problem I can think of using this method is that the resulting
music volume will jump around when the voice picks up. Would this be
noticeable or an issue?

Option 2
Both sample A and B are analyzed, and the highest values are returned
from each. As a result, every value in the music sample is lowered or
highered based on ((highest A + highest B) - clip value);

A problem I see here is if the voice clipped at some point, there
would be no music through the whole thing!

Option 3
Perhaps there is a way to smooth out option 1. Let's say that the
voice clipped somewhere, and the audio disappears. The music would
then fade back gradually to its normal level. I'm not quite sure yet
on how to solve this one, but some late night mocha and loud music may
resolve this.

The main thing here is giving voice priority over music, or one audio
sample over the other. I wonder what I would do to get both samples
equal priority when mixing. Average?

I appreciate your feedback on this.

  #2   Report Post  
Posted to rec.audio.tech
Karl Uppiano Karl Uppiano is offline
external usenet poster
 
Posts: 232
Default mathmatics behind mixing voice over music


"Ultrus" wrote in message
ups.com...
Hello,
I was discussing this topic in another area and was pointed in this
direction. My other post can be seen he
http://groups.google.com/group/rec.a...ead/thread/a8b...


You should use simple addition of the data streams from each source. If you
sum two signals of the same amplitude in phase, you will get a 6dB increase
in the output (doubling, e.g., 1 + 1 = 2).

This can cause clipping if you don't make adjustments for that: You can
reduce both inputs by 6 dB (e.g, 0.5 + 0.5 = 1), or you can allocate one
additional bit in the summing register (each additional bit doubles the
dynamic range). After you sum the bits, you can right shift the summing
register to restore the result to the original bit lanes (which reduces the
output by 6dB).

Similarly, if you are summing multiple channels, you need to account for the
build-up from all of the channels. With some software configuration, DSP
chips can do all this automatically for you. Their summing registers are
usually much wider than the audio data busses, and they can be programmed to
shift the bits back into the desired bus lanes in real time, after summing.
If you are doing this in a general-purpose computer, you might just use
integer arithmetic (32 or 64 bits), and right shift the result by 1 bit.
Then, cast to a short (16 bits) for output to wav format.


  #3   Report Post  
Posted to rec.audio.tech
Randy Yates Randy Yates is offline
external usenet poster
 
Posts: 839
Default mathmatics behind mixing voice over music

"Ultrus" writes:
[...]


Ultrus,

Believe it or not, this type of operation, while seemingly simple, is
actually quite involved, as I think you're finding out. To go into all
the "science" behind some of the options, even superficially, would
require a significant amount of time and writing.

If you don't care about the absolute highest audio quality, you can
simply sum the two channels using a 32-bit integer, then right-shift
the result one-bit and store the least-signficant word back to a
16-bit integer. Of course this is assuming 16-bit samples. This
will guarantee that you'll never clip, and will probably sound
just fine for your application.

Sorry I can't give you any better news, but I've spent my life
studying these types of things (and other electrical engineering
topics), and it is not really reasonable to expect someone to
attempt to try to explain them in one or two usenet posts.
--
% Randy Yates % "With time with what you've learned,
%% Fuquay-Varina, NC % they'll kiss the ground you walk
%%% 919-577-9882 % upon."
%%%% % '21st Century Man', *Time*, ELO
http://home.earthlink.net/~yatescr
  #4   Report Post  
Posted to rec.audio.tech
Ultrus Ultrus is offline
external usenet poster
 
Posts: 14
Default mathmatics behind mixing voice over music

Hello Karl,
Thanks for your feedback on this. It helped clearify several items!

  #5   Report Post  
Posted to rec.audio.tech
Ultrus Ultrus is offline
external usenet poster
 
Posts: 14
Default mathmatics behind mixing voice over music

Sorry I can't give you any better news, but I've spent my life
studying these types of things (and other electrical engineering
topics), and it is not really reasonable to expect someone to
attempt to try to explain them in one or two usenet posts.


Hello Randy,
Thanks for your feedback. I agree with you in that the topic I'm
looking into is much more complex than I originally anticipated. As
max quality is not my concern, the simple techniques discussed here
will work great. In the future, I will look into "ducking" (new term I
learned this morning). It would be great to give voice priority over
any music that tries to compete.




  #6   Report Post  
Posted to rec.audio.tech
Richard Crowley Richard Crowley is offline
external usenet poster
 
Posts: 806
Default mathmatics behind mixing voice over music

"Karl Uppiano" wrote ...
"Ultrus" wrote ...
Hello,
I was discussing this topic in another area and was pointed in this
direction. My other post can be seen he
http://groups.google.com/group/rec.a...ead/thread/a8b...


You should use simple addition of the data streams from each source.
If you sum two signals of the same amplitude in phase, you will get a
6dB increase in the output (doubling, e.g., 1 + 1 = 2).

This can cause clipping if you don't make adjustments for that: You
can reduce both inputs by 6 dB (e.g, 0.5 + 0.5 = 1), or you can
allocate one additional bit in the summing register (each additional
bit doubles the dynamic range). After you sum the bits, you can right
shift the summing register to restore the result to the original bit
lanes (which reduces the output by 6dB).


Of course, note that shifting the binary value right by one
bit is the very definition of division by 2. :-)

  #7   Report Post  
Posted to rec.audio.tech
Karl Uppiano Karl Uppiano is offline
external usenet poster
 
Posts: 232
Default mathmatics behind mixing voice over music


"Randy Yates" wrote in message
...
"Ultrus" writes:
[...]


Ultrus,

Believe it or not, this type of operation, while seemingly simple, is
actually quite involved, as I think you're finding out. To go into all
the "science" behind some of the options, even superficially, would
require a significant amount of time and writing.

If you don't care about the absolute highest audio quality, you can
simply sum the two channels using a 32-bit integer, then right-shift
the result one-bit and store the least-signficant word back to a
16-bit integer. Of course this is assuming 16-bit samples. This
will guarantee that you'll never clip, and will probably sound
just fine for your application.


I decided not to discuss dithering in my earlier post, but right-shifting
the data will delete any dither from the original data streams. Ideally, it
would need to be added back in.


  #8   Report Post  
Posted to rec.audio.tech
Karl Uppiano Karl Uppiano is offline
external usenet poster
 
Posts: 232
Default mathmatics behind mixing voice over music


"Ultrus" wrote in message
ups.com...
Hello Karl,
Thanks for your feedback on this. It helped clearify several items!


I forgot to mention: be sure to use signed integers, and do not use floating
point! Depending on the computer language you use, floating point arithmetic
might be the default, and you might have to go through some gymnastics to
enforce pure integer arithmetic.

Although floating point gives the illusion of more dynamic range and more
precision, it is actually not appropriate for most audio DSP applications.


  #9   Report Post  
Posted to rec.audio.tech
Serge Auckland Serge Auckland is offline
external usenet poster
 
Posts: 68
Default mathmatics behind mixing voice over music

Karl Uppiano wrote:
"Ultrus" wrote in message
ups.com...
Hello Karl,
Thanks for your feedback on this. It helped clearify several items!


I forgot to mention: be sure to use signed integers, and do not use floating
point! Depending on the computer language you use, floating point arithmetic
might be the default, and you might have to go through some gymnastics to
enforce pure integer arithmetic.

Although floating point gives the illusion of more dynamic range and more
precision, it is actually not appropriate for most audio DSP applications.



That's an interesting comment: Digital broadcast mixers that I'm
familiar with used all to be fixed point but more and more are going
over to floating point with newer DSP implementations. Any reason why?
Anything to do with the ubiquity of Sharc processors?

S.
  #10   Report Post  
Posted to rec.audio.tech
Karl Uppiano Karl Uppiano is offline
external usenet poster
 
Posts: 232
Default mathmatics behind mixing voice over music


"Serge Auckland" wrote in message
...
Karl Uppiano wrote:
"Ultrus" wrote in message
ups.com...
Hello Karl,
Thanks for your feedback on this. It helped clearify several items!


I forgot to mention: be sure to use signed integers, and do not use
floating point! Depending on the computer language you use, floating
point arithmetic might be the default, and you might have to go through
some gymnastics to enforce pure integer arithmetic.

Although floating point gives the illusion of more dynamic range and more
precision, it is actually not appropriate for most audio DSP
applications.


That's an interesting comment: Digital broadcast mixers that I'm familiar
with used all to be fixed point but more and more are going over to
floating point with newer DSP implementations. Any reason why? Anything to
do with the ubiquity of Sharc processors?

S.


Well, IEEE 754 single-precision floating point numbers are represented in 32
bits, with a 23-bit mantissa, 8-bit exponent and a sign bit, so you could
conceivably use these for studio-quality DAW applications (24-bit integers,
including the sign bit, are generally used for "studio quality"
applications). All else being equal, if you're going to use 32 bits anyway,
I guess single-precision floating point might be a more flexible
representation.

Having said that, I am not convinced that there is a need for the exponent
in digital audio applications, especially since you never get more than 24
bits of resolution (144 dB) anyway. Perhaps the advanced math packages are
more readily available for floating point. Floating point is more
compute-intensive, which could present a problem for real-time processing.
Processors keep getting faster though...




  #11   Report Post  
Posted to rec.audio.tech
Randy Yates Randy Yates is offline
external usenet poster
 
Posts: 839
Default mathmatics behind mixing voice over music

"Karl Uppiano" writes:

"Randy Yates" wrote in message
...
"Ultrus" writes:
[...]


Ultrus,

Believe it or not, this type of operation, while seemingly simple, is
actually quite involved, as I think you're finding out. To go into all
the "science" behind some of the options, even superficially, would
require a significant amount of time and writing.

If you don't care about the absolute highest audio quality, you can
simply sum the two channels using a 32-bit integer, then right-shift
the result one-bit and store the least-signficant word back to a
16-bit integer. Of course this is assuming 16-bit samples. This
will guarantee that you'll never clip, and will probably sound
just fine for your application.


I decided not to discuss dithering in my earlier post, but right-shifting
the data will delete any dither from the original data streams. Ideally, it
would need to be added back in.


Hi Karl,

If you added it back in, you'd again run the risk of overflow. I think
you're mis-stating the situation. You really don't need or want to
"add back in the original dither." Instead, the real goal is to
"requantize" the sum "nicely" back to the original bit-width.

In general, the sum of two N-bit values produces an N+1-bit result.
Forcing those N+1 bits back into an N-bit word requires some type
of (re)quantization no matter how you approach it. This is the key
issue.

There are two methods of requantization:

1. Truncating (what we have both suggested to Ultrus).
2. Rounding (better - the resulting error has zero-mean).

In addition, there are architectures that improve on basic
requantization in various ways:

1. Dithering following by requantization.
2. Noise-shaping by placing feedback around the quantizer.

There are almost endless possibilities in how the
noise-shaping is performed:

a. Simple zeros at DC.
b. Zeros at psychoacoustically significant places.
c. Higher-order filtering with psychoacoustic optimizations.

3. Noise-shaping and dithering.

These all produce results of varying quality.
--
% Randy Yates % "Though you ride on the wheels of tomorrow,
%% Fuquay-Varina, NC % you still wander the fields of your
%%% 919-577-9882 % sorrow."
%%%% % '21st Century Man', *Time*, ELO
http://home.earthlink.net/~yatescr
  #12   Report Post  
Posted to rec.audio.tech
Karl Uppiano Karl Uppiano is offline
external usenet poster
 
Posts: 232
Default mathmatics behind mixing voice over music


"Randy Yates" wrote in message
...
"Karl Uppiano" writes:

"Randy Yates" wrote in message
...
"Ultrus" writes:
[...]

Ultrus,

Believe it or not, this type of operation, while seemingly simple, is
actually quite involved, as I think you're finding out. To go into all
the "science" behind some of the options, even superficially, would
require a significant amount of time and writing.

If you don't care about the absolute highest audio quality, you can
simply sum the two channels using a 32-bit integer, then right-shift
the result one-bit and store the least-signficant word back to a
16-bit integer. Of course this is assuming 16-bit samples. This
will guarantee that you'll never clip, and will probably sound
just fine for your application.


I decided not to discuss dithering in my earlier post, but right-shifting
the data will delete any dither from the original data streams. Ideally,
it
would need to be added back in.


Hi Karl,

If you added it back in, you'd again run the risk of overflow. I think
you're mis-stating the situation. You really don't need or want to
"add back in the original dither." Instead, the real goal is to
"requantize" the sum "nicely" back to the original bit-width.

In general, the sum of two N-bit values produces an N+1-bit result.
Forcing those N+1 bits back into an N-bit word requires some type
of (re)quantization no matter how you approach it. This is the key
issue.

There are two methods of requantization:

1. Truncating (what we have both suggested to Ultrus).
2. Rounding (better - the resulting error has zero-mean).

In addition, there are architectures that improve on basic
requantization in various ways:

1. Dithering following by requantization.
2. Noise-shaping by placing feedback around the quantizer.

There are almost endless possibilities in how the
noise-shaping is performed:

a. Simple zeros at DC.
b. Zeros at psychoacoustically significant places.
c. Higher-order filtering with psychoacoustic optimizations.

3. Noise-shaping and dithering.

These all produce results of varying quality.


When I say "adding back in" I really meant re-dithering by some means,
although since dither typically has a triangular probability density of 1/3
LSB, as a practical matter, summing it in is unlikely to overflow any
registers.

The methods for generating/applying dither could be done with the
noise-shaping technique, as you mention. Van Der Kooy and Lipschitz
described an algorithm like that in an article from the late 1980s I think
it was.

It also occurred to me that if the OP were to sum two properly dithered
signals without changing their original amplitude, the dither would involve
the two LSBs, and no re-dithering would be necessary at all. It would remain
in the new LSB after the right shift. If there was any gain reduction in
either channel prior to summing, then it would be advisable to re-dither.


  #13   Report Post  
Posted to rec.audio.tech
John Phillips John Phillips is offline
external usenet poster
 
Posts: 54
Default mathmatics behind mixing voice over music

On 2007-02-10, Ultrus wrote:
What kind of math are you using to mix values from one audio file with
another? In the post above, one suggestion was to add the values of 2
or more audio samples, and find the average of them and use the
resulting value for each value set (excuse my lack of terminology). In
another forum, I was told this was not a good idea. Instead, I was
strictly to add values (but this could cause clipping). The main point
I understand now is mixing is an art, not just a science.


If you are mixing two signals then, as has been pointed out, the basic
mathematics is just addition:

output = I1 + I2 [1]

Yes that will clip if the sum of the inputs is too big to be represented
by the mixer's output word length. Of course, the same happens on an
analogue mixer due to its finite power supply voltage rail. The typical
solution is the same - scale the inputs to a lower level:

output = a * I1 + b * I2 [2]

Where a and b are constants in the range 0.000... to 1.000... (and I am
talking radix 2 numbers here so the level below 1.000... is 0.111...).
The products above are the same as having level controls on analogue
inputs.

Some people in this thread have suggested a = b = 0.1000 (0.5 in decimal)
which will always avoid clipping for a two-signal mixer, but does not
allow you to balance the two levels appropriately if required.

However whenever you scale a signal in audio processing (i.e. multiply it)
you will, in general, get word-length growth at the bottom (LSB) end.
If you truncate or round each individual operation back to some lower
level of precision you will get non-linear quantization distortion.
So either you must dither each product before truncation or keep enough
bits for the quantization distortion not to matter. Ideally keep all
bits (note that a 16-bit word times an 8-bit word will, in general,
have 16+8 = 24 bits in the product, etc.).

However, even if you can avoid dithering at intermediate stages, you do
need to dither the final output of the mixer if you want it to be reduced
to the same word length as the inputs and be free from (non-linear)
quantization products.

A way to approach a particular problem like mixing two signals of 16-bit
length (as an example) is to use 24-bit fixed-length signed arithmetic.

Add 8 zeros to the bottom of the input words before you start to process
them as 24-bit words and you can do a reasonable amount of simple
rounded/truncated but undithered arithmetic, like [2] above, with little
risk of overflow (if you scale the inputs suitably) and little risk of
audible quantization distortion.

Then dither the 24-bit result by generating a dither signal of the
appropriate type and resolution, and adding it to the 24-bit processed
output before finally truncating it to 16 bits.

For very simple operations you may not need to use as many as 8 extra
bits and for more complex processing it may be better to use more.

--
John Phillips
  #14   Report Post  
Posted to rec.audio.tech
Arny Krueger Arny Krueger is offline
external usenet poster
 
Posts: 17,262
Default mathmatics behind mixing voice over music

"Karl Uppiano" wrote in message
news:P1nzh.6928$Yn4.5950@trnddc03
"Randy Yates" wrote in message
...
"Ultrus" writes:
[...]


Ultrus,

Believe it or not, this type of operation, while
seemingly simple, is actually quite involved, as I think
you're finding out. To go into all the "science" behind
some of the options, even superficially, would require a
significant amount of time and writing. If you don't care about the
absolute highest audio
quality, you can simply sum the two channels using a
32-bit integer, then right-shift the result one-bit and
store the least-signficant word back to a 16-bit
integer. Of course this is assuming 16-bit samples. This will guarantee
that you'll never clip, and will probably
sound just fine for your application.


I decided not to discuss dithering in my earlier post,
but right-shifting the data will delete any dither from
the original data streams.


.... as will just about any means for obtaining significant amounts of gain
reduction.

Ideally, it would need to be added back in.


Agreed.


  #15   Report Post  
Posted to rec.audio.tech
Karl Uppiano Karl Uppiano is offline
external usenet poster
 
Posts: 232
Default mathmatics behind mixing voice over music


"Arny Krueger" wrote in message
. ..
"Karl Uppiano" wrote in message
news:P1nzh.6928$Yn4.5950@trnddc03
"Randy Yates" wrote in message
...
"Ultrus" writes:
[...]

Ultrus,

Believe it or not, this type of operation, while
seemingly simple, is actually quite involved, as I think
you're finding out. To go into all the "science" behind
some of the options, even superficially, would require a
significant amount of time and writing. If you don't care about the
absolute highest audio
quality, you can simply sum the two channels using a
32-bit integer, then right-shift the result one-bit and
store the least-signficant word back to a 16-bit
integer. Of course this is assuming 16-bit samples. This will guarantee
that you'll never clip, and will probably
sound just fine for your application.


I decided not to discuss dithering in my earlier post,
but right-shifting the data will delete any dither from
the original data streams.


... as will just about any means for obtaining significant amounts of gain
reduction.

Ideally, it would need to be added back in.


Agreed.


I actually doubt that the OP's application is critical enough to require
dither, but it is an interesting discussion. I wonder why he doesn't just
use off the shelf (free) software to perform this operation. Even the free
stuff has lots more features than I usually use. I don't know if they bother
to re-dither or not. I am using some free DAW software to transcribe my old
vinyl library to digital, and the vinyl provides *way* more dither than I
actually need anyway :-)




  #16   Report Post  
Posted to rec.audio.tech
Ultrus Ultrus is offline
external usenet poster
 
Posts: 14
Default mathmatics behind mixing voice over music

I actually doubt that the OP's application is critical enough to require
dither, but it is an interesting discussion.


Yes. This is interesting and trying to take everything in, but I'm
following the "Keep It Simple Stupid" method while scripting.

I wonder why he doesn't just
use off the shelf (free) software to perform this operation. Even the free
stuff has lots more features than I usually use.


I'm editing audio from the source using scripts that rip apart 16 bit
audio files, giving me an array of values ranging from -32768 to
32767. I play with those values, messing them up in limitless ways,
then covert the array back into an audio file. I've looked for good
audio software that I could connect to from scripts on a server, but
have not had the success I desire.

Thanks to great info from input above, I should be able to mix my full
volume voice with soft music in the background. To take it a step
further, let's say my music volume was a little louder, and I want it
to "duck" when it competes with the vocie. Would you think the
following formula would work for each "word"/array value of voice
sample and music sample?

in English:

if voice value plus music value is greater than clip value, then
new value equals voice value plus music value minus clip value plus
voice value
otherwise if voice value plus music value is less than negative clip
value, then
new value equals voice value plus music value minus negative clip
value plus voice value
otherwise
new value equals voice value plus music value

how I would write it in php:

if($voiceValue + $musicValue $clipValue) {
$newValue = $voiceValue + $musicValue - $clipValue + $voiceValue;
} else if($voiceValue + $musicValue -$clipValue) {
$newValue = $voiceValue + $musicValue - -$clipValue + $voiceValue;
} else {
$newValue = $voiceValue + $musicValue;
}

Thoughts on this? I did not increase the bit rate and drop it back
down while mixing. I just subtracted any music that was interfering
with the voice.

  #17   Report Post  
Posted to rec.audio.tech
Randy Yates Randy Yates is offline
external usenet poster
 
Posts: 839
Default mathmatics behind mixing voice over music

"Arny Krueger" writes:

"Karl Uppiano" wrote in message
news:P1nzh.6928$Yn4.5950@trnddc03
"Randy Yates" wrote in message
...
"Ultrus" writes:
[...]

Ultrus,

Believe it or not, this type of operation, while
seemingly simple, is actually quite involved, as I think
you're finding out. To go into all the "science" behind
some of the options, even superficially, would require a
significant amount of time and writing. If you don't care about the
absolute highest audio
quality, you can simply sum the two channels using a
32-bit integer, then right-shift the result one-bit and
store the least-signficant word back to a 16-bit
integer. Of course this is assuming 16-bit samples. This will guarantee
that you'll never clip, and will probably
sound just fine for your application.


I decided not to discuss dithering in my earlier post,
but right-shifting the data will delete any dither from
the original data streams.


... as will just about any means for obtaining significant amounts of gain
reduction.

Ideally, it would need to be added back in.


Agreed.


You're both confused in several respects.

Dither isn't some companion to the signal. It is something that is
added to the signal prior to a quantizer in order to linearize the
quantization. After the quantization, the signal is just the signal,
NOT the signal plus the dither.

Neither is it gain that "takes out the dither," since the dither isn't
there in the first place. Rather, gain, in general, and for either
positive or negative (dB) values, increases the resolution and
requires a requantization step to go back to less resolution. There
are exceptions of course, e.g., when the gain is a power of two.

Finally, the entire operation of shifting right one bit and taking
the least-significant word is one of requantization, not gain.
--
% Randy Yates % "I met someone who looks alot like you,
%% Fuquay-Varina, NC % she does the things you do,
%%% 919-577-9882 % but she is an IBM."
%%%% % 'Yours Truly, 2095', *Time*, ELO
http://home.earthlink.net/~yatescr
  #18   Report Post  
Posted to rec.audio.tech
Karl Uppiano Karl Uppiano is offline
external usenet poster
 
Posts: 232
Default mathmatics behind mixing voice over music


"Ultrus" wrote in message
ups.com...
I actually doubt that the OP's application is critical enough to require
dither, but it is an interesting discussion.


Yes. This is interesting and trying to take everything in, but I'm
following the "Keep It Simple Stupid" method while scripting.

I wonder why he doesn't just
use off the shelf (free) software to perform this operation. Even the
free
stuff has lots more features than I usually use.


I'm editing audio from the source using scripts that rip apart 16 bit
audio files, giving me an array of values ranging from -32768 to
32767. I play with those values, messing them up in limitless ways,
then covert the array back into an audio file. I've looked for good
audio software that I could connect to from scripts on a server, but
have not had the success I desire.

Thanks to great info from input above, I should be able to mix my full
volume voice with soft music in the background. To take it a step
further, let's say my music volume was a little louder, and I want it
to "duck" when it competes with the vocie. Would you think the
following formula would work for each "word"/array value of voice
sample and music sample?

in English:

if voice value plus music value is greater than clip value, then
new value equals voice value plus music value minus clip value plus
voice value
otherwise if voice value plus music value is less than negative clip
value, then
new value equals voice value plus music value minus negative clip
value plus voice value
otherwise
new value equals voice value plus music value

how I would write it in php:

if($voiceValue + $musicValue $clipValue) {
$newValue = $voiceValue + $musicValue - $clipValue + $voiceValue;
} else if($voiceValue + $musicValue -$clipValue) {
$newValue = $voiceValue + $musicValue - -$clipValue + $voiceValue;
} else {
$newValue = $voiceValue + $musicValue;
}

Thoughts on this? I did not increase the bit rate and drop it back
down while mixing. I just subtracted any music that was interfering
with the voice.


Ducking the music is something that audio engineers used to do manually
(with a gain control on analog mixers) or automatically, using compressors
and such.

In the digital realm, it is done more or less as you describe, although, you
need to fade the music in and out: you can't abruptly change the gain
without making clicks and other noises.

This means you probably need to buffer a few seconds of audio, so you can
"look ahead" to see if clipping would occur, and then fade the music by a
rate that allows you to reach your target level at the right time.

It's always something, isn't it? :-)


  #19   Report Post  
Posted to rec.audio.tech
Ultrus Ultrus is offline
external usenet poster
 
Posts: 14
Default mathmatics behind mixing voice over music

It's always something, isn't it? :-)

Yes, the challenges bring me joy.

The fading makes sense. I'm off to think on this for a bit.

  #20   Report Post  
Posted to rec.audio.tech
Karl Uppiano Karl Uppiano is offline
external usenet poster
 
Posts: 232
Default mathmatics behind mixing voice over music


"Randy Yates" wrote in message
...
"Arny Krueger" writes:

"Karl Uppiano" wrote in message
news:P1nzh.6928$Yn4.5950@trnddc03
"Randy Yates" wrote in message
...
"Ultrus" writes:
[...]

Ultrus,

Believe it or not, this type of operation, while
seemingly simple, is actually quite involved, as I think
you're finding out. To go into all the "science" behind
some of the options, even superficially, would require a
significant amount of time and writing. If you don't care about the
absolute highest audio
quality, you can simply sum the two channels using a
32-bit integer, then right-shift the result one-bit and
store the least-signficant word back to a 16-bit
integer. Of course this is assuming 16-bit samples. This will guarantee
that you'll never clip, and will probably
sound just fine for your application.

I decided not to discuss dithering in my earlier post,
but right-shifting the data will delete any dither from
the original data streams.


... as will just about any means for obtaining significant amounts of
gain
reduction.

Ideally, it would need to be added back in.


Agreed.


You're both confused in several respects.

Dither isn't some companion to the signal. It is something that is
added to the signal prior to a quantizer in order to linearize the
quantization. After the quantization, the signal is just the signal,
NOT the signal plus the dither.


I totally get that.

Neither is it gain that "takes out the dither," since the dither isn't
there in the first place. Rather, gain, in general, and for either
positive or negative (dB) values, increases the resolution and
requires a requantization step to go back to less resolution. There
are exceptions of course, e.g., when the gain is a power of two.


Increasing the gain of a dithered signal will preserve the dither along with
its linearizing effects (but the noise floor will be increased also, just
like analog). Reducing the gain will reduce the dither along with it, and
right shifting an optimally dithered signal (triangular probability density
at 1/3 LSB) will eliminate it completely. If there is more dither than
optimally necessary, it might survive some gain reduction.

Finally, the entire operation of shifting right one bit and taking
the least-significant word is one of requantization, not gain.


It is re-quantization, but it is to restore the gain to fit into the 16-bit
target. Every right shift reduces the gain by 6 dB. Every left shift
increases the gain by 6 dB.

I was suggesting adding two 16-bit samples, which could require one
additional bit. I suggested that the accumulator register be, I don't know,
32 bits (a convenient word size in most computers today) and far more than
enough to hold the overflow. Then right-shift one bit to restore the
original dynamic range for the 16-bit target. This is a re-quantization
operation, which ideally requires re-dithering, because of the truncation or
rounding that will take place.

% Randy Yates % "I met someone who looks alot like you,
%% Fuquay-Varina, NC % she does the things you do,
%%% 919-577-9882 % but she is an IBM."
%%%% % 'Yours Truly, 2095', *Time*, ELO
http://home.earthlink.net/~yatescr





  #21   Report Post  
Posted to rec.audio.tech
[email protected] dpierce@cartchunk.org is offline
external usenet poster
 
Posts: 402
Default mathmatics behind mixing voice over music

On Feb 11, 4:12 pm, "Karl Uppiano" wrote:
Neither is it gain that "takes out the dither," since the dither isn't
there in the first place. Rather, gain, in general, and for either
positive or negative (dB) values, increases the resolution and
requires a requantization step to go back to less resolution. There
are exceptions of course, e.g., when the gain is a power of two.


Increasing the gain of a dithered signal will preserve the dither along with
its linearizing effects (but the noise floor will be increased also, just
like analog). Reducing the gain will reduce the dither along with it, and
right shifting an optimally dithered signal (triangular probability density
at 1/3 LSB) will eliminate it completely. If there is more dither than
optimally necessary, it might survive some gain reduction.


And this is where you're getting lost. Reducing the gain will
NEVER "eliminate it completely," because "it" is not separate
or separable form the signal itself. And the dither, even though
it may be only a fraction of an LSB in the original signal, does
not have an effect which is limited to that small portion of the
signal. It has an effect across the entire range of the signal.

Consider, for example, what might happen to a high-level
signal when dither is added. Say our range is that of a
16-bit signed integer, -32768 to +32767.

COnsider a signal whose value is 0.67. Add a random
+-1/2 LSB dither signal to that and truncate. the result is
that roughly 2/3 of the time, the result will be 1, and 1/3
of the time, it will be 0. Average a large enough collection
of those, and the result is ... 0.67.

Now consider a signal whose original value is, oh,
29,355.67. Add a +-1/2 LSB random signal to that,
and the effect will be that about 2/3 of the time, the
value once truncated to an integer, will be 29,356,
and 1/3 of the time it will truncate to 29,355. Look
at the average, and the result is, ... 29,355.67.

That's how dither works. The dither has not only
affected tiny signals, it's affected the big ones.

Now, truncate enough, and, yes, the EFFECTS
will be less and less, simply because you've reduced
the dynamic range enough.


  #22   Report Post  
Posted to rec.audio.tech
Randy Yates Randy Yates is offline
external usenet poster
 
Posts: 839
Default mathmatics behind mixing voice over music

"Karl Uppiano" writes:
[...]
"Randy Yates" wrote in message
After the quantization, the signal is just the signal,
NOT the signal plus the dither.


I totally get that.
[...]


followed by:

Increasing the gain of a dithered signal will preserve the dither ...


You have just created a contradiction.

(but the noise floor will be increased also, just
like analog). Reducing the gain will reduce the dither along with it, and
right shifting an optimally dithered signal (triangular probability density
at 1/3 LSB) will eliminate it completely.


This is complete nonsense, as I've said, what, 3 times now that THERE IS
NO DITHER IN THE SIGNAL.

Please write on the board 100 times, "THERE IS NO DITHER IN THE SIGNAL.
THERE IS NO DITHER IN THE SIGNAL. ..."

Also, I don't know where you came up with, or what you even mean by,
"triangular probability density at 1/3 LSB." A TPDF that has a
probability density function ranging from -1 to +1 bit (or
equivalently, with a variance of delta^2 / 4, where delta is the value
of the LSB) is required to completely decorrelate the mean and
variance (i.e., the first and second moments) of the quantization
noise from the input signal. This is shown in Wannamaker's excellent
thesis [wannamaker].

--Randy

@article{wannamaker,
title = "{The Theory of Dithered Quantization}",
author = "{Robert~A.~Wannamaker}",
journal = "Ph.D. Thesis University of Waterloo Applied Mathematics Department",
year = "1997"}

--
% Randy Yates % "With time with what you've learned,
%% Fuquay-Varina, NC % they'll kiss the ground you walk
%%% 919-577-9882 % upon."
%%%% % '21st Century Man', *Time*, ELO
http://home.earthlink.net/~yatescr
  #23   Report Post  
Posted to rec.audio.tech
Karl Uppiano Karl Uppiano is offline
external usenet poster
 
Posts: 232
Default mathmatics behind mixing voice over music


wrote in message
ps.com...
On Feb 11, 4:12 pm, "Karl Uppiano" wrote:
Neither is it gain that "takes out the dither," since the dither isn't
there in the first place. Rather, gain, in general, and for either
positive or negative (dB) values, increases the resolution and
requires a requantization step to go back to less resolution. There
are exceptions of course, e.g., when the gain is a power of two.


Increasing the gain of a dithered signal will preserve the dither along
with
its linearizing effects (but the noise floor will be increased also, just
like analog). Reducing the gain will reduce the dither along with it, and
right shifting an optimally dithered signal (triangular probability
density
at 1/3 LSB) will eliminate it completely. If there is more dither than
optimally necessary, it might survive some gain reduction.


And this is where you're getting lost. Reducing the gain will
NEVER "eliminate it completely," because "it" is not separate
or separable form the signal itself. And the dither, even though
it may be only a fraction of an LSB in the original signal, does
not have an effect which is limited to that small portion of the
signal. It has an effect across the entire range of the signal.

Consider, for example, what might happen to a high-level
signal when dither is added. Say our range is that of a
16-bit signed integer, -32768 to +32767.

COnsider a signal whose value is 0.67. Add a random
+-1/2 LSB dither signal to that and truncate. the result is
that roughly 2/3 of the time, the result will be 1, and 1/3
of the time, it will be 0. Average a large enough collection
of those, and the result is ... 0.67.

Now consider a signal whose original value is, oh,
29,355.67. Add a +-1/2 LSB random signal to that,
and the effect will be that about 2/3 of the time, the
value once truncated to an integer, will be 29,356,
and 1/3 of the time it will truncate to 29,355. Look
at the average, and the result is, ... 29,355.67.

That's how dither works. The dither has not only
affected tiny signals, it's affected the big ones.

Now, truncate enough, and, yes, the EFFECTS
will be less and less, simply because you've reduced
the dynamic range enough.


I understand place value. But whenever a signal is re-quantized, it needs to
be re-dithered as well, because the quantization introduces a whole new set
of truncation and rounding errors that are not random, and are correlated
with the signal itself.

The original dither does not mitigate that. If the original random noise is
large enough, the new errors may be negligible, even randomized somewhat.
But now you have a noise level problem, which, though less severe than the
correlated error signals, is still higher than you might want.

I suspect you understand all that, but perhaps I was not making myself
clear.


  #24   Report Post  
Posted to rec.audio.tech
Richard Crowley Richard Crowley is offline
external usenet poster
 
Posts: 806
Default mathmatics behind mixing voice over music

"Ultrus" wrote...
I've looked for good audio software that I could connect
to from scripts on a server, but have not had the success
I desire.


Audacity is one of the more popular freeware audio
editing/processing applicaitons and it is open-source.
  #25   Report Post  
Posted to rec.audio.tech
Karl Uppiano Karl Uppiano is offline
external usenet poster
 
Posts: 232
Default mathmatics behind mixing voice over music


"Randy Yates" wrote in message
...
"Karl Uppiano" writes:
[...]
"Randy Yates" wrote in message
After the quantization, the signal is just the signal,
NOT the signal plus the dither.


I totally get that.
[...]


followed by:

Increasing the gain of a dithered signal will preserve the dither ...


You have just created a contradiction.

(but the noise floor will be increased also, just
like analog). Reducing the gain will reduce the dither along with it, and
right shifting an optimally dithered signal (triangular probability
density
at 1/3 LSB) will eliminate it completely.


This is complete nonsense, as I've said, what, 3 times now that THERE IS
NO DITHER IN THE SIGNAL.


Say it as many times as you want. There is no discrete representation for a
sample that is part way between two quantization levels. The difference
between the input and the encoded sample is noise. As the signal changes,
the noise level changes, correlated with the input. That produces strong
spectral lines (intermodulation and THD mostly) that are far more audible
than random noise. By randomizing the input signal by exactly the right
amount, you can spread the error energy over a wide spectrum, instead of
concentrating it at a specific frequency. The quantizer still cannot
represent the sample exactly, but since the signal is changing randomly, the
quantization error is also random. But this randomization amounts to white
noise, and it does add a bit to the noise floor, or more pointedly, it puts
a noise floor where one belongs, but did not exist, in an un-dithered
system. When done optimally, it is the same noise energy, but re-allocated
to different frequencies.

Implementations vary, but oversampling quantizers use noise shaping to add
or subtract errors from adjacent samples -- a form of negative feedback, and
thus de-correlate the errors. The basic physics -- and the end result -- is
the same.

Please write on the board 100 times, "THERE IS NO DITHER IN THE SIGNAL.
THERE IS NO DITHER IN THE SIGNAL. ..."


Holy crap, dude. Get a grip. What I refer to as a "properly dithered signal"
differs from the datastream that would have been created without dither.
Dither added, errors re-allocated -- same thing. What I refer to as
"preserving dither" is the same as not introducing new errors.

Also, I don't know where you came up with, or what you even mean by,
"triangular probability density at 1/3 LSB." A TPDF that has a
probability density function ranging from -1 to +1 bit (or
equivalently, with a variance of delta^2 / 4, where delta is the value
of the LSB) is required to completely decorrelate the mean and
variance (i.e., the first and second moments) of the quantization
noise from the input signal. This is shown in Wannamaker's excellent
thesis [wannamaker].


I read a different paper from you. That's how I came up with it. Van Der
Kooy & Lipschitz from the early '90s. I cannot locate it now, but it was
published in AES.

--Randy

@article{wannamaker,
title = "{The Theory of Dithered Quantization}",
author = "{Robert~A.~Wannamaker}",
journal = "Ph.D. Thesis University of Waterloo Applied Mathematics
Department",
year = "1997"}

--
% Randy Yates % "With time with what you've learned,
%% Fuquay-Varina, NC % they'll kiss the ground you walk
%%% 919-577-9882 % upon."
%%%% % '21st Century Man', *Time*, ELO
http://home.earthlink.net/~yatescr





  #26   Report Post  
Posted to rec.audio.tech
Randy Yates Randy Yates is offline
external usenet poster
 
Posts: 839
Default mathmatics behind mixing voice over music

"Karl Uppiano" writes:

"Randy Yates" wrote in message
...
"Karl Uppiano" writes:
[...]
"Randy Yates" wrote in message
After the quantization, the signal is just the signal,
NOT the signal plus the dither.

I totally get that.
[...]


followed by:

Increasing the gain of a dithered signal will preserve the dither ...


You have just created a contradiction.

(but the noise floor will be increased also, just
like analog). Reducing the gain will reduce the dither along with it, and
right shifting an optimally dithered signal (triangular probability
density
at 1/3 LSB) will eliminate it completely.


This is complete nonsense, as I've said, what, 3 times now that THERE IS
NO DITHER IN THE SIGNAL.


Say it as many times as you want. There is no discrete representation for a
sample that is part way between two quantization levels.


True, but that doesn't make what you've got "signal + dither" - it is just
"signal." You see the *effects* of the dither that was added prior to the
quantizer, but the dither itself is gone bye-bye.

The difference between the input and the blah blah blah.


Thanks, but I've already read up on the theory, and I take my theory
from engineering papers and texts, not someone on usenet.

But this randomization ... does add a bit to the noise floor ...
When done optimally, it is the same noise energy


Well which is it? Does it add energy (more correctly, power) or doesn't
it?

The answer is that nRPDF dither (see again Wannamaker for this
nomenclature) adds n+1 times the noise power of the basic delta^2/12
quantizer. So rectangular (uniform) dither (1RPDF) adds 3 dB, 2RPDF ==
TPDF adds 4.77 dB, etc.

Implementations vary, but oversampling quantizers use noise shaping to add
or subtract errors from adjacent samples -- a form of negative feedback, and
thus de-correlate the errors. The basic physics -- and the end result -- is
the same.


Huh???!?? When did we go from talking about dither to noise-shaping?
Noise-shaping is definitely NOT the same as dither, even in the "end
result." Who said anything about oversampling the signal???

Please write on the board 100 times, "THERE IS NO DITHER IN THE SIGNAL.
THERE IS NO DITHER IN THE SIGNAL. ..."


Holy crap, dude. Get a grip. What I refer to as a "properly dithered signal"
differs from the datastream that would have been created without dither.
Dither added, errors re-allocated -- same thing. What I refer to as
"preserving dither" is the same as not introducing new errors.


Also, I don't know where you came up with, or what you even mean by,
"triangular probability density at 1/3 LSB." A TPDF that has a
probability density function ranging from -1 to +1 bit (or
equivalently, with a variance of delta^2 / 4, where delta is the value
of the LSB) is required to completely decorrelate the mean and
variance (i.e., the first and second moments) of the quantization
noise from the input signal. This is shown in Wannamaker's excellent
thesis [wannamaker].


I read a different paper from you. That's how I came up with it. Van Der
Kooy & Lipschitz from the early '90s. I cannot locate it now, but it was
published in AES.


You mean

@article{resolutionbelowlsb,
title = "{Resolution Below the Least Significant Bit in Digital Systems with Dither}",
author = "{John~Vanderkooy, Stanley~P.~Lip****z}",
journal = "Journal of the Audio Engineering Society",
year = "1984",
month = "February"}

???
--
% Randy Yates % "Maybe one day I'll feel her cold embrace,
%% Fuquay-Varina, NC % and kiss her interface,
%%% 919-577-9882 % til then, I'll leave her alone."
%%%% % 'Yours Truly, 2095', *Time*, ELO
http://home.earthlink.net/~yatescr
  #27   Report Post  
Posted to rec.audio.tech
Ultrus Ultrus is offline
external usenet poster
 
Posts: 14
Default mathmatics behind mixing voice over music

Audacity is one of the more popular freeware audio
editing/processing applicaitons and it is open-source.


Hello Richard,
I'm a fan of Audacity. I contacted the developers a while back on
this, and it seems like they're not that far yet. I can't use their
software through the command line or similar method. It's on my wish
list however.

  #28   Report Post  
Posted to rec.audio.tech
Karl Uppiano Karl Uppiano is offline
external usenet poster
 
Posts: 232
Default mathmatics behind mixing voice over music


"Randy Yates" wrote in message
...
"Karl Uppiano" writes:

"Randy Yates" wrote in message
...
"Karl Uppiano" writes:
[...]
"Randy Yates" wrote in message
After the quantization, the signal is just the signal,
NOT the signal plus the dither.

I totally get that.
[...]

followed by:

Increasing the gain of a dithered signal will preserve the dither ...

You have just created a contradiction.

(but the noise floor will be increased also, just
like analog). Reducing the gain will reduce the dither along with it,
and
right shifting an optimally dithered signal (triangular probability
density
at 1/3 LSB) will eliminate it completely.

This is complete nonsense, as I've said, what, 3 times now that THERE IS
NO DITHER IN THE SIGNAL.


Say it as many times as you want. There is no discrete representation for
a
sample that is part way between two quantization levels.


True, but that doesn't make what you've got "signal + dither" - it is just
"signal." You see the *effects* of the dither that was added prior to the
quantizer, but the dither itself is gone bye-bye.


Exactly. It was transformed. That was the "blah blah blah" part which you so
blithely dismiss:

The difference between the input and the blah blah blah.


Thanks, but I've already read up on the theory, and I take my theory
from engineering papers and texts, not someone on usenet.


Well, that's a load off. I was afraid you were depending on me.

But this randomization ... does add a bit to the noise floor ...
When done optimally, it is the same noise energy


Well which is it? Does it add energy (more correctly, power) or doesn't
it?


Of course it has to add a small amount of power -- the dither has to
modulate the encoder even during complete silence so the signal doesn't kick
up noise out of nothing.

The answer is that nRPDF dither (see again Wannamaker for this
nomenclature) adds n+1 times the noise power of the basic delta^2/12
quantizer. So rectangular (uniform) dither (1RPDF) adds 3 dB, 2RPDF ==
TPDF adds 4.77 dB, etc.


I'm glad you have the documents handy. I'm working from memory. If I were
going to go off and implement one of these things, I would most certainly
refresh my memory before doing anything else. I haven't needed to in over 15
years, so that information is swapped out to disk or something. I did not
feel compelled to spend days reading up to be all authoritative so I could
respond to a question that did not require detailed dither design parameters
in the first place.

Implementations vary, but oversampling quantizers use noise shaping to
add
or subtract errors from adjacent samples -- a form of negative feedback,
and
thus de-correlate the errors. The basic physics -- and the end result --
is
the same.


Huh???!?? When did we go from talking about dither to noise-shaping?
Noise-shaping is definitely NOT the same as dither, even in the "end
result." Who said anything about oversampling the signal???


When did we do that? Well, I would estimate it happened at the very instant
I typed it. Of course noise shaping is not the same as dither, but noise
shaping definitely has an impact on how you implement dither in a particular
design. Oversampling converters using noise shaping can do some clever
implementations, both of which IIRC were described in the article I
referenced, which *was* about dither.

Please write on the board 100 times, "THERE IS NO DITHER IN THE SIGNAL.
THERE IS NO DITHER IN THE SIGNAL. ..."


Holy crap, dude. Get a grip. What I refer to as a "properly dithered
signal"
differs from the datastream that would have been created without dither.
Dither added, errors re-allocated -- same thing. What I refer to as
"preserving dither" is the same as not introducing new errors.


Also, I don't know where you came up with, or what you even mean by,
"triangular probability density at 1/3 LSB." A TPDF that has a
probability density function ranging from -1 to +1 bit (or
equivalently, with a variance of delta^2 / 4, where delta is the value
of the LSB) is required to completely decorrelate the mean and
variance (i.e., the first and second moments) of the quantization
noise from the input signal. This is shown in Wannamaker's excellent
thesis [wannamaker].


I read a different paper from you. That's how I came up with it. Van Der
Kooy & Lipschitz from the early '90s. I cannot locate it now, but it was
published in AES.


You mean

@article{resolutionbelowlsb,
title = "{Resolution Below the Least Significant Bit in Digital Systems
with Dither}",
author = "{John~Vanderkooy, Stanley~P.~Lip****z}",
journal = "Journal of the Audio Engineering Society",
year = "1984",
month = "February"}

???


That might have been it. There were a lot of articles going around at that
time. It could have been a series. I thought it was from 1992 (e.g.,
http://www.aes.org/e-lib/browse.cfm?elib=7047 seems to ring a bell, and I
seem to recall Wannamaker was credited as well). I had it in my stack of
stuff, but it went missing when we moved to our new dumpster.

--
% Randy Yates % "Maybe one day I'll feel her cold
embrace,
%% Fuquay-Varina, NC % and kiss her
interface,
%%% 919-577-9882 % til then, I'll leave her
alone."
%%%% % 'Yours Truly, 2095', *Time*, ELO
http://home.earthlink.net/~yatescr



  #29   Report Post  
Posted to rec.audio.tech
Ultrus Ultrus is offline
external usenet poster
 
Posts: 14
Default mathmatics behind mixing voice over music

Audacity is one of the more popular freeware audio
editing/processing applicaitons and it is open-source.


Hello Richard,
I'm a big fan of Audacity. I spoke with the developers about this
topic. While it is in the works long term, there is currently not a
way I can access the software from the command line or related method.
It is on my wish list however.

  #30   Report Post  
Posted to rec.audio.tech
Ultrus Ultrus is offline
external usenet poster
 
Posts: 14
Default mathmatics behind mixing voice over music

On Feb 11, 10:55 pm, "Ultrus" wrote:
Audacity is one of the more popular freeware audio
editing/processing applicaitons and it is open-source.


Hello Richard,
I'm a big fan of Audacity. I spoke with the developers about this
topic. While it is in the works long term, there is currently not a
way I can access the software from the command line or related method.
It is on my wish list however.




ahhh oops. This went into a second page. Sorry for the repeated post.



  #31   Report Post  
Posted to rec.audio.tech
John Phillips John Phillips is offline
external usenet poster
 
Posts: 54
Default mathmatics behind mixing voice over music

On 2007-02-11, wrote:
On Feb 11, 4:12 pm, "Karl Uppiano" wrote:
Neither is it gain that "takes out the dither," since the dither isn't
there in the first place. Rather, gain, in general, and for either
positive or negative (dB) values, increases the resolution and
requires a requantization step to go back to less resolution. There
are exceptions of course, e.g., when the gain is a power of two.


Increasing the gain of a dithered signal will preserve the dither along with
its linearizing effects (but the noise floor will be increased also, just
like analog). Reducing the gain will reduce the dither along with it, and
right shifting an optimally dithered signal (triangular probability density
at 1/3 LSB) will eliminate it completely. If there is more dither than
optimally necessary, it might survive some gain reduction.


And this is where you're getting lost. Reducing the gain will
NEVER "eliminate it completely," because "it" is not separate
or separable form the signal itself. And the dither, even though
it may be only a fraction of an LSB in the original signal, does
not have an effect which is limited to that small portion of the
signal. It has an effect across the entire range of the signal.


An interesting discussion. I am trying to follow the precise point
of contention.

I think KU contends that if you have a signal, add dither at a level
appropriate to a specific quantizer (call it quantizer X) and you scale
the signal+dither to a lower level before quantizing with quantizer X
then this fails to eliminate quantization distortion.

I do agree with dpierce that such a scaling does not completely eliminate
the effects on the most significant bits of adding the dither. However my
prejudice is that by doing this scaling before quantizing the amplitude
of the dither relative to the dither amplitude requirement for quantizer
X becomes insufficient to completely de-correlate the signal from the
quantization error.

So in this sense, although the dither cannot disappear completely through
scaling to a lower level it does, I think, become ineffective.

I have written software to successfully dither signals before
re-quantization and I have verified that:

(a) with TPDF dither at 2 LSB p-p WRT the quantizer this works to
completely eliminate quantization effects; and

(b) with dither at a lower level the quantization distortion is NOT
completely eliminated.

However I have not actually tried to add the right level of dither,
scale the signal to a lower level and then quantize. I will do so soon
(tonight if I have the time) and see in practice what happens - it's
only a few lines of code (!).

--
John Phillips
  #32   Report Post  
Posted to rec.audio.tech
Don Pearce Don Pearce is offline
external usenet poster
 
Posts: 2,726
Default mathmatics behind mixing voice over music

On 12 Feb 2007 10:29:26 GMT, John Phillips
wrote:

On 2007-02-11, wrote:
On Feb 11, 4:12 pm, "Karl Uppiano" wrote:
Neither is it gain that "takes out the dither," since the dither isn't
there in the first place. Rather, gain, in general, and for either
positive or negative (dB) values, increases the resolution and
requires a requantization step to go back to less resolution. There
are exceptions of course, e.g., when the gain is a power of two.

Increasing the gain of a dithered signal will preserve the dither along with
its linearizing effects (but the noise floor will be increased also, just
like analog). Reducing the gain will reduce the dither along with it, and
right shifting an optimally dithered signal (triangular probability density
at 1/3 LSB) will eliminate it completely. If there is more dither than
optimally necessary, it might survive some gain reduction.


And this is where you're getting lost. Reducing the gain will
NEVER "eliminate it completely," because "it" is not separate
or separable form the signal itself. And the dither, even though
it may be only a fraction of an LSB in the original signal, does
not have an effect which is limited to that small portion of the
signal. It has an effect across the entire range of the signal.


An interesting discussion. I am trying to follow the precise point
of contention.

I think KU contends that if you have a signal, add dither at a level
appropriate to a specific quantizer (call it quantizer X) and you scale
the signal+dither to a lower level before quantizing with quantizer X
then this fails to eliminate quantization distortion.

I do agree with dpierce that such a scaling does not completely eliminate
the effects on the most significant bits of adding the dither. However my
prejudice is that by doing this scaling before quantizing the amplitude
of the dither relative to the dither amplitude requirement for quantizer
X becomes insufficient to completely de-correlate the signal from the
quantization error.

So in this sense, although the dither cannot disappear completely through
scaling to a lower level it does, I think, become ineffective.

I have written software to successfully dither signals before
re-quantization and I have verified that:

(a) with TPDF dither at 2 LSB p-p WRT the quantizer this works to
completely eliminate quantization effects; and

(b) with dither at a lower level the quantization distortion is NOT
completely eliminated.

However I have not actually tried to add the right level of dither,
scale the signal to a lower level and then quantize. I will do so soon
(tonight if I have the time) and see in practice what happens - it's
only a few lines of code (!).


When you say you scale the signal to a lower level, do you mean you
reduce the amplitude of the entire signal, including the added dither?
If so, the dither no longer functions, because it must cover +/- 1 lsb
to de-quantize that step. That is why the dither is always an integral
part of the AtoD process - so it doesn't get affected by changes to
the signal level.

Of course real-world signals will almost always carry their own noise
at far higher level than any added dither signal, so quantization
products are, in practice, rarely a problem. Any noise will
de-correlate a quantized signal - it doesn't need to be specially
added.

d
--
Pearce Consulting
http://www.pearce.uk.com
  #33   Report Post  
Posted to rec.audio.tech
Randy Yates Randy Yates is offline
external usenet poster
 
Posts: 839
Default mathmatics behind mixing voice over music

John Phillips writes:
[...]
I think KU contends that if you have a signal, add dither at a level
appropriate to a specific quantizer (call it quantizer X) and you scale
the signal+dither to a lower level before quantizing with quantizer X
then this fails to eliminate quantization distortion.


No, I don't think that is what Mr. Uppiano is contending. Instead, he
is contending that if you scale (gain) the output of quantizer X using
a gain of 1/2 (or less), you lose the "dither."

What I have been contending is that THERE IS NO DITHER IN THE OUTPUT
OF THE QUANTIZER, thus if you scale it, you don't "lose" it since you
don't lose what you never had.

Think of it this way. Let's say you properly dither the input signal x[n]
to quantizer A and the resulting output is y[n]. Then you want to gain
y[n] down by a factor of 0.48147583 = 15777 / 32768 (just for example).
The proper way to do this would be to first multiply y[n] by 15777. You
then have a 32-bit number (assuming 16-bit signals). You then must
requantize this value back to a 16-bit number, so again you dither
(at bit position 16) and requantize (by taking the most-significant
word).

The result is that the original signal (y[n]) IS STILL PRESENT. This
is because, by dithering the result of the gain, the signal components
are preserved (albeit are noisier) ala "resolution below the least
significant digit with dither."

I do agree with dpierce that such a scaling does not completely eliminate
the effects on the most significant bits of adding the dither. However my
prejudice is that by doing this scaling before quantizing the amplitude
of the dither relative to the dither amplitude requirement for quantizer
X becomes insufficient to completely de-correlate the signal from the
quantization error.

So in this sense, although the dither cannot disappear completely through
scaling to a lower level it does, I think, become ineffective.

I have written software to successfully dither signals before
re-quantization and I have verified that:

(a) with TPDF dither at 2 LSB p-p WRT the quantizer this works to
completely eliminate quantization effects; and

(b) with dither at a lower level the quantization distortion is NOT
completely eliminated.

However I have not actually tried to add the right level of dither,
scale the signal to a lower level and then quantize. I will do so soon
(tonight if I have the time) and see in practice what happens - it's
only a few lines of code (!).


I can tell you right now that it won't work properly that way. The
dither level must be matched to the quantizer step.
--
% Randy Yates % "My Shangri-la has gone away, fading like
%% Fuquay-Varina, NC % the Beatles on 'Hey Jude'"
%%% 919-577-9882 %
%%%% % 'Shangri-La', *A New World Record*, ELO
http://home.earthlink.net/~yatescr
  #34   Report Post  
Posted to rec.audio.tech
John Phillips John Phillips is offline
external usenet poster
 
Posts: 54
Default mathmatics behind mixing voice over music

On 2007-02-12, Randy Yates wrote:
John Phillips writes:
[...]
I think KU contends that if you have a signal, add dither at a level
appropriate to a specific quantizer (call it quantizer X) and you scale
the signal+dither to a lower level before quantizing with quantizer X
then this fails to eliminate quantization distortion.


No, I don't think that is what Mr. Uppiano is contending. Instead, he
is contending that if you scale (gain) the output of quantizer X using
a gain of 1/2 (or less), you lose the "dither."


Ah, if so then I agree that after the quantizer the dither is no longer
relevant/functional. All you have is a noisier quantized signal.

...
However I have not actually tried to add the right level of dither,
scale the signal to a lower level and then quantize. I will do so soon
(tonight if I have the time) and see in practice what happens - it's
only a few lines of code (!).


I can tell you right now that it won't work properly that way. The
dither level must be matched to the quantizer step.


That's what I believe.

--
John Phillips
  #35   Report Post  
Posted to rec.audio.tech
John Phillips John Phillips is offline
external usenet poster
 
Posts: 54
Default mathmatics behind mixing voice over music

On 2007-02-12, Don Pearce wrote:
On 12 Feb 2007 10:29:26 GMT, John Phillips
wrote:

On 2007-02-11, wrote:
On Feb 11, 4:12 pm, "Karl Uppiano" wrote:
Neither is it gain that "takes out the dither," since the dither isn't
there in the first place. Rather, gain, in general, and for either
positive or negative (dB) values, increases the resolution and
requires a requantization step to go back to less resolution. There
are exceptions of course, e.g., when the gain is a power of two.

Increasing the gain of a dithered signal will preserve the dither along with
its linearizing effects (but the noise floor will be increased also, just
like analog). Reducing the gain will reduce the dither along with it, and
right shifting an optimally dithered signal (triangular probability density
at 1/3 LSB) will eliminate it completely. If there is more dither than
optimally necessary, it might survive some gain reduction.

And this is where you're getting lost. Reducing the gain will
NEVER "eliminate it completely," because "it" is not separate
or separable form the signal itself. And the dither, even though
it may be only a fraction of an LSB in the original signal, does
not have an effect which is limited to that small portion of the
signal. It has an effect across the entire range of the signal.


An interesting discussion. I am trying to follow the precise point
of contention.

I think KU contends that if you have a signal, add dither at a level
appropriate to a specific quantizer (call it quantizer X) and you scale
the signal+dither to a lower level before quantizing with quantizer X
then this fails to eliminate quantization distortion.

I do agree with dpierce that such a scaling does not completely eliminate
the effects on the most significant bits of adding the dither. However my
prejudice is that by doing this scaling before quantizing the amplitude
of the dither relative to the dither amplitude requirement for quantizer
X becomes insufficient to completely de-correlate the signal from the
quantization error.

So in this sense, although the dither cannot disappear completely through
scaling to a lower level it does, I think, become ineffective.

I have written software to successfully dither signals before
re-quantization and I have verified that:

(a) with TPDF dither at 2 LSB p-p WRT the quantizer this works to
completely eliminate quantization effects; and

(b) with dither at a lower level the quantization distortion is NOT
completely eliminated.

However I have not actually tried to add the right level of dither,
scale the signal to a lower level and then quantize. I will do so soon
(tonight if I have the time) and see in practice what happens - it's
only a few lines of code (!).


When you say you scale the signal to a lower level, do you mean you
reduce the amplitude of the entire signal, including the added dither?
If so, the dither no longer functions, because it must cover +/- 1 lsb
to de-quantize that step. That is why the dither is always an integral
part of the AtoD process - so it doesn't get affected by changes to
the signal level.


Yes - that's how I understand it, from theory and practice.

Of course real-world signals will almost always carry their own noise
at far higher level than any added dither signal, so quantization
products are, in practice, rarely a problem. Any noise will
de-correlate a quantized signal - it doesn't need to be specially
added.


Often true, but after the first quantization (the A/D in the case of
analogue signals) I think you still need to attend to the dithering if
you need to suppress quantization products. (You don't always need to,
of course.)

--
John Phillips


  #36   Report Post  
Posted to rec.audio.tech
Don Pearce Don Pearce is offline
external usenet poster
 
Posts: 2,726
Default mathmatics behind mixing voice over music

On 12 Feb 2007 12:53:18 GMT, John Phillips
wrote:

On 2007-02-12, Don Pearce wrote:
On 12 Feb 2007 10:29:26 GMT, John Phillips
wrote:

On 2007-02-11, wrote:
On Feb 11, 4:12 pm, "Karl Uppiano" wrote:
Neither is it gain that "takes out the dither," since the dither isn't
there in the first place. Rather, gain, in general, and for either
positive or negative (dB) values, increases the resolution and
requires a requantization step to go back to less resolution. There
are exceptions of course, e.g., when the gain is a power of two.

Increasing the gain of a dithered signal will preserve the dither along with
its linearizing effects (but the noise floor will be increased also, just
like analog). Reducing the gain will reduce the dither along with it, and
right shifting an optimally dithered signal (triangular probability density
at 1/3 LSB) will eliminate it completely. If there is more dither than
optimally necessary, it might survive some gain reduction.

And this is where you're getting lost. Reducing the gain will
NEVER "eliminate it completely," because "it" is not separate
or separable form the signal itself. And the dither, even though
it may be only a fraction of an LSB in the original signal, does
not have an effect which is limited to that small portion of the
signal. It has an effect across the entire range of the signal.

An interesting discussion. I am trying to follow the precise point
of contention.

I think KU contends that if you have a signal, add dither at a level
appropriate to a specific quantizer (call it quantizer X) and you scale
the signal+dither to a lower level before quantizing with quantizer X
then this fails to eliminate quantization distortion.

I do agree with dpierce that such a scaling does not completely eliminate
the effects on the most significant bits of adding the dither. However my
prejudice is that by doing this scaling before quantizing the amplitude
of the dither relative to the dither amplitude requirement for quantizer
X becomes insufficient to completely de-correlate the signal from the
quantization error.

So in this sense, although the dither cannot disappear completely through
scaling to a lower level it does, I think, become ineffective.

I have written software to successfully dither signals before
re-quantization and I have verified that:

(a) with TPDF dither at 2 LSB p-p WRT the quantizer this works to
completely eliminate quantization effects; and

(b) with dither at a lower level the quantization distortion is NOT
completely eliminated.

However I have not actually tried to add the right level of dither,
scale the signal to a lower level and then quantize. I will do so soon
(tonight if I have the time) and see in practice what happens - it's
only a few lines of code (!).


When you say you scale the signal to a lower level, do you mean you
reduce the amplitude of the entire signal, including the added dither?
If so, the dither no longer functions, because it must cover +/- 1 lsb
to de-quantize that step. That is why the dither is always an integral
part of the AtoD process - so it doesn't get affected by changes to
the signal level.


Yes - that's how I understand it, from theory and practice.

Of course real-world signals will almost always carry their own noise
at far higher level than any added dither signal, so quantization
products are, in practice, rarely a problem. Any noise will
de-correlate a quantized signal - it doesn't need to be specially
added.


Often true, but after the first quantization (the A/D in the case of
analogue signals) I think you still need to attend to the dithering if
you need to suppress quantization products. (You don't always need to,
of course.)


I would hope that after the first quantization everything is done in
high precision floating point so that no further dithering is needed
until the final quantization to generate the CD file.

Dithering is necessary whenever you change the amplitude of a
quantized signal, so doing the maths on non-quantized FP numbers is
far better.

d

--
Pearce Consulting
http://www.pearce.uk.com
  #37   Report Post  
Posted to rec.audio.tech
Arny Krueger Arny Krueger is offline
external usenet poster
 
Posts: 17,262
Default mathmatics behind mixing voice over music


"John Phillips" wrote in message
...
On 2007-02-12, Don Pearce wrote:


Of course real-world signals will almost always carry their own noise
at far higher level than any added dither signal, so quantization
products are, in practice, rarely a problem.


I strongly agree that the real-world analog signals we record will almost
always carry their own noise,
at far higher level than an optimal dither signal would have, at the point
where that signal is initially digitized.

Modest alterations of real-world signals will very often produce
quantization errors that are small enough to be effectively decorrelated by
the noise that is embedded in the signal.

Any noise will de-correlate a quantized signal - it doesn't need to be
specially
added.


Any is a very strong word.

It is true that any noise signal that contains a TPDF component that is
sufficient to decorrelate any quantization errors that are generated by that
process will result in effective decorrelation of all of the quantization
errors that the process generates.

However, this is not a perfectly general solution for all kinds of noise and
any amount of noise. It is not even a solution that is general for all
practical cases.

Often true, but after the first quantization (the A/D in the case of
analogue signals) I think you still need to attend to the dithering if
you need to suppress quantization products. (You don't always need to,
of course.)


Agreed.


  #38   Report Post  
Posted to rec.audio.tech
John Phillips John Phillips is offline
external usenet poster
 
Posts: 54
Default mathmatics behind mixing voice over music

On 2007-02-12, Don Pearce wrote:
On 12 Feb 2007 12:53:18 GMT, John Phillips
wrote:
... after the first quantization (the A/D in the case of
analogue signals) I think you still need to attend to the dithering if
you need to suppress quantization products. (You don't always need to,
of course.)


I would hope that after the first quantization everything is done in
high precision floating point so that no further dithering is needed
until the final quantization to generate the CD file.


In the code I use to generate test CDs for various purposes I use double
precision (64-bit) floating point so that I can do any maths required
and not worry about anything except the final quantization into 16-bit
representation.

Dithering is necessary whenever you change the amplitude of a
quantized signal, so doing the maths on non-quantized FP numbers is
far better.


Indeed, it's just so much simpler.

--
John Phillips
  #39   Report Post  
Posted to rec.audio.tech
Karl Uppiano Karl Uppiano is offline
external usenet poster
 
Posts: 232
Default mathmatics behind mixing voice over music


"Randy Yates" wrote in message
...
John Phillips writes:
[...]
I think KU contends that if you have a signal, add dither at a level
appropriate to a specific quantizer (call it quantizer X) and you scale
the signal+dither to a lower level before quantizing with quantizer X
then this fails to eliminate quantization distortion.


No, I don't think that is what Mr. Uppiano is contending. Instead, he
is contending that if you scale (gain) the output of quantizer X using
a gain of 1/2 (or less), you lose the "dither."


No, that is not what I am saying. I would like to set the record straight.

What I have been contending is that THERE IS NO DITHER IN THE OUTPUT
OF THE QUANTIZER, thus if you scale it, you don't "lose" it since you
don't lose what you never had.


I think you already understand this, I just want to restate what I have been
trying to say all along: Viewed in the time domain, dither works by
continuously and randomly modulating the quantizer at the LSB level such
that the average quantizer output over time is proportional to the original
analog value, even if the average input is between quantization levels.

Viewed in the frequency domain, because of the non-linearities involved with
discrete quantization, the signal and the dither will combine in complex
ways that depend on the proximity to a quantization level. So the spectrum
of the quantizer output will not be the same as the dithered analog version,
but since the dither is random, the new signal is randomized too. Just
different.

I think your statement that THERE IS NO DITHER IN THE OUTPUT OF THE
QUANTIZER is misleading, because the quantized output of a dithered input
has a noise floor where none existed before -- instead of absolute silence
except when a signal kicks up correlated noise, and it is infinitesimally
linear, but has random uncertainty, instead of stair-steps. Turning off
dither changes the output in measurable ways, or there would be no point in
using it. I'm sorry if my wording seemed to imply that truncating the LSB
would remove the dither. You can still see evidence of dither having been
applied, but without the critical information encoded in the LSB, it does
little or no good anymore.

Think of it this way. Let's say you properly dither the input signal x[n]
to quantizer A and the resulting output is y[n]. Then you want to gain
y[n] down by a factor of 0.48147583 = 15777 / 32768 (just for example).
The proper way to do this would be to first multiply y[n] by 15777. You
then have a 32-bit number (assuming 16-bit signals). You then must
requantize this value back to a 16-bit number, so again you dither
(at bit position 16) and requantize (by taking the most-significant
word).

The result is that the original signal (y[n]) IS STILL PRESENT. This
is because, by dithering the result of the gain, the signal components
are preserved (albeit are noisier) ala "resolution below the least
significant digit with dither."


Yes, because you never destroyed the effects of the original dither in the
first place, and then you re-dithered prior to re-quantizing. No problem
here.

I do agree with dpierce that such a scaling does not completely eliminate
the effects on the most significant bits of adding the dither. However
my
prejudice is that by doing this scaling before quantizing the amplitude
of the dither relative to the dither amplitude requirement for quantizer
X becomes insufficient to completely de-correlate the signal from the
quantization error.

So in this sense, although the dither cannot disappear completely through
scaling to a lower level it does, I think, become ineffective.

I have written software to successfully dither signals before
re-quantization and I have verified that:

(a) with TPDF dither at 2 LSB p-p WRT the quantizer this works to
completely eliminate quantization effects; and

(b) with dither at a lower level the quantization distortion is NOT
completely eliminated.

However I have not actually tried to add the right level of dither,
scale the signal to a lower level and then quantize. I will do so soon
(tonight if I have the time) and see in practice what happens - it's
only a few lines of code (!).


I can tell you right now that it won't work properly that way. The
dither level must be matched to the quantizer step.
--
% Randy Yates % "My Shangri-la has gone away, fading
like
%% Fuquay-Varina, NC % the Beatles on 'Hey Jude'"
%%% 919-577-9882 %
%%%% % 'Shangri-La', *A New World Record*, ELO
http://home.earthlink.net/~yatescr



  #40   Report Post  
Posted to rec.audio.tech
Randy Yates Randy Yates is offline
external usenet poster
 
Posts: 839
Default mathmatics behind mixing voice over music

"Karl Uppiano" writes:

You can still see evidence of dither having been applied, but
without the critical information encoded in the LSB, it does little
or no good anymore.


I think I see what you're trying to say, but you're wrong. The
fallacy in your thinking can be exposed by examining a real scenario
using actual implementation details, e.g., assuming two's complement
arithmetic.

Consider the following thought experiment: We digitize a sine
wave with peak-to-peak amplitude of 1/8 LSB and with a DC offset
of -1/4 LSB using dither into a 16-bit signed, two's complement
digital signal. Let's also assume our analog sine wave is noiseless,
just for heuristic purposes.

So if we didn't have dither, the digital signal would just be zero.
With dither, we get a very noisy sine wave. Alternately, assuming the
dither is less than or equal to 1 LSB peak-to-peak, the digital signal
resulting from the dither will be bouncing between 0 and -1, which in
16-bit signed two's complement is 0000h and FFFFh (hexadecimal).

So if we hack off the LSB, guess what? We've still got a whole lotta
variation in the signal. We didn't "lose" the effect of the dither
completely - we still have 15 bits banging back and forth.

The bottom line is that you can't view the dither as just wiggling
the LSB. It's simply not the case.

To return to our topic, the proper way to view the situation is that
the signal is, well, the signal. It is a (e.g.) 16-bit word in which
each bit is significant. And when we multiply that 16-bit word by
another 16-bit word, for example, when performing a gaining operation,
the result MUST be 32 bits in length in order to simultaneously
maintain precision and avoid overflow. So then when we convert this
32-bit result back to 16 bits, we necessarily REQUANTIZE it.

And again, just as in any quantization step, we must use some form of
linearization (e.g., dither, dither with noise-shaping, etc.) if we
want to maintain "resolution below the least-significant bit."
--
% Randy Yates % "How's life on earth?
%% Fuquay-Varina, NC % ... What is it worth?"
%%% 919-577-9882 % 'Mission (A World Record)',
%%%% % *A New World Record*, ELO
http://home.earthlink.net/~yatescr

Reply
Thread Tools
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
The Role of the New Producer/Engineer? kevindoylemusic Pro Audio 31 March 7th 05 12:47 AM
Some Recording Techniques kevindoylemusic Pro Audio 19 February 16th 05 07:54 PM
Some Mixing Techniques kevindoylemusic Pro Audio 78 February 16th 05 07:51 AM
Fwd: Research Says Music Really Does Have Charms to Soothe the Savage Breast ... and So On.... clamnebula Audio Opinions 4 October 10th 04 01:09 AM
Voluntary Collective Licensing of Music File Sharing Greg Pro Audio 11 September 1st 04 03:29 PM


All times are GMT +1. The time now is 07:13 AM.

Powered by: vBulletin
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 AudioBanter.com.
The comments are property of their posters.
 

About Us

"It's about Audio and hi-fi"