View Single Post
  #29   Report Post  
Posted to rec.audio.pro
Take Vos Take Vos is offline
external usenet poster
 
Posts: 50
Default digital photography vs. digital audio

Hi,

I've formed some analogies:
linearity of audio recorder= lens quality
frequency range of recorder=color accuracy
bit depth of recorder= pixel count

Any truth to that?

Sort off, and, not really.
There are actually a few more dimensions and frequencies domains
involved than one would expect with video.

Lets start with a single pixel of a grey scale camera.
- audio sample rate = video framerate
The value of a video pixel is read say around 24 frames per
second.
- audio bit depth = video bit depth
The value of a video pixel is read using a 10, 12, 14 or 16
bit A/D converter
- audio gain = pixel gain
an amplifier is used to increase the signal strength of a
pixel before entering the A/D converter, there is also a bias
component this is to seed a pixel with some more signal so that the
linearity improves.
- audio noise = pixel noise
A pixel includes noise from the amplifiers, quantization noise
from the A/D converter, and electrical noise picked up by the sensor.
- diagraph (how do you write that) size = pixel well size.
A larger pixel can receive more photons and with it reduce the
signal noise.
- high notes aliasing = high object movement aliasing.
It is not really called aliasing in the video world, but the
effect is nevertheless the same, objects that move fast are not seen
by a pixel, only object that move slower than the frame rate will be
seen.

As you see every effect you would have for audio from a single
microhpone, would correspond to a single grey scale picture of a video
camera.

Now imagine you have lots of microphones put in a grid, maybe you want
to make a wall of sound, the output of which would be to a speaker
array, think surround sound taken to the extreme. This is very similar
to a video sensor.

- each microphone is different = each pixel is different
You have to know the bias and sensitivity for each pixel and
compensate for it, otherwise it would show as static noise.

Now, if you have a lot of microphones you could have a single A/D
converter for each, but that may not be cost effective, for video at
least it is not. Instead each pixel has a sample and hold circuity,
and each pixel is quickly read one by one by a single A/D converter
through the same amplifier. Larger image sensors divide the image in
two or four regions each with a amplifier and A/D converter.

Then there is the lens, you could use an array of directional
microphones pointing them to a slightly different direction. you can
theoretically do the same with an image sensor, increase the well
depth of each pixel and point each well to a slightly different
direction. The problem is the amount of photons that will be exactly
parallel to a well is very low, so you need a lot of light, just like
a long microphone will receive a lot less sound. Instead we use a lens
to capture a lot of photons and point the photons to the correct pixel
well.

In this array you also have aliasing, object that are seen by one
pixel and not the other, the keep high resolution you want to use a
optical filter in front of the sensor, that scatters the photons
randomly in multiple pixel wells.

Then there is color, you take a grey scale camera, and put a colored
filter in front of each pixel, so one pixel is only used for red, and
other for green and yet an other for blue. It is like putting
different sound absorbing materials in front of each microphone in the
array, so that each responds to a low, mid or high frequency range
(this is not a perfect analogy).

In the video world they are running in to the same issues as the audio
world has with low bit depths. Current cameras have quite a low
contrast range and storing each pixel value into 8 bits does not help.
It is like using a compressor/limiter on all the pixels to fit in such
a low range of values. Luckily this is changing and we are now going
into an era that will be using HDR (High dynamic range) imaging, the
edit applications are using floating point, cameras are outputting
high bit depth and found algorithms to increase the psychological
dynamic range of a display (which are still only 8 bit).

Ok, this is already a long post and I will shut up now.

Cheers,
Take Vos
analogy boy