AudioBanter.com - View Single Post

Woof! EXCELlent response! Way to go.\

Pop

"Dick Pierce" wrote in message
om...
| "datanet" wrote in message
...
| Each WAV file is saved with certain parameters.
| The following is a typical example of such parameters ...
|
| PCM, 44.1kHz, 16bit, stereo, 172 kbps
|
| I know what the "44.1kHz" (sampling rate) means,
| and what the "172kbps" (bitrate) means.
|
| But what does the "16bit" refer to?
|
| Is there a FAQ that explains these items?
| Hmm, good reply, hel. Only your forgot 16 bits: That's the
| width of the data stream, meaning 16 bits at a time is
| transmitted in the data stream. Or, ultra-simply, a
combination
| of 16 zeroes and ones of data at a time is transmitted around
the
| system. The wider the virtual or literal buss (bit rate),
the
| faster things are internally and the less problem with
| bottlenecks. 32 bit is twice as fast as 16 bit (well, about,
| anyway, neglecting some added control bits) and so on.
|
| Ultra simple indeed but, unfortunately, ultra wrong. The 16
bits
| cited has NOTHING WHATSOEVER to do with the "width of the data
| stream." All the hoo-ha above about "vritual or literal buss"
and
| 16 vs 32 bits and bottlenecks is utterly irrelevant to the
question
| at hand.
|
| It is simply declaring that each audio sample will have
| a precision of 16 bits.
|
| Now, to answer the original question definitively: A WAVE file
| is one particular kind of RIFF (Resource Interchange File
Format)
| file. A RIFF file is composed of individual "chunks" of data,
each
| chunk is dientified by a 4-character chunk identifier and a 32
bit
| chunk size (in bytes).
|
| A WAVE file MUST have at least 2 such chunks to be valid:
|
| 1. The 'fmt ' chunk: this holds the parameters describing the
| properties of the audio data, and, in fact, it is the
contents
| of the 'fmt ' chunk that you are seeing. The 'fmt ' chunk is
| broken down as follows:
|
| wFormatTag - a 16-bit integer holding a format indicator for
| the file. SOme examples of which a
|
| 1 - PCM
| 2 - ADPCM
| 6 - A-Law
| 7 - u-Law
| 48 - Dolbay AC3
| 80 - MPEG
|
| nChannels - 16 bint integer of the number of audio channels
| in the file
|
| nSamplesPerSecond - 32 bit unsigned integer of the sample
rate
| of the audio
|
| nAvgBytesPerSecond - 32 bit unsigned integer of the average
| buffer rate for the audio.
|
| nBlockAlign - block alignment (in bytes) for the waveform
| data.
|
| And, if the wFormatTag is 1 (PCM), the fmt chunk is extended
to
| include:
|
| nBitsPerSample - number of bits per sample.
|
| If it is another format, the chunk extension will include
relevant
| formation for that format.
|
| 2. the 'data' chunk. This actually hold the audio data samples,
as
| described in the 'fmt ' chunk.
|
| There may well be other chuncks as well. FOr example, EBU Tech
3285
| describes the so-called "Broadcast Wave Format" extensions,
while
| AES-46 describes additional chunk information for use in radio
| applications.
|
| So your are, in essence, seeing a human readable "dump" of the
| contents
| of the 'fmt ' chunk. Your reading:
|
| "PCM, 44.1kHz, 16bit, stereo, 172 kbps"
|
| is directly mappable as follows:
|
| wFormatTag = 1 "PCM"
| nChannels = 2 "Stereo"
| nSamplesPerSecond = 44100 "44.1 kHz"
| nAvgBytesPerSecond = 176400 "172 kbps" (a typo, perhaps?)
| nBlockAlign = 4
| wBitsPerSample = 16 "16 bits"
|
| Now, nBlockALign is calculated as follows: each sample holds
| 16 bits. A byte is 8 bits, so 2 bytes are required to hold
| a single sample. The file is in stereo, so two samples are
| required (one for each channel) at each sample block, thus
| 2 (bytes/channel) * 2 channels = 4 bytes.
|
| Then, the nAvgBytesPerSecond is (in the case of PCM), simply
| the number of bytes per sample block times the number of sample
| blocks per second, 44,100 * 4 = 176,400. Now, they used the
term
| "kilo" in the wrong sense here, they used the disk convention
of
| 1024 as opposed to the normal 1000, thus 176,400/1024 = 172.26
| kiloBYTES per second.
|
| Now, the "16 bits" describes, as I mentioned, the width of each
| sample word, or its "precision." A 16 bit sample can represent,
| on a per-sample basis, one of 2^16 or some 65,536 unique
values.
| You can also view this as describing how wide a dynamic range
| that a sample can represent. In the case of 16 bits, this is a
| dynamic range of around 96 dB, that is, the difference between
| the lowest unambiguous value and the highest that can be
represented
| by a 16 bit sample stream is 96 dB.
|
| But, please, ignore the above nonsense about "virtual busses"
and
| "bottlenecks," it's a load of irrelevant hooey.