Home |
Search |
Today's Posts |
#1
Posted to rec.audio.pro
|
|||
|
|||
Stereo Raw
STEREO RAW
This article is to answer a few misconceptions about what stereo is and then to offer some analogies to correct those misconceptions. "Stereo" is a generic term that means stereophonic, a field-type system of auditory perspective using more than one channel from microphones to playback speakers. It can be the legacy two channel system for commercial recordings or any number of extra channels for surround sound or center channel or even full peripheral, including above and below. It is reproduced by loudspeakers in a room, placed in positions which are geometrically similar to those of the instruments. Stereo is differentiated from monophonic, a single channel on loudspeaker in a room, binaural, a head-related system using a dummy head for recording and either headphones or a circuit for the speakers that can isolate the channels to the respective ears with crosstalk removal. "Monaural" would be a single channel sent to one ear on headphone, "diotic" being a single channel sent to two headphones equally. In general the suffix "phonic" means on loudspeakers and "aural" means on headphones with the exception that "binaural" can be reproduced on speakers as mentioned above, in which case the term is simply loudspeaker binaural, because the signals are still isolated to each ear at the head of the listener. The first main point is that there are a few misconceptions about what stereo is, rooted in some confusion between the head-related system of binaural and the field-type system stereophonic. Some distinguished writers still believe that stereo is a "two ears, two speakers" head-related system intended to pipe the two recorded signals to the two ears, creating an illusion of a panorama of sounds as heard by the microphones. They believe that the recorded signals contain all of the spatial information necessary to decode the original sound field by means of the binaural localization mechanisms of human hearing by phase, amplitude, timing of the cues in the recording. Some believe that the system is based on the two ears and their separation and pickup patterns, their pinnae effects, Interaural Cross Correlation (IACC), and response curves, or transfer functions. Some have stated outright that they believe that the "problem" with stereo is interaural crosstalk. I hope to show that the system (stereophonic) has nothing whatsoever to do with the human hearing mechanism, the number of ears on our heads, their separation, pinnae, or response curves. We all know how stereo lateralization works, with a summing localization between the two channels, the intensity or timing differences between channels making it possible to perceive an auditory event anywhere along a line between the speakers. With coincident miking techniques where there is no timing difference between channels the summing localization is based only on amplitude differences between channels. With some separation between pickup microphones there are both intensity and timing differences. Multi-miking techniques with spot mikes picking up various parts of the sound field and pan-potted into the mix by means of intensity can also be incorporated, either as the sole method of recording or as spot mikes for certain important instruments or the human voice for soloists or small groups. So where does this alleged confusion between binaural and stereophonic come from? It could be from the innocent presumption that the use of two speakers for playback has something to do with our having two ears, which in turn may have arisen from the Blumlein patent and method of recording with two coincident mikes. But meanwhile, at around the same time in the Bell Labs, researchers were experimenting with multiple channels and placing speakers on the repro stage similar to the positioning of the instruments and microphones used for pickup. One of their ideal but impractical methods of reproduction was called the "curtain of sound" in which a line of many microphones might record the performance and a similar line of speakers would reproduce it on another stage, or playback space. They defined binaural as a head-related system and stereophonic as a field-type system, in which the idea is to place many speakers on a sound stage and reconstruct the sound fields that existed in the original. Binaural, on the other hand, was always and only a two channel head-related system based on the human hearing mechanism and recorded with a dummy head, the idea being that the headphones would introduce to the ears the identical signals that the dummy head heard at the recording site. William Snow remarked that the binaural system brought the listener to the original performance location, whereas the stereophonic system brought the performers into our own listening rooms. Bell Labs ended up with their recommendation for a three channel system, but practical limitations caused it to be limited to the two channel system that we know today. So what is the major difference between a head-related and a field-type system? There are two fundamentally different ways to reproduce a sensory experience. You can reproduce the sensory input directly, such as with binaural, or you can reproduce the object itself, the sound fields produced by the orchestra and let the subject's own sensory apparatus pick it up in the normal way, just as it does with live sound. The sensory input system depends heavily on attempting to pick up the sounds in the same way that our own ears do, such as with the use of a dummy head shaped like our heads and with number of ears and ear spacing and pinnae as much like ours as feasible. But the stereophonic system has nothing whatsoever to do with the number of ears on our heads, the spacing between them, their pinnae effects, or their frequency response (transfer functions), and the whole recording and reproduction process can be accomplished without any knowledge or consideration of those factors - NONE. Compare it to the difference between sculpture and 3D photography. If we want to reproduce the image of an elephant, we could do it one of two ways. We could either take a 3D photograph in color and introduce both halves of the image into our eyes, or we could hire a sculptor to make a very real 3 dimensional model of an elephant, even to the point of being life sized and placed in a background such that we could walk all around it and each of us perceive it with our native vision mechanism, the whole process accomplished with NO knowledge of the human vision mechanism. In fact, all beings who can see in three dimensions etc, such as the animals or visitors from another planet, all would behold the same model in the same way as they did live, even with no knowledge of how they see, hear, or anything else, if we did the reproduction as a model of the real thing rather than a direct sensory input. I hope to show that the system of stereophonic sound depends ONLY on our knowledge and study of sound fields in rooms, and not upon knowledge of the human hearing mechanism, except for the very fortunate psychoacoustic fact of the summing localization being able to permit the simplification of the number of channels to fewer than the number of instruments being reproduced. The raw, base example for purposes of illustration would be a team of researchers wishing to begin exploring systems of auditory perspective to explore the field-type system. They go into the recording studio with a battery of microphones and multi-channel recorder. They close-mike each instrument but also including a small amount of the reverberance from the studio as would be heard near the instrument. Some instruments such as the piano or drum set might call for more than one mike to capture the extent of the drum kit or the width of the piano. On playback, we select a good sounding playback space and place the speakers, possibly selected for a radiation pattern similar to their instruments, in positions in the room that are geometrically similar to the original. We now have a "they are here" system if no reverberance was recorded, or modified a touch by the original hall sound if some was recorded. Notice also that if some was recorded, and if we use a llittle of the natural reflecting surfaces around the speakers in the same way that the original hall's walls did it, the reflected sound from instruments on the right side would reflect from the right wall of the playback room etc, but the instruments themselves would remain anchored where they belong by means of the precedence effect. In total, this "model" of the original sound would be 3 dimensional, having depth and width and appropriate ambience behind and around, and you could literally walk all around the model and hear it from various angles from anywhere in the room. This is the raw model for the stereophonic system. I would first point out that the whole process was accomplished with NO knowledge or reference to the human hearing mechanism and would be the same to all listeners, each one hearing the model with his or her own hearing system. It was recorded and reproduced with knowledge ONLY of sound fields in rooms, reconstructing them in the new space as a model of the original. I would then point out that this ideal system could be simplified down to fewerand fewer channels for a more practical system without losing too much, if we could only remember what it is that we are doing with the system and not lose sight of the fact that it is a field-type system, a literal reconstruction, or model, of the original, not a binaural system. We first reduce the number of channels to as few as two, thanks to the summing localization being able to place all of the instruments anywhere along a line between the two speakers. We can then pull the speakers out from the walls and place them with some geometrical similarity to the original left and right positions of the orchestra. Finally, we can customize the radiation patterns of the speakers to a lower direct to reflected ratio because of our closeness to the speakers, relative to our original distance from the orchestra. If we now treat the walls so that we might get some of the reflected sound from the recording bouncing from the left, center, and right walls of the listening room, we stand a chance of having the various recordings make our playback rooms take on most of the important characteristics of the original acoustics. Finally, so what? The answer is that this is a radical change in thinking about how the process works, from a two ears/ two speakers process achieved with the direct sound output from two speakers to a 3 dimensional model of a typical original sound field, a reconstruction of all aspects of the original within the listening room rather than a direct sensory input from the speakers to your ears. The paradigm to be sought is now sound fields in rooms rather than the "accuracy" of getting the signals intact from the speakers to your ears. The new model requires paying attention to the radiation patterns, room positioning, and acoustical qualities of the whole playback system. In-wall speakers, nearfield speakers, dead rooms, highly focused sound from the speakers, all must be re-examined in light of the new theory. The total acoustical situation that we are hearing when our ears are free to hear it without any attempt to isolate the channels at the ears or from the room can be described visually as the image model of the fields in the room, whether it be the original concert hall or the playback room. What we are hearing is the total acoustical situation, direct, early reflected, and late reflected reverberant sound. All of these must be reproduced, which is to say reconstructed in front of us, or else it will sound different from the original. The preferred solution would be surround sound, but in any case the sound patterns within the room must be honored and the goal changed to realism rather than accuracy. We are not "doing" accuracy with stereophonic recording, unless you want to hear the piano from underneath the lid, the singer from a foot in front of her tonsils, or the perspective from 9 feet above the head of the conductor. Rather, we are seeking realism as will be displayed in the final result by the placement of the microphones and speakers to display the sounds from a distance from us in the listening room, with signal processing or extra channels all around us, and NOT from the perspectives of any particular microphones. Gary Eickmeier |
Thread Tools | |
Display Modes | |
|
|