Home |
Search |
Today's Posts |
#41
Posted to rec.audio.high-end
|
|||
|
|||
Do blind tests need being "calibrated" ?
On Jun 26, 10:50=A0am, ScottW wrote:
On Jun 24, 6:35=A0am, Scott wrote: On Jun 23, 7:44=3DA0pm, wrote: On Jun 22, 3:03=3D3DA0pm, Scott wrote: On Jun 18, 7:07=3D3D3DA0am, wrote: My question is, is there a real need for calibration or is this j= ust =3D a demand of audiophiles because the test came up with a negative re= sult=3D ? Of course it does otherwise you have no way of gauging the sensitiv= ity of the test. Anull with no calibration has too many variables. was = the test sensitive to audible differences? Were the subjects sensitive = to audible differences? No way to know is there? Assuming that there was a need for calibration. The M&M test was abou= t the audibility of "bottlenecking" a hi-rez signal. What calibration signal would one use other than a bottlenecked signal (one would have to otherwise some audiophiles would claim that the calibration signal was not adapted for its intended purpose), so for calibration one would use the signal that is going to be tested. The same signals one would use to test any set up for sensitivity to audible differences. =A0Interesting proposal. =A0Could you define the categories of audible differences? I could try but really I think this would be a good question for JJ who has done extensive tests for various thresholds of human hearing. Let's just take a relatively simple one like Frequency Response and examine it in with just bit of speculative detail. =A0I suppose single tone amplitude would be obvious with humans having different sensitivity to amplitude difference at different frequencies. That sensitivity to amplitude difference also changes with amplitude. You can't hear difference in amplitude between two signals both of which you can't hear, nor do I suppose you can hear the difference between two signals of different amplitude when both of them are sufficient to make your ears bleed. Add a second single frequency fixed amplitude masking tone. Measure sensitivity to variable amplitude tones. That will vary with freqency of masking tone, amplitude of masking, frequency of the tone we're measuring sensitivity to amplitude changes of, and amplitude of the tone we're meausring sensitivity to amplitude changes of and probably of few other interactions I've failed to mention. Hmm...try to matrix that. =A0How many signal conditions have we constructed? Let's see....1hz freq resolution, 0.5 db amplitude resolution from threshold to pain.. times all the interactions... Well, I'm going out on a limb to guess this signal matrix will be quite large. Now, How do these sensitivity tests with test signals correlate to real music signals used in ABX tests? All very fair questions. Again I would defer to JJ who routinely did such tests throughout his career and did so in many cases in examination of human thresholds of hearing. Wait a second, we're now back to the beginning of trying to determine humans ability to determine audible differences with different music signals on a system. Seems like we'd need to know that to begin to calibrate with "test signals". Such is the value of a large body of data. |
#42
Posted to rec.audio.high-end
|
|||
|
|||
Do blind tests need being "calibrated" ?
On Jun 26, 11:24=A0am, "Arny Krueger" wrote:
"Scott" wrote in message On Jun 24, 12:35 pm, Audio Empire wrote: But does that mean that they "can't" hear or that there are simply no differences TO hear? Without testing for sensitivity the answer to your question is yes. The outcome of any test you run with a negative result, can be interprete= d as follows: It means that they either "can't" hear or there are simply differences to hear or they simply can't discriminate those under that particular test. That includes any so-called sensitivity tests. Sensitivity tests without positive results would either indicate incomplete sensitivity tests or a complete lack of sensitivity for audible differences. Let's review the current situation. There is no "current situation" to review in regards to my assertions about ABX DBTs needing to be calibrated for sensititvity. It is an assertion about ABX DBTs in general. It is easy enough to screw up such a test. This dodges addressing the current situation where thousands of listening tests have been run to show an audible difference due to excess sample ra= tes with no known positive outcomes when appropriate experimental controls we= re in place. No, it doen't dodge anything. It is a basic truism about ABX DBTs and does not make any reference to any specific tests. Just continue to test on ABX with differences that are near the threshold of audibility way beyond the threshold of listener fatigue and you will likely get a false negative. Since you specifically mention ABX, are you saying that there is no such thing as listener fatique in sighted evaluations? No Or are you saying that some other methodology, such as ABC/hr is not at least equally fatiguing? No. Are you asserting that all of the thousands of failed tests were all due = to listener fatique? No. Or, are just just dragging out an old, tired red herring? What red herring Arny? My assertion is about ABX DBTs in general. The only red herrings I see are the ones you just tried to drag out. |
#43
Posted to rec.audio.high-end
|
|||
|
|||
Do blind tests need being "calibrated" ?
On 6/26/2010 8:45 PM, Scott wrote:
On Jun 26, 11:24=A0am, "Arny wrote: wrote in message snip It means that they either "can't" hear or there are simply differences to hear or they simply can't discriminate those under that particular test. That includes any so-called sensitivity tests. Sensitivity tests without positive results would either indicate incomplete sensitivity tests or a complete lack of sensitivity for audible differences. Sensitivity to what precisely? You cannot verify sensitivity to unknown parameters. You can't generate test signals that effectively mimic uncharacterized differences between presentations. Look at it another way, if you could characterize the differences, and generate a representative test signal with which to "calibrate" the listeners, then the actual ABX test would be moot relative to the specific difference in question - the answer would be known based on the sensitivity test. And *that* would tell you precisely nothing, as none of the masking effects present an actual musical ABX test would be present. If they are present, then the "sensitivity" test and the ABX test are identical, and you're back to square one. And let's not give short shrift to the "not everything can be measured" crowd; that position precludes even the possibility of a sensitivity test, as you cannot ever generate a test signal, nor can you quantify any results related to the test. Let's review the current situation. There is no "current situation" to review in regards to my assertions about ABX DBTs needing to be calibrated for sensititvity. It is an assertion about ABX DBTs in general. It is easy enough to screw up such a test. This dodges addressing the current situation where thousands of listening tests have been run to show an audible difference due to excess sample ra= tes with no known positive outcomes when appropriate experimental controls we= re in place. No, it doen't dodge anything. It is a basic truism about ABX DBTs and does not make any reference to any specific tests. No, it's not true, let alone a truism, about ABX DBT's. ABX has been shown many times as capable of detecting audible differences when they exist. It's been used many times to determine that excessive sample rates don't result in audible differences, using the same methodology and bias controls. That test base testifies to the precision of the method which is the only type of "calibration" that is relevant to this type of *difference discernment* test. And once again, one would need to show how your supposed "calibration deficit" would in fact have relevance, were it to exist, in the context of actually applying ABX methods. For example, in situations like Arny referred to above, where the only countervailing "evidence" is gathered via listening tests that have the same "problems" (be they calibration, fatigue, whatever) as is claimed for ABX, *plus* additional uncontrolled error sources as well, an ABX test confirming the results expected, based upon engineering and psychoacoustic knowledge, have no requirement for accuracy (which would require some calibration activity), as they are not *quantifying* anything. They require only establishment of precision, and that's provided by a database of many test/subject replicates. Just continue to test on ABX with differences that are near the threshold of audibility way beyond the threshold of listener fatigue and you will likely get a false negative. And so? Put earmuffs on the subjects and you'll get the same false negative. There are endless ways to screw up a test, how is that relevant to the discussion here? Failure to "calibrate" a method that does not seek to quantify anything is *not* one of those failure modes. Keith Hughes |
#44
Posted to rec.audio.high-end
|
|||
|
|||
Do blind tests need being "calibrated" ?
On Jun 27, 10:41=A0am, KH wrote:
On 6/26/2010 8:45 PM, Scott wrote: On Jun 26, 11:24=3DA0am, "Arny Kruege= =A0wrote: =A0wrote in message snip It means that they either "can't" hear or there are simply differences to hear or they simply can't discriminate those under that particular test. That includes any so-called sensitivity tests. Sensitivity tests without positive results would either indicate incomplete sensitivity tests or a complete lack of sensitivity for audible differences. Sensitivity to what precisely? Actual audible differences. =A0You cannot verify sensitivity to unknown parameters. What unknown parameters are you talking about? We got a pretty good idea what the parameters are of the thresholds of human hearing. =A0You can't generate test signals that effectively mimic uncharacterized differences between presentations. How are known audible differences uncharacterized? =A0Look at it another way, if you could characterize the differences, and generate a representative test signal with which to "calibrate" the listeners, then the actual ABX test would be moot relative to the specific difference in question - the answer would be known based on the sensitivity test. That would be true if the claim under test were already known to be audibly different. think about it. =A0And *that* would tell you precisely nothing, as none of the masking effects present an actual musical ABX test would be present. If they are present, then the "sensitivity" test and the ABX test are identical, and you're back to square one. No. The question is can a given ABX test setup reveal actual audible differences. The answer is yes once you show it can do so. Until then the answer is maybe. And let's not give short shrift to the "not everything can be measured" crowd; that position precludes even the possibility of a sensitivity test, as you cannot ever generate a test signal, nor can you quantify any results related to the test. Why on earth would we want to address that crowd? How would that help make ABX DBTs better? Let's review the current situation. There is no "current situation" to review in regards to my assertions about ABX DBTs needing to be calibrated for sensititvity. It is an assertion about ABX DBTs in general. It is easy enough to screw up such a test. This dodges addressing the current situation where thousands of listen= ing tests have been run to show an audible difference due to excess sample= ra=3D tes with no known positive outcomes when appropriate experimental controls= we=3D re in place. No, it doen't dodge anything. It is a basic truism about ABX DBTs and does not make any reference to any specific tests. No, it's not true, let alone a truism, about ABX DBT's. =A0ABX has been shown many times as capable of detecting audible differences when they exist. Really? How have ABX tests that never wrought any positives been shown to be capable of detecting audible differences without some sort of check for test sensitivity? =A0It's been used many times to determine that excessive sample rates don't result in audible differences, using the same methodology and bias controls. =A0That test base testifies to the precision of the method which is the only type of "calibration" that is relevant to this type of *difference discernment* test. In the middle ages putting a suspected heretic's hand in boiling water was also used many times and the results were every bit as consistant. Doesn't mean it actually worked. And once again, one would need to show how your supposed "calibration deficit" would in fact have relevance, were it to exist, in the context of actually applying ABX methods. Can't say I am buying that. The burden of rigor is on those who do such tests. No one needs to show these folks the need for due rigor. =A0For example, in situations like Arny referred to above, where the only countervailing "evidence" is gathered via listening tests that have the same "problems" (be they calibration, fatigue, whatever) as is claimed for ABX, *plus* additional uncontrolled error sources as well, an ABX test confirming the results expected, based upon engineering and psychoacoustic knowledge, have no requirement for accuracy (which would require some calibration activity), as they are not *quantifying* anything. =A0They require only establishment of precision, and that's provided by a database of many test/subject replicates. Not buying that. Consistant results is not proof per se that results are accurate. Just continue to test on ABX with differences that are near the threshold of audibility way beyond the threshold of listener fatigue and you will likely get a false negative. And so? Put earmuffs on the subjects and you'll get the same false negative. =A0There are endless ways to screw up a test, how is that relevant to the discussion here? =A0Failure to "calibrate" a method that does not seek to quantify anything is *not* one of those failure modes. Unfortunately you snipped the context of my assertion which simply was there were many ways to get a false negative and that the causes of false negatives and false positives tended to be different. It was not claimed that listener fatigue had any direct corolation to testing for test sensitivity. It's just another one particular way to desentize a test. If a test lacks due sensitivity it may wrought false negative results. There are indeed "endless ways to screw up a test" and checking for sensitivity can actually prevent any number of those endless ways from creeping in and corrupting a given test. Perhaps you don't think such a precaution is a good idea. i think it is. |
#45
Posted to rec.audio.high-end
|
|||
|
|||
Do blind tests need being "calibrated" ?
My question was: What calibration signal would one use other than a
bottlenecked signal? On Jun 24, 2:35 pm, Scott answered: The same signals one would use to test any set up for sensitivity to audible differences. True, nor do you "know" that any given test will reveal audible differences should there be audible differences. Given the body of knowledge on the thresholds of human hearing that aspect of any given ABX DBT can be gauged before conducting any further ABX DBTs. How on earth would it ever be anything but a good idea to do so? Are you saying that one could use any known audible difference to see whether the test is capable of judging the audibility of bottlenecking? So I could use, say, audible phase shift, IM distortion, or group delay? If ABX yields positive results you have shown that ABX is sensitive to phase shift, IM distortion, or group delay, but nothing more. It won=92t tell you that it will be sensitive to bottlenecking. Further, thresholds of human hearing are one thing, differences between CD-players, amplifiers or, in this case, different formats, another. Either you hear a difference or you don=92t, there is no such as thing as threshold of perception of that difference. Therefore, you cannot use any arbitrary signal to calibrate the bottleneck test. You have to use a bottlenecked signal, so the calibration step would constitute the very test you want to calibrate. In the end the AES is just a group of people with it's own baggage. Show me one published scientific researcher who would suggest checking a DBT for sensitivity is anything other than a good idea. You want names, here are some: Ted Grusec (Canadian department of communication) Soren Bech (Bang & Olufsen) W. H. Schmidt (University for technology and economy, Berlin) Kaoru Watanabe (NHK) Stan Lip****z (Waterloo University) Kaoru Ashihara (National Institute of Advanced Industrial Science and Technology) W.A. Munson ((Bell Labs) W.A. Rosenblith (MIT) J. Hillenbrand (Dep. of speech pathology and audiology, Western Michigan University) D. Kewley-Port (Speech research lab., Indiana University) Ian B. Thomas (Speech communications lab., U. of Massachusetts) Not only AES, also JASA peer reviewers seem to have no objections against non-calibrated blind tests. Now it=92s your turn: name published scientists who do think that blind tests need being calibrated, after all, you made the claim that they do! Klaus |
#46
Posted to rec.audio.high-end
|
|||
|
|||
Do blind tests need being "calibrated" ?
"Scott" wrote in message
... No. The question is can a given ABX test setup reveal actual audible differences. The answer is yes once you show it can do so. Until then the answer is maybe. Yet another example of anti-ABX bias by raising a general question about listening tests in such a way that it appears that only ABX tests are affected. Here's a corrected version: The question is can a given listening test setup reveal actual audible differences. The answer is yes once you show it can do so. Until then the answer is maybe. Given their built-in bias towards false positive results, most audiophile listening evaluations cannot be properly called tests. They are useless for determining actual audible differences unless the differences are relatively gross. |
#47
Posted to rec.audio.high-end
|
|||
|
|||
Do blind tests need being "calibrated" ?
Klaus mentioned:
Further, thresholds of human hearing are one thing, differences between CD-players, amplifiers or, in this case, different formats, another. E ither you hear a difference or you don=92t, there is no such as thing as threshold of perception of that difference. Therefore, you cannot use any arbitrary signal to calibrate the bottleneck test. You have to use a bottlenecked signal, so the calibration step would constitute the very test you want to calibrate. This is all rhetorical tap dancing. In a listening alone context thresholds of hearing perceptions of difference is all that counts. All difference in a hi fi bit of gear is electrical as to signal differences. All signal differences have by experiment a threshold by which difference can be percieved. Distortion will serve. Any two bits of gear can be made to sound different if one causes one to produce enough difference in distortion then the other. As one lowers that difference a threshold is reached beyond which no difference can be percieved but by measurement it still exists. If one claims that cd player x but not y sounds "sweeter" then one is making reference to electrical properties one thinks result in that difference. Controlled listening alone testing will soon resolve this perception claim. If no difference beyond guessing can be identified then the difference lies in ones brain's subjective perception producing process not the electrical threshold potential of the gear. The perception toggle"sweeter" on and off as one knows or not which bit of gear is active confirms all. As for source of signal, it is irrelevant. If electrical differences are thought to produce a "sweeter" perception then choose a signal source one claims to have experienced it while listening. The initial claim is the "calibration" by listening alone, as will be the follow up listening alone test of perception difference. |
#48
Posted to rec.audio.high-end
|
|||
|
|||
Do blind tests need being "calibrated" ?
On Jun 29, 3:44 pm, wrote:
Klaus mentioned: Further, thresholds of human hearing are one thing, differences between CD-players, amplifiers or, in this case, different formats, another. E ither you hear a difference or you don=92t, there is no such as thing as threshold of perception of that difference. Therefore, you cannot use any arbitrary signal to calibrate the bottleneck test. You have to use a bottlenecked signal, so the calibration step would constitute the very test you want to calibrate. This is all rhetorical tap dancing. In a listening alone context thresholds of hearing perceptions of difference is all that counts. If you take stuff like distortion, you start with zero % and increase the amount until you perceive it: threshold found. In the Meyer/Moran study I referred to, they converted the SACD signal to 16 bit/44.1 kHz and looked whether or not one could hear the difference. Please care to explain how in this particular case a threshold of perception can exist? The amount of which parameter do you increase until you find this threshold? The same is valid when you compare two pieces of gear: the amount of which parameter do you increase to find a threshold? Klaus |
#49
Posted to rec.audio.high-end
|
|||
|
|||
Do blind tests need being "calibrated" ?
Observed:
This is all rhetorical tap dancing. In a listening alone context thresholds of hearing perceptions of difference is all that counts. Klaus responded: If you take stuff like distortion, you start with zero % and increase the amount until you perceive it: threshold found. In the Meyer/Moran study I referred to, they converted the SACD signal to 16 bit/44.1 kHz and looked whether or not one could hear the difference. Please care to explain how in this particular case a threshold of perception can exist? The amount of which parameter do you increase until you find this threshold? The same is valid when you compare two pieces of gear: the amount of which parameter do you increase to find a threshold? You are focused on the wrong question. In psychoacoustics we might want to know the source of what difference produces what threshold. Or in gear design one might be interested to know what signal source produces what threshold so as to make the gear produce as desired. In listening alone testing frankly we don't care and dwelling on your question is only a diversion. If cd player x is said to be "sweeter" then y, we are to test the claimed difference not to discover what electrical source might have produced it. In fact if when listening alone testing establishes that no difference can be reliably spotted, any effort to answer your question is moot. Such a test result only adds to the body of research showing such percieved differences do not exist in electtrical difference but are generated in the brain post reception of the signal at the ears. That percieved difference responding to non acoustical/electrical perception stimuli when the gear being tested is known. |
#50
Posted to rec.audio.high-end
|
|||
|
|||
Do blind tests need being "calibrated" ?
wrote in message
... On Jun 29, 3:44 pm, wrote: Klaus mentioned: Further, thresholds of human hearing are one thing, differences between CD-players, amplifiers or, in this case, different formats, another. E ither you hear a difference or you don=92t, there is no such as thing as threshold of perception of that difference. Therefore, you cannot use any arbitrary signal to calibrate the bottleneck test. You have to use a bottlenecked signal, so the calibration step would constitute the very test you want to calibrate. This is all rhetorical tap dancing. In a listening alone context thresholds of hearing perceptions of difference is all that counts. If you take stuff like distortion, you start with zero % and increase the amount until you perceive it: threshold found. In the Meyer/Moran study I referred to, they converted the SACD signal to 16 bit/44.1 kHz and looked whether or not one could hear the difference. Please care to explain how in this particular case a threshold of perception can exist? The amount of which parameter do you increase until you find this threshold? I've done extensive tests with other musical program material that was recorded at 24/96 under near-lab conditions in the interest of high actual recorded dynamic range and extended bandwidth. For example, some of my recordings had about 90 dB dynamic range, as compared to an exceptionally good commercial recording's dynamic range which is in the area of 75 dB. I've then reduced this signal's dynamic range and bandwidth progressively until I could find even one listener (of many tried) who could hear a difference. I removed one bit of resolution at a time by rudely stripping off bits without dither. I removed bandwidth by means of downsampling using brick wall filtering. I found that resolution reduction became reliably audible with 14 bits resolution, and that bandwidth reduction became reliably audible with 16 KHz bandwidth. Resolution reduction to 15 bits and 19 KHz bandwidth reduction was undetectible. The same is valid when you compare two pieces of gear: the amount of which parameter do you increase to find a threshold? One can increase the noise, response errors and distortion of audio gear progressivly and in ways that simply extend the inaccuracies of the particular piece of gear, by passing audio signals through that gear again and a again. I found that for example passing demanding musical signals (see above) through a power amplifier can become audible (due to small HF frequency response losses) after about 5 passes. Other types of equipment, such as the converters that I used in all of these experiements could pass a signal in excess of 20 times without an reliably detectable audible change. |
#51
Posted to rec.audio.high-end
|
|||
|
|||
Do blind tests need being "calibrated" ?
On Jun 29, 4:46=A0am, wrote:
You want names, here are some: Ted Grusec (Canadian department of communication) Soren Bech (Bang & Olufsen) W. H. Schmidt (University for technology and economy, Berlin) Kaoru Watanabe (NHK) Stan Lip****z (Waterloo University) Kaoru Ashihara (National Institute of Advanced Industrial Science and Technology) W.A. Munson ((Bell Labs) W.A. Rosenblith (MIT) J. Hillenbrand (Dep. of speech pathology and audiology, Western Michigan University) D. Kewley-Port (Speech research lab., Indiana University) Ian B. Thomas (Speech communications lab., U. of Massachusetts) That certainly *is* a list of names..... Not only AES, also JASA peer reviewers seem to have no objections against non-calibrated blind tests. "seem?" Now it=3D92s your turn: name published scientists who do think that blind tests need being calibrated, after all, you made the claim that they do! Since you are not limiting it to audio this could turn into one of those lists like the long list of scientists who believe in evolution named Steve. You will find a lot of discussion on calibration in medical DBTs. |
#52
Posted to rec.audio.high-end
|
|||
|
|||
Do blind tests need being "calibrated" ?
On Jun 29, 7:44=A0am, "Arny Krueger" wrote:
"Scott" wrote in message ... No. The question is can a given ABX test setup reveal actual audible differences. The answer is yes once you show it can do so. Until then the answer is maybe. Yet another example of anti-ABX bias by raising a general question about listening tests in such a way that it appears that only ABX tests are affected. How is that an example of anti ABX? I could just as easily say its an example of pro ABX. Where is the bias? Here's a corrected version: The question is can a given listening test setup reveal actual audible differences. The answer is yes once you show it can do so. Until then the answer is maybe. That is just a different version not a corrected version. The question works just as well for listening tests as a general catagory or ABX as a specific subset. |
#53
Posted to rec.audio.high-end
|
|||
|
|||
Do blind tests need being "calibrated" ?
"Scott" wrote in message
... On Jun 29, 7:44=A0am, "Arny Krueger" wrote: "Scott" wrote in message ... No. The question is can a given ABX test setup reveal actual audible differences. The answer is yes once you show it can do so. Until then the answer is maybe. Yet another example of anti-ABX bias by raising a general question about listening tests in such a way that it appears that only ABX tests are affected. How is that an example of anti ABX? It makes it look like only ABX tests have any problems. I could just as easily say its an example of pro ABX. Thats a problem - you say things like this so easily and glibly. Where is the bias? The fact that it makes a general problem look like it only applies to ABX. Here's a corrected version: The question is can a given listening test setup reveal actual audible differences. The answer is yes once you show it can do so. Until then the answer is maybe. That is just a different version not a corrected version. No, it corrects the obvious writer's misapprehension that only ABX tests have the problem that is being pointed out. The question works just as well for listening tests as a general catagory or ABX as a specific subset. That's exactly right, so explain why you made a point of picking on ABX tests, and didn't phrase it so it applied to listening tests in general? |
#54
Posted to rec.audio.high-end
|
|||
|
|||
Do blind tests need being "calibrated" ?
Scott observed:
named Steve. You will find a lot of discussion on calibration in medical DBTs. In a typical medical test two or more groups are given different medical treatments and the results measured against a calibrated measure of some body function or presence of some substance. In listening alone tests this is meaningless except for calibration of both arms of the test gear lineup. Also of statistical methods used having validity for the test used. We want to know if the difference statement "cd player x sounds sweeter then y" has its source in the signal reaching the ears or after it in the brain. In which case if the reproduction gear has been calibrated as to making all things but the bit of gear being tested as equal as possible, nothing more is required in terms of calibration there. Results are measured against odds of spotting differences being at the guessing level alone. Calibration then is for proper statistical testing methods to have been employed. |
#55
Posted to rec.audio.high-end
|
|||
|
|||
Do blind tests need being "calibrated" ?
On Jul 1, 9:59=A0am, "Arny Krueger" wrote:
"Scott" wrote in message ... On Jun 29, 7:44=3DA0am, "Arny Krueger" wrote: "Scott" wrote in message ... No. The question is can a given ABX test setup reveal actual audible differences. The answer is yes once you show it can do so. Until the= n the answer is maybe. Yet another example of anti-ABX bias by raising a general question abo= ut listening tests in such a way that it appears that only ABX tests are affected. How is that an example of anti ABX? It makes it look like only ABX tests have any problems. How? I could just as easily say its an example of pro ABX. Thats a problem - you say things like this so easily and glibly. How is that a problem? These percpetions of anti ABX are really in your head not in my posts. Where is the bias? The fact that it makes a general problem look like it only applies to ABX= |
#56
Posted to rec.audio.high-end
|
|||
|
|||
Do blind tests need being "calibrated" ?
On Jul 1, 11:34=A0am, wrote:
Scott observed: named Steve. You will find a lot of discussion on calibration in medical DBTs. In a typical medical test two or more groups are given different medical treatments and the results measured against a calibrated measure of some body function or presence of some substance. There is a lot more calibration going on than that. In listening alone tests this is meaningless except for calibration of both arms of the test gear lineup. =A0Also of statistical methods used having validity for the test used. We want to know if the difference statement "cd player x sounds sweeter then y" has its source in the signal reaching the ears or after it in the brain. In which case if the reproduction gear has been calibrated as to making all things but the bit of gear being tested as equal as possible, nothing more is required in terms of calibration there. Results are measured against odds of spotting differences being at the guessing level alone. =A0Calibration then is for proper statistical testi= ng methods to have been employed. Let me demonstrate how this is an issue by using an extreme. let's say the test had the wires screwed up and the ABX test was in reality wired as an AAX test and B was accidentally cut out of the loop. How sensitive would this test be to audible differences between A and B? How would you know if you did a test under these circumstances and a got a null and never checked up on the setup? It's about testing the test. Or at least checking to make sure everything is working as it should be. How this is a bad idea still is beyond me. |
#57
Posted to rec.audio.high-end
|
|||
|
|||
Do blind tests need being "calibrated" ?
"Scott" wrote in message
Let me demonstrate how this is an issue by using an extreme. let's say the test had the wires screwed up and the ABX test was in reality wired as an AAX test and B was accidentally cut out of the loop. How sensitive would this test be to audible differences between A and B? How would you know if you did a test under these circumstances and a got a null and never checked up on the setup? You need to come up with an example that ABX testing doesn't make impossible. Level matching prevents miswiring an ABX test because you need to have proper identification and wiring of A and B to get through the level matching step. It's about testing the test. The test tests itself during the setup phase. |
#58
Posted to rec.audio.high-end
|
|||
|
|||
Do blind tests need being "calibrated" ?
On Jul 1, 6:05=A0pm, Scott wrote:
Let me demonstrate how this is an issue by using an extreme. let's say the test had the wires screwed up and the ABX test was in reality wired as an AAX test and B was accidentally cut out of the loop. How sensitive would this test be to audible differences between A and B? How would you know if you did a test under these circumstances and a got a null and never checked up on the setup? So it's come to this. The thread began with a couple of quotes challenging the validity of a DBT published in a peer-reviewed journal, on the basis that they had not "calibrated" their test. And here we are, 60-odd posts later, learning that "calibration" means, make sure the equipment is wired correctly. I think I said before that the demand for "calibration" was just handwaving by people with no better case to make. This post seems to vindicate that assessment. bob |
#59
Posted to rec.audio.high-end
|
|||
|
|||
Do blind tests need being "calibrated" ?
Arny Krueger wrote:
I've done extensive tests with other musical program material that was recorded at 24/96 under near-lab conditions in the interest of high actual recorded dynamic range and extended bandwidth. For example, some of my recordings had about 90 dB dynamic range, as compared to an exceptionally good commercial recording's dynamic range which is in the area of 75 dB. I've then reduced this signal's dynamic range and bandwidth progressively until I could find even one listener (of many tried) who could hear a difference. I removed one bit of resolution at a time by rudely stripping off bits without dither. I removed bandwidth by means of downsampling using brick wall filtering. I found that resolution reduction became reliably audible with 14 bits resolution, and that bandwidth reduction became reliably audible with 16 KHz bandwidth. Resolution reduction to 15 bits and 19 KHz bandwidth reduction was undetectible. What were your criteria for establishing that an effect was reliably audible or (reliably) undetectable? |
#60
Posted to rec.audio.high-end
|
|||
|
|||
Do blind tests need being "calibrated" ?
On Jul 1, 5:59=A0pm, Scott wrote:
Since you are not limiting it to audio this could turn into one of those lists like the long list of scientists who believe in evolution named Steve. You will find a lot of discussion on calibration in medical DBTs. In other disciplines calibration might be needed, but this is an audio forum, and the thread is about blind tests in audio, not drugs or food. Just like audio, psychoacoustic research uses hearing as detection tool, and 60 years of non-calibrated blind tests should have some weight, or so I presume. I only searched JASA and only for ABX and I did not look at all the research referenced to in the papers I found, so that list of names could be much longer. So far there's only the claim that blind tests need being calibrated, no evidence, no indication of hearing related research where calibration was indeed used. I hence do consider this claim as lame excuse for not accepting the particular study, its results and conclusions. Klaus |
#61
Posted to rec.audio.high-end
|
|||
|
|||
Do blind tests need being "calibrated" ?
"John Corbett" wrote in message
Arny Krueger wrote: I've done extensive tests with other musical program material that was recorded at 24/96 under near-lab conditions in the interest of high actual recorded dynamic range and extended bandwidth. For example, some of my recordings had about 90 dB dynamic range, as compared to an exceptionally good commercial recording's dynamic range which is in the area of 75 dB. I've then reduced this signal's dynamic range and bandwidth progressively until I could find even one listener (of many tried) who could hear a difference. I removed one bit of resolution at a time by rudely stripping off bits without dither. I removed bandwidth by means of downsampling using brick wall filtering. I found that resolution reduction became reliably audible with 14 bits resolution, and that bandwidth reduction became reliably audible with 16 KHz bandwidth. Resolution reduction to 15 bits and 19 KHz bandwidth reduction was undetectible. What were your criteria for establishing that an effect was reliably audible or (reliably) undetectable? You're asking a question you know the answer to, John. What's your point? Since statistics is your bag, what criteria would you use? |
#62
Posted to rec.audio.high-end
|
|||
|
|||
Do blind tests need being "calibrated" ?
Arny Krueger wrote:
"Scott" wrote in message Let me demonstrate how this is an issue by using an extreme. let's say the test had the wires screwed up and the ABX test was in reality wired as an AAX test and B was accidentally cut out of the loop. How sensitive would this test be to audible differences between A and B? How would you know if you did a test under these circumstances and a got a null and never checked up on the setup? You need to come up with an example that ABX testing doesn't make impossible. Level matching prevents miswiring an ABX test because you need to have proper identification and wiring of A and B to get through the level matching step. It's about testing the test. The test tests itself during the setup phase. Of course the sort of error that Scott described is not impossible. See http://www.bostonaudiosociety.org/ba...x_testing2.htm for a real-world example. |
#63
Posted to rec.audio.high-end
|
|||
|
|||
Do blind tests need being "calibrated" ?
Arny Krueger wrote:
"John Corbett" wrote in message Arny Krueger wrote: I've done extensive tests with other musical program material that was recorded at 24/96 under near-lab conditions in the interest of high actual recorded dynamic range and extended bandwidth. For example, some of my recordings had about 90 dB dynamic range, as compared to an exceptionally good commercial recording's dynamic range which is in the area of 75 dB. I've then reduced this signal's dynamic range and bandwidth progressively until I could find even one listener (of many tried) who could hear a difference. I removed one bit of resolution at a time by rudely stripping off bits without dither. I removed bandwidth by means of downsampling using brick wall filtering. I found that resolution reduction became reliably audible with 14 bits resolution, and that bandwidth reduction became reliably audible with 16 KHz bandwidth. Resolution reduction to 15 bits and 19 KHz bandwidth reduction was undetectible. What were your criteria for establishing that an effect was reliably audible or (reliably) undetectable? You're asking a question you know the answer to, John. What's your point? Since statistics is your bag, what criteria would you use? A well-designed and carefully executed ABX test can provide strong evidence about audibility of a given stimulus. However, an ABX test with a small number of trials (e.g., 16) and small alpha level (such as .01) is inherently incapable of reliably detecting small effects. If you claim to have shown that something was inaudible, but are unwilling to provide evidence to support your claim, then maybe I should follow the advice you gave earlier in this very thread, and consider your claims to be unsubstantiated and unsupported. In your own words: Nobody has any obligation to do even one little thing to support their claims. Then, every reasonable person recognizes the claims for what they are, unsubstantiated, unsupported claims, and simply moves on. And, On one side we have people who defend unsubstantiated claims, and on the other side we have people who are themselves capable of making claims and supporting them with reliable, well-thought out evidence. |
#64
Posted to rec.audio.high-end
|
|||
|
|||
Do blind tests need being "calibrated" ?
"John Corbett" wrote in message
Arny Krueger wrote: "John Corbett" wrote in message Arny Krueger wrote: I've done extensive tests with other musical program material that was recorded at 24/96 under near-lab conditions in the interest of high actual recorded dynamic range and extended bandwidth. For example, some of my recordings had about 90 dB dynamic range, as compared to an exceptionally good commercial recording's dynamic range which is in the area of 75 dB. I've then reduced this signal's dynamic range and bandwidth progressively until I could find even one listener (of many tried) who could hear a difference. I removed one bit of resolution at a time by rudely stripping off bits without dither. I removed bandwidth by means of downsampling using brick wall filtering. I found that resolution reduction became reliably audible with 14 bits resolution, and that bandwidth reduction became reliably audible with 16 KHz bandwidth. Resolution reduction to 15 bits and 19 KHz bandwidth reduction was undetectible. What were your criteria for establishing that an effect was reliably audible or (reliably) undetectable? You're asking a question you know the answer to, John. What's your point? Since statistics is your bag, what criteria would you use? A well-designed and carefully executed ABX test can provide strong evidence about audibility of a given stimulus. However, an ABX test with a small number of trials (e.g., 16) and small alpha level (such as .01) is inherently incapable of reliably detecting small effects. The assertion without relevant substantiation is noted. This is a very old discussion point, with divergent opinions on both sides. I'm hardly married to 16 trials and 0.01 probability. It does turn out that all observed circumstances where many more trials were attempted, showed convergence to a random mean. If you claim to have shown that something was inaudible, but are unwilling to provide evidence to support your claim, then maybe I should follow the advice you gave earlier in this very thread, and consider your claims to be unsubstantiated and unsupported. In your own words: Nobody has any obligation to do even one little thing to support their claims. Then, every reasonable person recognizes the claims for what they are, unsubstantiated, unsupported claims, and simply moves on. Interesting that we have yet another example of the same, above. And, On one side we have people who defend unsubstantiated claims, and on the other side we have people who are themselves capable of making claims and supporting them with reliable, well-thought out evidence. Certainly, any well-thought-out evidence would be appreciated, but seeing nothing new... |
#65
Posted to rec.audio.high-end
|
|||
|
|||
Do blind tests need being "calibrated" ?
"John Corbett" wrote in message
Arny Krueger wrote: "Scott" wrote in message Let me demonstrate how this is an issue by using an extreme. let's say the test had the wires screwed up and the ABX test was in reality wired as an AAX test and B was accidentally cut out of the loop. How sensitive would this test be to audible differences between A and B? How would you know if you did a test under these circumstances and a got a null and never checked up on the setup? You need to come up with an example that ABX testing doesn't make impossible. Level matching prevents miswiring an ABX test because you need to have proper identification and wiring of A and B to get through the level matching step. It's about testing the test. The test tests itself during the setup phase. Of course the sort of error that Scott described is not impossible. See http://www.bostonaudiosociety.org/ba...x_testing2.htm for a real-world example. Waiting for a well-thought-out analysis, as opposed to yet another unsubstantiated assertion. Also, the reference cited above is dated 1984 which was merely 26 years ago. For something a little more current from the same publication, please see: http://www.bostonaudiosociety.org/explanation.htm |
Reply |
|
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
"AKAI", "KURZWEIL", "ROLAND", DVDs and CDs | Audio Opinions |