About Us

Scott[_6_]

On Jun 26, 10:50=A0am, ScottW wrote:
On Jun 24, 6:35=A0am, Scott wrote:

On Jun 23, 7:44=3DA0pm, wrote:

On Jun 22, 3:03=3D3DA0pm, Scott wrote:

On Jun 18, 7:07=3D3D3DA0am, wrote:
My question is, is there a real need for calibration or is this j=
ust =3D
a
demand of audiophiles because the test came up with a negative re=
sult=3D
?
Of course it does otherwise you have no way of gauging the sensitiv=
ity
of the test. Anull with no calibration has too many variables. was =
the
test sensitive to audible differences? Were the subjects sensitive =
to
audible differences? No way to know is there?

Assuming that there was a need for calibration. The M&M test was abou=
t
the audibility of "bottlenecking" a hi-rez signal. What calibration
signal would one use other than a bottlenecked signal (one would have
to otherwise some audiophiles would claim that the calibration signal
was not adapted for its intended purpose), so for calibration one
would use the signal that is going to be tested.

The same signals one would use to test any set up for sensitivity to
audible differences.

=A0Interesting proposal. =A0Could you define the categories of audible
differences?

I could try but really I think this would be a good question for JJ
who has done extensive tests for various thresholds of human hearing.

Let's just take a relatively simple one like Frequency Response and
examine it in with just bit of speculative detail.

=A0I suppose single tone amplitude would be obvious with humans having
different sensitivity to amplitude difference at different
frequencies.
That sensitivity to amplitude difference also changes with amplitude.
You can't hear difference in amplitude between two signals both of
which you can't hear, nor do I suppose you can hear the difference
between two signals of different amplitude when both of them are
sufficient to make your ears bleed.
Add a second single frequency fixed amplitude masking tone. Measure
sensitivity to variable amplitude tones. That will vary with freqency
of masking tone, amplitude of masking, frequency of the tone we're
measuring sensitivity to amplitude changes of, and amplitude of the
tone we're meausring sensitivity to amplitude changes of
and probably of few other interactions I've failed to mention.

Hmm...try to matrix that. =A0How many signal conditions have we
constructed?
Let's see....1hz freq resolution, 0.5 db amplitude resolution from
threshold to pain..
times all the interactions...
Well, I'm going out on a limb to guess this signal matrix will be
quite large.
Now, How do these sensitivity tests with test signals correlate to
real music signals used in ABX tests?

All very fair questions. Again I would defer to JJ who routinely did
such tests throughout his career and did so in many cases in
examination of human thresholds of hearing.

Wait a second, we're now back to the beginning of trying to determine
humans ability to determine audible differences with different music
signals on a system.

Seems like we'd need to know that to begin to calibrate with "test
signals".

Such is the value of a large body of data.

Scott[_6_]

On Jun 26, 11:24=A0am, "Arny Krueger" wrote:
"Scott" wrote in message

On Jun 24, 12:35 pm, Audio Empire
wrote:
But does that mean that they "can't" hear or that there
are simply no differences TO hear?
Without testing for sensitivity the answer to your
question is yes.

The outcome of any test you run with a negative result, can be interprete=
d
as follows:

It means that they either "can't" hear
or there are simply differences to hear or they simply
can't discriminate those under that particular test.

That includes any so-called sensitivity tests.

Sensitivity tests without positive results would either indicate
incomplete sensitivity tests or a complete lack of sensitivity for
audible differences.

Let's review the current situation.

There is no "current situation" to review in regards to my assertions
about ABX DBTs needing to be calibrated for sensititvity. It is an
assertion about ABX DBTs in general.

It is easy enough to screw up such a test.

This dodges addressing the current situation where thousands of listening
tests have been run to show an audible difference due to excess sample ra=
tes
with no known positive outcomes when appropriate experimental controls we=
re
in place.

No, it doen't dodge anything. It is a basic truism about ABX DBTs and
does not make any reference to any specific tests.

Just continue to
test on ABX with differences that are near the threshold
of audibility way beyond the threshold of listener
fatigue and you will likely get a false negative.

Since you specifically mention ABX, are you saying that there is no such
thing as listener fatique in sighted evaluations?

No

Or are you saying that
some other methodology, such as ABC/hr is not at least equally fatiguing?

No.

Are you asserting that all of the thousands of failed tests were all due =
to
listener fatique?

No.

Or, are just just dragging out an old, tired red herring?

What red herring Arny? My assertion is about ABX DBTs in general. The
only red herrings I see are the ones you just tried to drag out.

KH

On 6/26/2010 8:45 PM, Scott wrote:
On Jun 26, 11:24=A0am, "Arny wrote:
wrote in message

snip

It means that they either "can't" hear
or there are simply differences to hear or they simply
can't discriminate those under that particular test.

That includes any so-called sensitivity tests.

Sensitivity tests without positive results would either indicate
incomplete sensitivity tests or a complete lack of sensitivity for
audible differences.

Sensitivity to what precisely? You cannot verify sensitivity to unknown
parameters. You can't generate test signals that effectively mimic
uncharacterized differences between presentations. Look at it another
way, if you could characterize the differences, and generate a
representative test signal with which to "calibrate" the listeners, then
the actual ABX test would be moot relative to the specific difference in
question - the answer would be known based on the sensitivity test. And
*that* would tell you precisely nothing, as none of the masking effects
present an actual musical ABX test would be present. If they are
present, then the "sensitivity" test and the ABX test are identical, and
you're back to square one.

And let's not give short shrift to the "not everything can be measured"
crowd; that position precludes even the possibility of a sensitivity
test, as you cannot ever generate a test signal, nor can you quantify
any results related to the test.

Let's review the current situation.

There is no "current situation" to review in regards to my assertions
about ABX DBTs needing to be calibrated for sensititvity. It is an
assertion about ABX DBTs in general.

It is easy enough to screw up such a test.

This dodges addressing the current situation where thousands of listening
tests have been run to show an audible difference due to excess sample ra=
tes
with no known positive outcomes when appropriate experimental controls we=
re
in place.

No, it doen't dodge anything. It is a basic truism about ABX DBTs and
does not make any reference to any specific tests.

No, it's not true, let alone a truism, about ABX DBT's. ABX has been
shown many times as capable of detecting audible differences when they
exist. It's been used many times to determine that excessive sample
rates don't result in audible differences, using the same methodology
and bias controls. That test base testifies to the precision of the
method which is the only type of "calibration" that is relevant to this
type of *difference discernment* test.

And once again, one would need to show how your supposed "calibration
deficit" would in fact have relevance, were it to exist, in the context
of actually applying ABX methods. For example, in situations like Arny
referred to above, where the only countervailing "evidence" is gathered
via listening tests that have the same "problems" (be they calibration,
fatigue, whatever) as is claimed for ABX, *plus* additional uncontrolled
error sources as well, an ABX test confirming the results expected,
based upon engineering and psychoacoustic knowledge, have no requirement
for accuracy (which would require some calibration activity), as they
are not *quantifying* anything. They require only establishment of
precision, and that's provided by a database of many test/subject
replicates.

Just continue to
test on ABX with differences that are near the threshold
of audibility way beyond the threshold of listener
fatigue and you will likely get a false negative.

And so? Put earmuffs on the subjects and you'll get the same false
negative. There are endless ways to screw up a test, how is that
relevant to the discussion here? Failure to "calibrate" a method that
does not seek to quantify anything is *not* one of those failure modes.

Keith Hughes

Scott[_6_]

On Jun 27, 10:41=A0am, KH wrote:
On 6/26/2010 8:45 PM, Scott wrote: On Jun 26, 11:24=3DA0am, "Arny Kruege=
=A0wrote:
=A0wrote in message

snip

It means that they either "can't" hear
or there are simply differences to hear or they simply
can't discriminate those under that particular test.

That includes any so-called sensitivity tests.

Sensitivity tests without positive results would either indicate
incomplete sensitivity tests or a complete lack of sensitivity for
audible differences.

Sensitivity to what precisely?

Actual audible differences.

=A0You cannot verify sensitivity to unknown
parameters.

What unknown parameters are you talking about? We got a pretty good
idea what the parameters are of the thresholds of human hearing.

=A0You can't generate test signals that effectively mimic
uncharacterized differences between presentations.

How are known audible differences uncharacterized?

=A0Look at it another
way, if you could characterize the differences, and generate a
representative test signal with which to "calibrate" the listeners, then
the actual ABX test would be moot relative to the specific difference in
question - the answer would be known based on the sensitivity test.

That would be true if the claim under test were already known to be
audibly different. think about it.

=A0And
*that* would tell you precisely nothing, as none of the masking effects
present an actual musical ABX test would be present. If they are
present, then the "sensitivity" test and the ABX test are identical, and
you're back to square one.

No. The question is can a given ABX test setup reveal actual audible
differences. The answer is yes once you show it can do so. Until then
the answer is maybe.

And let's not give short shrift to the "not everything can be measured"
crowd; that position precludes even the possibility of a sensitivity
test, as you cannot ever generate a test signal, nor can you quantify
any results related to the test.

Why on earth would we want to address that crowd? How would that help
make ABX DBTs better?

Let's review the current situation.

There is no "current situation" to review in regards to my assertions
about ABX DBTs needing to be calibrated for sensititvity. It is an
assertion about ABX DBTs in general.

It is easy enough to screw up such a test.

This dodges addressing the current situation where thousands of listen=
ing
tests have been run to show an audible difference due to excess sample=
ra=3D
tes
with no known positive outcomes when appropriate experimental controls=
we=3D
re
in place.

No, it doen't dodge anything. It is a basic truism about ABX DBTs and
does not make any reference to any specific tests.

No, it's not true, let alone a truism, about ABX DBT's. =A0ABX has been
shown many times as capable of detecting audible differences when they
exist.

Really? How have ABX tests that never wrought any positives been shown
to be capable of detecting audible differences without some sort of
check for test sensitivity?

=A0It's been used many times to determine that excessive sample
rates don't result in audible differences, using the same methodology
and bias controls. =A0That test base testifies to the precision of the
method which is the only type of "calibration" that is relevant to this
type of *difference discernment* test.

In the middle ages putting a suspected heretic's hand in boiling water
was also used many times and the results were every bit as consistant.
Doesn't mean it actually worked.

And once again, one would need to show how your supposed "calibration
deficit" would in fact have relevance, were it to exist, in the context
of actually applying ABX methods.

Can't say I am buying that. The burden of rigor is on those who do
such tests. No one needs to show these folks the need for due rigor.

=A0For example, in situations like Arny
referred to above, where the only countervailing "evidence" is gathered
via listening tests that have the same "problems" (be they calibration,
fatigue, whatever) as is claimed for ABX, *plus* additional uncontrolled
error sources as well, an ABX test confirming the results expected,
based upon engineering and psychoacoustic knowledge, have no requirement
for accuracy (which would require some calibration activity), as they
are not *quantifying* anything. =A0They require only establishment of
precision, and that's provided by a database of many test/subject
replicates.

Not buying that. Consistant results is not proof per se that results
are accurate.

Just continue to
test on ABX with differences that are near the threshold
of audibility way beyond the threshold of listener
fatigue and you will likely get a false negative.

And so? Put earmuffs on the subjects and you'll get the same false
negative. =A0There are endless ways to screw up a test, how is that
relevant to the discussion here? =A0Failure to "calibrate" a method that
does not seek to quantify anything is *not* one of those failure modes.

Unfortunately you snipped the context of my assertion which simply was
there were many ways to get a false negative and that the causes of
false negatives and false positives tended to be different. It was not
claimed that listener fatigue had any direct corolation to testing for
test sensitivity. It's just another one particular way to desentize a
test. If a test lacks due sensitivity it may wrought false negative
results. There are indeed "endless ways to screw up a test" and
checking for sensitivity can actually prevent any number of those
endless ways from creeping in and corrupting a given test. Perhaps you
don't think such a precaution is a good idea. i think it is.

[email protected]

My question was: What calibration signal would one use other than a
bottlenecked signal?

On Jun 24, 2:35 pm, Scott answered:

The same signals one would use to test any set up for sensitivity to
audible differences.

True, nor do you "know" that any given test will reveal audible
differences should there be audible differences. Given the body of
knowledge on the thresholds of human hearing that aspect of any given
ABX DBT can be gauged before conducting any further ABX DBTs. How on
earth would it ever be anything but a good idea to do so?

Are you saying that one could use any known audible difference to see
whether the test is capable of judging the audibility of
bottlenecking? So I could use, say, audible phase shift, IM
distortion, or group delay? If ABX yields positive results you have
shown that ABX is sensitive to phase shift, IM distortion, or group
delay, but nothing more. It won=92t tell you that it will be sensitive
to bottlenecking.

Further, thresholds of human hearing are one thing, differences
between CD-players, amplifiers or, in this case, different formats,
another. Either you hear a difference or you don=92t, there is no such
as thing as threshold of perception of that difference.

Therefore, you cannot use any arbitrary signal to calibrate the
bottleneck test. You have to use a bottlenecked signal, so the
calibration step would constitute the very test you want to calibrate.

In the end the AES is just a group of people with it's own baggage.
Show me one published scientific researcher who would suggest checking
a DBT for sensitivity is anything other than a good idea.

You want names, here are some:

Ted Grusec (Canadian department of communication)
Soren Bech (Bang & Olufsen)
W. H. Schmidt (University for technology and economy, Berlin)
Kaoru Watanabe (NHK)
Stan Lip****z (Waterloo University)
Kaoru Ashihara (National Institute of Advanced Industrial Science and
Technology)
W.A. Munson ((Bell Labs)
W.A. Rosenblith (MIT)
J. Hillenbrand (Dep. of speech pathology and audiology, Western
Michigan University)
D. Kewley-Port (Speech research lab., Indiana University)
Ian B. Thomas (Speech communications lab., U. of Massachusetts)

Not only AES, also JASA peer reviewers seem to have no objections
against non-calibrated blind tests.

Now it=92s your turn: name published scientists who do think that blind
tests need being calibrated, after all, you made the claim that they
do!

Klaus

Arny Krueger

"Scott" wrote in message
...

No. The question is can a given ABX test setup reveal actual audible
differences. The answer is yes once you show it can do so. Until then
the answer is maybe.

Yet another example of anti-ABX bias by raising a general question about
listening tests in such a way that it appears that only ABX tests are
affected.

Here's a corrected version:

The question is can a given listening test setup reveal actual audible
differences. The answer is yes once you show it can do so. Until then
the answer is maybe.

Given their built-in bias towards false positive results, most audiophile
listening evaluations cannot be properly called tests. They are useless for
determining actual audible differences unless the differences are relatively
gross.

[email protected]

Klaus mentioned:

Further, thresholds of human hearing are one thing, differences between
CD-players, amplifiers or, in this case, different formats, another.
E ither you hear a difference or you don=92t, there is no such as thing
as
threshold of perception of that difference.

Therefore, you cannot use any arbitrary signal to calibrate the
bottleneck test. You have to use a bottlenecked signal, so the
calibration step would constitute the very test you want to calibrate.

This is all rhetorical tap dancing. In a listening alone context
thresholds of hearing perceptions of difference is all that counts. All
difference in a hi fi bit of gear is electrical as to signal differences.
All signal differences have by experiment a threshold by which difference
can be percieved.

Distortion will serve. Any two bits of gear can be made to sound
different if one causes one to produce enough difference in distortion
then the other. As one lowers that difference a threshold is reached
beyond which no difference can be percieved but by measurement it still
exists.

If one claims that cd player x but not y sounds "sweeter" then one is
making reference to electrical properties one thinks result in that
difference. Controlled listening alone testing will soon resolve this
perception claim.

If no difference beyond guessing can be identified then the difference
lies in ones brain's subjective perception producing process not the
electrical threshold potential of the gear. The perception
toggle"sweeter"
on and off as one knows or not which bit of gear is active confirms all.

As for source of signal, it is irrelevant. If electrical differences are
thought to produce a "sweeter" perception then choose a signal source one
claims to have experienced it while listening. The initial claim is the
"calibration" by listening alone, as will be the follow up listening alone
test of perception difference.

[email protected]

On Jun 29, 3:44 pm, wrote:
Klaus mentioned:

Further, thresholds of human hearing are one thing, differences between
CD-players, amplifiers or, in this case, different formats, another.
E ither you hear a difference or you don=92t, there is no such as thing
as
threshold of perception of that difference.

Therefore, you cannot use any arbitrary signal to calibrate the
bottleneck test. You have to use a bottlenecked signal, so the
calibration step would constitute the very test you want to calibrate.

This is all rhetorical tap dancing. In a listening alone context
thresholds of hearing perceptions of difference is all that counts.

If you take stuff like distortion, you start with zero % and increase
the amount until you perceive it: threshold found. In the Meyer/Moran
study I referred to, they converted the SACD signal to 16 bit/44.1 kHz
and looked whether or not one could hear the difference. Please care
to explain how in this particular case a threshold of perception can
exist? The amount of which parameter do you increase until you find
this threshold?

The same is valid when you compare two pieces of gear: the amount of
which parameter do you increase to find a threshold?

Klaus

[email protected]

Observed:

This is all rhetorical tap dancing. In a listening alone context
thresholds of hearing perceptions of difference is all that counts.

Klaus responded:

If you take stuff like distortion, you start with zero % and increase
the amount until you perceive it: threshold found. In the Meyer/Moran
study I referred to, they converted the SACD signal to 16 bit/44.1 kHz
and looked whether or not one could hear the difference. Please care
to explain how in this particular case a threshold of perception can
exist? The amount of which parameter do you increase until you find
this threshold?

The same is valid when you compare two pieces of gear: the amount of
which parameter do you increase to find a threshold?

You are focused on the wrong question. In psychoacoustics we might want
to know the source of what difference produces what threshold. Or in gear
design one might be interested to know what signal source produces what
threshold so as to make the gear produce as desired.

In listening alone testing frankly we don't care and dwelling on your
question is only a diversion.

If cd player x is said to be "sweeter" then y, we are to test the claimed
difference not to discover what electrical source might have produced it.
In fact if when listening alone testing establishes that no difference can
be reliably spotted, any effort to answer your question is moot.

Such a test result only adds to the body of research showing such
percieved differences do not exist in electtrical difference but are
generated in the brain post reception of the signal at the ears. That
percieved difference responding to non acoustical/electrical perception
stimuli when the gear being tested is known.

Arny Krueger

wrote in message
...
On Jun 29, 3:44 pm, wrote:
Klaus mentioned:

Further, thresholds of human hearing are one thing, differences between
CD-players, amplifiers or, in this case, different formats, another.
E ither you hear a difference or you don=92t, there is no such as thing
as
threshold of perception of that difference.

Therefore, you cannot use any arbitrary signal to calibrate the
bottleneck test. You have to use a bottlenecked signal, so the
calibration step would constitute the very test you want to calibrate.

This is all rhetorical tap dancing. In a listening alone context
thresholds of hearing perceptions of difference is all that counts.

If you take stuff like distortion, you start with zero % and increase
the amount until you perceive it: threshold found. In the Meyer/Moran
study I referred to, they converted the SACD signal to 16 bit/44.1 kHz
and looked whether or not one could hear the difference. Please care
to explain how in this particular case a threshold of perception can
exist? The amount of which parameter do you increase until you find
this threshold?

I've done extensive tests with other musical program material that was
recorded at 24/96 under near-lab conditions in the interest of high actual
recorded dynamic range and extended bandwidth. For example, some of my
recordings had about 90 dB dynamic range, as compared to an exceptionally
good commercial recording's dynamic range which is in the area of 75 dB.

I've then reduced this signal's dynamic range and bandwidth progressively
until I could find even one listener (of many tried) who could hear a
difference. I removed one bit of resolution at a time by rudely stripping
off bits without dither. I removed bandwidth by means of downsampling
using brick wall filtering.

I found that resolution reduction became reliably audible with 14 bits
resolution, and that bandwidth reduction became reliably audible with 16 KHz
bandwidth. Resolution reduction to 15 bits and 19 KHz bandwidth reduction
was undetectible.

The same is valid when you compare two pieces of gear: the amount of
which parameter do you increase to find a threshold?

One can increase the noise, response errors and distortion of audio gear
progressivly and in ways that simply extend the inaccuracies of the
particular piece of gear, by passing audio signals through that gear again
and a again. I found that for example passing demanding musical signals
(see above) through a power amplifier can become audible (due to small HF
frequency response losses) after about 5 passes. Other types of equipment,
such as the converters that I used in all of these experiements could pass a
signal in excess of 20 times without an reliably detectable audible change.

Scott[_6_]

On Jun 29, 4:46=A0am, wrote:

You want names, here are some:

Ted Grusec (Canadian department of communication)
Soren Bech (Bang & Olufsen)
W. H. Schmidt (University for technology and economy, Berlin)
Kaoru Watanabe (NHK)
Stan Lip****z (Waterloo University)
Kaoru Ashihara (National Institute of Advanced Industrial Science and
Technology)
W.A. Munson ((Bell Labs)
W.A. Rosenblith (MIT)
J. Hillenbrand (Dep. of speech pathology and audiology, Western
Michigan University)
D. Kewley-Port (Speech research lab., Indiana University)
Ian B. Thomas (Speech communications lab., U. of Massachusetts)

That certainly *is* a list of names.....

Not only AES, also JASA peer reviewers seem to have no objections
against non-calibrated blind tests.

"seem?"

Now it=3D92s your turn: name published scientists who do think that blind
tests need being calibrated, after all, you made the claim that they
do!

Since you are not limiting it to audio this could turn into one of
those lists like the long list of scientists who believe in evolution
named Steve. You will find a lot of discussion on calibration in
medical DBTs.

Scott[_6_]

On Jun 29, 7:44=A0am, "Arny Krueger" wrote:
"Scott" wrote in message

...

No. The question is can a given ABX test setup reveal actual audible
differences. The answer is yes once you show it can do so. Until then
the answer is maybe.

Yet another example of anti-ABX bias by raising a general question about
listening tests in such a way that it appears that only ABX tests are
affected.

How is that an example of anti ABX? I could just as easily say its an
example of pro ABX. Where is the bias?

Here's a corrected version:

The question is can a given listening test setup reveal actual audible
differences. The answer is yes once you show it can do so. Until then
the answer is maybe.

That is just a different version not a corrected version. The question
works just as well for listening tests as a general catagory or ABX as
a specific subset.

Arny Krueger

"Scott" wrote in message
...
On Jun 29, 7:44=A0am, "Arny Krueger" wrote:
"Scott" wrote in message

...

No. The question is can a given ABX test setup reveal actual audible
differences. The answer is yes once you show it can do so. Until then
the answer is maybe.

Yet another example of anti-ABX bias by raising a general question about
listening tests in such a way that it appears that only ABX tests are
affected.

How is that an example of anti ABX?

It makes it look like only ABX tests have any problems.

I could just as easily say its an example of pro ABX.

Thats a problem - you say things like this so easily and glibly.

Where is the bias?

The fact that it makes a general problem look like it only applies to ABX.

Here's a corrected version:

The question is can a given listening test setup reveal actual audible
differences. The answer is yes once you show it can do so. Until then
the answer is maybe.

That is just a different version not a corrected version.

No, it corrects the obvious writer's misapprehension that only ABX tests
have the problem that is being pointed out.

The question works just as well for listening tests as a general catagory
or ABX as
a specific subset.

That's exactly right, so explain why you made a point of picking on ABX
tests, and didn't phrase it so it applied to listening tests in general?

[email protected]

Scott observed:

named Steve. You will find a lot of discussion on calibration in
medical DBTs.

In a typical medical test two or more groups are given different medical
treatments and the results measured against a calibrated measure of some
body function or presence of some substance.

In listening alone tests this is meaningless except for calibration of
both arms of the test gear lineup. Also of statistical methods used
having validity for the test used.

We want to know if the difference statement "cd player x sounds sweeter
then y" has its source in the signal reaching the ears or after it in the
brain.

In which case if the reproduction gear has been calibrated as to making
all things but the bit of gear being tested as equal as possible, nothing
more is required in terms of calibration there.

Results are measured against odds of spotting differences being at the
guessing level alone. Calibration then is for proper statistical testing
methods to have been employed.

Scott[_6_]

On Jul 1, 9:59=A0am, "Arny Krueger" wrote:
"Scott" wrote in message

...

On Jun 29, 7:44=3DA0am, "Arny Krueger" wrote:
"Scott" wrote in message

...

No. The question is can a given ABX test setup reveal actual audible
differences. The answer is yes once you show it can do so. Until the=
n
the answer is maybe.
Yet another example of anti-ABX bias by raising a general question abo=
ut
listening tests in such a way that it appears that only ABX tests are
affected.
How is that an example of anti ABX?

It makes it look like only ABX tests have any problems.

How?

I could just as easily say its an example of pro ABX.

Thats a problem - you say things like this so easily and glibly.

How is that a problem? These percpetions of anti ABX are really in
your head not in my posts.

Where is the bias?

The fact that it makes a general problem look like it only applies to ABX=

Scott[_6_]

On Jul 1, 11:34=A0am, wrote:
Scott observed:

named Steve. You will find a lot of discussion on calibration in
medical DBTs.

In a typical medical test two or more groups are given different medical
treatments and the results measured against a calibrated measure of some
body function or presence of some substance.

There is a lot more calibration going on than that.

In listening alone tests this is meaningless except for calibration of
both arms of the test gear lineup. =A0Also of statistical methods used
having validity for the test used.

We want to know if the difference statement "cd player x sounds sweeter
then y" has its source in the signal reaching the ears or after it in the
brain.

In which case if the reproduction gear has been calibrated as to making
all things but the bit of gear being tested as equal as possible, nothing
more is required in terms of calibration there.

Results are measured against odds of spotting differences being at the
guessing level alone. =A0Calibration then is for proper statistical testi=
ng
methods to have been employed.

Let me demonstrate how this is an issue by using an extreme. let's say
the test had the wires screwed up and the ABX test was in reality
wired as an AAX test and B was accidentally cut out of the loop. How
sensitive would this test be to audible differences between A and B?
How would you know if you did a test under these circumstances and a
got a null and never checked up on the setup?

It's about testing the test. Or at least checking to make sure
everything is working as it should be. How this is a bad idea still is
beyond me.

Arny Krueger

"Scott" wrote in message

Let me demonstrate how this is an issue by using an
extreme. let's say the test had the wires screwed up and
the ABX test was in reality wired as an AAX test and B
was accidentally cut out of the loop. How sensitive would
this test be to audible differences between A and B? How
would you know if you did a test under these
circumstances and a got a null and never checked up on
the setup?

You need to come up with an example that ABX testing doesn't make
impossible. Level matching prevents miswiring an ABX test because you need
to have proper identification and wiring of A and B to get through the level
matching step.

It's about testing the test.

The test tests itself during the setup phase.

bob

On Jul 1, 6:05=A0pm, Scott wrote:

Let me demonstrate how this is an issue by using an extreme. let's say
the test had the wires screwed up and the ABX test was in reality
wired as an AAX test and B was accidentally cut out of the loop. How
sensitive would this test be to audible differences between A and B?
How would you know if you did a test under these circumstances and a
got a null and never checked up on the setup?

So it's come to this. The thread began with a couple of quotes
challenging the validity of a DBT published in a peer-reviewed
journal, on the basis that they had not "calibrated" their test.

And here we are, 60-odd posts later, learning that "calibration"
means, make sure the equipment is wired correctly.

I think I said before that the demand for "calibration" was just
handwaving by people with no better case to make. This post seems to
vindicate that assessment.

bob

John Corbett

Arny Krueger wrote:

I've done extensive tests with other musical program material that was
recorded at 24/96 under near-lab conditions in the interest of high actual
recorded dynamic range and extended bandwidth. For example, some of my
recordings had about 90 dB dynamic range, as compared to an exceptionally
good commercial recording's dynamic range which is in the area of 75 dB.

I've then reduced this signal's dynamic range and bandwidth progressively
until I could find even one listener (of many tried) who could hear a
difference. I removed one bit of resolution at a time by rudely stripping
off bits without dither. I removed bandwidth by means of downsampling
using brick wall filtering.

I found that resolution reduction became reliably audible with 14 bits
resolution, and that bandwidth reduction became reliably audible with 16 KHz
bandwidth. Resolution reduction to 15 bits and 19 KHz bandwidth reduction
was undetectible.

What were your criteria for establishing that an effect was reliably
audible or (reliably) undetectable?

[email protected]

On Jul 1, 5:59=A0pm, Scott wrote:

Since you are not limiting it to audio this could turn into one of
those lists like the long list of scientists who believe in evolution
named Steve. You will find a lot of discussion on calibration in
medical DBTs.

In other disciplines calibration might be needed, but this is an audio
forum, and the thread is about blind tests in audio, not drugs or
food. Just like audio, psychoacoustic research uses hearing as
detection tool, and 60 years of non-calibrated blind tests should have
some weight, or so I presume. I only searched JASA and only for ABX
and I did not look at all the research referenced to in the papers I
found, so that list of names could be much longer.

So far there's only the claim that blind tests need being calibrated,
no evidence, no indication of hearing related research where
calibration was indeed used. I hence do consider this claim as lame
excuse for not accepting the particular study, its results and
conclusions.

Klaus

Arny Krueger

"John Corbett" wrote in message

Arny Krueger wrote:

I've done extensive tests with other musical program
material that was recorded at 24/96 under near-lab
conditions in the interest of high actual recorded
dynamic range and extended bandwidth. For example, some
of my recordings had about 90 dB dynamic range, as
compared to an exceptionally good commercial recording's
dynamic range which is in the area of 75 dB.

I've then reduced this signal's dynamic range and
bandwidth progressively until I could find even one
listener (of many tried) who could hear a difference. I
removed one bit of resolution at a time by rudely
stripping off bits without dither. I removed bandwidth
by means of downsampling using brick wall filtering.

I found that resolution reduction became reliably
audible with 14 bits resolution, and that bandwidth
reduction became reliably audible with 16 KHz bandwidth.
Resolution reduction to 15 bits and 19 KHz bandwidth
reduction was undetectible.

What were your criteria for establishing that an effect
was reliably audible or (reliably) undetectable?

You're asking a question you know the answer to, John.

What's your point?

Since statistics is your bag, what criteria would you use?

John Corbett

Arny Krueger wrote:
"Scott" wrote in message

Let me demonstrate how this is an issue by using an
extreme. let's say the test had the wires screwed up and
the ABX test was in reality wired as an AAX test and B
was accidentally cut out of the loop. How sensitive would
this test be to audible differences between A and B? How
would you know if you did a test under these
circumstances and a got a null and never checked up on
the setup?

You need to come up with an example that ABX testing doesn't make
impossible. Level matching prevents miswiring an ABX test because you need
to have proper identification and wiring of A and B to get through the level
matching step.

It's about testing the test.

The test tests itself during the setup phase.

Of course the sort of error that Scott described is not impossible.
See
http://www.bostonaudiosociety.org/ba...x_testing2.htm
for a real-world example.

John Corbett

Arny Krueger wrote:
"John Corbett" wrote in message

Arny Krueger wrote:

I've done extensive tests with other musical program
material that was recorded at 24/96 under near-lab
conditions in the interest of high actual recorded
dynamic range and extended bandwidth. For example, some
of my recordings had about 90 dB dynamic range, as
compared to an exceptionally good commercial recording's
dynamic range which is in the area of 75 dB.

I've then reduced this signal's dynamic range and
bandwidth progressively until I could find even one
listener (of many tried) who could hear a difference. I
removed one bit of resolution at a time by rudely
stripping off bits without dither. I removed bandwidth
by means of downsampling using brick wall filtering.

I found that resolution reduction became reliably
audible with 14 bits resolution, and that bandwidth
reduction became reliably audible with 16 KHz bandwidth.
Resolution reduction to 15 bits and 19 KHz bandwidth
reduction was undetectible.

What were your criteria for establishing that an effect
was reliably audible or (reliably) undetectable?

You're asking a question you know the answer to, John.

What's your point?

Since statistics is your bag, what criteria would you use?

A well-designed and carefully executed ABX test can provide strong
evidence about audibility of a given stimulus.
However, an ABX test with a small number of trials (e.g., 16) and small
alpha level (such as .01) is inherently incapable of reliably detecting
small effects.

If you claim to have shown that something was inaudible, but are
unwilling to provide evidence to support your claim, then maybe I should
follow the advice you gave earlier in this very thread, and consider
your claims to be unsubstantiated and unsupported. In your own words:

Nobody has any obligation to do even one little thing to support
their claims. Then, every reasonable person recognizes the claims for
what
they are, unsubstantiated, unsupported claims, and simply moves on.

And,

On one side we have people who defend
unsubstantiated claims, and on the other side we have people who are
themselves capable of making claims and supporting them with reliable,
well-thought out evidence.

Arny Krueger

"John Corbett" wrote in message

Arny Krueger wrote:
"John Corbett" wrote in message

Arny Krueger wrote:

I've done extensive tests with other musical program
material that was recorded at 24/96 under near-lab
conditions in the interest of high actual recorded
dynamic range and extended bandwidth. For example,
some of my recordings had about 90 dB dynamic range, as
compared to an exceptionally good commercial
recording's dynamic range which is in the area of 75
dB.

I've then reduced this signal's dynamic range and
bandwidth progressively until I could find even one
listener (of many tried) who could hear a difference.
I removed one bit of resolution at a time by rudely
stripping off bits without dither. I removed
bandwidth by means of downsampling using brick wall
filtering.

I found that resolution reduction became reliably
audible with 14 bits resolution, and that bandwidth
reduction became reliably audible with 16 KHz
bandwidth. Resolution reduction to 15 bits and 19 KHz
bandwidth reduction was undetectible.

What were your criteria for establishing that an effect
was reliably audible or (reliably) undetectable?

You're asking a question you know the answer to, John.

What's your point?

Since statistics is your bag, what criteria would you
use?

A well-designed and carefully executed ABX test can
provide strong evidence about audibility of a given
stimulus.
However, an ABX test with a small number of trials (e.g.,
16) and small alpha level (such as .01) is inherently
incapable of reliably detecting small effects.

The assertion without relevant substantiation is noted.

This is a very old discussion point, with divergent opinions on both sides.

I'm hardly married to 16 trials and 0.01 probability. It does turn out that
all observed circumstances where many more trials were attempted, showed
convergence to a random mean.

If you claim to have shown that something was inaudible,
but are unwilling to provide evidence to support your
claim, then maybe I should follow the advice you gave
earlier in this very thread, and consider your claims to
be unsubstantiated and unsupported. In your own words:

Nobody has any obligation to do even one little thing to
support their claims. Then, every reasonable person recognizes
the claims for what
they are, unsubstantiated, unsupported claims, and
simply moves on.

Interesting that we have yet another example of the same, above.

And,

On one side we have people who defend
unsubstantiated claims, and on the other side we have
people who are themselves capable of making claims and
supporting them with reliable, well-thought out evidence.

Certainly, any well-thought-out evidence would be appreciated, but seeing
nothing new...

Arny Krueger

"John Corbett" wrote in message

Arny Krueger wrote:
"Scott" wrote in message

Let me demonstrate how this is an issue by using an
extreme. let's say the test had the wires screwed up and
the ABX test was in reality wired as an AAX test and B
was accidentally cut out of the loop. How sensitive
would this test be to audible differences between A and
B? How would you know if you did a test under these
circumstances and a got a null and never checked up on
the setup?

You need to come up with an example that ABX testing
doesn't make impossible. Level matching prevents
miswiring an ABX test because you need to have proper
identification and wiring of A and B to get through the
level matching step.

It's about testing the test.

The test tests itself during the setup phase.

Of course the sort of error that Scott described is not
impossible. See
http://www.bostonaudiosociety.org/ba...x_testing2.htm
for a real-world example.

Waiting for a well-thought-out analysis, as opposed to yet another
unsubstantiated assertion.

Also, the reference cited above is dated 1984 which was merely 26 years ago.

For something a little more current from the same publication, please see:

http://www.bostonaudiosociety.org/explanation.htm

Thread Tools
Show Printable Version
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
"AKAI", "KURZWEIL", "ROLAND", DVDs and CDs	[email protected]	Audio Opinions	0	January 31st 06 09:08 AM

Menu

About Us