About Us

Found this JAES report that I thought some mioght find interesting, and some
will not be happy about.

Better to be informed than to guess.

Find it he
http://www.hfm-detmold.de/eti/projek...paper_6086.pdf

" wrote in message
...
Found this JAES report that I thought some mioght find interesting, and
some
will not be happy about.

Better to be informed than to guess.

Find it he
http://www.hfm-detmold.de/eti/projek...paper_6086.pdf

This is the second time I have seen the article. It is certainly a test
that probably is as good an abx test as could be done. And I suppose it's
origin arose out of the DVD-A vs. DSD wars that permeated 2001-2003. It
would have been more interesting if either had been compared to ordinary CD.

Two things strike me after having read the article this second time:

* The authors are obviously believers in abx. Accordingly they downplay the
fact that there are differences as evidenced by four responders. But they
fail to follow up by extending the test for these four, thereby giving a
possible excuse as per standard abx practice.(a) Instead, they choose to
emphasize that 141 could hear no difference and they conclude that there is
little difference between the technologies. It would be more accurate to
say either a) there is a difference apparently, but most listeners can't
hear the difference, or b) the testing methodology used doesn't allow most
listeners to hear the difference, either of which could be true.

* They comment upon the stress and confusion of the test, but do their best
to try to twist this into something positive, instead of reporting it for
what it was...a stressful situation for the testers.

(a) Notice that these four are at the tail-end of a near-Poisson
distribution,
not a Bell curve. So this is unlikely a case of simply being wide
dispersion of listening abilities, a speculation further supported by their
being drawn from a fairly
coherent population of musicians in training. So even though additional
testing was not done, it is reasonable to assume that these four truly did
hear differences.

Oustanding paper ! Thanks for passing it on.

wrote:
Found this JAES report that I thought some mioght find interesting, and some
will not be happy about.

Better to be informed than to guess.

Find it he
http://www.hfm-detmold.de/eti/projek...paper_6086.pdf

"Harry Lavo" wrote in message
...
" wrote in message
...
Found this JAES report that I thought some mioght find interesting, and
some
will not be happy about.

Better to be informed than to guess.

Find it he
http://www.hfm-detmold.de/eti/projek...paper_6086.pdf

This is the second time I have seen the article. It is certainly a test
that probably is as good an abx test as could be done. And I suppose
it's
origin arose out of the DVD-A vs. DSD wars that permeated 2001-2003. It
would have been more interesting if either had been compared to ordinary
CD.

Two things strike me after having read the article this second time:

* The authors are obviously believers in abx.

That puts them in line with the restof the scientific community doing audio
research.

Accordingly they downplay the
fact that there are differences as evidenced by four responders. But they
fail to follow up by extending the test for these four, thereby giving a
possible excuse as per standard abx practice.(a) Instead, they choose to
emphasize that 141 could hear no difference and they conclude that there
is
little difference between the technologies.

Those 4 did only with head phones, without them they did no better than
anyone else.

It would be more accurate to
say either a) there is a difference apparently, but most listeners can't
hear the difference, or b) the testing methodology used doesn't allow most
listeners to hear the difference, either of which could be true.

It would be more appropriate to say there might be a better chance of
hearing such differences, with headphones.

* They comment upon the stress and confusion of the test, but do their
best
to try to twist this into something positive, instead of reporting it for
what it was...a stressful situation for the testers.

I don't see anybody doing any twisting, unless it's you commentign on the
"beleif in ABX." ABX and ABC/hr are SOP for audio testing.

(a) Notice that these four are at the tail-end of a near-Poisson
distribution,
not a Bell curve. So this is unlikely a case of simply being wide
dispersion of listening abilities, a speculation further supported by
their
being drawn from a fairly
coherent population of musicians in training. So even though additional
testing was not done, it is reasonable to assume that these four truly did
hear differences.

Nobody is claiming they didn't only that for the vast majority of people,
such differences are unlikely to be heard without headphones. They didn't
do further testing because they got very good results and while I can't say
for certain, there probably were time and money constraints.

The bottom line seems to be that about 97-98% of people are not likely to
hear any differences between the 2 formats.

" wrote in message
...
"Harry Lavo" wrote in message
...
" wrote in message
...

snip

The bottom line seems to be that about 97-98% of people are not likely to
hear any differences between the 2 formats.

In this test. That's all you can say for sure. However it is not an
uncommon phenomenon in abx testing. Sean Olive reportedly has to screen out
the majority of potential testers because they cannot discriminate when he
starts training for his abx tests, even when testing for known differences
in sound.

Harry Lavo wrote:

In this test. That's all you can say for sure. However it is not an
uncommon phenomenon in abx testing. Sean Olive reportedly has to screen out
the majority of potential testers because they cannot discriminate when he
starts training for his abx tests, even when testing for known differences
in sound.

Sean Olive doesn't do ABX tests. He doesn't "screen out" potential
testers, either; the article Sully referred to used a couple of hundred
listeners. What he has done is assembled an expert listening panel,
specially trained to identify specific differences in frequency
response. That's a tough task, and not everyone can do it, even with
training. But it has nothing to do with either ABX or preference
testing.

This is the second time in a week you have misrepresented Mr. Olive's
work, Harry. I suggest you ceasse referring to it until you learn
something about it.

bob

In article , "Harry Lavo"
wrote (re
http://www.hfm-detmold.de/eti/projek...paper_6086.pdf):

It is certainly a test that probably is as good an abx test as could be
done.

Huh?

The authors are obviously believers in abx. Accordingly they downplay the
fact that there are differences as evidenced by four responders.

Four listeners getting at least 15 of 20 correct in 145 attempts is hardly
strong evidence. Is there any other evidence to support this "fact" that
there are differences?

Consider these facts:

For a single run of 20 trials, the probability that someone would get at
least 15 correct just by guessing is .0207, which is about 1 chance in 48.

In an experiment with 145 runs (of 20 trials each) the expected number of
apparently significant results (at the .05 level) is 3 if subjects are
just guessing. That is, over all such experiments, the average number of
20-trial runs with "significant" results is 3 per 145-attempt experiment.

The probability that just guessing in an experiment with 145 attempts to
get 15 or more correct in 20 trials would yield at least 4 successes is
about .3529, so seeing four apparently successful results is not unusual
enough to rule out chance.

wrote in message ...
Harry Lavo wrote:

In this test. That's all you can say for sure. However it is not an
uncommon phenomenon in abx testing. Sean Olive reportedly has to screen
out
the majority of potential testers because they cannot discriminate when
he
starts training for his abx tests, even when testing for known
differences
in sound.

Sean Olive doesn't do ABX tests. He doesn't "screen out" potential
testers, either; the article Sully referred to used a couple of hundred
listeners. What he has done is assembled an expert listening panel,
specially trained to identify specific differences in frequency
response. That's a tough task, and not everyone can do it, even with
training. But it has nothing to do with either ABX or preference
testing.

This is the second time in a week you have misrepresented Mr. Olive's
work, Harry. I suggest you ceasse referring to it until you learn
something about it.

Sean and Harman don't do ABX tests? Perhaps you should check that out with
your buddy Stewart. Here is one of many quotes from his postings:

"Difficult to pin down a post from that tiny quote, but I have
personally experienced numerous positive outcomes from ABX testing,

"*as you are well aware* and of course it's a plain fact that Floyd
Toole and the boys at Harman International get positive results all
the time, otherwise they wouldn't use it as a development tool (along
with KEF, B&W et al).

"Since I have stated all this before on several occasions, your wilful
ignorance of it does raise a question of 'cherry picking' only those
results which suits your preconceptions.

"The plain *fact* of the matter is that ABX is a very sensitive tool
for *revealing* subtle differences, which is why major corporations
use it as a development tool. "
SOURCE:

Newsgroups: rec.audio.high-end
From: (Stewart Pinkerton)
Date: Fri, 01 Nov 2002 18:24:43 GMT

Subject: HOW TO GET A POSITIVE ABX TEST.

Harman's involvment with ABX has been discussed here for years. Here's a
quote from 1998 from *somebody* you might recognize:

"In article ,

"Trotsky wrote:
Jacob, the ABX company went out of business years ago. You would think an
idea that has this much relevance to Audio Reality would have companies
lining up to license this technology, but in actuality all you have is a
bunch of guys in southern Michigan trumpeting the joys of the ABX
alternative lifestyle.

"References to ABX and ABC/hr tests have been posted here. The
companies cited include AT&T, Harmon, CRC, Swedish Radio, and others.
Standards orginazations cited have been MPEG-Audio, CCIR, and others.

"Your failure to read citations has no meaning, and reveals only your
failure to do your own homework.
--
"Copyright 1998, all rights reserved, except
transmission
by USENET and like facilities granted. This notice must be included. Any
use by a provider charging in any way for the IP represented in and by this
article and any inclusion in print or other media are specifically
prohibited. "

SOURCE:

Newsgroups: rec.audio.opinion
From: (jj, curmudgeon and tiring philalethist)
Date: 1998/09/02
Subject: ABX Practice and Usage. I quit because the discussion
ended two posts ago with personal attacks.

If Sean is screening for the ability to discriminate specific frequency
response emphasis, then it is for discrimination tests, right? No different
than training to hear codec anomalies. And what are the two leading audio
discrimination tests? ABX and it's close cousin ABC/hr.

Sean's work has been discussed here and elsewhere on Usenet for many
years(*), as the quote above from J.J. shows, and unless everybody involved
in the discussions has been in error, Harman uses discrimination tests.

And BTW, I wasn't referring to "the article Sully referred to". I made a
general reference to Sean and Harman-Kardon. Perhaps you should read more
carefully.

(*) A search for "ABX" and "Harman" brings up 109 posts.

wrote:
Harry Lavo wrote:

In this test. That's all you can say for sure. However it is not an
uncommon phenomenon in abx testing. Sean Olive reportedly has to screen out
the majority of potential testers because they cannot discriminate when he
starts training for his abx tests, even when testing for known differences
in sound.

Sean Olive doesn't do ABX tests. He doesn't "screen out" potential
testers, either; the article Sully referred to used a couple of hundred
listeners. What he has done is assembled an expert listening panel,
specially trained to identify specific differences in frequency
response. That's a tough task, and not everyone can do it, even with
training. But it has nothing to do with either ABX or preference
testing.

This is the second time in a week you have misrepresented Mr. Olive's
work, Harry. I suggest you ceasse referring to it until you learn
something about it.

In the work reported in the 2003 paper, Olive 'screened out' one
listener -- part of the group that underwent training at Harman to
become 'expert' listeners -- because his results were perfectly
'wrong' -- that is, they showed a perfect *negative* correlation
between loudspeaker preferences in 4-way and 3-way tests. As it turned
out, he suffered from broad-band hearing loss in one ear. All the
other listeners were audiometrically normal.

The various listeners, btw, consisted of audio retailers (n=250),
university students enrolled in engineering or music/recording
industry studies (14), field marketing and salespeople for Harman
(21), professional audio reviewers for popular audio and HT magazines
(6), and finally a set of Harman-trained 'expert' listeners (12),
divided into 36 groups ranging from 3 to 23 listeners per group (each
group, AFAICT, was 'monotypic' - only one 'type' of listener in each
group). Retailers, reviewers, and trained listeners took the 4-way
speaker comparison test; the 3-way comparison was performed by
retailers, trained listeners, marketers, and students.

Amusingly, when the 'listener performance' metric -- a measure of the
listener's ability to discriminate between loudspeakers, combined with
the consistence of their ratings -- was calculated for the different
listener occupations participating in the four-way loudspeaker test
(retailers, reviewers, and trained listeners), audio magazine
reviewers were found to have performed the *worst* on average (that is
, least discriminating and least reliable). In the three-way
loudspeaker tests (retailers, marketing people, students, trained
listeners) students tended to perform worst. In both tests trained
listeners performed best.

I quote: 'The reviewers' performance is something of a surprise given
that they are all paid to audition and review products for various
audiophile magazines. In terms of listening performance, they are
about equal to the marketing and sales people, who are well below the
performance of audio retailers and trained listeners."

That said, the other take-home message was that even with the
difference in performance, the rank order of the speakers by
preference was similar across all 36 listening groups groups -- the
various groups of listeners tended to converge on the same ideas of
'best' and 'worst' sound when they didn't know the brand and
appearance of the speaker. And the 'best' (most preferred)
loudspeakers had the smoothest, flattest and most extended frequency
responses maintained uniformly off axis, in acoustic anaechoic
measurements. This speaker had received a 'class A' rating for three
years running in one audiophile magazine. The least-preferred
loudspeaker was an electrostatic hybrid , and it also measured the
worst. This speaker had *also* received a class A rating for three
years running, and better still had been declared 'product of the
year', by the same audiophile mag (I wonder which?)

Another quote from Olive 2003, from the conclusion of the results
section: "It is the author's experience that most of the differences
in opinion about the sound quality of audio product(s) in our industry
are confounded by the influence of nuisance factors tha have nothing
to do with the product itself. These include differences in listening
rooms, loudspeaker positions, and personal prejudices (such as price,
brand, and reputation) known to strongly influence a person;s
judgement of sound quality (Toole & Olive, 1994). This study has only
reinforced this view. The remarkable consensus in loudspeaker
preference among these 268 listeners was only possible because the
judgements were all made under controlled double-blind listening
conditions."

"John Corbett" wrote in message
...
In article , "Harry Lavo"
wrote (re
http://www.hfm-detmold.de/eti/projek...paper_6086.pdf):

It is certainly a test that probably is as good an abx test as could be
done.

Huh?

Meaning, they paid attention to most details.

The authors are obviously believers in abx. Accordingly they downplay
the
fact that there are differences as evidenced by four responders.

Four listeners getting at least 15 of 20 correct in 145 attempts is hardly
strong evidence. Is there any other evidence to support this "fact" that
there are differences?

See below.

Consider these facts:

For a single run of 20 trials, the probability that someone would get at
least 15 correct just by guessing is .0207, which is about 1 chance in 48.

In an experiment with 145 runs (of 20 trials each) the expected number of
apparently significant results (at the .05 level) is 3 if subjects are
just guessing. That is, over all such experiments, the average number of
20-trial runs with "significant" results is 3 per 145-attempt experiment.

The probability that just guessing in an experiment with 145 attempts to
get 15 or more correct in 20 trials would yield at least 4 successes is
about .3529, so seeing four apparently successful results is not unusual
enough to rule out chance.

I didn't rule out chance...I said it was less likely since the distribution
was not a normal bell curve but rather something resembling a Poisson (an
inference, BTW, heightened by the fact that they all used earphones
suggesting that the differences were there but masked by room ambience.)
Since the test proctors did not do follow-up evaluation of the four who
scored well, it is impossible to know for sure whether or not these results
were real, or chance. I believe the Poisson distribution and the use of
headphones suggests "real".

Harry Lavo wrote:

Sean and Harman don't do ABX tests? Perhaps you should check that out with
your buddy Stewart.

Since when is Stewart an authority on the testing practices of a
company he does not work for? Once again, I suggest you try to actually
read Sean Olive's research before commenting on it further.

snip irrelevancies

If Sean is screening for the ability to discriminate specific frequency
response emphasis, then it is for discrimination tests, right?

Wrong.

No different
than training to hear codec anomalies.

Wrong.

And what are the two leading audio
discrimination tests? ABX and it's close cousin ABC/hr.

Sean's work has been discussed here and elsewhere on Usenet for many
years(*), as the quote above from J.J. shows, and unless everybody involved
in the discussions has been in error, Harman uses discrimination tests.

And BTW, I wasn't referring to "the article Sully referred to". I made a
general reference to Sean and Harman-Kardon. Perhaps you should read more
carefully.

Perhaps you should read, period. I was offering that as an example of
the kind of work Olive does with untrained listeners.

(*) A search for "ABX" and "Harman" brings up 109 posts.

So what? You've read second- and third-hand Usenet posts. I've read a
fair chunk of Olive's research. Which of us is more likely to know what
Olive's research is about?

Now, somebody at Harman may use ABX tests. But they've never appeared
in Olive's work that I know of(or you, since know nothing of his work).
And your statement wasn't about "Harman." It was about Olive. And it
was wrong.

bob

Harry Lavo wrote:
" wrote in message
...
"Harry Lavo" wrote in message
...
" wrote in message
...

snip

The bottom line seems to be that about 97-98% of people are not likely to
hear any differences between the 2 formats.

In this test. That's all you can say for sure.

And in other tests, that percentage may even by higher, especially if
speakers were used instead of headphones. Seems pretty conclusive to me
that the difference between SACD and DVD-A is immaterial. Quite
different than what some audiophiles had thought previously, isn't it?

However it is not an
uncommon phenomenon in abx testing.

It is not uncommon that in any large number of trials, there will be
testees with high scores. If everyone simply guesses by tossing a coin,
there will be some scoring a high percentage of right answers. It would
have been interesting to retest those 4 testees, or let them try using
speakers instead of headphones.

Sean Olive reportedly has to screen out
the majority of potential testers because they cannot discriminate when he
starts training for his abx tests, even when testing for known differences
in sound.

Of course even when there are known differences, not everyone can detect
those differences. The known differences may be below the threshold of
detectible differences, or maybe some of us do not have as good a
listening acuity as we like.

Harry Lavo wrote:
"John Corbett" wrote in message
...
In article , "Harry Lavo"
wrote (re
http://www.hfm-detmold.de/eti/projek...paper_6086.pdf):

It is certainly a test that probably is as good an abx test as could be
done.

Huh?

Meaning, they paid attention to most details.

Are there any details that they missed?

The authors are obviously believers in abx. Accordingly they downplay
the
fact that there are differences as evidenced by four responders.

Four listeners getting at least 15 of 20 correct in 145 attempts is hardly
strong evidence. Is there any other evidence to support this "fact" that
there are differences?

See below.

Consider these facts:

For a single run of 20 trials, the probability that someone would get at
least 15 correct just by guessing is .0207, which is about 1 chance in 48.

In an experiment with 145 runs (of 20 trials each) the expected number of
apparently significant results (at the .05 level) is 3 if subjects are
just guessing. That is, over all such experiments, the average number of
20-trial runs with "significant" results is 3 per 145-attempt experiment.

The probability that just guessing in an experiment with 145 attempts to
get 15 or more correct in 20 trials would yield at least 4 successes is
about .3529, so seeing four apparently successful results is not unusual
enough to rule out chance.

I didn't rule out chance...

Well, Harry, you have to understand what you wrote. Here is what you wrote:

"Accordingly they downplay the fact that there are differences as
evidenced by four responders."

You said that it is a fact that there are differences. No, you ruled out
chance, Harry.

Harry Lavo wrote:
" wrote in message
...
Found this JAES report that I thought some mioght find interesting, and
some
will not be happy about.

Better to be informed than to guess.

Find it he
http://www.hfm-detmold.de/eti/projek...paper_6086.pdf

This is the second time I have seen the article. It is certainly a test
that probably is as good an abx test as could be done. And I suppose it's
origin arose out of the DVD-A vs. DSD wars that permeated 2001-2003. It
would have been more interesting if either had been compared to ordinary CD.

Two things strike me after having read the article this second time:

* The authors are obviously believers in abx. Accordingly they downplay the
fact that there are differences as evidenced by four responders. But they
fail to follow up by extending the test for these four, thereby giving a
possible excuse as per standard abx practice.(a) Instead, they choose to
emphasize that 141 could hear no difference and they conclude that there is
little difference between the technologies. It would be more accurate to
say either a) there is a difference apparently, but most listeners can't
hear the difference, or b) the testing methodology used doesn't allow most
listeners to hear the difference, either of which could be true.

* They comment upon the stress and confusion of the test, but do their best
to try to twist this into something positive, instead of reporting it for
what it was...a stressful situation for the testers.

(a) Notice that these four are at the tail-end of a near-Poisson
distribution,
not a Bell curve. So this is unlikely a case of simply being wide
dispersion of listening abilities, a speculation further supported by their
being drawn from a fairly
coherent population of musicians in training. So even though additional
testing was not done, it is reasonable to assume that these four truly did
hear differences.

But because there were switching noises involved, as discussed at
length on page 8, one has to question whether the difference heard by those four,
only when doing stereo headphone
listening, was between DSD and PCM per se, or whether they were being cued
by the noises.

Apparently the four 'successes' are discussed at even greater length
in the German master's thesis. The authors in the preprint certainly
do not rule out hte possibility that the four heard a real difference.

But basically, given the extraordinary performance of four out of 145
similar listeners, the authors need to retest those four, under
conditions where switching noise is not a factor, before they
can make a claim of difference stronger than tham they have.

--
-S
"The most appealing intuitive argument for atheism is the mindblowing stupidity of religious
fundamentalists." -- Ginger Yellow

Chung wrote:
Harry Lavo wrote:
"John Corbett" wrote in message
...
In article , "Harry Lavo"
wrote (re
http://www.hfm-detmold.de/eti/projek...paper_6086.pdf):

It is certainly a test that probably is as good an abx test as could be
done.

Huh?

Meaning, they paid attention to most details.

Are there any details that they missed?

They acknowledge two possible flaws in their work:

1) switching noise that they tried very hard to eliminate,
but could not

2) lack of time to re-test the four outliers.

Which is a shame in both cases, but given that this appears to
have been master's thesis work, perhaps understandable.

--
-S
"The most appealing intuitive argument for atheism is the mindblowing stupidity of religious
fundamentalists." -- Ginger Yellow

"Harry Lavo" wrote in message
...
"John Corbett" wrote in message
...
In article , "Harry Lavo"
wrote (re
http://www.hfm-detmold.de/eti/projek...paper_6086.pdf):

It is certainly a test that probably is as good an abx test as could be
done.

Huh?

Meaning, they paid attention to most details.

See below.

I didn't rule out chance...I said it was less likely since the
distribution
was not a normal bell curve but rather something resembling a Poisson (an
inference, BTW, heightened by the fact that they all used earphones
suggesting that the differences were there but masked by room ambience.)

But aren't the subjectivists always saying that listening should be done in
real world applications? I suggest to you that there is vastly more
lsitening done by audiophiles in situations where there is room ambience
than through headphones.

Since the test proctors did not do follow-up evaluation of the four who
scored well, it is impossible to know for sure whether or not these
results
were real, or chance. I believe the Poisson distribution and the use of
headphones suggests "real".

Of course you do.

The fact still remains that even if you wish to argue that there might be
that tiny percentage of people who actually heard a difference, they are
very likely not the type of people who might work for a subjectivist audio
magazine. They are still not able to hear such differences without
headphones, making the differences virtually the same as non-existent in
normal conditions.

"Chung" wrote in message
...
Harry Lavo wrote:
"John Corbett" wrote in message
...
In article , "Harry Lavo"
wrote (re
http://www.hfm-detmold.de/eti/projek...paper_6086.pdf):

It is certainly a test that probably is as good an abx test as could be
done.

Huh?

Meaning, they paid attention to most details.

Are there any details that they missed?

Yes, they didn't do the follow up tests. And they allowed communication
between the tests.

The authors are obviously believers in abx. Accordingly they downplay
the
fact that there are differences as evidenced by four responders.

Four listeners getting at least 15 of 20 correct in 145 attempts is
hardly
strong evidence. Is there any other evidence to support this "fact"
that
there are differences?

See below.

Consider these facts:

For a single run of 20 trials, the probability that someone would get at
least 15 correct just by guessing is .0207, which is about 1 chance in
48.

In an experiment with 145 runs (of 20 trials each) the expected number
of
apparently significant results (at the .05 level) is 3 if subjects are
just guessing. That is, over all such experiments, the average number
of
20-trial runs with "significant" results is 3 per 145-attempt
experiment.

The probability that just guessing in an experiment with 145 attempts to
get 15 or more correct in 20 trials would yield at least 4 successes is
about .3529, so seeing four apparently successful results is not unusual
enough to rule out chance.

I didn't rule out chance...

Well, Harry, you have to understand what you wrote. Here is what you
wrote:

"Accordingly they downplay the fact that there are differences as
evidenced by four responders."

You said that it is a fact that there are differences. No, you ruled out
chance, Harry.

There *were* differences...the issue is whether or not they were due to
chance.
They outline all the reasons the differences were suspect, and their
conclusion virtually ignores that apparently there was a difference. I'll
stand by my statement, and the alternative conclusions that would have been
more accurate.

"Steven Sullivan" wrote in message
...
wrote:
Harry Lavo wrote:

In this test. That's all you can say for sure. However it is not an
uncommon phenomenon in abx testing. Sean Olive reportedly has to
screen out
the majority of potential testers because they cannot discriminate when
he
starts training for his abx tests, even when testing for known
differences
in sound.

Sean Olive doesn't do ABX tests. He doesn't "screen out" potential
testers, either; the article Sully referred to used a couple of hundred
listeners. What he has done is assembled an expert listening panel,
specially trained to identify specific differences in frequency
response. That's a tough task, and not everyone can do it, even with
training. But it has nothing to do with either ABX or preference
testing.

This is the second time in a week you have misrepresented Mr. Olive's
work, Harry. I suggest you ceasse referring to it until you learn
something about it.

In the work reported in the 2003 paper, Olive 'screened out' one
listener -- part of the group that underwent training at Harman to
become 'expert' listeners -- because his results were perfectly
'wrong' -- that is, they showed a perfect *negative* correlation
between loudspeaker preferences in 4-way and 3-way tests. As it turned
out, he suffered from broad-band hearing loss in one ear. All the
other listeners were audiometrically normal.

The various listeners, btw, consisted of audio retailers (n=250),
university students enrolled in engineering or music/recording
industry studies (14), field marketing and salespeople for Harman
(21), professional audio reviewers for popular audio and HT magazines
(6), and finally a set of Harman-trained 'expert' listeners (12),
divided into 36 groups ranging from 3 to 23 listeners per group (each
group, AFAICT, was 'monotypic' - only one 'type' of listener in each
group). Retailers, reviewers, and trained listeners took the 4-way
speaker comparison test; the 3-way comparison was performed by
retailers, trained listeners, marketers, and students.

Amusingly, when the 'listener performance' metric -- a measure of the
listener's ability to discriminate between loudspeakers, combined with
the consistence of their ratings -- was calculated for the different
listener occupations participating in the four-way loudspeaker test
(retailers, reviewers, and trained listeners), audio magazine
reviewers were found to have performed the *worst* on average (that is
, least discriminating and least reliable). In the three-way
loudspeaker tests (retailers, marketing people, students, trained
listeners) students tended to perform worst. In both tests trained
listeners performed best.

I quote: 'The reviewers' performance is something of a surprise given
that they are all paid to audition and review products for various
audiophile magazines. In terms of listening performance, they are
about equal to the marketing and sales people, who are well below the
performance of audio retailers and trained listeners."

That said, the other take-home message was that even with the
difference in performance, the rank order of the speakers by
preference was similar across all 36 listening groups groups -- the
various groups of listeners tended to converge on the same ideas of
'best' and 'worst' sound when they didn't know the brand and
appearance of the speaker. And the 'best' (most preferred)
loudspeakers had the smoothest, flattest and most extended frequency
responses maintained uniformly off axis, in acoustic anaechoic
measurements. This speaker had received a 'class A' rating for three
years running in one audiophile magazine. The least-preferred
loudspeaker was an electrostatic hybrid , and it also measured the
worst. This speaker had *also* received a class A rating for three
years running, and better still had been declared 'product of the
year', by the same audiophile mag (I wonder which?)

Another quote from Olive 2003, from the conclusion of the results
section: "It is the author's experience that most of the differences
in opinion about the sound quality of audio product(s) in our industry
are confounded by the influence of nuisance factors tha have nothing
to do with the product itself. These include differences in listening
rooms, loudspeaker positions, and personal prejudices (such as price,
brand, and reputation) known to strongly influence a person;s
judgement of sound quality (Toole & Olive, 1994). This study has only
reinforced this view. The remarkable consensus in loudspeaker
preference among these 268 listeners was only possible because the
judgements were all made under controlled double-blind listening
conditions."

Steven, I still haven't recieved my copy. One thing I wonder about....how
did the revolving turntable, everything in the same position aspect affect
the results, especially for the "worst" speaker. Dipoles generally require
careful placement with regards to room boundries to get the right balance of
direct/reflected sound. It would seem to me that the optimum placement for
the other three may have been different than what would have been optimium
for the dipole.

Also, I just note in passing that Harman does not make a dipole, so perhaps
there was less care/understanding/consideration given to this aspect by the
test team?

If the article addresses this aspect, perhaps you can comment?

"chung" wrote in message
...
Harry Lavo wrote:
" wrote in message
...
"Harry Lavo" wrote in message
...
" wrote in message
...

snip

The bottom line seems to be that about 97-98% of people are not likely
to
hear any differences between the 2 formats.

In this test. That's all you can say for sure.

And in other tests, that percentage may even by higher, especially if
speakers were used instead of headphones. Seems pretty conclusive to me
that the difference between SACD and DVD-A is immaterial. Quite different
than what some audiophiles had thought previously, isn't it?

However it is not an uncommon phenomenon in abx testing.

It is not uncommon that in any large number of trials, there will be
testees with high scores. If everyone simply guesses by tossing a coin,
there will be some scoring a high percentage of right answers. It would
have been interesting to retest those 4 testees, or let them try using
speakers instead of headphones.

Sean Olive reportedly has to screen out the majority of potential
testers because they cannot discriminate when he starts training for his
abx tests, even when testing for known differences in sound.

Of course even when there are known differences, not everyone can detect
those differences. The known differences may be below the threshold of
detectible differences, or maybe some of us do not have as good a
listening acuity as we like.

I see. As an objectivist we (not you, necessarily) broadcast Harman as one
of the leaders in use of ABX in the industry when it suits as, and extoll
Sean as the epitome of Harman testing along with Floyd Toole, but when it
comes to the specifics they don't use ABX.

Hmmm, interesting.

" wrote in message
...
"Harry Lavo" wrote in message
...
"John Corbett" wrote in message
...
In article , "Harry Lavo"
wrote (re
http://www.hfm-detmold.de/eti/projek...paper_6086.pdf):

It is certainly a test that probably is as good an abx test as could be
done.

Huh?

Meaning, they paid attention to most details.

See below.

I didn't rule out chance...I said it was less likely since the
distribution
was not a normal bell curve but rather something resembling a Poisson (an
inference, BTW, heightened by the fact that they all used earphones
suggesting that the differences were there but masked by room ambience.)

But aren't the subjectivists always saying that listening should be done
in
real world applications? I suggest to you that there is vastly more
lsitening done by audiophiles in situations where there is room ambience
than through headphones.

Of course this is true. But if you have ever moved into direct range of
your speakers, or as I do listen to music at night while falling asleep, and
you do it with good headphones, you become well aware that you can hear much
more detail thatn normal in-room listening (not however, that listening to
music on computers with crappy earphones this is not always or perhaps even
usually true.)

Since the test proctors did not do follow-up evaluation of the four who
scored well, it is impossible to know for sure whether or not these
results
were real, or chance. I believe the Poisson distribution and the use of
headphones suggests "real".

Of course you do.

The fact still remains that even if you wish to argue that there might be
that tiny percentage of people who actually heard a difference, they are
very likely not the type of people who might work for a subjectivist audio
magazine. They are still not able to hear such differences without
headphones, making the differences virtually the same as non-existent in
normal conditions.

They were musician/engineers in training. Why would not some of them end up
reviewing equipment, as many of the folks on RAP do?

"Steven Sullivan" wrote in message
...
Chung wrote:
Harry Lavo wrote:
"John Corbett" wrote in message
...
In article , "Harry Lavo"
wrote (re
http://www.hfm-detmold.de/eti/projek...paper_6086.pdf):

It is certainly a test that probably is as good an abx test as could
be
done.

Huh?

Meaning, they paid attention to most details.

Are there any details that they missed?

They acknowledge two possible flaws in their work:

1) switching noise that they tried very hard to eliminate,
but could not

2) lack of time to re-test the four outliers.

Which is a shame in both cases, but given that this appears to
have been master's thesis work, perhaps understandable.

Thank you Steven, I forgot the "clicks". Although I suspect they made too
much of them, since their own testing showed the subjects to be unawares.

In article , "Harry Lavo"
wrote:

"Steven Sullivan" wrote in message
...
Chung wrote:
Harry Lavo wrote:
"John Corbett" wrote in message
...
In article , "Harry Lavo"
wrote (re

http://www.hfm-detmold.de/eti/projek...paper_6086.pdf):

It is certainly a test that probably is as good an abx test as could
be
done.

Huh?

Meaning, they paid attention to most details.

Are there any details that they missed?

They acknowledge two possible flaws in their work:

1) switching noise that they tried very hard to eliminate,
but could not

2) lack of time to re-test the four outliers.

Which is a shame in both cases, but given that this appears to
have been master's thesis work, perhaps understandable.

Thank you Steven, I forgot the "clicks". Although I suspect they made too
much of them, since their own testing showed the subjects to be unawares.

There are several problems with the paper and with the discussion about it
in this thread.

The authors (and Harry Lavo, too) overlooked the issue of multiple tests.
For a single run of 20 trials, getting 15 or more correct would be
significant at the .05 level. But that's not what they did---they did 145
of those runs. Then it is much more likely that here will be some runs of
20 where 15 or more are correct. In fact, the probability of getting at
least one run of 20 with at least 15 correct is over .95, so it would be
"significant" if they did _not_ get some apparently successful results!

Their design is wastefully inefficient. They used a total of 2900 ABX
trials, but they could have used fewer than 1800 and still had an
experiment with type 1 error risk .05 and with 99% chance of detecting
even threshold-level effects.

The plan was to compare DSD and PCM, but in fact they compared DSD+noise1
to PCM+noise2; in other words, the switching noises were confounded with
the effects under study. The experiment actually performed cannot
separate the effects of DSD and PCM from the effects of the different
switching noises. Merely saying that the subjects didn't think they were
affected doesn't do it.

One way to correct for multiple tests is to require 18 (instead of 15)
correct for each 20-trial run; then there would have been two (not four)
apparently successful subjects. However, due to the confounding, we
cannot say that what they were responding to was actually a difference
between DSD and PCM. Of course the authors should have retested those two
subjects (with the switching noise problem fixed). But they didn't, so
the evidence to support Harry Lavo's view simply is not there.

Harry also claimed:

Notice that these four are at the tail-end of a near-Poisson distribution,
not a Bell curve.

and

... the distribution was not a normal bell curve but rather something
resembling a Poisson ...

and

I didn't rule out chance...I said it was less likely since the distribution
was not a normal bell curve but rather something resembling a Poisson (an
inference, BTW, heightened by the fact that they all used earphones
suggesting that the differences were there but masked by room ambience.)
Since the test proctors did not do follow-up evaluation of the four who
scored well, it is impossible to know for sure whether or not these results
were real, or chance. I believe the Poisson distribution and the use of
headphones suggests "real".

What is Harry talking about? I am a mathematician and statistician, and I
know what a Poisson distribution is. Somehow I'm not sure Harry
does---this distributional argument sounds fishy to me. ;-)

Maybe he's referring to Figure 10 in the paper, but that does _not_ appear
to be from a Poisson distribution. If the subjects were guessing, we
would expect their scores to follow a binomial distribution with mean 10
and standard deviation sqrt(5) = 2.236; that binomial distribution can be
approximated by a normal distribution. These data have sample mean 10.03
and sample standard deviation 2.424, so they're not too far from what we'd
expect. What is the Poisson distribution that Harry thinks is a better
fit?

(If the subjects are guessing, the distribution of the total number of
successful 20-trial runs in the entire experiment of 145 runs is binomial
and can be approximated by a Poisson distribution with mean 3. However,
the paper does not mention this, and all we have is one point from such a
distribution anyway. I doubt that is what Harry means.)

Thread Tools
Show Printable Version
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
OT Political	Blind Joni	Pro Audio	337	September 25th 04 03:34 AM
Topic Police	Steve Jorgensen	Pro Audio	85	July 9th 04 11:47 PM
DNC Schedule of Events	BLCKOUT420	Pro Audio	2	July 8th 04 04:19 PM

Menu

About Us