View Single Post
  #22   Report Post  
John Corbett
 
Posts: n/a
Default DVD audio vs. SACD

In article , "Harry Lavo"
wrote:

"Steven Sullivan" wrote in message
...
Chung wrote:
Harry Lavo wrote:
"John Corbett" wrote in message
...
In article , "Harry Lavo"
wrote (re

http://www.hfm-detmold.de/eti/projek...paper_6086.pdf):


It is certainly a test that probably is as good an abx test as could
be
done.

Huh?


Meaning, they paid attention to most details.


Are there any details that they missed?


They acknowledge two possible flaws in their work:

1) switching noise that they tried very hard to eliminate,
but could not

2) lack of time to re-test the four outliers.

Which is a shame in both cases, but given that this appears to
have been master's thesis work, perhaps understandable.


Thank you Steven, I forgot the "clicks". Although I suspect they made too
much of them, since their own testing showed the subjects to be unawares.


There are several problems with the paper and with the discussion about it
in this thread.

The authors (and Harry Lavo, too) overlooked the issue of multiple tests.
For a single run of 20 trials, getting 15 or more correct would be
significant at the .05 level. But that's not what they did---they did 145
of those runs. Then it is much more likely that here will be some runs of
20 where 15 or more are correct. In fact, the probability of getting at
least one run of 20 with at least 15 correct is over .95, so it would be
"significant" if they did _not_ get some apparently successful results!

Their design is wastefully inefficient. They used a total of 2900 ABX
trials, but they could have used fewer than 1800 and still had an
experiment with type 1 error risk .05 and with 99% chance of detecting
even threshold-level effects.

The plan was to compare DSD and PCM, but in fact they compared DSD+noise1
to PCM+noise2; in other words, the switching noises were confounded with
the effects under study. The experiment actually performed cannot
separate the effects of DSD and PCM from the effects of the different
switching noises. Merely saying that the subjects didn't think they were
affected doesn't do it.

One way to correct for multiple tests is to require 18 (instead of 15)
correct for each 20-trial run; then there would have been two (not four)
apparently successful subjects. However, due to the confounding, we
cannot say that what they were responding to was actually a difference
between DSD and PCM. Of course the authors should have retested those two
subjects (with the switching noise problem fixed). But they didn't, so
the evidence to support Harry Lavo's view simply is not there.

Harry also claimed:

Notice that these four are at the tail-end of a near-Poisson distribution,
not a Bell curve.


and

... the distribution was not a normal bell curve but rather something
resembling a Poisson ...


and

I didn't rule out chance...I said it was less likely since the distribution
was not a normal bell curve but rather something resembling a Poisson (an
inference, BTW, heightened by the fact that they all used earphones
suggesting that the differences were there but masked by room ambience.)
Since the test proctors did not do follow-up evaluation of the four who
scored well, it is impossible to know for sure whether or not these results
were real, or chance. I believe the Poisson distribution and the use of
headphones suggests "real".


What is Harry talking about? I am a mathematician and statistician, and I
know what a Poisson distribution is. Somehow I'm not sure Harry
does---this distributional argument sounds fishy to me. ;-)

Maybe he's referring to Figure 10 in the paper, but that does _not_ appear
to be from a Poisson distribution. If the subjects were guessing, we
would expect their scores to follow a binomial distribution with mean 10
and standard deviation sqrt(5) = 2.236; that binomial distribution can be
approximated by a normal distribution. These data have sample mean 10.03
and sample standard deviation 2.424, so they're not too far from what we'd
expect. What is the Poisson distribution that Harry thinks is a better
fit?

(If the subjects are guessing, the distribution of the total number of
successful 20-trial runs in the entire experiment of 145 runs is binomial
and can be approximated by a Poisson distribution with mean 3. However,
the paper does not mention this, and all we have is one point from such a
distribution anyway. I doubt that is what Harry means.)