View Single Post
  #3   Report Post  
S888Wheel
 
Posts: n/a
Default Why DBTs in audio do not deliver (was: Finally ... The Furutech

I said

Wrong.In the Dave Clark test listener #2 got 30/48 correct with a
statistical
relaibility of hearing a difference of 94% Listener #6 got 26/48 with a
statistical probablity of 84% chance of hearing differences. Listener

#15
got
15/21 correct with an 81% chance of

hearing a difference.


Keith hughs said

I'd like to know how these probabilities were calculated. For
example, in the case where the listener heard a difference in 30
of 48 trials, one can do a simple F-test between populations (i.e.
30/48 versus 24/48 - the expectation if results are due to random
guessing). For this example, F-critical is 1.623, with
F-calculated of 1.067, i.e. the population
variances are not
different at the .05 level. One can then compare the means of the
populations using a simple t-test for populations with equal
variances. The t-critical is then 1.99 - two-tailed, or 1.66 -
one-tailed, with t-calculated = 1.23. Thus the means (i.e. RESULTS
of the listener vs. random guessing) are
not statistically
different, at the 0.05 (95% probability) level. Nor are they
significant at the 0.10 (90% confidence) level.


I got the numbers from the article.

Bob said


I don't recall this article, but this conclusion seems to be well
supported by the data you cite. If the best performance of the group
wasn't statistically significant at a 95% confidence level, then it's
perfectly reasonable to say that no listener was able to identify the
amps in the test. (Note: Saying they couldn't is not the same as
saying they can't. As I noted above, we can never say definitively
that they can't; we can only surmise from their--and everybody
else's--inability to do so.)


I said


That is ridiculous. If all of them scored 94% it would be reasonable to

say
this?



Kieth said


I believe Bob is talking about 95% confidence interval, *not* 95%
scores.


Yes I know.

Kieth said

And yes, it is very common to require a 95% confidence
level.


How often is 94% confidence level results regarded as a null when one is
seeking 95% confidence results which is in fact a subjective choice. Bottom
line is the tests were inconclusive in this particlular case.

I said


No. It all depends on how they fall into the bell curve. but even this is
problematic for two reasons. 1. the listeners were never tested for

sensitivty
to subtle diferences. The abilities of the participants will profoundly

affect
any bell curve.



Kieth said

No, the variance in those abilities are the *cause* of the bell
curve.


I was using the predicted bell curve outcome one would get if there are no
audible differences. That bell curve is dictated by the number of samples.

Kieth said

That's why sample size (panel size in this context) is so
important, and why the weight given to one or two individuals
performances, in any test, must be limited.


I took that into consideration when I claimed the test results were
inconclusive. That is why I suggested the thing that should have been done was
further testing of those individuals and the equipment that scored near or
beyond that which the bell curve predicts. That is why i don't run around
claiming this test proves that some people hear differences. Let's also not
forget that we have no tests on listener or system sensitivity to subtle
audible differences. So we have unknown variables. Further the use of many amps
in no obvious pattern for comparisons introduces another variable that could
profoundly affect the bell curve in unknown ways if some amps sound like each
other and some don't. All of this leaves the interpretation of those results
wide open.

Kieth said

Also why such high
criteria such as a 95% CI are applied. Having a limited
population, one cannot alway assume a bell curve, making outliers
more difficult to identify.


Sorry but this is in some ways an arbitrary number. If a dozen people took such
tests and half of them scored between 90% and 94% confidence then this 95%
limmit would be an unreasonable one. OTOH if a hundred people took the test and
one scored at 95% confidence I think you could argue that this does fall within
the predicted bell curve of a null result. As it is, as far as i can tell some
of the results of the Clark test are close enough to the edge of the bell curve
prediction or beyond it to warrent further investigation and to preclude anyone
of drawing definitive conclusions. Statistical analysis is not my strength in
math but that is how it looks o me.

I said

2. many different amps were used. We have no way of knowing
that we didn't have a mix of some amps sounding different and some sounding

the
same. The Counterpoint amp not only was identified with a probablity of 94%

.
Given there wre 8 idfferent combinations it fell out of the predicted bell
curve if my math is right.


Kieth said


What math did you use? Not having the data your using to hand, I'm
curious how you calculated such a high probability.


I used the numbers given in the article. The article gave a 94% confidence
level on the Counterpoint amp compared to an NAD amp. There were a total of 8
comparisons between different amps. I was rounding by the way. The number is
actually 94.4%.

I said


Bottom line is you cannot draw definitive
conclusions either way. If the one listener had made one more correct ID he
would have been well above 94%. I doubt that one can simply make 95%
probability a barrier of truth.



Kieth said


You should read more scientific literature then. It is the most
commonly used confidence level IME. Especially with limited
population size.


I will look into it but i will be quite surprised if that number does not
heavily depend on the sample sizes. It has to vary with sample size.

I said


It is ridiculous. Bell curves don't work that
way.


Kieth said

Nonsense, of course they do. The fact that there are two-tailed
bell curves, in most population responses, is the genesis for use
of high confidence intervals. Because there *are* tails - way out
on the edges of the population, allowance for these tails must be
made when comparing populations for significant heterogeneity.


Maybe you didn't get what i was trying to say. What does and does not fall
within the predictions of a bell curve depends heavily on the number of
samples.

I said


No foot stamping is needed. The test was quite inconclusive.


Kieth said

From what's been presented here, I'd say is was quite conclusive.
The data do not support rejecting the null hypothesis (i.e. that
the amps sound the same), for the test population, under the test
protocol and conditions used. Not knowing the population size, or
protocol used, I wouldn't venture an opinion on whethere it may or
may not be applicable to the broader
population.


So you are drawing definitive conclusions from one test without even knowing
the sample size? I think you are leaping without looking. You are entitled to
your opinions. I don't find your arguements convincing so far.

I said


Had the test been
put infront of a scientific peer review with the same data and the same
conclusions that panel would have sent it back for corrections. The

analysis
was wrong scientifically speaking.


Kieth said


Doesn't appear to be based on what's been posted here. You appear
to think that definitive results are required, in support of some
postulate, for the conclusion to be valid (or acceptable for a
review board). This is simply incorrect. Oftentimes, the data
obtained from a test are not useful, or positive, relative to
supporting a postulate, but that does not invalidate the data.
Clearly, failure to reject the null hypothesis, in any experiment,
does not invalidate the test, nor is it something requiring
"correction". It is merely part of the database that the author,
and others, build on in future research.


Where do I claim definitve results are required in support of some postulate? I
would say definitve results are required for one to make claims of definitve
results. Certainly such a test as a part of a body of evidence could be seen as
supportive but even that I think is dodgy given some of the results as far as I
can see would call for further testing and the absense of listener sensitivty
alone makes it impossible to make any specific claims about the results. I
never said the test was not valid due to it's failure to reject the null. I
simply said I think the results are such that calls for further investigation
and are on the border between a null a positive. Not the sort of thing one can
base definitive conclusions on.

Kieth said


It's instructive to also note that peer reviewed journals publish,
not infrequently, two or more studies that contradict one another.
You seem to believe this can't be the case, because the "wrong"
ones would be sent back for "correction".


Nonsense. what I think will be sent back is a poor analysis. If the analysis
fits the data it won't be sent back. If someone is making claims of definitive
conclusions based on this test one is jumping the gun. Scientific research
papers with huge sample sizes and apparent definitive rsults are ususally
carefully worded to not make definitive claims. That is a part of propper
scientific prudence.

Kieht said

ones would be sent back for "correction". A look through some
select peer reviewed journals, such as the "Journal of
Pharmacuetical Science and Technology", will show that to be
mistaken.


Really? I'd like to see any peer reviewed published studies thatmake definitive
claims based on sample sizes of 25 participants especially when one had a 94%
confidence score.

Kieth said

Often in the case of biological studies, conflicting
data are obtained, often due to unknown (at the time) and/or
uncontrolled variables (often inherent to the specific population
under study). These data, while contradictory, are still of great
value for future studies.


I never suggested the data should be ignored or was valueless, only that it was
inconclusive. I think the many uncontroled variables in this specific test are
problematic. Don't you?