Thread: Red Meat on ABX
View Single Post
  #14   Report Post  
John Corbett
 
Posts: n/a
Default

In article , "Arny Krueger"
wrote:


Leventhal's little rant passed peer review just long enough for subsequent
disembowelment.


Arny, are you really stupid enough to belive what you claim?




Then there are the subsequent JAES papers that disemboweled Leventhal's
papers:

Comments on "Type 1 and Type 2 Errors in the Statistical Analysis of
Listening Tests" and Author's Replies 674942 bytes (CD aes4)
Author(s): Shanefield, Daniel; Clark, David; Nousaine, Tom; Leventhal, Les
Publication: Volume 35 Number 7/8 pp. 567·572; July 1987


This is a letter to the editor, and not a regular journal article.
See below for detailed comments on this, after we check the only real
journal articles you cited.




Transformed Binomial Confidence Limits for Listening Tests 468821 bytes (CD
aes5)
Author(s): Burstein, Herman
Publication: Volume 37 Number 5 pp. 363·367; May 1989
Abstract: A simple transformation of classical binomial confidence limits
provides exact confidence limits for the results of a listening test, such
as the popular ABX test. These limits are for the proportion of known
correct responses, as distinguished from guessed correct responses.
Similarly, a point estimate is obtained for the proportion of known correct
responses. The transformed binomial limits differ, often markedly, from
those obtained by the Bayesian method.


There is nothing in this article that debunks Leventhal's 1986 JAES
article "Type 1 and Type 2 Errors in the Statistical Analysis of Listening
Tests" which is the final version of the preprint.

The only places that Burstein even mentions Leventhal's JAES article are
in footnotes:

"Leventhal in [3] also justly suggests that c/n be evaluated in terms
of the probability of type 2 error."

and

"The relationship between p_k and p_c in Eq. (2) also applies to other
instances where it is desired to differentiate between the known response
rate and the correct response rate. For example, Leventhal [3] in his
Table 3 lists several effect sizes, namely, hypothesized correct-response
rates in the population. These can be converted to known-response rates
by Eq. (2). Thus an effect size of 0.75 becomes a known-response rate of
(2 X 0.75) - 1 = 0.50. It is assumed here, as in Leventhal's Table 3, that
a subject is choosing between two components. If more than two components
are involved, it would be necessary to use Eq. (2a) below."

Other than a bibliography entry, that is it.

Burstein does mention a different Leventhal paper (in the BAS Speaker) and
he indicates that he prefers p_k to p_c, but these are equivalent ways to
express effect size. There is no debunking of Leventhal in this paper.

Arny, you claimed this disemboweled Leventhal's paper, but
you can't find anything in this paper to support your claim.

Put up or shut up.







Approximation Formulas for Error Risk and Sample Size in ABX Testing 442116
bytes (CD aes4)
Author(s): Burstein, Herman
Publication: Volume 36 Number 11 pp. 879·883; November 1988
Abstract: When sampling from a dichotomous population with an assumed
proportion p of events having a defined characteristic, the binomial
distribution is the appropriate statistical model for accurately
determining: type 1 error risk (symbol); type 2 error risk (symbol); sample
size n based on specified (symbol) and (symbol) and assumptions about p; and
critical c (minimum number of events to satisfy a specified [symbol]). Table
3 in [1] pre;sents such data for a limited number of sample sizes and p
values. To extend the scope of Table 3 to most n and p, we present
approximation formulas of substantial accuracy, based on the normal
distribution as an approximation of the binomial.


Far from debunking Leventhal's paper, this is a whole-hearted endorsement
of it.

It starts as follows.

"This paper is principally an extension of Table 3 in the paper by
Leventhal [1], which correctly stresses the frequent importance of
considering type 2 error risk as well as type 1 in testing a sample from a
dischotomous population."

Burstein goes on to write:

"Our purpose is to provide approximation formulas that extend the scope
of Leventhal's Table 3 to most values of sample size n and proportion p."



Later, as he presents his formulas, he gives worked examples and shows
that their numerical values agree with Leventhal's.

Arny, you claimed this disemboweled Leventhal's paper, but
you can't find anything in this paper to support your claim.

Put up or shut up.




There remains the "letter to the editor" that you cited.

Shanefield observes that Leventhal provides some mathematical precision,
and he cautions that statistical and practical significance are not the
same.
In short, there is no serious conflict with Leventhal's paper.

Leventhal, of course does not disembowel his own paper.

As another poster observed, Nousaine is confused about probability.
(Were he correct in his view, he'd think that when a subject is guessing
with p = .50, only half the trials were real trials!)

Clark's letter is a rant, but it mostly shows that Clark doesn't seem to
understand elementary statistics at all, but that is not new; it has been
the case as far back as his 1982 JAES paper introducing ABX.

Also, in this letter to the editor, Clark says he never claimed that
differences were inaudible. Perhaps he forgot to read his own JAES paper,
where he wrote

"A 12-bit companded digital delay line was just audible. A 16-bit
linear system was not."

By the way, when I went back to Clark's original JAES paper, I noticed
several problems.

The overall impression is that Clark simply didn't really understand how
using statistical methods works in science, but he knew enough to look
scientific to an untrained reader.

Clark appears to make the common mistake about what the p-value computed
in a standard statistical test is. (It is NOT the "probability that a
particular score from an A/B/X test is due to chance".)

When Clark mentions a JAES paper by Plenge et al as an example of how
Clark can use statistical methods to increase sensitivity, he commits a
gaffe comparable to using an AC voltmeter to measure the (DC) voltage of a
battery.
Plenge's experiment involved Same/Different trials where each subject was
presented with 10 Same and 10 Different pairs of filters. Since the
design of that experiment fixed the numbers of same and different trials,
it was not inherently appropriate to apply a binomial distribution.
Clark's conclusion that some filters were audible is based on Clark's
bogus misapplication of the binomial distribution formula; Plenge does not
do that. Ironically, the design of ABX cleverly avoids that particular
problem.

Statistical science is more than simple calculations---it requires
understanding of how and when to use the techniques. Clark's rant about
Leventhal's JAES paper has to be seen as the work of someone who simply
does not grasp the issues involved.


Are you totally confused or what, Atkinson? Why are you so hot to cite a
paper that was subsequently debunked so thoroughly?


Arny, why are you so hot to cite references that you either haven't read,
or don't understand? The only debunking here is of your claims.