Home |
Search |
Today's Posts |
#41
|
|||
|
|||
A/B/X Testing (was: dB vs. Apparent Loudness)
ff123 wrote:
On Tue, 8 Jun 2004 22:00:39 -0400, "Arny Krueger" wrote: John Corbett wrote: Here are a few suggestions: (1) Establish two modes, training and testing. Shows how little time you've actually spent looking at the site, Corbett. There have always been two modes of operation at the PCABX web site, training and testing. He's talking specifically about the PC-ABX application, not the website. GMAB, that's not what Corbett said at all. He based his argument on an out-of-context quote of something he found on the web site, something that is not part of the PCABX application. But let's say that he was talking about the PCABX application all by itself. Arguing about the PCABX application all by itself is a straw man argument because the PCABX application is presented in the context of the web site. The core of the PCABX web site is not the PCABX application. In fact the web site makes a strong point of presenting the PCABX application as a tool that can be replaced by a number of other similar tools, some of which may be superior to it in some senses. I guess it would be interesting to hear you and Corbett pontificate about what is the most important single thing on the PCABX web site. I'm sure you'll both get it wrong. I'll give you a hint - count your fingers. What you are looking for is like your hands, and is as important to subjective testing as your fingers are to your hands. The idea is to remove the errors associated with sequential testing (testing mode), while simultaneously allowing the listener to just noodle around (training mode). Been there, done that, but in another critical part of the web site that Corbett shows no knowledge or understanding of. Training and testing modes could be used synergistically: use the training mode to estimate what the value of theta should be for the testing mode so that the number of total trials can be suggested to control both type I and type II errors. It seems to be difficult or impossible to convince statistics junkies that there is more to experimental design than statistics. We spent about 10 years looking at reducing type II errors by jacking up the number of trials by various means. We decided that beyond a certain point this was a bogus approach, practically speaking. The best way at that point turned out to make major gains in improving listener sensitivity, not by jacking up the number of trials, but rather to more train and enable the listener to do a better job of hearing differences. This begs the question "why not do both?". The answer to that question should be found by stepping back from a headlong rush towards sensitivity for the sake of sensitivity. Instead, you have to look at the practical relevance of the results that you are getting once you have a certain combination of listener sensitivity, and statistical detection of differences in the results. (2) Another possibility would be for the user to propose what effect size (theta) he wants to detect... Shows once again how little time you've actually looked at the site, Corbett At the PCABX web site, users have always been able to specify what effect size (theta) they want to detect... Furthermore, the site has been structured to encourage them to start with larger effects and work down to smaller effects. The effects have been selected so that the larger effects are reasonably obvious. The smallest effects are difficult or impossible detect. A number of intermediate-sized effects are also provided. None of this is quantitative. Say what? It is very quantative. The size of the effect is formally known for all samples on the PCABX site for which there are known and generally agreed-upon ways to quantify the effect size. This includes the vast majority of the samples. The only area of exceptions that come to mine are the perceptual coder samples, because AFAIK there is no known, and generally agreed-upon ways to quantify the effect size for them. Corbett is suggesting a quantitative (and generally accepted) way of specifying alpha, beta, and theta to come up with an appropriate number of total trials. Here we go again - trying to find out information that is practically irrelevant, by jacking up the total number of trials. Corbett, I'm really wondering how you expect anybody to take you seriously, given your slap-dash analysis of the PCABX web site. You obviously never looked at any of it, even for a few seconds. All you've ever seen of it is the URL, right? It's hard to take you seriously when PC-ABX doesn't even calculate the right p-values, and you've never bothered to make even this simple correction! You've got me confused with someone who is interested wasting anybody's time including my own, by splitting hairs. I have looked at both PC-ABX and your website (obviously). The statistical concerns with PC-ABX are valid: 1. PC-ABX calculates inaccurate p-values So what? On the best day of their life, p-values are just a guide towards a larger goal that transcends mere statistics. 2. PC-ABX allows a mode in which sequential testing errors are not controlled So what? Anybody who thinks that a purely statistical approach can actually quantify all relevant sequential testing errors has missed many important points of experimental design. 3. PC-ABX does not suggest the number of trials to perform based on listener specified type II error risk and effect size. So what? Anybody who thinks that a purely statistical approach can actually quantify type II error risk has missed many important points of experimental design. |
#42
|
|||
|
|||
A/B/X Testing (was: dB vs. Apparent Loudness)
|
Reply |
|
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
sound in wav-format | General | |||
Loudness Compensation problem | General |