Home |
Search |
Today's Posts |
|
#1
![]() |
|||
|
|||
![]() Arny Krueger wrote JBorg" wrote If our empirical senses are faulty, That's a scientific fact, which is well known. If our senses were not faulty there would be no need for microscopes or telephones. what would be an example of a properly controlled listening "test" which would circumvent this problem ? Please see www.pcabx.com for examples. Please see Dr. Corbett's commentary about pcabx dated 11/27/04, RAO Thread Title: Let's do some "scieenccece" in the Hive http://tinyurl.com/4kkxa ************** I had someone do an experiment for me using PCABX. As soon as I saw the data, I knew something was wrong, as the numbers from PCABX could not possibly be right. It took only a few minutes to find these errors in Arny's code: ... ptable(12, 1) = 1.642 ptable(12, 2) = 0.2 ptable(13, 1) = 2.072 ptable(13, 2) = 0.25 should be 0.15 ptable(14, 1) = 2.706 ^ ptable(14, 2) = 0.2 should be 0.1 ptable(15, 1) = 3.17 ^ ptable(15, 2) = 0.075 ... I sent Arny an e-mail reporting this, but I never got a reply to that e-mail. (He had replied to other e-mail I had sent to that address before that.) Those typos are only part of the problem with PCABX. I've been teaching college and university math classes for over thirty years, so my BS detector is well calibrated. But its meter pegs when I read what Arny says about scientific and technical issues involving mathematics, statistics, and design of experiments. You know the feeling when you are in a store and you overhear the salesman unloading a pile of BS on an unsuspecting customer? It's pretty much the same whether it is Radio Shack, or Best Buy, or Lafayette Radio, or an audiophile salon, and Arny brings it to the Internet. When someone follows Arny's advice on statistical design or analysis, you know it is a double blind experiment---it's a case of the blind leading the blind. (1) What Arny calls the "probability you were guessing" is apparently what the rest of the world calls a "p-value". I wrote "apparently" because PCABX cannot even calculate those numbers correctly; even if he had the right numbers, Arny obviously does not understand what they mean. In an ABX experiment, a p-value is calculated under the assumption that the subject is guessing. For instance, if a subject gets 14 correct in 16 trials, we say p = .002 because IF someone is guessing (with 50% chance of a correct answer on each trial) THEN the probability that he will get 14, or 15, or 16 correct in 16 trials is approximately .002. Arny has this bass-ackwards. He claims that IF someone gets 14 correct THEN the probability is .002 that the person was guessing. Of course there is absolutely NO logical or scientific support for that---it is entirely a result of Arny's failure to comprehend what the calculations are about. The fact that Arny refers to a p-value as a probability that the test subject was guessing is a dead giveaway that he has no clue about how statistical science works. (2) There are several reasons why PCABX reports bogus numbers for p-values: One reason is the typos I already mentioned. Another is the fact that Arny based his calculations on part of what David Carlstrom presented as the statistical basis for the original ABX comparator. Carlstrom mentioned two tests---one was based on a binomial distibution and a second was based on a chi-squared distribution. The binomial approach leads to an exact solution for testing H_0: theta = .5 vs H_1: theta .5 where theta is the single-trial probability of a correct answer. Thus theta = .5 means the subject is guessing with the same chance of success as flipping a fair coin, and theta .5 means he is doing better than that. That is an appropriate test if you want to see if a subject is doing *better* than chance would cause him to do. But Carlstrom made an error when he proposed the other test. He described a chi-squared procedure that tests H_0: theta = .5 vs H_1: theta not equal to .5. Now this compares chance behavior to *dfferent-from-chance* performance. Since that includes theta .5 as well as theta .5, the numbers generated this way are off by a factor of two from what would be comparable to the binomial test. This is obvious to anyone with real statistical training, but not to someone who naively copied a formula out of a book and coded it into a computer program. Of course a competent statistician would know how to adapt that chi-squared procedure to the sort of test that Carlstrom described with his binomial plan. Arny's PCABX uses the flawed chi-squared approach, so his calculations are biased; PCABX reports larger p-values (hence less-significant results) than it should. (That error is not quite as far off as a factor of two because there are other errors from approximating a discrete distribution by a continuous one; since they are in the opposite directions, the errors partially cancel.) To see this effect search Google Groups for the Usenet article with Subject: Statistics and PCABX (was weakest Link in the Chain) Newsgroups: rec.audio.high-end Date: 2004-01-13 (3) Yet another issue is that some of the numbers PCABX returns are not calculated by standard procedures at all. Although Arny claims that PCABX follows recognized scientific practice, the fact is that some of the numbers PCABX returns are pure fabrication. Maybe because Arny did not understand what a p-value is, or maybe because he did not realize that he based his calculations on an inappropriate method, PCABX reports p-values of 1 when the observed data show less than half the trials with correct answers. This is NOT a standard calculation based on techniques in any textbook I'm aware of. It also does not agree with the methods described in http://www.pcavtech.com/abx/abx_p9.htm which Arny cited earlier in this thread as an authoritative reference. If Arny has a specific citation of a reference showing how someone with pencil and paper (and perhaps a simple calculator) can duplicate the numbers PCABX comes up with, I'd like to see it. So it's clear that the analysis side of PCABX is broken in many ways. It is also the case that he experimental design part has problems. Although much effort went into refining experimental technique, there appears to be very little awareness of the rest of experimental design. Arny's Ten Commandments^H^H^H^H^H^H^H^H^H^H^H^HRequirements are NOT sufficient to make a good listening experiment. No matter how well you try, the reality is that if a test has only one trial, there is a 50% type I error risk. The ONLY way to reduce that is statistical---you need more trials. Once you do that, there is the issue of how many trials to do, and how many of those are needed to pass the test. PCABX suggests 14 correct in 16 trials, even though that is a really bad choice. If the effect being tested is small, say near threshold, then the 14/16 test will usually (80% of the time) _fail_ to detect a real effect. If the effect is large, then 16 trials is wasteful. A test with far fewer trials may be adequate then. There are plenty of designs that are better than 14/16, but it would be hard to find one that is worse. Once again, Arny gets it bacwards. He starts with 16 trials, then picks 14 (it used to be 12) as a passing score. Of course a rational design might start with specified levels for type I and type II error risks, and then determine a sample size to achieve that performance. For a graduated collection of tests, such as would be the case if the links in the table near the bottom of http://www.pcabx.com/training/index.htm actually worked, we would need only a few trials for the easy samples but many more for the harder ones if we wanted comparable sensitivity of the tests. Using the same number of trials for different levels means that the tests do not have the same power (sensitivity); the result is that subjects will seem to have a threshold-style respnse even if their true response were a linear function of stimulus level. If the true response has a threshold then it is confounded with the test's power function, making interpretation of the results difficult. This is analagous to measuring a decreasing signal with a meter. As the signal level drops, the meter needs to be adjusted to read on a lower range (more sensitive) scale. If that is not done, a naive user may "see" that below some point there is apparently no response when actually there is some response below the current meter range. Using a fixed size of 16 trials over a broad range of stimulus levels will cause that sort of error, yet that is precisely what PCABX says to do. The statistical science in PCABX is Completely Ridiculous & Absolutely Preposterous, which we can abbreviate as CRAP. Lest anyone get the wrong imnpression, I want to be clear that I am in favor of properly-done scientific tests. ABX and similar tests can be properly done, but merely using an ABX data collection plan is no guarantee of a worthwhile experiment. A worthwhile experiment requires competent statistical design and analysis along with good experimental technique. No part is sufficient---all these are necessary. No matter how good the other parts are, if the statistical aspects are bungled, the experiment is ruined. Now I do not claim that good statistical practice is enough to make a successful experiment, but I do argue that failing to get the statistical stuff right is enough to botch the experiment. It is much the same as noting that neither level matching nor time-synchronizing nor blinding alone will make a good experiment, but missing any one can esily ruin on otherwise-okay experiment. ************************************* End report. |
#2
![]() |
|||
|
|||
![]()
"JBorg" wrote in message
. com Arny Krueger wrote JBorg" wrote If our empirical senses are faulty, That's a scientific fact, which is well known. If our senses were not faulty there would be no need for microscopes or telephones. what would be an example of a properly controlled listening "test" which would circumvent this problem ? Please see www.pcabx.com for examples. Please see Dr. Corbett's commentary about pcabx dated 11/27/04, RAO Good idea. It shows what happens if one becomes obsessed with details, and loses the ability to figuratively see the forest for the trees. I see that none of the RAO trolls are bright enough to see the rather gross flaws in Corbett's little study. Let me also recommend the following: http://www.amazon.com/exec/obidos/AS...534735-0115334 |
#3
![]() |
|||
|
|||
![]()
On Tue, 30 Nov 2004 10:23:05 -0500, "Arny Krueger"
wrote: "JBorg" wrote in message .com Arny Krueger wrote JBorg" wrote If our empirical senses are faulty, That's a scientific fact, which is well known. If our senses were not faulty there would be no need for microscopes or telephones. what would be an example of a properly controlled listening "test" which would circumvent this problem ? Please see www.pcabx.com for examples. Please see Dr. Corbett's commentary about pcabx dated 11/27/04, RAO Good idea. It shows what happens if one becomes obsessed with details, and loses the ability to figuratively see the forest for the trees. I see that none of the RAO trolls are bright enough to see the rather gross flaws in Corbett's little study. Let me also recommend the following: http://www.amazon.com/exec/obidos/AS...534735-0115334 Thanks for the recommendation. Looks like a great read. BTW, god lies in the details, right? Or are you saying the God is just lying? |
#4
![]() |
|||
|
|||
![]()
"dave weil" wrote in message
On Tue, 30 Nov 2004 10:23:05 -0500, "Arny Krueger" wrote: "JBorg" wrote in message . com Arny Krueger wrote JBorg" wrote If our empirical senses are faulty, That's a scientific fact, which is well known. If our senses were not faulty there would be no need for microscopes or telephones. what would be an example of a properly controlled listening "test" which would circumvent this problem ? Please see www.pcabx.com for examples. Please see Dr. Corbett's commentary about pcabx dated 11/27/04, RAO Good idea. It shows what happens if one becomes obsessed with details, and loses the ability to figuratively see the forest for the trees. I see that none of the RAO trolls are bright enough to see the rather gross flaws in Corbett's little study. Let me also recommend the following: http://www.amazon.com/exec/obidos/AS...534735-0115334 Thanks for the recommendation. Looks like a great read. I admit it, I immediately saw you in its target audience, Weil. Enjoy! |
#5
![]() |
|||
|
|||
![]()
On Tue, 30 Nov 2004 12:29:22 -0500, "Arny Krueger"
wrote: "dave weil" wrote in message On Tue, 30 Nov 2004 10:23:05 -0500, "Arny Krueger" wrote: "JBorg" wrote in message . com Arny Krueger wrote JBorg" wrote If our empirical senses are faulty, That's a scientific fact, which is well known. If our senses were not faulty there would be no need for microscopes or telephones. what would be an example of a properly controlled listening "test" which would circumvent this problem ? Please see www.pcabx.com for examples. Please see Dr. Corbett's commentary about pcabx dated 11/27/04, RAO Good idea. It shows what happens if one becomes obsessed with details, and loses the ability to figuratively see the forest for the trees. I see that none of the RAO trolls are bright enough to see the rather gross flaws in Corbett's little study. Let me also recommend the following: http://www.amazon.com/exec/obidos/AS...534735-0115334 Thanks for the recommendation. Looks like a great read. I admit it, I immediately saw you in its target audience, Weil. Enjoy! Thanks, I will. Nice deceptive editing, BTW. |
#6
![]() |
|||
|
|||
![]() "dave weil" wrote in message ... On Tue, 30 Nov 2004 10:23:05 -0500, "Arny Krueger" wrote: "JBorg" wrote in message y.com Arny Krueger wrote JBorg" wrote If our empirical senses are faulty, That's a scientific fact, which is well known. If our senses were not faulty there would be no need for microscopes or telephones. what would be an example of a properly controlled listening "test" which would circumvent this problem ? Please see www.pcabx.com for examples. Please see Dr. Corbett's commentary about pcabx dated 11/27/04, RAO Good idea. It shows what happens if one becomes obsessed with details, and loses the ability to figuratively see the forest for the trees. I see that none of the RAO trolls are bright enough to see the rather gross flaws in Corbett's little study. Let me also recommend the following: http://www.amazon.com/exec/obidos/AS...534735-0115334 Thanks for the recommendation. Looks like a great read. BTW, god lies in the details, right? Or are you saying the God is just lying? Who lies more, God or Google? |
#7
![]() |
|||
|
|||
![]()
In article , "Arny Krueger"
wrote: Let me also recommend the following: http://www.amazon.com/exec/obidos/AS...534735-0115334 Mr. Krueger having not addressed any of the specific points raised in my earlier post now tries misdirection. The above link actually has nothing to do with the current discussion. I am not the author of that work. But if you follow that link, you might as well search for "Arnold Krueger" on the Amazon site while you are there. Here's what you'll find (at http://www.amazon.com/exec/obidos/AS...976328-1126447 ) The blurb there says: "Over twenty years, a lawyer, a photograther and an artist pair-off with a pair of students, a pair of Frenchmen, a pair of twins and a horny married guy from New Jersey, never imagining they are fodder for a fond friend's fiction. Milt has his lover out, Rod has his feelers out, Jean has his leathers out and Sam has his lenses out!" Folks, I am not making this up! I'm not saying this is our Arny, but "photograther" kinda makes you wonder. ;-) |
Reply |
Thread Tools | |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
Review: High-Power Audio Amplifier Construction Manual, Slone | Tech | |||
What are they Teaching | Audio Opinions | |||
Clean Power? | Car Audio | |||
FS: SOUNDSTREAM CLOSEOUTS AND MORE!! | Car Audio | |||
FS: 3000 watt amp $179!! 900 watt woofers $36!! new- free shipping | General |