Home |
Search |
Today's Posts |
#161
|
|||
|
|||
Harry Lavo wrote:
"Keith Hughes" wrote in message ... snip And this is audio, Harry, not social science. When you do taste testing for foods, you are free to rely on *all* organoleptic perceptual components as that *is* the context of use. That's why this type of test is not suitable for verification of *audible* differences - it is not designed to control the organoleptic components that contribute to a "preference". There is a heavy social science side to audio that has been ignored.... *Only* if you accept that "preference" is what you're testing for, not relative to discrimination. for the interpretation of sound, particularly musical value judgements (i.e. is the bass "right", does the orchestra sound "lifelike") are subjective judgements that can only be reported by people....no different than people reporting whether they liked a certain food, or color, or flavor, or thought a certain imitation sour cream mix tasted "almost like the real thing". Testing that ignores this aspect of the audiophile experience is automatically suspect, and that is one of the reasons why the abx test is not embraced by most audiophiles. Harry, this is a totally circular exercise, as I'm sure you know already. Your precept here is that preferences can be formed without the ability to discriminate between presentations, which is *not* a given, nor is it accepted by most objectivists. No, *you* are talking about a narrow subject. You are also talking about taking a test developed for use in a narrow way and applying it for use in a much broader way. That is why the potential test set must also be broadened. Now, *We* are...audio. snip Your irony escapes me. Probably because it was a typo, not irony. "No, *We* are implied verb 'talk'...audio object". snip Thank you for making my point. Your test is a population distribution test, *not* a discrimination test. In the scenario I presented, the results were not real *for me*, they were real. From that point, we can discuss investigative ways for determining cause and extent. One need continue with a larger sample size *ONLY* if distribution within the population is of interest. Sorry, Keith, there are very sophisticated probability measurements to determine significance between two distributed populations, Non-sequitur. The point is that your test *IS* a distribution test. I know there are adequate methods to model test results, that does not however make the test suitable for discrimination. snip If so, it would also show up in a monadic test if the test sample is large enough (it only takes a few percent far off the centerline of the bell-curve to create significant differences) But.. If it were just a few percent off the centerline, there would be a very low significance, buried in the noise. Tuchy's would likely identify them as outliers. In measuring probabilities against a null hypothesis in distributed samples, if their are outliers there is a reason for them.... No, if there's an identifiable cause, they are valid data, not outliers. A matter of definition. that's one of the beauties of using a distributed population. Their is virtually no chance of a true outlier screwing up the results. We're not talking about outliers screwing up the results. We're talking about if "only a few percent", of whatever population size, detect a difference (or are "far off the centerline"), that will *Not* be statistically significant. And if the 'few percent' are not distributed normally, or homoscedastically, they will likely show up as outliers. snip Yes, you're talking about differences that have not been demonstrated under *any* test scenario other than sighted. First, how many published component tests are you citing to support this fact? Cite them please. And of those, how many have *not* been ABX tests? In other words, how do you determine that the differences not being found are not the result of the test technique and environment itself. I truly have no idea what you're talking about here. *I'm* not citing data, I'm saying that there have been no results discussed here (re cables, s/s amps etc.) about audible component differences that were observed *Except* those based on sighted tests. snip And thus, the belief in ability to discriminate is based solely on sighted evaluations, right? Wrong. ABX tests are only one kind of blind test. Fine. What types of blind tests have been performed that have positive results for distinguishing cables? As you said, please cite them. Otherwise, my statement stands. If there are no blind tests that have shown utility for discriminating between components (e.g. cables, s/s amps), then perforce the data can only be based on sighted evaluations. To the best of my knowledge nobody in the audio industry has yet had the motivation or resources to undertake the kind of validation testing I have proposed. That is another kind. Simple AB preference tests, done blind, are a third. I can probably name another four or six variations on these. Irrelevant. That these tests exist says nothing about what tests have been used, as you are surely aware. Again, my argument is not with blind, other than its practicality for home use in the purchase of equipment. And I am not advocating such. The tests I've done were for my own edification after "hearing" sighted differences between cables that my (inexpert) knowledge of solid state physics said were untenable. Yes, this *is* a hobby, and I like "audio jewelry" as much as the next person. It never ceases to amaze me that folks here get so defensive when anyone "accuses" them of having preferences based on personal biases (many not associated with sonic characteristics); as though that were inherently wrong. My problem is with short-snippet, comparative testing, of which ABX is the leading example. And the same goes for most other opponents whose position I have run into here on usenet. Your suggestion that we oppose blind testing is a strawman that is often used on usenet to avoid engaging over the real issues raised against ABX and its ilk. Again, you misconstrue. I merely remark that the only data in evidence here that contravenes the reported ABX data is based on sighted evaluation. A simple observation to which I ascibed no ulterior motive. snip Well, you clearly are interested Harry, because the *only* reason to question whether the boundaries of ABX testing can extend beyond where you believe it to be 'validated', is the presence of anecdotal evidence based on sighted evaluation. What other reasons are there (non-phenomenological that is)? I'm not saying I'm not interested in the question, I'm saying that for purposes of establishing a control test is is irrelevant. Far more than that, the only evidence that intimates a control test is called for is from sighted listening. That's the point, and that's why I'm saying that sighted results are the *basis* for your complaint. The control must be both perceived and "real" in the sense of being able to be measured with statistical significance in monadic testing. Things that are perceived but are not real are totally irrelevant to the necessary control. The purpose of the control test is to see if ABX testing can pick up real differences that are not volume or frequency response related. And the monadic test differentiates between perceived and "real" how exactly? Especially relative to ill-defined criteria such as "musicality", "lifelike", etc. ? I don't want to measure "phantom" differences...first I want to establish that there is in fact a "real" difference blind that can be discriminated by a relatively large group of people. But I want to use a non-intrusive test to do it. Well, first, you don't know that the test is "non-intrusive" to a greater extent than is ABX. You merely assume that it is. Second, you have *no* validation of your monadic testing for detecting *audible* differences. Further, you assume that audio and organoleptic perceptions are testable in the same fashion, and that the intrusiveness of any particular test constraint apply equally to both (these are implicit in your belief that the monadic test is suitable). Neither of which has been verified. Thus, you are using an unverified method as a reference against which to 'validate' a test that has been at least partially validated, by your own admission, for the sensory mode under test. An untenable approach IMO. Absolutely, I don't know. But I do know test design, and there has been plenty of discussion about how people listen here in arguing these issues...even Arnie's 10 criteria deal with some of them. I can certainly say that the monadic test as proposed comes a lot closer than does ABX. I think you'll find the first and last sentences above to be mutally exclusive. First, it only has to be done once, so it can use full segments of music, establishing musical context and allowing time for differences to surface. Second, it does not require active comparison at all. Well, again, this is a precept that has not been stipulated. There appear to be a number of us who do not accept that such evaluation differs from any other type of comparison. You must have a mental model against which to compare (for whatever parameter of interest), and to say A is a "5" versus a "3" requires a comparison. It simply requires normal audiophile-type reactions to the music and the sound. Third, any "rating" is done after the listening is over, not during it, and is based on recall..recall that can take into account perceptions both acute and vague, as well as feeling-states. Introducing a huge error source...memory. As has been discussed at length. snip No, Harry, it does not "separate the test from the individual" at all. That process is the same - you *assume* that multiple presentations are inherently more intrusive, and thus data-correlative, than is a single presentation, something I believe you have no data to support, relative to audio. There's a very valid reason that repetitive trials for organoleptic perception testing are problematic - the senses quickly become habituated, and discrimination ability is reduced. AFAIK, barring fatigue (or high volume related artifacts which should, of course, be controlled for in the test), this has not been shown to be an issue with auditory testing. You must be kidding. This is one of the things most commented upon by people using the technique...how quickly they lose the ability to discriminate along with growing fatigue. No, you must be speed-reading. Did you totally miss "barring fatigue (or high volume related artifacts which should, of course, be controlled for in the test)"? Even those using and supporting the test often report it as fatiguing and rather grueling with a sense developing in the late stages of great uncertainty. The ITU guidelines even comment upon this aspect of the test, as one of the reason for limiting the number of trials. Hence the need to control for fatigue... Again, Harry, you want to use a test that has not shown *any* utility for audible testing for a reference to 'validate' one that has. Your call to "validate the test" applies to an even greater measure to the test you propose as the reference. Keith, I am trying along with others here to show why ABX should be viewed with some skepticism for the purpose of open-ended evaluation of audio components, until and unless it is validated. It is that simple. I understand that. I, along with others here, are trying to point out that ABX has been validated for auditory testing (as you have admitted). The boundaries of that validation have been extended to discrimination of 'difference between components'. There have been no data of any sort presented (correct me if I'm wrong) here that would cast any doubt on the efficacy of ABX usage in that venue *except* for 'data' collected from sighted listening evaluations, known to have an immense propensity for error. *And*, that using a completely unvalidated method (your monadic test) as a reference against which to validate ABX is unsupportable. Even were we to stipulate that ABX needs validated, it would require a reference that had, itself, been validated for the modality of interest. As you say, it is that simple. Keith Hughes |
#163
|
|||
|
|||
"Steven Sullivan" wrote in message
... wrote: wrote: Harry Lavo wrote: We were speaking specifically of his latest round of loudspeaker tests, which Sean himself describes as "monadic". Either provide a quote and citation, or admit you are making this up. I visited Sean Olive at Harman last March. Yes, he does now do monadic testing, with one speaker presented at a time. The series of presentations does include a hidden reference of known quality. I believe he discussed this in his most recent AES paper, but I need to look it up when I am in the office to confirm. The most recent paper in the databse is the 2003 paper on preference in trained vs untrained listeners, which I have in front of me. The paper can be downloaded from the AES site, for a fee. I haven't seen the newest issue of the JAES, and I don't know how fast things get into the database, but I'll assume you meant to 2003 paper. The 2003 article uses the Harman lab to do four way (four speaker) and 3-way (three-speaker) tests. I quote: // The four-way test involved multiple comparisons among four speaker rated independently using four different programs. A test comprised four trials in the morning, repeated in the afternoon for a total of eight trials [for the three-way test where there was only a morning test]. . . . All tests were double-blind using monophonic (single-speaker) comparisons Before each test, listeners were given their instructions and were free to ask questions about the test procedure. In both tests the program order was randomized. For each trial the control control computer determined randomly the letter (A-D) assigned to each loudspeaker. Listeners were provided feedback through an LCD monitor that indicated the current loudspeaker being played. Switching between loudspeakers in each trial was performed in a random seuqence by the experimenter. THe music was paused during the 3-second interval required to substitute the positions of the speakers. [There follows a discussion of the possible effect of the silent interval] . . . The presentation time for each loudspeaker was typically equal to the length of the program loop (15-30 seconds) and shortened to 10-15 seconds toward the end of each trial. Switching continued until all listeners had entered a rating for each loudspeaker, at which point the next trial would begin. A trial typically lasted 3-5 minutes, with an entire session typically lasting 15-20 minutes. For the four way test listeners were told not to discuss their responses with one another until the end of the second session. All listeners were shown their results after the completion of hte test. . . . [from the instructions to listeners]: In these instructions you will be judging the sound quality of different loudspeakers and rating them according to your personal preference. You MUST enter a rating for each loudspeaker in the appropriate box after the program selection has ended. Please enter your ratings using the following preference scale [scale graphic] Your rating can contain up to one decimal place (e.g. 7.3, 2.5) DO NOT GIVE TIED SCORES IN ANY ROUND If you do, the computer will ask yuou to reenter your ratings. You should separate your preference ratings amond different speakers to reflect your relative preference between two speakers. Use the following guidelines: [guidleines graphic] Finally , we encourage you to write comments about what you like anbd dislike about the sound of the speakers you are comaring: what aspects is it about the speaker that makes you prefer it (or not prefer it) over the other speaker(s)? // The four prgrams , btw, were works by James Taylor, Little Feat, Tracy Chapman, and Jennifer Warnes "selected on the basis of their ability to revela spectral and preferential differences between differeent loudspeakers in over 100 different louspeakers in over 100 different listening tests and varioous listener training exercises." From this I gather that the test proceeded as follows: listeners were given instructions, then a 'lazy susan' (behind an acoustically transparent screen) containing four loudspeakers rotated to bring the first speaker to the playing position. The first program is played until all listeners have rated the speaker on the preference scale. Then a different -- and the listeners know it is a different one, just not *which* one -- loudspeaker is brought into the playing slot (this takes 3 seconds) and it plays the same program. This is done for the remaining speakers. This constitutes one trial. Then the whole process is repeated, using the second musical selection from each of the four speakers. Four musical selections means four trials in total per test. (And repeated for the four-way speaker comparisons). I also gather that speaker *difference* is a given for Olive in these trials, particularly as he uses musical materials intended to make the contrast as clear as he can. This does not sound to me like one of Harry's monodic listening tests -- done at home, with long listening intervals, etc. It is not. It is a preference test, only instead of being AB, it is ABCD. And it is not done under idea conditions even for a preference test. It would have been better if the samples were longer and the length and rotation was under the control of an individual user/rater. Obviously practical time considerations entered in and Sean made a judgment that for preference he could get by with the techniques used. He may have been right since difference in speakers are less subtle than in most audio gear...but he'll (and we'll) never know for sure what differences might have occurred with a more relaxed test constraint. It is this test (I believe) that Sean told John was followed by true Monadic descriptive testing, which correlated well. Or at least that is what I believe I remember John saying or writing (I searched Usenet for it to no avail, but it may have been at the Stereo show debate, or in writing in Stereophile).. I have the 2003 paper on the way and hope to get copies of the earlier tests as well. My suspicion is that the work that Olive was doing in March 2005 has not been published yet. |
#164
|
|||
|
|||
A model of the brain, & quick-switch
ontent-Length: 3324
Lines: 68 Harry Lavo wrote: "Steven Sullivan" wrote in message This does not sound to me like one of Harry's monodic listening tests -- done at home, with long listening intervals, etc. It is not. It is a preference test, only instead of being AB, it ABCD. And it is not done under idea conditions even for a preference test. It would have been better if the samples were longer and the length of rotation was under the control of an individual user/rater. Obviously practical time considerations entered in and Sean made a judgment at for preference he could get by with the techniques used. He may have been right since difference in speakers are less subtle than in most audio gear...but he'll (and we'll) never know for sure what differences might have curred with a more relaxed test constraint. It is this test (I believe) that Sean told John was followed by true Monadic descriptive testing, which correlated well. Or at least that is at I believe I remember John saying or writing (I searched Usenet for it to no avail, but it may have been at the Stereo show debate, or in writing in Stereophile).. I have the 2003 paper on the way and hope to get copies of the earlier tests as well. My suspicion is that the work that Olive was doing in March 2005 has not been published yet. Hi Harry, You know, the objectivists seem to be arguing that "professionals" don't use monadic listening tests. That seems to me just to prove the point that you and I are making: that there are dimensions of listening that have never been explored in an empirical way. Obviously "professionals" base their choices on practicality and what they can get paid to do. Clearly, Olive made decisions about how to run this test which are consistent with his vision of what matters about reproduction. But a different sort of test could easily produce different results. I read in "Scientific American Mind" about recent research that shows people can make accurate snap judgments about other people's personalities. The article said this was a new idea, contradicting the messages that psychologists sent in the past few decades, "don't judge people at first appearance." So which is it? Can people make accurate snap judgments, or not? Well, if you look at the personality "model" they are using for these tests, it is five numbers. That is a vast simplification and rather superficial representation of the whole of a person's personality. The tests may prove merely that people can make snap judgments about *superficial* aspects of personality. However, it did say that this personality model is successful at gathering reliable and repeatable data. Sound familiar? The objectivist model of audio doesn't have any unexplained "holes" and demands "reliable, repeatable" observations? Well, the personality model is still superficial, and even though scientists are learning repeatable facts about people, it doesn't mean they are learning anything relevant at all to living wisely, to getting along with your neighbors, etc. Likewise, I think the objectivist audio model is good at explaining how people hear in quick switching situations, which is a superficial way of using the brain. It can explain some things--but that does |
Reply |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
Not In Love With Tivoli Audio? Maybe Here is Why-FAQ and Exegesis | Audio Opinions | |||
NPR reports on new brain research music | High End Audio | |||
Installing stand-by switch | Vacuum Tubes | |||
More cable questions! | Tech |