View Single Post
  #161   Report Post  
Keith Hughes
 
Posts: n/a
Default

Harry Lavo wrote:
"Keith Hughes" wrote in message
...


snip

And this is audio, Harry, not social science. When you do taste testing
for foods, you are free to rely on *all* organoleptic perceptual
components as that *is* the context of use. That's why this type of test
is not suitable for verification of *audible* differences - it is not
designed to control the organoleptic components that contribute to a
"preference".

There is a heavy social science side to audio that has been ignored....


*Only* if you accept that "preference" is what you're testing for, not
relative to discrimination.

for
the interpretation of sound, particularly musical value judgements (i.e. is
the bass "right", does the orchestra sound "lifelike") are subjective
judgements that can only be reported by people....no different than people
reporting whether they liked a certain food, or color, or flavor, or thought
a certain imitation sour cream mix tasted "almost like the real thing".
Testing that ignores this aspect of the audiophile experience is
automatically suspect, and that is one of the reasons why the abx test is
not embraced by most audiophiles.


Harry, this is a totally circular exercise, as I'm sure you know
already. Your precept here is that preferences can be formed without
the ability to discriminate between presentations, which is *not* a
given, nor is it accepted by most objectivists.


No, *you* are talking about a narrow subject. You are also talking about
taking a test developed for use in a narrow way and applying it for use
in a much broader way. That is why the potential test set must also be
broadened.


Now, *We* are...audio.

snip



Your irony escapes me.


Probably because it was a typo, not irony. "No, *We* are implied verb
'talk'...audio object".

snip

Thank you for making my point. Your test is a population distribution
test, *not* a discrimination test. In the scenario I presented, the
results were not real *for me*, they were real. From that point, we can
discuss investigative ways for determining cause and extent. One need
continue with a larger sample size *ONLY* if distribution within the
population is of interest.


Sorry, Keith, there are very sophisticated probability measurements to
determine significance between two distributed populations,


Non-sequitur. The point is that your test *IS* a distribution test. I
know there are adequate methods to model test results, that does not
however make the test suitable for discrimination.

snip

If so, it would also show up in a monadic test if the test sample is
large enough (it only takes a few percent far off the centerline of the
bell-curve to create significant differences) But..


If it were just a few percent off the centerline, there would be a very
low significance, buried in the noise. Tuchy's would likely identify them
as outliers.



In measuring probabilities against a null hypothesis in distributed samples,
if their are outliers there is a reason for them....


No, if there's an identifiable cause, they are valid data, not outliers.
A matter of definition.

that's one of the
beauties of using a distributed population. Their is virtually no chance of
a true outlier screwing up the results.


We're not talking about outliers screwing up the results. We're talking
about if "only a few percent", of whatever population size, detect a
difference (or are "far off the centerline"), that will *Not* be
statistically significant. And if the 'few percent' are not distributed
normally, or homoscedastically, they will likely show up as outliers.

snip

Yes, you're talking about differences that have not been demonstrated
under *any* test scenario other than sighted.



First, how many published component tests are you citing to support this
fact? Cite them please. And of those, how many have *not* been ABX tests?
In other words, how do you determine that the differences not being found
are not the result of the test technique and environment itself.


I truly have no idea what you're talking about here. *I'm* not citing
data, I'm saying that there have been no results discussed here (re
cables, s/s amps etc.) about audible component differences that were
observed *Except* those based on sighted tests.

snip

And thus, the belief in ability to discriminate is based solely on sighted
evaluations, right?



Wrong. ABX tests are only one kind of blind test.


Fine. What types of blind tests have been performed that have positive
results for distinguishing cables? As you said, please cite them.

Otherwise, my statement stands. If there are no blind tests that have
shown utility for discriminating between components (e.g. cables, s/s
amps), then perforce the data can only be based on sighted evaluations.


To the best of my
knowledge nobody in the audio industry has yet had the motivation or
resources to undertake the kind of validation testing I have proposed. That
is another kind. Simple AB preference tests, done blind, are a third. I
can probably name another four or six variations on these.


Irrelevant. That these tests exist says nothing about what tests have
been used, as you are surely aware.

Again, my argument is not with blind, other than its practicality for home
use in the purchase of equipment.


And I am not advocating such. The tests I've done were for my own
edification after "hearing" sighted differences between cables that my
(inexpert) knowledge of solid state physics said were untenable. Yes,
this *is* a hobby, and I like "audio jewelry" as much as the next
person. It never ceases to amaze me that folks here get so defensive
when anyone "accuses" them of having preferences based on personal
biases (many not associated with sonic characteristics); as though that
were inherently wrong.

My problem is with short-snippet,
comparative testing, of which ABX is the leading example. And the same goes
for most other opponents whose position I have run into here on usenet.
Your suggestion that we oppose blind testing is a strawman that is often
used on usenet to avoid engaging over the real issues raised against ABX and
its ilk.


Again, you misconstrue. I merely remark that the only data in evidence
here that contravenes the reported ABX data is based on sighted
evaluation. A simple observation to which I ascibed no ulterior motive.


snip

Well, you clearly are interested Harry, because the *only* reason to
question whether the boundaries of ABX testing can extend beyond where you
believe it to be 'validated', is the presence of anecdotal evidence based
on sighted evaluation. What other reasons are there (non-phenomenological
that is)?



I'm not saying I'm not interested in the question, I'm saying that for
purposes of establishing a control test is is irrelevant.


Far more than that, the only evidence that intimates a control test is
called for is from sighted listening. That's the point, and that's why
I'm saying that sighted results are the *basis* for your complaint.

The control must
be both perceived and "real" in the sense of being able to be measured with
statistical significance in monadic testing. Things that are perceived but
are not real are totally irrelevant to the necessary control. The purpose
of the control test is to see if ABX testing can pick up real differences
that are not volume or frequency response related.


And the monadic test differentiates between perceived and "real" how
exactly? Especially relative to ill-defined criteria such as
"musicality", "lifelike", etc. ?


I don't want to measure "phantom" differences...first I want to establish
that there is in fact a "real" difference blind that can be discriminated
by a relatively large group of people. But I want to use a non-intrusive
test to do it.


Well, first, you don't know that the test is "non-intrusive" to a greater
extent than is ABX. You merely assume that it is. Second, you have *no*
validation of your monadic testing for detecting *audible* differences.
Further, you assume that audio and organoleptic perceptions are testable
in the same fashion, and that the intrusiveness of any particular test
constraint apply equally to both (these are implicit in your belief that
the monadic test is suitable). Neither of which has been verified. Thus,
you are using an unverified method as a reference against which to
'validate' a test that has been at least partially validated, by your own
admission, for the sensory mode under test. An untenable approach IMO.



Absolutely, I don't know. But I do know test design, and there has been
plenty of discussion about how people listen here in arguing these
issues...even Arnie's 10 criteria deal with some of them. I can certainly
say that the monadic test as proposed comes a lot closer than does ABX.


I think you'll find the first and last sentences above to be mutally
exclusive.

First, it only has to be done once, so it can use full segments of music,
establishing musical context and allowing time for differences to surface.
Second, it does not require active comparison at all.


Well, again, this is a precept that has not been stipulated. There
appear to be a number of us who do not accept that such evaluation
differs from any other type of comparison. You must have a mental model
against which to compare (for whatever parameter of interest), and to
say A is a "5" versus a "3" requires a comparison.

It simply requires
normal audiophile-type reactions to the music and the sound. Third, any
"rating" is done after the listening is over, not during it, and is based on
recall..recall that can take into account perceptions both acute and vague,
as well as feeling-states.


Introducing a huge error source...memory. As has been discussed at length.

snip

No, Harry, it does not "separate the test from the individual" at all.
That process is the same - you *assume* that multiple presentations are
inherently more intrusive, and thus data-correlative, than is a single
presentation, something I believe you have no data to support, relative to
audio. There's a very valid reason that repetitive trials for
organoleptic perception testing are problematic - the senses quickly
become habituated, and discrimination ability is reduced. AFAIK, barring
fatigue (or high volume related artifacts which should, of course, be
controlled for in the test), this has not been shown to be an issue with
auditory testing.



You must be kidding. This is one of the things most commented upon by
people using the technique...how quickly they lose the ability to
discriminate along with growing fatigue.


No, you must be speed-reading. Did you totally miss "barring
fatigue (or high volume related artifacts which should, of course, be
controlled for in the test)"?

Even those using and supporting
the test often report it as fatiguing and rather grueling with a sense
developing in the late stages of great uncertainty. The ITU guidelines even
comment upon this aspect of the test, as one of the reason for limiting the
number of trials.


Hence the need to control for fatigue...

Again, Harry, you want to use a test that has not shown *any* utility for
audible testing for a reference to 'validate' one that has. Your call to
"validate the test" applies to an even greater measure to the test you
propose as the reference.



Keith, I am trying along with others here to show why ABX should be viewed
with some skepticism for the purpose of open-ended evaluation of audio
components, until and unless it is validated. It is that simple.


I understand that. I, along with others here, are trying to point out
that ABX has been validated for auditory testing (as you have admitted).
The boundaries of that validation have been extended to discrimination
of 'difference between components'. There have been no data of any sort
presented (correct me if I'm wrong) here that would cast any doubt on
the efficacy of ABX usage in that venue *except* for 'data' collected
from sighted listening evaluations, known to have an immense propensity
for error. *And*, that using a completely unvalidated method (your
monadic test) as a reference against which to validate ABX is
unsupportable. Even were we to stipulate that ABX needs validated, it
would require a reference that had, itself, been validated for the
modality of interest. As you say, it is that simple.

Keith Hughes