About Us

Harry Lavo wrote:
"Keith Hughes" wrote in message
...

snip

And this is audio, Harry, not social science. When you do taste testing
for foods, you are free to rely on *all* organoleptic perceptual
components as that *is* the context of use. That's why this type of test
is not suitable for verification of *audible* differences - it is not
designed to control the organoleptic components that contribute to a
"preference".

There is a heavy social science side to audio that has been ignored....

*Only* if you accept that "preference" is what you're testing for, not
relative to discrimination.

for
the interpretation of sound, particularly musical value judgements (i.e. is
the bass "right", does the orchestra sound "lifelike") are subjective
judgements that can only be reported by people....no different than people
reporting whether they liked a certain food, or color, or flavor, or thought
a certain imitation sour cream mix tasted "almost like the real thing".
Testing that ignores this aspect of the audiophile experience is
automatically suspect, and that is one of the reasons why the abx test is
not embraced by most audiophiles.

Harry, this is a totally circular exercise, as I'm sure you know
already. Your precept here is that preferences can be formed without
the ability to discriminate between presentations, which is *not* a
given, nor is it accepted by most objectivists.

No, *you* are talking about a narrow subject. You are also talking about
taking a test developed for use in a narrow way and applying it for use
in a much broader way. That is why the potential test set must also be
broadened.

Now, *We* are...audio.

snip

Your irony escapes me.

Probably because it was a typo, not irony. "No, *We* are implied verb
'talk'...audio object".

snip

Thank you for making my point. Your test is a population distribution
test, *not* a discrimination test. In the scenario I presented, the
results were not real *for me*, they were real. From that point, we can
discuss investigative ways for determining cause and extent. One need
continue with a larger sample size *ONLY* if distribution within the
population is of interest.

Sorry, Keith, there are very sophisticated probability measurements to
determine significance between two distributed populations,

Non-sequitur. The point is that your test *IS* a distribution test. I
know there are adequate methods to model test results, that does not
however make the test suitable for discrimination.

snip

If so, it would also show up in a monadic test if the test sample is
large enough (it only takes a few percent far off the centerline of the
bell-curve to create significant differences) But..

If it were just a few percent off the centerline, there would be a very
low significance, buried in the noise. Tuchy's would likely identify them
as outliers.

In measuring probabilities against a null hypothesis in distributed samples,
if their are outliers there is a reason for them....

No, if there's an identifiable cause, they are valid data, not outliers.
A matter of definition.

that's one of the
beauties of using a distributed population. Their is virtually no chance of
a true outlier screwing up the results.

We're not talking about outliers screwing up the results. We're talking
about if "only a few percent", of whatever population size, detect a
difference (or are "far off the centerline"), that will *Not* be
statistically significant. And if the 'few percent' are not distributed
normally, or homoscedastically, they will likely show up as outliers.

snip

Yes, you're talking about differences that have not been demonstrated
under *any* test scenario other than sighted.

First, how many published component tests are you citing to support this
fact? Cite them please. And of those, how many have *not* been ABX tests?
In other words, how do you determine that the differences not being found
are not the result of the test technique and environment itself.

I truly have no idea what you're talking about here. *I'm* not citing
data, I'm saying that there have been no results discussed here (re
cables, s/s amps etc.) about audible component differences that were
observed *Except* those based on sighted tests.

snip

And thus, the belief in ability to discriminate is based solely on sighted
evaluations, right?

Wrong. ABX tests are only one kind of blind test.

Fine. What types of blind tests have been performed that have positive
results for distinguishing cables? As you said, please cite them.

Otherwise, my statement stands. If there are no blind tests that have
shown utility for discriminating between components (e.g. cables, s/s
amps), then perforce the data can only be based on sighted evaluations.

To the best of my
knowledge nobody in the audio industry has yet had the motivation or
resources to undertake the kind of validation testing I have proposed. That
is another kind. Simple AB preference tests, done blind, are a third. I
can probably name another four or six variations on these.

Irrelevant. That these tests exist says nothing about what tests have
been used, as you are surely aware.

Again, my argument is not with blind, other than its practicality for home
use in the purchase of equipment.

And I am not advocating such. The tests I've done were for my own
edification after "hearing" sighted differences between cables that my
(inexpert) knowledge of solid state physics said were untenable. Yes,
this *is* a hobby, and I like "audio jewelry" as much as the next
person. It never ceases to amaze me that folks here get so defensive
when anyone "accuses" them of having preferences based on personal
biases (many not associated with sonic characteristics); as though that
were inherently wrong.

My problem is with short-snippet,
comparative testing, of which ABX is the leading example. And the same goes
for most other opponents whose position I have run into here on usenet.
Your suggestion that we oppose blind testing is a strawman that is often
used on usenet to avoid engaging over the real issues raised against ABX and
its ilk.

Again, you misconstrue. I merely remark that the only data in evidence
here that contravenes the reported ABX data is based on sighted
evaluation. A simple observation to which I ascibed no ulterior motive.

snip

Well, you clearly are interested Harry, because the *only* reason to
question whether the boundaries of ABX testing can extend beyond where you
believe it to be 'validated', is the presence of anecdotal evidence based
on sighted evaluation. What other reasons are there (non-phenomenological
that is)?

I'm not saying I'm not interested in the question, I'm saying that for
purposes of establishing a control test is is irrelevant.

Far more than that, the only evidence that intimates a control test is
called for is from sighted listening. That's the point, and that's why
I'm saying that sighted results are the *basis* for your complaint.

The control must
be both perceived and "real" in the sense of being able to be measured with
statistical significance in monadic testing. Things that are perceived but
are not real are totally irrelevant to the necessary control. The purpose
of the control test is to see if ABX testing can pick up real differences
that are not volume or frequency response related.

And the monadic test differentiates between perceived and "real" how
exactly? Especially relative to ill-defined criteria such as
"musicality", "lifelike", etc. ?

I don't want to measure "phantom" differences...first I want to establish
that there is in fact a "real" difference blind that can be discriminated
by a relatively large group of people. But I want to use a non-intrusive
test to do it.

Well, first, you don't know that the test is "non-intrusive" to a greater
extent than is ABX. You merely assume that it is. Second, you have *no*
validation of your monadic testing for detecting *audible* differences.
Further, you assume that audio and organoleptic perceptions are testable
in the same fashion, and that the intrusiveness of any particular test
constraint apply equally to both (these are implicit in your belief that
the monadic test is suitable). Neither of which has been verified. Thus,
you are using an unverified method as a reference against which to
'validate' a test that has been at least partially validated, by your own
admission, for the sensory mode under test. An untenable approach IMO.

Absolutely, I don't know. But I do know test design, and there has been
plenty of discussion about how people listen here in arguing these
issues...even Arnie's 10 criteria deal with some of them. I can certainly
say that the monadic test as proposed comes a lot closer than does ABX.

I think you'll find the first and last sentences above to be mutally
exclusive.

First, it only has to be done once, so it can use full segments of music,
establishing musical context and allowing time for differences to surface.
Second, it does not require active comparison at all.

Well, again, this is a precept that has not been stipulated. There
appear to be a number of us who do not accept that such evaluation
differs from any other type of comparison. You must have a mental model
against which to compare (for whatever parameter of interest), and to
say A is a "5" versus a "3" requires a comparison.

It simply requires
normal audiophile-type reactions to the music and the sound. Third, any
"rating" is done after the listening is over, not during it, and is based on
recall..recall that can take into account perceptions both acute and vague,
as well as feeling-states.

Introducing a huge error source...memory. As has been discussed at length.

snip

No, Harry, it does not "separate the test from the individual" at all.
That process is the same - you *assume* that multiple presentations are
inherently more intrusive, and thus data-correlative, than is a single
presentation, something I believe you have no data to support, relative to
audio. There's a very valid reason that repetitive trials for
organoleptic perception testing are problematic - the senses quickly
become habituated, and discrimination ability is reduced. AFAIK, barring
fatigue (or high volume related artifacts which should, of course, be
controlled for in the test), this has not been shown to be an issue with
auditory testing.

You must be kidding. This is one of the things most commented upon by
people using the technique...how quickly they lose the ability to
discriminate along with growing fatigue.

No, you must be speed-reading. Did you totally miss "barring
fatigue (or high volume related artifacts which should, of course, be
controlled for in the test)"?

Even those using and supporting
the test often report it as fatiguing and rather grueling with a sense
developing in the late stages of great uncertainty. The ITU guidelines even
comment upon this aspect of the test, as one of the reason for limiting the
number of trials.

Hence the need to control for fatigue...

Again, Harry, you want to use a test that has not shown *any* utility for
audible testing for a reference to 'validate' one that has. Your call to
"validate the test" applies to an even greater measure to the test you
propose as the reference.

Keith, I am trying along with others here to show why ABX should be viewed
with some skepticism for the purpose of open-ended evaluation of audio
components, until and unless it is validated. It is that simple.

I understand that. I, along with others here, are trying to point out
that ABX has been validated for auditory testing (as you have admitted).
The boundaries of that validation have been extended to discrimination
of 'difference between components'. There have been no data of any sort
presented (correct me if I'm wrong) here that would cast any doubt on
the efficacy of ABX usage in that venue *except* for 'data' collected
from sighted listening evaluations, known to have an immense propensity
for error. *And*, that using a completely unvalidated method (your
monadic test) as a reference against which to validate ABX is
unsupportable. Even were we to stipulate that ABX needs validated, it
would require a reference that had, itself, been validated for the
modality of interest. As you say, it is that simple.

Keith Hughes

wrote:
wrote:
Harry Lavo wrote:
We were speaking specifically of his latest round of
loudspeaker tests, which Sean himself describes as "monadic".

Either provide a quote and citation, or admit you are making
this up.

I visited Sean Olive at Harman last March. Yes, he does now
do monadic testing, with one speaker presented at a time.
The series of presentations does include a hidden reference
of known quality.

I believe he discussed this in his most recent AES paper, but
I need to look it up when I am in the office to confirm.

The most recent paper in the databse is the 2003 paper on
preference in trained vs untrained listeners, which I have in front
of me. The paper can be downloaded from the AES site, for a fee.
I haven't seen the newest issue of the JAES, and I don't know
how fast things get into the database, but I'll assume you
meant to 2003 paper.

The 2003 article uses the Harman lab to do four way (four speaker)
and 3-way (three-speaker) tests. I quote:
//
The four-way test involved
multiple comparisons among four speaker rated independently using
four different programs. A test comprised four trials in the morning,
repeated in the afternoon for a total of eight trials [for the three-way test
where there was only a morning test].
..
..
..
All tests were double-blind using monophonic (single-speaker) comparisons
Before each test, listeners were given their instructions and were free
to ask questions about the test procedure.

In both tests the program order was randomized. For each trial the
control control computer determined randomly the letter (A-D) assigned
to each loudspeaker. Listeners were provided feedback through an LCD
monitor that indicated the current loudspeaker being played.

Switching between loudspeakers in each trial was performed in a random
seuqence by the experimenter. THe music was paused during the 3-second
interval required to substitute the positions of the speakers. [There
follows a discussion of the possible effect of the silent interval]
..
..
..
The presentation time for each loudspeaker was typically equal to the
length of the program loop (15-30 seconds) and shortened to 10-15 seconds
toward the end of each trial. Switching continued until all listeners
had entered a rating for each loudspeaker, at which point the next trial
would begin. A trial typically lasted 3-5 minutes, with an entire session
typically lasting 15-20 minutes.

For the four way test listeners were told not to discuss their responses with
one another until the end of the second session. All listeners were shown
their results after the completion of hte test.
..
..
..
[from the instructions to listeners]:
In these instructions you will be judging the sound quality of different
loudspeakers and rating them according to your personal preference. You MUST
enter a rating for each loudspeaker in the appropriate box after the program
selection has ended. Please enter your ratings using the following preference
scale

[scale graphic]

Your rating can contain up to one decimal place (e.g. 7.3, 2.5)

DO NOT GIVE TIED SCORES IN ANY ROUND
If you do, the computer will ask yuou to reenter your ratings.

You should separate your preference ratings amond different speakers to
reflect your relative preference between two speakers. Use the following
guidelines:
[guidleines graphic]

Finally , we encourage you to write comments about what you like anbd dislike
about the sound of the speakers you are comaring: what aspects is it about the speaker
that makes you prefer it (or not prefer it) over the other speaker(s)?
//

The four prgrams , btw, were works by James Taylor, Little Feat, Tracy
Chapman, and Jennifer Warnes "selected on the basis of their ability to
revela spectral and preferential differences between differeent
loudspeakers in over 100 different louspeakers in over 100 different
listening tests and varioous listener training exercises."

From this I gather that the test proceeded as follows:
listeners were given instructions, then a 'lazy susan' (behind an
acoustically transparent screen) containing
four loudspeakers rotated to bring the first speaker to the playing position.
The first program is played until all listeners have rated the speaker on
the preference scale. Then a different -- and the listeners know it
is a different one, just not *which* one -- loudspeaker is brought into the
playing slot (this takes 3 seconds) and it plays the same program.
This is done for the remaining speakers. This constitutes one trial.
Then the whole process is repeated, using the second musical selection
from each of the four speakers. Four musical selections means four
trials in total per test. (And repeated for the four-way speaker
comparisons).

I also gather that speaker *difference* is a given for Olive
in these trials, particularly as he uses musical materials intended
to make the contrast as clear as he can.

This does not sound to me like one of Harry's monodic listening
tests -- done at home, with long listening intervals, etc.

--
-S
"The most appealing intuitive argument for atheism is the mindblowing stupidity of religious
fundamentalists." -- Ginger Yellow

"Steven Sullivan" wrote in message
...
wrote:
wrote:
Harry Lavo wrote:
We were speaking specifically of his latest round of
loudspeaker tests, which Sean himself describes as "monadic".

Either provide a quote and citation, or admit you are making
this up.

I visited Sean Olive at Harman last March. Yes, he does now
do monadic testing, with one speaker presented at a time.
The series of presentations does include a hidden reference
of known quality.

I believe he discussed this in his most recent AES paper, but
I need to look it up when I am in the office to confirm.

The most recent paper in the databse is the 2003 paper on
preference in trained vs untrained listeners, which I have in front
of me. The paper can be downloaded from the AES site, for a fee.
I haven't seen the newest issue of the JAES, and I don't know
how fast things get into the database, but I'll assume you
meant to 2003 paper.

The 2003 article uses the Harman lab to do four way (four speaker)
and 3-way (three-speaker) tests. I quote:
//
The four-way test involved
multiple comparisons among four speaker rated independently using
four different programs. A test comprised four trials in the morning,
repeated in the afternoon for a total of eight trials [for the three-way
test
where there was only a morning test].
.
.
.
All tests were double-blind using monophonic (single-speaker) comparisons
Before each test, listeners were given their instructions and were free
to ask questions about the test procedure.

In both tests the program order was randomized. For each trial the
control control computer determined randomly the letter (A-D) assigned
to each loudspeaker. Listeners were provided feedback through an LCD
monitor that indicated the current loudspeaker being played.

Switching between loudspeakers in each trial was performed in a random
seuqence by the experimenter. THe music was paused during the 3-second
interval required to substitute the positions of the speakers. [There
follows a discussion of the possible effect of the silent interval]
.
.
.
The presentation time for each loudspeaker was typically equal to the
length of the program loop (15-30 seconds) and shortened to 10-15 seconds
toward the end of each trial. Switching continued until all listeners
had entered a rating for each loudspeaker, at which point the next trial
would begin. A trial typically lasted 3-5 minutes, with an entire session
typically lasting 15-20 minutes.

For the four way test listeners were told not to discuss their responses
with
one another until the end of the second session. All listeners were shown
their results after the completion of hte test.
.
.
.
[from the instructions to listeners]:
In these instructions you will be judging the sound quality of different
loudspeakers and rating them according to your personal preference. You
MUST
enter a rating for each loudspeaker in the appropriate box after the
program
selection has ended. Please enter your ratings using the following
preference
scale

[scale graphic]

Your rating can contain up to one decimal place (e.g. 7.3, 2.5)

DO NOT GIVE TIED SCORES IN ANY ROUND
If you do, the computer will ask yuou to reenter your ratings.

You should separate your preference ratings amond different speakers to
reflect your relative preference between two speakers. Use the following
guidelines:
[guidleines graphic]

Finally , we encourage you to write comments about what you like anbd
dislike
about the sound of the speakers you are comaring: what aspects is it about
the speaker
that makes you prefer it (or not prefer it) over the other speaker(s)?
//

The four prgrams , btw, were works by James Taylor, Little Feat, Tracy
Chapman, and Jennifer Warnes "selected on the basis of their ability to
revela spectral and preferential differences between differeent
loudspeakers in over 100 different louspeakers in over 100 different
listening tests and varioous listener training exercises."

From this I gather that the test proceeded as follows:
listeners were given instructions, then a 'lazy susan' (behind an
acoustically transparent screen) containing
four loudspeakers rotated to bring the first speaker to the playing
position.
The first program is played until all listeners have rated the speaker on
the preference scale. Then a different -- and the listeners know it
is a different one, just not *which* one -- loudspeaker is brought into
the
playing slot (this takes 3 seconds) and it plays the same program.
This is done for the remaining speakers. This constitutes one trial.
Then the whole process is repeated, using the second musical selection
from each of the four speakers. Four musical selections means four
trials in total per test. (And repeated for the four-way speaker
comparisons).

I also gather that speaker *difference* is a given for Olive
in these trials, particularly as he uses musical materials intended
to make the contrast as clear as he can.

This does not sound to me like one of Harry's monodic listening
tests -- done at home, with long listening intervals, etc.

It is not. It is a preference test, only instead of being AB, it is ABCD.
And it is not done under idea conditions even for a preference test. It
would have been better if the samples were longer and the length and
rotation was under the control of an individual user/rater. Obviously
practical time considerations entered in and Sean made a judgment that for
preference he could get by with the techniques used. He may have been right
since difference in speakers are less subtle than in most audio gear...but
he'll (and we'll) never know for sure what differences might have occurred
with a more relaxed test constraint.

It is this test (I believe) that Sean told John was followed by true Monadic
descriptive testing, which correlated well. Or at least that is what I
believe I remember John saying or writing (I searched Usenet for it to no
avail, but it may have been at the Stereo show debate, or in writing in
Stereophile).. I have the 2003 paper on the way and hope to get copies of
the earlier tests as well. My suspicion is that the work that Olive was
doing in March 2005 has not been published yet.

ontent-Length: 3324
Lines: 68

Harry Lavo wrote:
"Steven Sullivan" wrote in message
This does not sound to me like one of Harry's monodic listening
tests -- done at home, with long listening intervals, etc.

It is not. It is a preference test, only instead of being AB, it ABCD.
And it is not done under idea conditions even for a preference test. It
would have been better if the samples were longer and the length of
rotation was under the control of an individual user/rater. Obviously
practical time considerations entered in and Sean made a judgment at for
preference he could get by with the techniques used. He may have been right
since difference in speakers are less subtle than in most audio gear...but
he'll (and we'll) never know for sure what differences might have curred
with a more relaxed test constraint.

It is this test (I believe) that Sean told John was followed by true Monadic
descriptive testing, which correlated well. Or at least that is at I
believe I remember John saying or writing (I searched Usenet for it to no
avail, but it may have been at the Stereo show debate, or in writing in
Stereophile).. I have the 2003 paper on the way and hope to get copies of
the earlier tests as well. My suspicion is that the work that Olive was
doing in March 2005 has not been published yet.

Hi Harry,

You know, the objectivists seem to be arguing that "professionals"
don't use monadic listening tests. That seems to me just to prove the
point that you and I are making: that there are dimensions of listening
that have never been explored in an empirical way. Obviously
"professionals" base their choices on practicality and what they can
get paid to do.

Clearly, Olive made decisions about how to run this test which are
consistent with his vision of what matters about reproduction. But a
different sort of test could easily produce different results.

I read in "Scientific American Mind" about recent research that shows
people can make accurate snap judgments about other people's
personalities. The article said this was a new idea, contradicting the
messages that psychologists sent in the past few decades, "don't judge
people at first appearance."

So which is it? Can people make accurate snap judgments, or not?

Well, if you look at the personality "model" they are using for these
tests, it is five numbers. That is a vast simplification and rather
superficial representation of the whole of a person's personality. The
tests may prove merely that people can make snap judgments about
*superficial* aspects of personality. However, it did say that this
personality model is successful at gathering reliable and repeatable
data.

Sound familiar? The objectivist model of audio doesn't have any
unexplained "holes" and demands "reliable, repeatable" observations?

Well, the personality model is still superficial, and even though
scientists are learning repeatable facts about people, it doesn't mean
they are learning anything relevant at all to living wisely, to getting
along with your neighbors, etc.

Likewise, I think the objectivist audio model is good at explaining how
people hear in quick switching situations, which is a superficial way
of using the brain. It can explain some things--but that does

Thread Tools
Show Printable Version
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Not In Love With Tivoli Audio? Maybe Here is Why-FAQ and Exegesis	[email protected]	Audio Opinions	5	April 25th 05 01:35 AM
NPR reports on new brain research music	Harry Lavo	High End Audio	21	March 25th 05 05:02 AM
Installing stand-by switch	Sugarite	Vacuum Tubes	3	February 26th 04 04:04 PM
More cable questions!	[email protected]	Tech	317	January 20th 04 03:58 AM

Menu

About Us