Home |
Search |
Today's Posts |
#81
|
|||
|
|||
Mark DeBellis wrote:
The question that interests me now is whether the implications of an identification ("Was that SACD or CD?") test need be the same as those of a discrimination ("Are A and B the same or different?") test. Does the research show, in particular, that an identification test (the kind I undertook) is among the kinds of tests that are reliable for determining whether two sources sound different? If by "identification test," you mean that you listen to a single signal and decide whether it is CD or SACD, that is extremely difficult, because you must remember what both SACD and CD sound like. (And, as I've said before, our aural memory for such small sonic differences is far too short to do that.) Whereas, in a proper same-different test (or an ABX test, which is a variant), you have both signals available to you at all times, and can switch immediately between them, which allows you to compare directly. Unfortunately, most home users cannot do that sort of a test, because it requires you not only to level-match the two (relatively easy) but also time-sync them (very hard). It would be too easy to tell that the two were different if one were running even fractionally ahead of the other. bob |
#82
|
|||
|
|||
Harry Lavo wrote:
We aren't looking to determine differences, Bob. You're the one who started this whole conversation by insisting that an ABX test was inadequate. Well, the ONLY purpose of an ABX test is to determine difference. If your argument is that an ABX test is not adequate for determining something it was not designed to determine, then you've been wasting our time. We're looking to evaluate audio components sonic signatures and subjective shading of musical reproduction. And there has been no confimation that ABX or a straight AB difference test can show up all the various shadings that show up in longer-term listening evaluations. There is no evidence that "various shadings" really do show up (rather than simply being imagined by the listener) in longer-term listening evaluations of components that cannot be distinguished in ABX tests. You are once again assuming your conclusion. bob |
#83
|
|||
|
|||
Harry Lavo wrote:
"vlad" wrote in message ... Harry, It was few weeks ago when you described your "monadic" test first time in this group. Now you are talking about this test as an established fact. Even if somebody would go into a hassle and an expense of implementing your suggestion it is not at all obvious that the test would produce any results. I think that most likely outcome that it would find your subjective terms like "warmth", "depth", etc. not correlated to the sound of the recording. I would bet that the distribution of particular term would be completely random for different users. But of course then you would require not 200 participants but 2000, etc. or something that again will make a proposed test unfeasible. And you will continue speculate about validity of your imaginary test. Please, either provide some proof that you so-called "monadic" test works or stop speculating about it. vlad Harry, you did not address my statement about your "monadic" test procedure. Let me repeat it here - -- Even if somebody would go into a hassle and an expense of -- implementing your suggestion it is not at all obvious that the test -- would produce any results. I think that most likely outcome that it -- would find your subjective terms like "warmth", "depth", etc. not -- correlated to the sound of the recording. Also you are trying to present your test as a mean of "validation" of ABX/DBT tests. ABX/DBT tests do not need validation. They test audibility of differences in physical devices (amp, wires, etc) and for this purpose they work just fine according to experts in this field. Your 'monadic' testing is designed to measure subjective differences. You probably can measure subjective preferences, I will give you that. For instance after testing of 10000 subjects you can conclude that 52% favor box a, 45% box B, and 3% are undecided. After all efforts and money spent on this test will these results have any value? Subjective is subjective and that is all. For instance, many people have subjective preference for LPs. But it does not make LP an accurate reproduction medium? Should we stick to LPs for music listening? We have much better means now to store and transfer audio signal. It is the matter of preference for some people, that's it. Nobody argues with preferences. Well, I guess I can understand why you feel that way. But fact is, Vlad, I postulated such a test as a key part (the "control" part) of a validation test here nearly two years ago. DBT does not need validation by "monadic" tests. I let the matter drop after much controvery, and only recently brought it up again (in another forum, but it has spilled over here). I also realized that perhaps understanding of what I was proposing was buried in the complexity of the overall testing needed to validate quick-switch testing, so I have tried to make my explainations as simple as possible. The reason I say it is a standard test is that it is widely used in the social sciences, psychological and behavioral sciences, and in the medical sciences. Audio is a field where it has not traditionally been used, at least to my knowledge. Partly this may be structural (there are not a lot of large companies worried about the quality of musical reprodcution, after all). But more likely it is because the field has been dominated by sound research conducted by physisists, electircal engineers, and audiologists. However, more recently scientists have made rapid progress in brain research with the growing realization that how we hear is very complex, and how we hear music even more so. There is growing realization that musical evaluation must be treated as a subjective phenomenon, and that means treating its measurement using the tools of the social and psychological scientists, and the medical scientists, not necessarily the physical scientists. Musical perception is subjective phenomenon, and always was. As difficult as you may find it to believe that ratings of things like "warmth" or "depth" or "dimensional" have meaning, those kinds of subjective yet descriptive phrases are widely used in subjective research. Of course, part of the art of researchers in a given field is determining the best, most precise, way of asking the question to minimize confusion. You don't want to say "on a scale of one to five, rate this item on "warmth"". You doubtless would construct a scale that said " on a scale of one to five, where 'one' is a relatively cool tone, and 'five' is a relatively warm tone, where would you place the sound you just heard?". Or something to that effect. No, I think first of all you will find that if you will take two amps or wires that are undistinguishable in DBT , then results of you subjective evaluation test will be all over the map. I would expect that subjective feelings of subject will be very poorly correlated if correlated at all with particular pieces of equipment. So before pouring any money or efforts in this kind of testing I would ask first why you think that this test will give results at all. My second question would be what you are going to do with results. Subjective preferences tend to change with the time and can be influenced by the last review in a Stereophile easily. I personally don't care about subjective feeling of people that I don't know. Part of the research art is developing, and oft-times pretesting, the questions so that you know they are meaningful and with minimum misinterpretation. This is all practical "art", and there are commercial researchers who are quite good at it. What do you mean by that? vlad |
#84
|
|||
|
|||
Mark DeBellis wrote:
On 24 Jun 2005 01:09:45 GMT, Gary Eickmeier wrote: You can't do a "quick switch" test with two sources that run at different speeds because you can't synchronize them, which would be a dead giveaway in itself, so that is a bad example. If you want to use that example, you will have to listen first to one, then the other, in its entirety, then decide if the speed difference is audible. If so, then do a blind series, listening to a known version, then to a randomly chosen one, and decide whether it is the same or different. In this manner you will eventually arrive at a number for a speed differential that is at the audible threshold. That is the basic idea of how audio research is done. You may find that speed differences of 1.01 will be inaudible to most, but audible to some with perfect pitch. If this is interesting enough a question for you, then do the research and report it. p.s. Suppose one carried out research such as this and found, for a given one-minute-long excerpt, what is the audible threshold. So a given subject could reliably discriminate between the excerpt and a version that is 1.01 as fast (say). What theoretical reason would we have to think that, if we did a quick switch test (see my previous email for a suggestion about how to do it), the subject would be able to tell the excerpts apart in that test? Because the difference in pitch would be the way you'd be telling them apart. (You certainly don't think you can tell the difference between a passage that is 60 seconds long and a passage that is 60.6 seconds long, do you?) And we know that differences in pitch are much easier to detect when you can switch directly between the samples. I don't understand the point about perfect pitch, because I am supposing that one version is faster than the other, not that the speed and pitch are both higher (as would be the case with analog tape). Maybe I am not seeing your point though. If one version is faster than the other, then the pitch will be higher, whatever the medium. The only exception would be if you were to use digital signal processing to correct for this. In that case, you probably won't be able to tell them apart without a stopwatch unless the difference is substantial. Our resident conductor would presumably do somewhat better, because she is trained to be sensitive to subtle differences in tempo. But even she would have her limits. bob |
#85
|
|||
|
|||
Harry Lavo wrote:
But that is a result of the fact that music itself is subjective, and *cannot* be measured objectively. The closest you can come perhaps is to substitute some kind of psychophysiological measurements. Do you really believe all that??? Notation? Music theory? Tuning systems? Harmonic series? Compositional devices? Just to name a few of the obvious ones. |
#86
|
|||
|
|||
|
#87
|
|||
|
|||
On 27 Jun 2005 14:52:21 GMT, Gary Eickmeier
wrote: Mark DeBellis wrote: The question that interests me now is whether the implications of an identification ("Was that SACD or CD?") test need be the same as those of a discrimination ("Are A and B the same or different?") test. Does the research show, in particular, that an identification test (the kind I undertook) is among the kinds of tests that are reliable for determining whether two sources sound different? I'm not sure what you mean by "identification test." There is no such paradigm in what I have read. It is much more difficult to listen to a randomly selected source and try to "identify" it than to compare two sources and decide "same" or "different." In an ABX test, for example, you can listen to the two known sources as long as you want, switch back and forth between them and listen for differences, see if you can get a "fix" on just what each sounds like, then go for a test. In the test, you would select A or B, then let the comparator select X, and decide whether X is A or B. You usually do this by quick switching between A and X, then B and X, and deciding same or different. If X is same as A, then you put A as the identification of it, and press on to trial 2. If the differences are really audible, the trials will be child's play. If they sound identical, you will be guessing and probably know it. Anyway, the task is to decide same or different, not to identify the source when presented with a single signal. Thank you. That confirms my belief and I appreciate the elegant description of the testing paradigm. Mark |
#88
|
|||
|
|||
|
#89
|
|||
|
|||
|
#90
|
|||
|
|||
vlad wrote:
So before pouring any money or efforts in this kind of testing I would ask first why you think that this test will give results at all. Because he doesn't like the results we've already got. No other reason. The problem with using monadic tests for the purpose of determining whether any difference is discernible between two components is that the you will get a large (and incalcuable) number of false negatives. You will get negative results: 1) when subjects really can't distinguish between the two, 2) when they could but didn't in this particular test (the standard false negative that all such tests face), and 3) when subjects could distinguish between the two, but their impressions based on whatever criteria you asked them about did not lean consistently in a single direction. For example, if they could all hear a difference between LP and CD, but half of them preferred one and found it more lifelike/musical/etc., and the other half had exactly the opposite reaction, the results would be inconclusive. And what good is a test for difference that can't even distinguish between things that sound as different as LP and CD? bob |
#91
|
|||
|
|||
wrote in message ...
Harry Lavo wrote: We aren't looking to determine differences, Bob. You're the one who started this whole conversation by insisting that an ABX test was inadequate. Well, the ONLY purpose of an ABX test is to determine difference. If your argument is that an ABX test is not adequate for determining something it was not designed to determine, then you've been wasting our time. It started because an ABX test was proposed as a means of making listening decisions for audio equipment. The fact that *difference* is the wrong measure is just one of the problems with this approach. We're looking to evaluate audio components sonic signatures and subjective shading of musical reproduction. And there has been no confimation that ABX or a straight AB difference test can show up all the various shadings that show up in longer-term listening evaluations. There is no evidence that "various shadings" really do show up (rather than simply being imagined by the listener) in longer-term listening evaluations of components that cannot be distinguished in ABX tests. You are once again assuming your conclusion. The shadings can presume to be there, as they are heard by many people, until proven otherwise. And they can't be proven otherwise except through something like a monadic control test. The "shadings" are subjective; it requires a test that can determine if subjective perception is real or not and that is by ratings among a large cross-section of audiophiles, with statistical analysis applied. |
#92
|
|||
|
|||
|
#93
|
|||
|
|||
Mark DeBellis wrote:
By an identification test I mean one where you can switch back and forth between the signals, but where what you have to decide is not whether they are the same or different, but which one is CD and which is SACD. That would be slightly easier than just listening to a single one, but it still requires you to remember the criteria by which you had previously distinguished (or *thought* you'd distinguished) them. So it's still harder than a same-different test, or ABX, or similar. This, I think, is difficult for some of the same reasons that the test you describe above is difficult, so what should be inferred from a subject's failure to get a high percentage of correct answers on my kind of identification test is not necessarily the same as what should be inferred from a subject's failure to get a high score on a proper "same-different" test. Agreed. In particular, there's probably some subset of sonic differences which you wouldn't detect in an identification test, but you would in a same-different test. bob |
#94
|
|||
|
|||
wrote in message ...
Harry Lavo wrote: But that is a result of the fact that music itself is subjective, and *cannot* be measured objectively. The closest you can come perhaps is to substitute some kind of psychophysiological measurements. Do you really believe all that??? Notation? Music theory? Tuning systems? Harmonic series? Compositional devices? Just to name a few of the obvious ones. I see your point. Let me correct my statement: the "experiencing" of music itself is subjective, and *cannot* be measured objectively. Now hopefully you can agree to that, which is the part that is relevant to a listening test. |
#95
|
|||
|
|||
wrote in message ...
vlad wrote: So before pouring any money or efforts in this kind of testing I would ask first why you think that this test will give results at all. Because he doesn't like the results we've already got. No other reason. Thanks for the gratuitous insult, Bob. The problem with using monadic tests for the purpose of determining whether any difference is discernible between two components is that the you will get a large (and incalcuable) number of false negatives. You will get negative results: 1) when subjects really can't distinguish between the two, 2) when they could but didn't in this particular test (the standard false negative that all such tests face), and 3) when subjects could distinguish between the two, but their impressions based on whatever criteria you asked them about did not lean consistently in a single direction. For example, if they could all hear a difference between LP and CD, but half of them preferred one and found it more lifelike/musical/etc., and the other half had exactly the opposite reaction, the results would be inconclusive. And what good is a test for difference that can't even distinguish between things that sound as different as LP and CD? Basically, Bob, this exposition shows that you have no idea of how scaling works to measure differences. Please read my current posts before you *decide* (based on erroneous beliefs) why it doesn't work. If I am to believe you, I just wasted twenty five years of work and my company(s) didn't make the hundreds of millions of dollars based on it that they thought they did. |
#96
|
|||
|
|||
"vlad" wrote in message
... Harry Lavo wrote: snip Harry, you did not address my statement about your "monadic" test procedure. Let me repeat it here - -- Even if somebody would go into a hassle and an expense of -- implementing your suggestion it is not at all obvious that the test -- would produce any results. I think that most likely outcome that it -- would find your subjective terms like "warmth", "depth", etc. not -- correlated to the sound of the recording. Well, you thoughts are your thoughts. But I have done a lot of research in food, where ratings are subjective, and I simply disagree. If one amp, for example, performs in a way that can be characterized as "cool" and another as "warm", peoples ratings will reflect that even if they think they are rating the music rather than the amp. Although there probably is no reason to deceive them since the test is mondadic. There will be substantial scatter; they won't march in lockstep. But the averages will reflect the difference, and if the difference in averages is great enough, they will reach the 95% significance level. Then you can conclude that amp "A" is warmer-sounding than Amp "B". Likewise, you can ask for overall preference and a whole series of ratings on characteristics. Together they will tell you if and how the two amps differ. Keep in mind that this goes beyond measurement. If the "coolness" is a static frequency response dip, it would also likely be heard in an ABX test. If it is the way the timbre changes dynamically, heard over an extended listen, it might not. In this *subjective* test, it really doesn't matter what is creating the perception; the test simply determines whether the perception difference is real or not. Also you are trying to present your test as a mean of "validation" of ABX/DBT tests. ABX/DBT tests do not need validation. They test audibility of differences in physical devices (amp, wires, etc) and for this purpose they work just fine according to experts in this field. The test differences that are volume-related, since as frequency response, loudness, and standard distortions. They don't do so well, many of us believe, on things that are more complex perceptually such as imaging, transparency, dynamic phase coherence, dimensionality, etc. Your 'monadic' testing is designed to measure subjective differences. That is correct. You probably can measure subjective preferences, I will give you that. That's a start. You can also measure differences in perception, believe me. To say you can't to me, with my background, is like saying you can't measure harmonic distortion to an EE. For instance after testing of 10000 subjects you can conclude that 52% favor box a, 45% box B, and 3% are undecided. After all efforts and money spent on this test will these results have any value? Subjective is subjective and that is all. Well, for starters, such a preference by 10000 people woud be statistically significant beyond a doubt. So you can say for sure "the two amps sound different, and "A" is preferred. Now also suppose that those 10000 subjects also determine that Amp "A" "sounds less constrained" on dynamic peaks, versus Amp "B" (at the 95% confidence level), and they also determine that Amp "A" sounds "easier to listen 'into' on soft passages" than Amp "B", again at the 95% level. That would give a pretty good indication of why Amp "A" was preferred. It's value, and what was done with the information, would depend on who and for what purpose the test was done. (Incidentally, usually a sample size of 200-300 people per cell is an acceptable trade off between test cost and statistical sensitivity). Now in my case I proposed such a test as part of an overall series of tests to determine if the short-form, quick-switch, comparative tests could give the same results. If so, their worth would be proven for open-ended evaluation of audio components. If not they would be misleading for this use, however valuable for other uses they might be. Or the test technique might have to be altered slightly. For example, let us hypothesize some possible results: a standard ABX test of 20 trials, conducted among ten similar people, fails to reveal a statistical difference. a standard ABX test of 20 trials, conducted among ten similar people just reaches the 95% difference threshold a standard AB preference test of 20 trials shows roughly the same preference and significance among 10 people a standard AB preference test of 20 trials shows no statistical preference among 10 people, but shows a statistical prefernce when 20 people are included All of these would have major implications for those using comparative tests for the purposes of open-ended evaluation of audio components. But only once the *benchmark* or control had been established. For instance, many people have subjective preference for LPs. But it does not make LP an accurate reproduction medium? Should we stick to LPs for music listening? We have much better means now to store and transfer audio signal. It is the matter of preference for some people, that's it. Nobody argues with preferences. If I were a Sony executive and my testing among 300 people should a statistically significant 60-40 preference for vinyl over CD, I might think hard about the product and marketing implications of same. Likewise, if I had hard evidence that SACD was preferred over CD, I'd certainly be thinking hard about how to capitalize on that fact. Well, I guess I can understand why you feel that way. But fact is, Vlad, I postulated such a test as a key part (the "control" part) of a validation test here nearly two years ago. DBT does not need validation by "monadic" tests. The double-blind technique as a concept certainly does not. However, qucik-switch comparative testing certainly does for the purpose of open-ended evaluation of audio components, since these tests were designed or a whole 'nother purpose. I let the matter drop after much controvery, and only recently brought it up again (in another forum, but it has spilled over here). I also realized that perhaps understanding of what I was proposing was buried in the complexity of the overall testing needed to validate quick-switch testing, so I have tried to make my explainations as simple as possible. The reason I say it is a standard test is that it is widely used in the social sciences, psychological and behavioral sciences, and in the medical sciences. Audio is a field where it has not traditionally been used, at least to my knowledge. Partly this may be structural (there are not a lot of large companies worried about the quality of musical reprodcution, after all). But more likely it is because the field has been dominated by sound research conducted by physisists, electircal engineers, and audiologists. However, more recently scientists have made rapid progress in brain research with the growing realization that how we hear is very complex, and how we hear music even more so. There is growing realization that musical evaluation must be treated as a subjective phenomenon, and that means treating its measurement using the tools of the social and psychological scientists, and the medical scientists, not necessarily the physical scientists. Musical perception is subjective phenomenon, and always was. Then you must use a test that measures this subjective phenomenon in its fullest. That means the test itself has to be designed to interfere the least possible way in the actual act of listening and evaluation. This is where the quick-switch, comparative testing has conceptual weakness, since it completely alters the listening experience and most likely portions of the brain involved in this activity. As difficult as you may find it to believe that ratings of things like "warmth" or "depth" or "dimensional" have meaning, those kinds of subjective yet descriptive phrases are widely used in subjective research. Of course, part of the art of researchers in a given field is determining the best, most precise, way of asking the question to minimize confusion. You don't want to say "on a scale of one to five, rate this item on "warmth"". You doubtless would construct a scale that said " on a scale of one to five, where 'one' is a relatively cool tone, and 'five' is a relatively warm tone, where would you place the sound you just heard?". Or something to that effect. No, I think first of all you will find that if you will take two amps or wires that are undistinguishable in DBT , then results of you subjective evaluation test will be all over the map. I would expect that subjective feelings of subject will be very poorly correlated if correlated at all with particular pieces of equipment. Au contraire...if their truly is no difference the averages of the two cells evaluation the amps or wires will be identical from a statistical standpoint, that is, they would fail to differ at a statistically significant level. Within each evaluating cell, there would be a lot of scatter, but the averages are what are used in such a test. So before pouring any money or efforts in this kind of testing I would ask first why you think that this test will give results at all. My second question would be what you are going to do with results. Subjective preferences tend to change with the time and can be influenced by the last review in a Stereophile easily. I personally don't care about subjective feeling of people that I don't know. I think I've answered all of this above. I've proposed it as a control test for the short form tests. And if I was a marketing or R&D exec at Sony or Harman International, I'd consider using it for other purposes, as aparently Harman has. Part of the research art is developing, and oft-times pretesting, the questions so that you know they are meaningful and with minimum misinterpretation. This is all practical "art", and there are commercial researchers who are quite good at it. What do you mean by that? I mean there are firms whose job it is to help companies design, conduct, and evaluate tests. And one of the skills a company that does this has to develop is the ability to design and pretest questions that make sense and increase response coherence. I happened to study under the founder of one such company while obtaining my MBA from Northwestern back in the early '60's. Dr. Sidney Levy was a highly regarded leader in the field of behavioral psychology. And then for twenty-five years as an executive I helped design and make decisions based on such testing for a major consumer packaged goods company, working with many such companies. |
#97
|
|||
|
|||
Jenn wrote:
Hey, I have MANY limitations! :-) I have found that I have sensitivity in regard to tempi at about 3 beats per min. 3 beats per minute out of how many? Three beats of Largo is a lot longer than 3 beats of Presto. That is, I can tell that one performance is slower or fast at about that threshold. I CAN'T pick specific tempi out of the air with that degree of sensitivity. Others can come pretty close to that. That's why I don't care for conducting ballet, for example. Dancers need things REALLY exact in tempo, and I just don't enjoy working that way; it's anti musical to me. I'm learning a lot through this discussion, btw. Thanks to the participants. Don't say that. You'll only encourage us. bob |
#98
|
|||
|
|||
Harry Lavo wrote:
wrote in message ... Harry Lavo wrote: But that is a result of the fact that music itself is subjective, and *cannot* be measured objectively. The closest you can come perhaps is to substitute some kind of psychophysiological measurements. Do you really believe all that??? Notation? Music theory? Tuning systems? Harmonic series? Compositional devices? Just to name a few of the obvious ones. I see your point. Let me correct my statement: the "experiencing" of music itself is subjective, and *cannot* be measured objectively. Now hopefully you can agree to that, which is the part that is relevant to a listening test. I wouldn't disagree, except that soliciting responses under controlled conditions is also relevant, which is a bogeyman for you for reasons you have yet to adequitely explain. They are not mutually exclusive, despite your assertions otherwise. Saying the experience of music is subjective is sort of belaboring the obvious. I think a single word to describe it better, I would use abstraction. |
#99
|
|||
|
|||
Harry Lavo wrote:
wrote in message ... vlad wrote: So before pouring any money or efforts in this kind of testing I would ask first why you think that this test will give results at all. Because he doesn't like the results we've already got. No other reason. Thanks for the gratuitous insult, Bob. The problem with using monadic tests for the purpose of determining whether any difference is discernible between two components is that the you will get a large (and incalcuable) number of false negatives. You will get negative results: 1) when subjects really can't distinguish between the two, 2) when they could but didn't in this particular test (the standard false negative that all such tests face), and 3) when subjects could distinguish between the two, but their impressions based on whatever criteria you asked them about did not lean consistently in a single direction. For example, if they could all hear a difference between LP and CD, but half of them preferred one and found it more lifelike/musical/etc., and the other half had exactly the opposite reaction, the results would be inconclusive. And what good is a test for difference that can't even distinguish between things that sound as different as LP and CD? Basically, Bob, this exposition shows that you have no idea of how scaling works to measure differences. Please read my current posts before you *decide* (based on erroneous beliefs) why it doesn't work. If I am to believe you, I just wasted twenty five years of work and my company(s) didn't make the hundreds of millions of dollars based on it that they thought they did. And those were audio tests. Correct? |
#101
|
|||
|
|||
|
#102
|
|||
|
|||
Mark DeBellis wrote:
In the example in question, I am supposing that the speed is higher but not the pitch. If the only way to do this is by digital signal processing, then so be it. There will be an audible threshold No, there won't. You're not measuring audibility here. You're measuring perception of elapsed time. (That is, if you are comparing a one-minute segment to the same segment stretched out to one minute and X seconds.) Now, you may also be measuring perception of tempo, and I would guess that focusing on tempo would be much more effective than trying to judge the relative length of two long musical segments. And you certainly don't need to listen to a full minute to judge the tempo, do you? at which a subject can reliably discriminate between the excerpt and the sped-up excerpt. My question stands: What theoretical reason would we have to think that, if we did a quick switch test (see my previous email for a suggestion about how to do it), the subject would be able to tell the excerpts apart in that test? I've no idea which post you're referring to, but your question displays a misconception about quick-switching tests. They do not *require* switching; they *allow* switching. A subject can, in a quick-switching test, listen to the entire one-minute segment, if he so chooses. Subject tend not to do so, however, because it tends not to work. Let me pose your question a different way. Which method would work better: 1) The DeBellis Method: Listen to a full one-minute segment, then listen to the same segment (possibly now stretched to 1.X minutes), and determine whether the two are the same. 2) The Marcus Method (if I may be so bold): Listen to sets of two beats, and determine whether the distance between the beats is the same or different. I don't know offhand whether this experiment has been done (though I can't imagine I'm the first to think of it). Absent any data, I see no reason to believe that your method would be more sensitive than mine. And given the general pattern of findings in psychoacoustics, I'd bet on mine. bob |
#103
|
|||
|
|||
Harry Lavo wrote:
Basically, Bob, this exposition shows that you have no idea of how scaling works to measure differences. Please read my current posts before you *decide* (based on erroneous beliefs) why it doesn't work. If I am to believe you, I just wasted twenty five years of work and my company(s) didn't make the hundreds of millions of dollars based on it that they thought they did. Basically, Harry, this exposition shows that you are not paying attention to what I am saying. Please read my posts more carefully. If, indeed, you spent 25 years trying to determine whether cereal A tastes different than cereal B, then I would question the sanity of your employers. But I suspect that you spent most of that time trying to answer other questions, like which cereal people preferred, and what they preferred about it. Now, again assuming the sanity of your employers, they would not have spent the kind of money that monadic testing costs if they had any doubt about whether cereal A and cereal B tasted different. And that is why all of your experience is completely irrelevant to the basic objectivist-subjectivist divide in audio. In your career, you were dealing with comparisons and evaluations of products that were known to differ. Here, we are talking about components that are not known to differ. That's Question #1. That's why we're talking about difference, Harry. And I think we know why you keep trying to change the subject. bob |
#104
|
|||
|
|||
Harry Lavo wrote:
wrote in message ... Harry Lavo wrote: We aren't looking to determine differences, Bob. You're the one who started this whole conversation by insisting that an ABX test was inadequate. Well, the ONLY purpose of an ABX test is to determine difference. If your argument is that an ABX test is not adequate for determining something it was not designed to determine, then you've been wasting our time. It started because an ABX test was proposed as a means of making listening decisions for audio equipment. So it apparently started because you misread something, and then decided to pick a fight about it. No one's ever suggested using ABX tests to "make listening decisions" here. It's been proposed only as a way to confirm impressions that components sound different. Stop fighting the straw men, Harry. It doesn't help your cause. The fact that *difference* is the wrong measure is just one of the problems with this approach. We're looking to evaluate audio components sonic signatures and subjective shading of musical reproduction. And there has been no confimation that ABX or a straight AB difference test can show up all the various shadings that show up in longer-term listening evaluations. There is no evidence that "various shadings" really do show up (rather than simply being imagined by the listener) in longer-term listening evaluations of components that cannot be distinguished in ABX tests. You are once again assuming your conclusion. The shadings can presume to be there, as they are heard by many people, until proven otherwise. Spoken like a true anti-empiricist. People who don't pick and choose which science they wish to believe in will understand that things don't exist just because people--even "many" people--claim they exist. Human perception is not that simple. And they can't be proven otherwise except through something like a monadic control test. The "shadings" are subjective; it requires a test that can determine if subjective perception is real or not and that is by ratings among a large cross-section of audiophiles, with statistical analysis applied. Just to sum up here, it is your position that ABX tests are inadequate because: 1) they do not measure things they are not designed to measure; and, 2) they cannot detect things we do not know exist. Glad we've got that straight. bob |
#106
|
|||
|
|||
wrote in message ...
Harry Lavo wrote: wrote in message ... Harry Lavo wrote: But that is a result of the fact that music itself is subjective, and *cannot* be measured objectively. The closest you can come perhaps is to substitute some kind of psychophysiological measurements. Do you really believe all that??? Notation? Music theory? Tuning systems? Harmonic series? Compositional devices? Just to name a few of the obvious ones. I see your point. Let me correct my statement: the "experiencing" of music itself is subjective, and *cannot* be measured objectively. Now hopefully you can agree to that, which is the part that is relevant to a listening test. I wouldn't disagree, except that soliciting responses under controlled conditions is also relevant, which is a bogeyman for you for reasons you have yet to adequitely explain. They are not mutually exclusive, despite your assertions otherwise. We simply don't know that. Knowledge of the brain suggests they may be, or at the very least are different enough demands on the brain that the "controlled conditions" where those conditions impose the need for quick-switching, short-snippet, comparative choices interfere with normal musical perception. The reason for a control test is to determine which assumptions are correct. Saying the experience of music is subjective is sort of belaboring the obvious. I think a single word to describe it better, I would use abstraction. Well, it may be obvious. But the tests being used to say "no difference" have been shown only to be highly senstive to more objective volume/partial volume differences...so other aspects of subjectivity may well be blocked. |
#107
|
|||
|
|||
wrote in message ...
Harry Lavo wrote: wrote in message ... vlad wrote: So before pouring any money or efforts in this kind of testing I would ask first why you think that this test will give results at all. Because he doesn't like the results we've already got. No other reason. Thanks for the gratuitous insult, Bob. The problem with using monadic tests for the purpose of determining whether any difference is discernible between two components is that the you will get a large (and incalcuable) number of false negatives. You will get negative results: 1) when subjects really can't distinguish between the two, 2) when they could but didn't in this particular test (the standard false negative that all such tests face), and 3) when subjects could distinguish between the two, but their impressions based on whatever criteria you asked them about did not lean consistently in a single direction. For example, if they could all hear a difference between LP and CD, but half of them preferred one and found it more lifelike/musical/etc., and the other half had exactly the opposite reaction, the results would be inconclusive. And what good is a test for difference that can't even distinguish between things that sound as different as LP and CD? Basically, Bob, this exposition shows that you have no idea of how scaling works to measure differences. Please read my current posts before you *decide* (based on erroneous beliefs) why it doesn't work. If I am to believe you, I just wasted twenty five years of work and my company(s) didn't make the hundreds of millions of dollars based on it that they thought they did. And those were audio tests. Correct? Bob's critique were of test design and use, not audio per se. Test design and use are practices in an of themselves, applicable to testing in any field. Makes no difference in this case whether food, drugs, or audio...scalar ratings work and are evaluated the same way in a mondadic test. |
#109
|
|||
|
|||
On 25 Jun 2005 02:28:36 GMT, "Buster Mudd"
wrote: But in order for a psychologist to postulate unconscious representation they need to observe something in a subject's behavior that suggests that Perceived-But-Not-Brought-To-Consciousness thing *was* affecting the subject's cognitive economy. This gets right back to my previous question: How would you go about *proving* (confirming? demonstrating?) that something was in someone's "cognitive economy" if that something could not enable that someone to perform a task? Mark wrote: Well, just as you say, by observing behavior that, together with everything else that is observed, is best explained by that hypothesis, in the context of a larger theory. p.s. If you are thinking: but specifically what behavior or what sort of behavior? DeBellis isn't telling me that! That is because the relevant behavior would vary from one case to another. It would depend on what the mental item was and what role it was playing in somebody's psychology. The relevant behavior would be specified in the psychological theory itself, not by you or I looking at the theory, as it were, "from outside." Mark |
#110
|
|||
|
|||
Harry Lavo wrote:
wrote in message ... Harry Lavo wrote: We aren't looking to determine differences, Bob. You're the one who started this whole conversation by insisting that an ABX test was inadequate. Well, the ONLY purpose of an ABX test is to determine difference. If your argument is that an ABX test is not adequate for determining something it was not designed to determine, then you've been wasting our time. It started because an ABX test was proposed as a means of making listening decisions for audio equipment. The fact that *difference* is the wrong measure is just one of the problems with this approach. Clearly you must be joking. Difference is *the* requisite predicate. If you cannot determine a difference, due to sonic characteristics only, then a preference (as between components) must be based on non-sonic attributes. QED. We're looking to evaluate audio components sonic signatures and subjective shading of musical reproduction. And there has been no confimation that ABX or a straight AB difference test can show up all the various shadings that show up in longer-term listening evaluations. There is no evidence that "various shadings" really do show up (rather than simply being imagined by the listener) in longer-term listening evaluations of components that cannot be distinguished in ABX tests. You are once again assuming your conclusion. The shadings can presume to be there, as they are heard by many people, until proven otherwise. And they can't be proven otherwise except through something like a monadic control test. The "shadings" are subjective; it requires a test that can determine if subjective perception is real or not and that is by ratings among a large cross-section of audiophiles, with statistical analysis applied. You keep repeating this misguided idea that a "monadic / proto-monadic" test must be applied to some vast population to have any meaning. As a research method to identify the frequency/distribution of some attribute or parameter, and extrapolate that to the general population, this method has merit. However, relative to the situation being discussed here, it is merely a dodge. Why? Because population distribution is irrelevant within the current context. You're talking about a test for identification of *preference* within the population, where there is a *known* difference in presented stimuli. That's a basic precept in the method. There is no *known* difference in stimuli in the current context - that's the whole argument. Luckily, however, you already have a population subset, yourself included, who claim to possess an attribute (i.e. who can distinguish, sighted, the differences within a myriad of devices believed by many to be indistiguishable, and believe that those differences are *real* and reproducible), and thus the test need only involve that subset. Conduct the test among the identified subset, construct the test to utilize blind controls and level matching, then test in whatever manner, using whatever scoring system, and for whatever period, you wish. Perform sufficient replicates to generate a statistically valid data set, and you're done. Will this be universally transferrable to the whole population? No, but again, that's irrelevant. It will, however, identify whether there is such an attribute (ability to distinguish cable differences for e.g.) within the *ONLY* population subset of interest. There is no utility in testing outside that subset until the existence of the 'peceived' attribute is confirmed, or not. You see, testing only yourself, Mr. Lavo, using proper controls, would be sufficient to confirm the existence of the ability you claim. Your failure to confirm such an ability could not be extrapolated to the population, but that's not the intent. So what keeps you from doing just that? I did, and my observed (and obvious) differences in cables...disappeared. Keith Hughes |
#111
|
|||
|
|||
In the example I posted previously ...
Suppose now that the test consists only in comparing short corresponding snippets of the two signals, two seconds in length ... please change the sample length to *one* second. (If the samples are two seconds long, and if each dot lasts an entire second ("dot" is a bad name for that, I know), and if a sample can start partway through a dot, and if the dot begins with an articulation so you can tell that a dot is beginning, then it would not be impossible to tell that you were hearing three dots in a row, if the sample started partway through a dot. This problem disappears if the sample length is changed to one second.) |
#112
|
|||
|
|||
|
#113
|
|||
|
|||
Jenn wrote:
In article , wrote: Jenn wrote: Hey, I have MANY limitations! :-) I have found that I have sensitivity in regard to tempi at about 3 beats per min. 3 beats per minute out of how many? Three beats of Largo is a lot longer than 3 beats of Presto. Three BPM out of a minute's worth of beats. This is confusing. Let's say I have a piece of music that has a fast tempo: 140 bpm. I cannot tell between 140 bpm and 143 bpm (a 2.1% difference), I don't think. Let's pick another piece of music with a slow tempo, say a Largo, with 30 bpm. I think I can tell 30 bpm from 33 bpm (a 10% difference). |
#114
|
|||
|
|||
Harry Lavo wrote:
We simply don't know that. Knowledge of the brain suggests they may be, or at the very least are different enough demands on the brain that the "controlled conditions" where those conditions impose the need for quick-switching, short-snippet, comparative choices interfere with normal musical perception. Then you would then agree that all musicians are unmusical because the effort involved in just playing the right notes at the right time (objective) destroys their emotional perception of music. Playing all those right notes at the right time also involves training, (read: rehersal, where musicians break pieces up into parts, make exercises out of passages, compare snippets of interpretive ideas played back to back and etc. and then have to put it all back together) which is something else that you seem to think destroys music. I think it's absurd. Sorry. |
#115
|
|||
|
|||
Harry Lavo wrote:
wrote in message ... Harry Lavo wrote: wrote in message ... vlad wrote: So before pouring any money or efforts in this kind of testing I would ask first why you think that this test will give results at all. Because he doesn't like the results we've already got. No other reason. Thanks for the gratuitous insult, Bob. The problem with using monadic tests for the purpose of determining whether any difference is discernible between two components is that the you will get a large (and incalcuable) number of false negatives. You will get negative results: 1) when subjects really can't distinguish between the two, 2) when they could but didn't in this particular test (the standard false negative that all such tests face), and 3) when subjects could distinguish between the two, but their impressions based on whatever criteria you asked them about did not lean consistently in a single direction. For example, if they could all hear a difference between LP and CD, but half of them preferred one and found it more lifelike/musical/etc., and the other half had exactly the opposite reaction, the results would be inconclusive. And what good is a test for difference that can't even distinguish between things that sound as different as LP and CD? Basically, Bob, this exposition shows that you have no idea of how scaling works to measure differences. Please read my current posts before you *decide* (based on erroneous beliefs) why it doesn't work. If I am to believe you, I just wasted twenty five years of work and my company(s) didn't make the hundreds of millions of dollars based on it that they thought they did. And those were audio tests. Correct? Bob's critique were of test design and use, not audio per se. Test design and use are practices in an of themselves, applicable to testing in any field. Makes no difference in this case whether food, drugs, or audio...scalar ratings work and are evaluated the same way in a mondadic test. Yes, which leads me to my point. The little details of how a test is implemented are dependent on what you are testing. You keep trying to take your experience with food tasing tests and apply them to audio testing apparently without reading the scientific literature on hearing perception. I doubt if you really understand the difference between marketing research and basic research. |
#116
|
|||
|
|||
Mark DeBellis wrote:
Yes indeed. Here is another example which may prove useful. Suppose you have two signals. The first consists of the pattern dot-dot-dee (where dot and dee are different pitches, say), repeated over and over, where each dot or dee lasts one second. The second pattern consists of dot-dot-dot-dee, repeated over and over. Say that these signals are synchronized to begin at the start of the patterns, and then each goes on in its own way. Suppose now that the test consists only in comparing short corresponding snippets of the two signals, two seconds in length (so you hear only the snippet, not its surrounding context). If the task is to say whether the two signals are the same, that will be easy, because there will be different sounds on at least some of the samples, assuming enough samples are allowed. If on the other hand the task is to say which signal is which, it will be impossible, because the samples are too short. This is an example of the "forest-for-trees" phenomenon that I worried about in an earlier post. And the difference between the dot-dot-dee pattern and the dot-dot-dot-dee pattern is an example of a difference in properties of temporally extended passages. "Because the samples are too short"??? Unless your dots & dees are plodding along at an excrutiatingly lethargic adagio, the two second samples *wouldn't* be shorter than a single complete iteration of this recurring dot-dot-dee or dot-dot-dot-dee pattern...which is all that would be required for most folks to identify which signal is which. |
#117
|
|||
|
|||
In article , Chung
wrote: Jenn wrote: In article , wrote: Jenn wrote: Hey, I have MANY limitations! :-) I have found that I have sensitivity in regard to tempi at about 3 beats per min. 3 beats per minute out of how many? Three beats of Largo is a lot longer than 3 beats of Presto. Three BPM out of a minute's worth of beats. This is confusing. Let's say I have a piece of music that has a fast tempo: 140 bpm. I cannot tell between 140 bpm and 143 bpm (a 2.1% difference), I don't think. Let's pick another piece of music with a slow tempo, say a Largo, with 30 bpm. I think I can tell 30 bpm from 33 bpm (a 10% difference). I was speaking in generalities. I think that I could do that at 140...I'll have someone test me! :-) |
#118
|
|||
|
|||
"Keith Hughes" wrote in message
... Harry Lavo wrote: wrote in message ... Harry Lavo wrote: We aren't looking to determine differences, Bob. You're the one who started this whole conversation by insisting that an ABX test was inadequate. Well, the ONLY purpose of an ABX test is to determine difference. If your argument is that an ABX test is not adequate for determining something it was not designed to determine, then you've been wasting our time. It started because an ABX test was proposed as a means of making listening decisions for audio equipment. The fact that *difference* is the wrong measure is just one of the problems with this approach. Clearly you must be joking. Difference is *the* requisite predicate. If you cannot determine a difference, due to sonic characteristics only, then a preference (as between components) must be based on non-sonic attributes. QED. Difference is a necessary condition to explain differences in sonic perception. The problem is that AB or ABX testing has never been shown decisively to be able to include in their "difference" measurement *all* the things that can led to a perception difference. Thus the need for a control test. We're looking to evaluate audio components sonic signatures and subjective shading of musical reproduction. And there has been no confimation that ABX or a straight AB difference test can show up all the various shadings that show up in longer-term listening evaluations. There is no evidence that "various shadings" really do show up (rather than simply being imagined by the listener) in longer-term listening evaluations of components that cannot be distinguished in ABX tests. You are once again assuming your conclusion. The shadings can presume to be there, as they are heard by many people, until proven otherwise. And they can't be proven otherwise except through something like a monadic control test. The "shadings" are subjective; it requires a test that can determine if subjective perception is real or not and that is by ratings among a large cross-section of audiophiles, with statistical analysis applied. You keep repeating this misguided idea that a "monadic / proto-monadic" test must be applied to some vast population to have any meaning. As a research method to identify the frequency/distribution of some attribute or parameter, and extrapolate that to the general population, this method has merit. However, relative to the situation being discussed here, it is merely a dodge. Why? Because population distribution is irrelevant within the current context. You're talking about a test for identification of *preference* within the population, where there is a *known* difference in presented stimuli. That's a basic precept in the method. There is no *known* difference in stimuli in the current context - that's the whole argument. I have proposed it only as a means of validating ABX and AB testing, to make sure that they can deliver the goods in the more esoteric perceptual areas. It has never been done, and until it is, the use of such tests, while bequiling because of their simplicity, is simply a matter of faith in the test technique. Not science. Luckily, however, you already have a population subset, yourself included, who claim to possess an attribute (i.e. who can distinguish, sighted, the differences within a myriad of devices believed by many to be indistiguishable, and believe that those differences are *real* and reproducible), and thus the test need only involve that subset. Conduct the test among the identified subset, construct the test to utilize blind controls and level matching, then test in whatever manner, using whatever scoring system, and for whatever period, you wish. Perform sufficient replicates to generate a statistically valid data set, and you're done. You can't use the test you believe might be inaccurate to validate itself. Think about it. Will this be universally transferrable to the whole population? No, but again, that's irrelevant. It will, however, identify whether there is such an attribute (ability to distinguish cable differences for e.g.) within the *ONLY* population subset of interest. There is no utility in testing outside that subset until the existence of the 'peceived' attribute is confirmed, or not. Again you miss the basic point. The test is not *PROVEN* to work for all conditions of perceived sonic difference. You see, testing only yourself, Mr. Lavo, using proper controls, would be sufficient to confirm the existence of the ability you claim. Your failure to confirm such an ability could not be extrapolated to the population, but that's not the intent. So what keeps you from doing just that? I did, and my observed (and obvious) differences in cables...disappeared. Yep, so you bought the argument. Did you ever seriously question the underlying premises of the test itself? Did you ever think about the difference in how you listened during the test, and how you listen when relaxing and enjoying music? Did you pause to consider that the ear/brain function in *listening to music* is very complex and context-derived? If not, then you've bought into a faith. But it is not science. If it was truly science, it's advocates (not its skeptics) would be pushing to absolutely, positively verify it. That has not happened. |
#119
|
|||
|
|||
wrote in message ...
Harry Lavo wrote: We simply don't know that. Knowledge of the brain suggests they may be, or at the very least are different enough demands on the brain that the "controlled conditions" where those conditions impose the need for quick-switching, short-snippet, comparative choices interfere with normal musical perception. Then you would then agree that all musicians are unmusical because the effort involved in just playing the right notes at the right time (objective) destroys their emotional perception of music. Playing all those right notes at the right time also involves training, (read: rehersal, where musicians break pieces up into parts, make exercises out of passages, compare snippets of interpretive ideas played back to back and etc. and then have to put it all back together) which is something else that you seem to think destroys music. I think it's absurd. Sorry. And I would suggest that a musician performing is more akin to an audiophile taking a test, rather than one kicking back and simply experiencing the music. |
#120
|
|||
|
|||
"Steven Sullivan" wrote in message
... wrote: Harry Lavo wrote: wrote in message ... Harry Lavo wrote: But that is a result of the fact that music itself is subjective, and *cannot* be measured objectively. The closest you can come perhaps is to substitute some kind of psychophysiological measurements. Do you really believe all that??? Notation? Music theory? Tuning systems? Harmonic series? Compositional devices? Just to name a few of the obvious ones. I see your point. Let me correct my statement: the "experiencing" of music itself is subjective, and *cannot* be measured objectively. Now hopefully you can agree to that, which is the part that is relevant to a listening test. I wouldn't disagree, except that soliciting responses under controlled conditions is also relevant, which is a bogeyman for you for reasons you have yet to adequitely explain. I would propose that the 'experiencing' of music isn't inherently beyond scientific investigation, as brain scanning technology advances. Certainly the 'experiencing' of music has been the subject of psychological investigation. For what it is worth, Steven, I agree with you on this and hope more of this type of work is done. From some of the articles I scanned briefly while looking for the Oohashi article, it would appear more and more is being done. Of course one can 'experience' things that have no physical existence, making 'experience' alone rather iffy as a basis for objective claims of difference. I am pretty sure that Harry would report a different 'experience' of the *same* musical selection, using the same playback gear, played twice in succession, if Harry was led to believe that he was hearing different gear. Possibly this 'experiental' difference would even have a physical manifestation, visible in a brain scan. Imagination does. Agree with you again. However, I am talking about a test that measures the average response of two similar groups of people to the same musical stimuls, but played through two different pieces of gear. Therefore any difference can only be ascribed to the equipment. Thats what test design is designed to do...control all othe variables, either by eliminating them or by normalizing them. snip remainder as not commented upon |
Reply |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
common mode rejection vs. crosstalk | Pro Audio | |||
Topic Police | Pro Audio |