View Single Post
  #25   Report Post  
Bob Cain
 
Posts: n/a
Default



Ryan wrote:
Bob Cain wrote in message ...


The Ghost could address this in some detail if anyone could
get him to do something besides insult people. When he was
young he published with one of the pioneers in the field of
hearing research, someone who I believe got a Nobel Prize
for it.




The Ghost?


Unimportant. If you don't know of him, you certainly don't
want to.

1. Analyze real instrument sound files. These files should inculde
every possible way every classical instrument can be played, from the
traditional to the avant garde. For the viols for example, from plain
jane arco to bartok's snapping strings to harmonics to different bow
pressures to playing behind the bridge to the tapping of fingers on
the body of the instruments. There should be files that represent the
instruments at all possible dynamic levels. There should be files
that feature the instruments playing in micotones if it can do so.
(Most classical instruments can.) Also, there should be analysis of
the instruments in "static form." By this I mean the part of the
sound after the intial attack, which can be looped over and over again
to give the impression the note is sustaining. This is done in
standard synthesis as well as good sample libraries. It may take
quite awhile to amass all these samples, but once collected the
analysis of them only has to be done once.


And has yet to be done once. :-)

You aren't really defining an analysis, or even the features
you would like extracted and cataloged. "Every possible way
an instrument can be played" has no meaning until you very
specifically give it that. It is what my high school
writing teacher called a glittering generality. I'm sorry
if that is a bit brutal but so was she. :-)

How would the subjective characteristics that your brain is
very good at discerining be algorithmically characterized
and what would be the form of the data the analysis
produced? You can't just describe it in subjective terms
because we have yet to teach machines this level of
subjective classification and discernment. We are a _long_
way from that.

Don't just offer the term FFT. There is no new information
in an FFT, just a different view of it. What you are
imagining would employ transforms of some kind, undoubtedly,
but which ones and exactly how they could be used to get at
the far more complex information you want is not even a well
formulated problem much less a solved one.

Imagine asking for a machine that could analyze and
catagorize smiles. What you are asking is far more
difficult and open ended.


2. Deduct from these analyzations the prime aspects of these sounds.


First you must very precisely characterize all of these
prime aspects via a, probably long, research program and
then figure out what processes must be applied to the data
to extract and classify them in those terms.

If we only have, say ten frequencies to represent this sound, which
ones would be the most usefull.


That particular "if" has no real connection to reality.

Or would some other type of info
about the file be more imprtant than it's frequencies?


Good question. Now you are getting to the heart of the matter.

So now we have
a set of data instead of just a pcm sound file.


Not quite yet we don't.

We can call these
data sets, "fingerprints."


What would be in these data sets.

This is mainly to help speed up the math
performed later during step 4, though it will compromise the accuracy
of the final product.


What math?

Ideally, the user should be able to select the
amount of data to be derived from the samples.


Cool.


3. Analyze any given sound file. These would be the "real world"
sounds. Or anything at all. In fact, I was thinking last night that
the ultimate test for this software would be to feed it, say,
Beethoven's 9th, and see how close it could aproximate it.


Approximate it with what?


4. Run differential, or co-efficent on the "real world" sound file
compared to all the "sound fingerprints" the program created in step
2.


Each of your analyzed snippets would be a vector in a very
high dimensional parameter space. Once you defined that
space and a way to deduce all the coordinates in it for a
particular fingerprint, you could then determine the
corresponding vectors for your "real world" sounds. Problem
is that once the dimensions of a space get large enough, any
arbitrary vector in it will almost certainly be orthogonal
to any other. What this means is that they have about as
much in common as "left" and "wrong." Matching is poorly
defined in such situations.


5. Create midi file. After the program has deduced what would be the
best combination of instruments in which playing styles at what
pitches and what dynamics, playing at what kind of rhytmic figures,
etc., the program would simply create a multiple staff midi file with
all said info scored on it.


Yeah, simply.

Viola!


What, you want to do all this synthesis with a single
instrument? :-)


Bob
--

"Things should be described as simply as possible, but no
simpler."

A. Einstein