Home |
Search |
Today's Posts |
#1
|
|||
|
|||
Fourier Analyses, or, how the Orchestra learned to play "Jet Engine"
I'm looking to find out more about writing some software that will use
traditional classical instruments to emulate "natural" or "non musical sounds." The software will perform some type of analyses on an audio file, I imagine FFT would be used at some point, but the problem with FFT is that it only tells you what "perfect" or pure sine wave based frequencies are present in a sound. Besides the flute, not much else in an orchestra has anything close to a sine wave output. After this analysis is done, the software will look through a library of sounds made by traditional instruments. These sounds will include every noise and playing style every traditional instrument can produce. The software will then juggle the sounds around at various dynamic levels in various rhythms and etc until it comes up with the closest combination to the original sound. Perhaps a car engine sound file would yield three Double Basses, a flute or two in very quiet irregular rhythms, and maybe a horn would be involved during gear changes. I might not have to tell you that Gyorgy Ligeti's "Atmospheres" and his "Mechanical Music" served as the chief inspiration for this idea. Has anybody ever heard of anything like this, or know where I might start to look for info on this subject? I'm not looking for programming help, but rather, help with setting up the math. Are there any scientific communities online that I could point my questions to? Any books on this type of thing. I've heard Csound might work for this. I thought Csound was for composing, not for analyzing existing sound files. I can't seem to come up with the right keywords to get anything out of Google, but I hoped someone here might be able to put me on the right path. |
#3
|
|||
|
|||
Ryan wrote:
I'm looking to find out more about writing some software that will use traditional classical instruments to emulate "natural" or "non musical sounds." The software will perform some type of analyses on an audio file, I imagine FFT would be used at some point, but the problem with FFT is that it only tells you what "perfect" or pure sine wave based frequencies are present in a sound. No. ANY arbitrary waveform can be decomposed down to sine waves. When you put the sines back together, you can reconstitute the original wave. This is the WHOLE POINT of the Fourier series. The time domain and frequency domain representations of the waveform are equivalent and you can convert from one to the other and back with impunity. Besides the flute, not much else in an orchestra has anything close to a sine wave output. After this analysis is done, the software will look through a library of sounds made by traditional instruments. These sounds will include every noise and playing style every traditional instrument can produce. The software will then juggle the sounds around at various dynamic levels in various rhythms and etc until it comes up with the closest combination to the original sound. Why use a computer for this anyway? George Gershwin did a perfectly good job of this by ear. --scott -- "C'est un Nagra. C'est suisse, et tres, tres precis." |
#4
|
|||
|
|||
On Tue, 12 Oct 2004 07:09:24 -0700, Karl Winkler wrote:
snip What you are attempting to do with sounds reminds me of those posters, where one large picture (say, of a person) is made up of hundreds of smaller pictures. The movie poster for "The Truman Show" starring Jim Carrey comes to mind. Perhaps some of the math or the code from that system may work for what you are doing. There is a free program called "Soundmosaic" that does exactly this. It sorta works. http://thalassocracy.org/soundmosaic/ (Some people here may appreciate the demo of a George Bush speech combined with a chimp screaming.) And "Dissasociated studio" which does the same kind of thing, but within a single audio file. http://www.panix.com/~asl2/music/dissoc_studio/ |
#6
|
|||
|
|||
|
#7
|
|||
|
|||
Ryan wrote:
(Scott Dorsey) wrote in message ... Hi Scott. How have you been? Heard anymore Sonic Youth of late? I'm listening to Toots and the Maytals as I type this... No. ANY arbitrary waveform can be decomposed down to sine waves. When you put the sines back together, you can reconstitute the original wave. This is the WHOLE POINT of the Fourier series. The time domain and frequency domain representations of the waveform are equivalent and you can convert from one to the other and back with impunity. So what I have to do is perform FFT on each of my sound "samples", the squeak of a vilon played behind the bridge, a viol's "dry string" sounds, regular arco, pizzicato, etc, ect, all the ohter instruments, etc. And then perform an FFT on any given sound file I'm interested in emulating. After that, what kind of math would be used to sort through all the samples and figure what goes best where? I'm not sure this will really do what you want, but you can try it. You could just do a standard correlation coefficient and see how close they come. Then again, you could probably just do a correlation coefficient on the samples themselves. That might be fun to look at. Samplitude features an FFT analyses window. It just looks like a regular EQ anlysis to me. Is it the case that if I take each frequency as a sine wave and apply it to the given amplitude that I will have achieved X's sound? Is there anyway to simplify that? Even the simplest natural sounds have about a 5khz range. Do I have to create 5000 individual sine waves? The FFT graph only shows frequency over time, How do I find out about the relationships between the frequencies as far as timming? For example say a put a sine wave at 2Khz and 1Khz. Obviously the 2Khz occilates twice as fast as the 1 Khz, but beyond that, the starting/ending points (where y=0) might not sink up. The 2Khz sine may start, say, 300ths of a second after the 1Khz. I don't think info like this can be found out by the FFT window, can it? No, you probably want a tool like matlab. How many terms you want to calculate out to depends on how good an approximation you want. I think that the number of terms that you're going to get is going to be larger than the number of samples in the original file for most arbitrary sounds. You can decide to reduce this by bandlimiting the original signal, though. --scott -- "C'est un Nagra. C'est suisse, et tres, tres precis." |
#8
|
|||
|
|||
Symbolic Sound's KYMA does resynthesis in real-time.
Gotta buy the box, though . . . Kurt Riemann |
#9
|
|||
|
|||
|
#10
|
|||
|
|||
Ryan wrote: Terms? As in how many instruments I want to end up with? Or by what specs I will measure the orignal soundfile? If the later, do you mean something like bitrate, samplerate, something else? Why would the number of terms be greater than the samplerate? Is matlab an audio tool. Probably just a math program right? So I would enter in pcm info and run the calculations and then use the output to create a pcm file? Sorry so many questions. Ryan, most of what you are asking about is well beyond the state of the art, the art being DSP. I would suggest that you go to comp.dsp and set forth what it is you want to get more specific feedback about it. Bob -- "Things should be described as simply as possible, but no simpler." A. Einstein |
#11
|
|||
|
|||
On Wed, 13 Oct 2004 17:01:59 -0700, Bob Cain
wrote: Ryan wrote: Terms? As in how many instruments I want to end up with? Or by what specs I will measure the orignal soundfile? If the later, do you mean something like bitrate, samplerate, something else? Why would the number of terms be greater than the samplerate? Is matlab an audio tool. Probably just a math program right? So I would enter in pcm info and run the calculations and then use the output to create a pcm file? Sorry so many questions. Ryan, most of what you are asking about is well beyond the state of the art, the art being DSP. I'm trying to follow the thoughts... it appears what he wants is a computer program that does with an orchestra what one does with a synthesizer to imitate the sound of a musical instrument ("imitative synthesis"). I suppose nowadays you could write a program that scans a digitized audio recording and makes a patch (or orchestral score) that somewhat crudely approximates the sound, but it could surely be tweaked by hand/ear to make it better, or perhaps a synthesist (person making a synth patch) would just start over and make something that sounds better/closer. I doubt that having it do a mathematical operation such as a fit a least-squares match of the FFT would make it anywhere near the "original sound" as would a person experienced in doing these things. But to make "arbitrary sounds" with orchestral instruments ... the only thing I've heard that's anything like this is on Peter Shickele's "Upper West Side" where he says something about hearing Vivaldi one more time. The strings play throught the melody once, then they play the beat of the melody with hip-hop record-scratching sounds. It was hard to believe my ears. Is there a video? I'd like to SEE these string players reproducing this speed-up-and-slow-down record-scratching sound. I would suggest that you go to comp.dsp and set forth what it is you want to get more specific feedback about it. Like MIDI output of polyphonic audio input, this technology is not quite (actually nowhere near) ready for prime time. Bob ----- http://mindspring.com/~benbradley |
#12
|
|||
|
|||
Ben Bradley wrote in message . ..
I'm trying to follow the thoughts... it appears what he wants is a computer program that does with an orchestra what one does with a synthesizer to imitate the sound of a musical instrument ("imitative synthesis"). I suppose nowadays you could write a program that scans a digitized audio recording and makes a patch (or orchestral score) that somewhat crudely approximates the sound, but it could surely be tweaked by hand/ear to make it better, or perhaps a synthesist (person making a synth patch) would just start over and make something that sounds better/closer. I doubt that having it do a mathematical operation such as a fit a least-squares match of the FFT would make it anywhere near the "original sound" as would a person experienced in doing these things. Well, maybe, I don't really know. I'd be surprised if some type of math couldn't be rigged up that would do as good a job as a human. It's all analytical, and actually not too subjective. It will either sound like a jet engine or not, and since the computer will "know" what a jet engine sounds like thanks to the FFT and differential analysis, it seems to me this shoud be as easy as asking a computer to come up with a number that adds to 7 to make ten. It doesn't matter if it's in real time or not to me. It could take an hour to process a minute long soundfile for all I care. And once I get something together I can tweak it for better results, and it doesn't have to be perfect. Again, this will be mainly a learning tool. But to make "arbitrary sounds" with orchestral instruments ... the only thing I've heard that's anything like this is on Peter Shickele's "Upper West Side" where he says something about hearing Vivaldi one more time. The strings play throught the melody once, then they play the beat of the melody with hip-hop record-scratching sounds. It was hard to believe my ears. Is there a video? I'd like to SEE these string players reproducing this speed-up-and-slow-down record-scratching sound. Hmmm, I've never heard this before. You're not talking about the musical "west side story" I gather. Anyway, my guess is that it's a simple dodecaphonic or maybe microtonal glissando performed with light enough pressure on the bow/strings to emit that rosiny scratchy sound. I keep bringing up this guy's name, but if you haven't listened to any Ligeti, you really owe it to yourself to. His "Atmospheres" and his "Harmonies (for organ)" are good starting points. His music often sounds like Arbitrary sounds, and it's always produced with traditional instruments. "Harmonies" is especially interesting. The organ has to be rigged up to change the inner air pressure so as to play microtonally. The low powered organ sounds like a giant whoosh of sound, or the kind of still wonder you might expect an astronaut to hear in his head. It mesmerizes and twinkles like distant stars or complex microscopic schools of glowing plankton in the ocean at night. In fact, a small bit of Atmoshperes was used in 2001: a space oddessy. A lot of his music takes you into the moment, stops your breath, and makes you question why no one else thought of it first. He does this partially by emulating real world sound. I would suggest that you go to comp.dsp and set forth what it is you want to get more specific feedback about it. This is good advice. Like MIDI output of polyphonic audio input, this technology is not quite (actually nowhere near) ready for prime time. If it was out and available on every supermarket endcap, I probably wouldn't want anything to do with it! ;-) This interests me because as far as I know, it isn't really done that much (the orchestration, not the software), and certainly not the extent I want to take it to. Bob ----- http://mindspring.com/~benbradley |
#13
|
|||
|
|||
Ryan wrote:
I'd be surprised if some type of math couldn't be rigged up that would do as good a job as a human. If the math is required to make the assumptions you make in the next few sentences putting the calcs together is going to be tough. It's all analytical, and actually not too subjective. It will either sound like a jet engine or not, and since the computer will "know" what a jet engine sounds like thanks to the FFT and differential analysis, it seems to me this shoud be as easy as asking a computer to come up with a number that adds to 7 to make ten. Do all oboes sound the same? All violins? All trumpets? _All jet engines_? "Not too subjective" goes into the grist mill when a creative mind chooses among available voicings for a given instrument. -- ha |
#14
|
|||
|
|||
I think one of the things you'll find, investigating these real-world
sounds, is that most of them differ drastically from the sound made by most musical instruments in that they are inharmonic; in other words, musical instruments produce sound consisting mostly of a fundamental and harmonics, at integer multiples of the fundamental frequency. Real-world noises, to a great extent, have mixtures of frequencies that aren't integer multiples of one another. The implication of that, of course, is that in trying to score instruments to sound like real-world noises, you'll have to suppress their natural tendency to play with integer-multiple harmonic series. In other words, you'll need to force them to stop behaving like musical instruments. Thus, for example, the suggestion of the light-pressure bow producing extraneous, "non-musical" sounds in the Schickele recording. Contemporary composers have been doing things like this for a while, with varying degrees of success -- I think back to the string snaps in Bartok's Music for Strings, Percussion and Celesta, in effect making the fiddles into percussion instruments. Interesting project, and quite a challenge. Peace, Paul |
#15
|
|||
|
|||
Ryan wrote: Well, maybe, I don't really know. I'd be surprised if some type of math couldn't be rigged up that would do as good a job as a human. Prepare, then, to be surprised. Our mechanisms for feature extraction and interpretation remain largely a mystery. The process is highly algorithmic and that is very different than mathematical, although math can be employed in some algorithmic process. It's all analytical, and actually not too subjective. It will either sound like a jet engine or not, and since the computer will "know" what a jet engine sounds like thanks to the FFT and differential analysis, it seems to me this shoud be as easy as asking a computer to come up with a number that adds to 7 to make ten. An FFT doesn't begin to disclose what you are looking for in and of itself. It's no more than a view of the same data with a different independant axis. It contains no information at all about when things happen. In any event, the ear brain does not do a Fourier analysis. There are frequency dependant mechanisms but they are totally ad hoc in terms of what nature found most useful for subsequent analysis. In a very real sense you are asking for an artificial ear all the way through to the process of blind separation. That problem remains a curiousity that researchers are merely nibbling the edges of. You might want to Google on "blind separation" to see how much your problem involves that and how little progress has been made. Bob -- "Things should be described as simply as possible, but no simpler." A. Einstein |
#16
|
|||
|
|||
Bob Cain wrote:
An FFT doesn't begin to disclose what you are looking for in and of itself. It's no more than a view of the same data with a different independant axis. It contains no information at all about when things happen. Or why things happen. -- ha |
#17
|
|||
|
|||
|
#18
|
|||
|
|||
Bob Cain wrote in message ...
Ryan wrote: Well, maybe, I don't really know. I'd be surprised if some type of math couldn't be rigged up that would do as good a job as a human. Prepare, then, to be surprised. Our mechanisms for feature extraction and interpretation remain largely a mystery. The process is highly algorithmic and that is very different than mathematical, although math can be employed in some algorithmic process. It's all analytical, and actually not too subjective. It will either sound like a jet engine or not, and since the computer will "know" what a jet engine sounds like thanks to the FFT and differential analysis, it seems to me this shoud be as easy as asking a computer to come up with a number that adds to 7 to make ten. An FFT doesn't begin to disclose what you are looking for in and of itself. It's no more than a view of the same data with a different independant axis. It contains no information at all about when things happen. Is there any kind of analysis that does? I used FFT because that's the only one I've really ever heard of. What if I perform a different FFT for every second of the soundfile? In any event, the ear brain does not do a Fourier analysis. There are frequency dependant mechanisms but they are totally ad hoc in terms of what nature found most useful for subsequent analysis. Was this a typo? I hope this doesn't offend, but every site I've looked at about this says that indeed our ears do function as FFT devices. If this is incorrect I'd very much like to know the turth about the matter. In a very real sense you are asking for an artificial ear all the way through to the process of blind separation. That problem remains a curiousity that researchers are merely nibbling the edges of. You might want to Google on "blind separation" to see how much your problem involves that and how little progress has been made. Bob Is this what I'm asking for? I really don't know myself. It seems to me FFT would work ideally if the only instruments I wanted to score for were flutes. Flutes have an almost perfect sine wave output. And since FFT is a breakdown of the sound into sine waves, I'd think this would work quite well, except of course for the limited bass range of the flute family. No? Regardless, thanks for giving me some new info to go on. |
#19
|
|||
|
|||
"Paul Stamler" wrote in message ...
I think one of the things you'll find, investigating these real-world sounds, is that most of them differ drastically from the sound made by most musical instruments in that they are inharmonic; in other words, musical instruments produce sound consisting mostly of a fundamental and harmonics, at integer multiples of the fundamental frequency. Real-world noises, to a great extent, have mixtures of frequencies that aren't integer multiples of one another. This is something I've always wondered about. I thought everything obeyed the 1st harmonic, 2cnd harmonic, etc., rules. Is it possible for a sound to have no overtones? I thought that even computer generated sounds that have no harmonics on screen, produce them automatically when they come out of the speaker. I thought the harmonic series was just part of the physics of sound. Yes, real world sounds often contain dissonant and un related intervals, but if we broke down the overall sound to a set of sounds, wouldn't these sounds in themselves produce the natural overtones? The implication of that, of course, is that in trying to score instruments to sound like real-world noises, you'll have to suppress their natural tendency to play with integer-multiple harmonic series. In other words, you'll need to force them to stop behaving like musical instruments. How about microtones? I imagine the sound of an F#+ coming out of an oboe would create some funny interactions with the harmonics. But I could be wrong. Thus, for example, the suggestion of the light-pressure bow producing extraneous, "non-musical" sounds in the Schickele recording. Contemporary composers have been doing things like this for a while, with varying degrees of success -- I think back to the string snaps in Bartok's Music for Strings, Percussion and Celesta, in effect making the fiddles into percussion instruments. Interesting project, and quite a challenge. Peace, Paul |
#20
|
|||
|
|||
Ryan wrote: Well, I'm just starting to get my hands around this. I think I may be suffering from "don't know how to ask the right questions" syndrome. Just to clarify a bit: It is certainly true that no two oboes sound the same, in fact the very same oboe can sound different from day to day or from climate to climate. I think we could aproximate the sound of a bassoon, and since this is is only a learning tool, not intended to produce a perfect final product, that would be good enough. On the other hand, for this problem, there is only one sound of a jet engine, and that sound would be whatever soundfile I choose to feed to the software. Although both sounds will have to be analyzed to produce the desired effect, the file I seek to emulate, "the jet engine sound," will never have to suffer from aproximation. That's what I meant by "the computer will know" what a jet engine sounds like. You've got me confused now, what is it that you are wanting to do that is different than a sampler? Bob -- "Things should be described as simply as possible, but no simpler." A. Einstein |
#21
|
|||
|
|||
Ryan wrote: An FFT doesn't begin to disclose what you are looking for in and of itself. It's no more than a view of the same data with a different independant axis. It contains no information at all about when things happen. Is there any kind of analysis that does? I used FFT because that's the only one I've really ever heard of. What if I perform a different FFT for every second of the soundfile? Very good! You've just described the STFT, short time Fourier transform. It does give information about when things happen with no greater resolution than the length of the FFT. They can be overlapped for better resolution. There is also the variety of wavelet transforms which allow you to trade off the resolution in frequency and in time according to a principle similar to Heisenberg's. They are tricky to use. The question remains to be answered in some detail what information you want to obtain. In any event, the ear brain does not do a Fourier analysis. There are frequency dependant mechanisms but they are totally ad hoc in terms of what nature found most useful for subsequent analysis. Was this a typo? I hope this doesn't offend, but every site I've looked at about this says that indeed our ears do function as FFT devices. If this is incorrect I'd very much like to know the turth about the matter. Nope. No offense taken. There is a _big_ difference between a FT and an ad hoc and idiosyncratic feature extraction mechanism that uses a very complicated organic filter as part of its discrimination. The FT has a precise mathematical formulation involving inner products with sin and cosine signals at a precise set of frequencies. The ear just doesn't do that. There is a gross similarity but that's about all. The Ghost could address this in some detail if anyone could get him to do something besides insult people. When he was young he published with one of the pioneers in the field of hearing research, someone who I believe got a Nobel Prize for it. Is this what I'm asking for? I really don't know myself. I'm having trouble figuring that out exactly too. :-) In case you've received any new information that might help you frame it better, would you care to try again? Refinement to specs from vague ideas is not an uncommon process in the user/marketing/engineering cyclic process. Bob -- "Things should be described as simply as possible, but no simpler." A. Einstein |
#22
|
|||
|
|||
"Ryan" wrote in message om... "Paul Stamler" wrote in message ... I think one of the things you'll find, investigating these real-world sounds, is that most of them differ drastically from the sound made by most musical instruments in that they are inharmonic; in other words, musical instruments produce sound consisting mostly of a fundamental and harmonics, at integer multiples of the fundamental frequency. Real-world noises, to a great extent, have mixtures of frequencies that aren't integer multiples of one another. This is something I've always wondered about. I thought everything obeyed the 1st harmonic, 2cnd harmonic, etc., rules. Is it possible for a sound to have no overtones? I thought that even computer generated sounds that have no harmonics on screen, produce them automatically when they come out of the speaker. I thought the harmonic series was just part of the physics of sound. Yes, real world sounds often contain dissonant and un related intervals, but if we broke down the overall sound to a set of sounds, wouldn't these sounds in themselves produce the natural overtones? Not necessarily. Many noises contain a mixture of frequencies not at all harmonically related. For that matter, sometimes even musical instruments produce a sound that isn't perfectly harmonic -- in other words, the harmonics aren't exact integer multiples. One of my guitars at the moment needs its strings changed; they're no longer perfectly cylindrical (there are dents in the windings where they go over the frets), and the harmonics aren't quite perfect multiples of the fundamental anymore. Which is why it sounds like crap, of course, and will continue to do so until I get off my duff and change the strings. Another example: play a guitar through a fuzzbox or an amplifier craniked up enough to distort. Play two strings and, along with the harmonic series of each individual note, you'll get a whole raft of intermodulation products not part of that harmonic series at all. That's the fuzz. Anyway, back to non-musical noises. I remember having to clean up a recording made in a room with a large HVAC blower outside. It had a lot of different frequencies in it, most of them not related to each other by any simple ratios. Along with that was a heap of white noise. No, not everything obeys the harmonic rules. Peace, Paul |
#23
|
|||
|
|||
Bob Cain wrote in message ...
The Ghost could address this in some detail if anyone could get him to do something besides insult people. When he was young he published with one of the pioneers in the field of hearing research, someone who I believe got a Nobel Prize for it. The Ghost? Is this what I'm asking for? I really don't know myself. I'm having trouble figuring that out exactly too. :-) In case you've received any new information that might help you frame it better, would you care to try again? Refinement to specs from vague ideas is not an uncommon process in the user/marketing/engineering cyclic process. Well, I think you had the right idea the first time, before I attempted to be more conscise and confused you. I will jot out a basic algorithm for the softwa 1. Analyze real instrument sound files. These files should inculde every possible way every classical instrument can be played, from the traditional to the avant garde. For the viols for example, from plain jane arco to bartok's snapping strings to harmonics to different bow pressures to playing behind the bridge to the tapping of fingers on the body of the instruments. There should be files that represent the instruments at all possible dynamic levels. There should be files that feature the instruments playing in micotones if it can do so. (Most classical instruments can.) Also, there should be analysis of the instruments in "static form." By this I mean the part of the sound after the intial attack, which can be looped over and over again to give the impression the note is sustaining. This is done in standard synthesis as well as good sample libraries. It may take quite awhile to amass all these samples, but once collected the analysis of them only has to be done once. 2. Deduct from these analyzations the prime aspects of these sounds. If we only have, say ten frequencies to represent this sound, which ones would be the most usefull. Or would some other type of info about the file be more imprtant than it's frequencies? So now we have a set of data instead of just a pcm sound file. We can call these data sets, "fingerprints." This is mainly to help speed up the math performed later during step 4, though it will compromise the accuracy of the final product. Ideally, the user should be able to select the amount of data to be derived from the samples. 3. Analyze any given sound file. These would be the "real world" sounds. Or anything at all. In fact, I was thinking last night that the ultimate test for this software would be to feed it, say, Beethoven's 9th, and see how close it could aproximate it. 4. Run differential, or co-efficent on the "real world" sound file compared to all the "sound fingerprints" the program created in step 2. 5. Create midi file. After the program has deduced what would be the best combination of instruments in which playing styles at what pitches and what dynamics, playing at what kind of rhytmic figures, etc., the program would simply create a multiple staff midi file with all said info scored on it. Viola! |
#24
|
|||
|
|||
On Fri, 15 Oct 2004 22:20:06 -0700, Ryan wrote:
Bob Cain wrote in message ... The Ghost could address this in some detail if anyone could get him to do something besides insult people. When he was young he published with one of the pioneers in the field of hearing research, someone who I believe got a Nobel Prize for it. The Ghost? Is this what I'm asking for? I really don't know myself. I'm having trouble figuring that out exactly too. :-) In case you've received any new information that might help you frame it better, would you care to try again? Refinement to specs from vague ideas is not an uncommon process in the user/marketing/engineering cyclic process. Well, I think you had the right idea the first time, before I attempted to be more conscise and confused you. I will jot out a basic algorithm for the softwa 1. Analyze real instrument sound files. These files should inculde every possible way every classical instrument can be played, from the traditional to the avant garde. For the viols for example, from plain jane arco to bartok's snapping strings to harmonics to different bow pressures to playing behind the bridge to the tapping of fingers on the body of the instruments. There should be files that represent the instruments at all possible dynamic levels. There should be files that feature the instruments playing in micotones if it can do so. (Most classical instruments can.) Also, there should be analysis of the instruments in "static form." By this I mean the part of the sound after the intial attack, which can be looped over and over again to give the impression the note is sustaining. This is done in standard synthesis as well as good sample libraries. It may take quite awhile to amass all these samples, but once collected the analysis of them only has to be done once. Why not use mathametical models of the instruments? I would imagine the amount of samples required to cover all the sounds a violin could make would be impossible (think of playing a false harmonic on all the strings of a violin at every position, and with every bowing style). With a model, you have defined the 'prime aspects of these sounds' in a very flexible way. The computer could adjust the way the model is 'played' to find the best fit to the sound you wish to analyse. This would perhaps get nearer to fulfilling the interesting idea in your original post- "Perhaps a car engine sound file would yield three Double Basses, a flute or two in very quiet irregular rhythms, and maybe a horn would be involved during gear changes." The computer could go through every single possible sound a violin could make by iterating though all possible bow positions/angle/velocity, finger positions etc until it found the combination that would most closely approximate the sound you want to analyse. If I were to persue this, I would brutally simplify things to start with. For example, make some simple rules for an experiment... All music is played on a single instrument model that creates perfect sine waves. Each note this instrument makes has a fixed decay to silence over a period of one second. The only variables this instrument has is how loud each note is, and its pitch, fixed to a chromatic scale. The limitations of the 'player' of this instrument has is to play twenty notes per second, and as many as ten notes at once. Then, take the sound file to be analysed and every 20th of a second, try each of the limited range of sounds this instrument can create until you find the one that correlates most closely. (Litrally by fft correlation?) Once that is done, you should have a performance on a very simple instrument that has some relation to the file you wish to analyse. Then, the model could be made slightly more complex, ie - this instrument is an ideal Karplus-Strong string with a simple frequency dependent loss filter. It has the properties of the length of the string, where the string is struck, and the amount of energy imparted. It is monophonic, and can change pitch at a limited rate. The disadvantages of this way of working would be - Iterating though each sound a model could create would be *very* time consuming once the models became more realistic. It's very hard to create good physical models of real instruments. The advantages would be - It might actually work. Or at least provide a way to begin attacking this interesting but extraordinarily difficult task. The model does not just define a fixed set of sounds (samples) an instrument can create, but also defines the limitations in how that instrument can be played. I think that you would have to create a model of limitations of the player as well as the instrument anyway if you were using samples. This would be very difficult if the computer does not 'understand' the instrument like a physical model, as you would have to create a large amount of rules by hand for each sample. 2. Deduct from these analyzations the prime aspects of these sounds. If we only have, say ten frequencies to represent this sound, which ones would be the most usefull. Or would some other type of info about the file be more imprtant than it's frequencies? So now we have a set of data instead of just a pcm sound file. We can call these data sets, "fingerprints." This is mainly to help speed up the math performed later during step 4, though it will compromise the accuracy of the final product. Ideally, the user should be able to select the amount of data to be derived from the samples. 3. Analyze any given sound file. These would be the "real world" sounds. Or anything at all. In fact, I was thinking last night that the ultimate test for this software would be to feed it, say, Beethoven's 9th, and see how close it could aproximate it. 4. Run differential, or co-efficent on the "real world" sound file compared to all the "sound fingerprints" the program created in step 2. 5. Create midi file. After the program has deduced what would be the best combination of instruments in which playing styles at what pitches and what dynamics, playing at what kind of rhytmic figures, etc., the program would simply create a multiple staff midi file with all said info scored on it. Viola! |
#25
|
|||
|
|||
Ryan wrote: Bob Cain wrote in message ... The Ghost could address this in some detail if anyone could get him to do something besides insult people. When he was young he published with one of the pioneers in the field of hearing research, someone who I believe got a Nobel Prize for it. The Ghost? Unimportant. If you don't know of him, you certainly don't want to. 1. Analyze real instrument sound files. These files should inculde every possible way every classical instrument can be played, from the traditional to the avant garde. For the viols for example, from plain jane arco to bartok's snapping strings to harmonics to different bow pressures to playing behind the bridge to the tapping of fingers on the body of the instruments. There should be files that represent the instruments at all possible dynamic levels. There should be files that feature the instruments playing in micotones if it can do so. (Most classical instruments can.) Also, there should be analysis of the instruments in "static form." By this I mean the part of the sound after the intial attack, which can be looped over and over again to give the impression the note is sustaining. This is done in standard synthesis as well as good sample libraries. It may take quite awhile to amass all these samples, but once collected the analysis of them only has to be done once. And has yet to be done once. :-) You aren't really defining an analysis, or even the features you would like extracted and cataloged. "Every possible way an instrument can be played" has no meaning until you very specifically give it that. It is what my high school writing teacher called a glittering generality. I'm sorry if that is a bit brutal but so was she. :-) How would the subjective characteristics that your brain is very good at discerining be algorithmically characterized and what would be the form of the data the analysis produced? You can't just describe it in subjective terms because we have yet to teach machines this level of subjective classification and discernment. We are a _long_ way from that. Don't just offer the term FFT. There is no new information in an FFT, just a different view of it. What you are imagining would employ transforms of some kind, undoubtedly, but which ones and exactly how they could be used to get at the far more complex information you want is not even a well formulated problem much less a solved one. Imagine asking for a machine that could analyze and catagorize smiles. What you are asking is far more difficult and open ended. 2. Deduct from these analyzations the prime aspects of these sounds. First you must very precisely characterize all of these prime aspects via a, probably long, research program and then figure out what processes must be applied to the data to extract and classify them in those terms. If we only have, say ten frequencies to represent this sound, which ones would be the most usefull. That particular "if" has no real connection to reality. Or would some other type of info about the file be more imprtant than it's frequencies? Good question. Now you are getting to the heart of the matter. So now we have a set of data instead of just a pcm sound file. Not quite yet we don't. We can call these data sets, "fingerprints." What would be in these data sets. This is mainly to help speed up the math performed later during step 4, though it will compromise the accuracy of the final product. What math? Ideally, the user should be able to select the amount of data to be derived from the samples. Cool. 3. Analyze any given sound file. These would be the "real world" sounds. Or anything at all. In fact, I was thinking last night that the ultimate test for this software would be to feed it, say, Beethoven's 9th, and see how close it could aproximate it. Approximate it with what? 4. Run differential, or co-efficent on the "real world" sound file compared to all the "sound fingerprints" the program created in step 2. Each of your analyzed snippets would be a vector in a very high dimensional parameter space. Once you defined that space and a way to deduce all the coordinates in it for a particular fingerprint, you could then determine the corresponding vectors for your "real world" sounds. Problem is that once the dimensions of a space get large enough, any arbitrary vector in it will almost certainly be orthogonal to any other. What this means is that they have about as much in common as "left" and "wrong." Matching is poorly defined in such situations. 5. Create midi file. After the program has deduced what would be the best combination of instruments in which playing styles at what pitches and what dynamics, playing at what kind of rhytmic figures, etc., the program would simply create a multiple staff midi file with all said info scored on it. Yeah, simply. Viola! What, you want to do all this synthesis with a single instrument? :-) Bob -- "Things should be described as simply as possible, but no simpler." A. Einstein |
#26
|
|||
|
|||
Well, I don't know Phil. Your idea sounded interesting at first, but
then towards the end you describe how hard it would be to use "realisitic" models anyway, so you kind of defeat your own suggestion. Plus, this would be even more comp sci and math I'd have to learn. I do appreciate your ideas however, and I thank you. I was thinking maybe when I can afford it I would just spring for the Vienna Symphonic Library Orchestral Cube. It purports to provide samples of everything I want, recorded by world class players in anechoic chambers. It would be ideal if it wasn't for the three thousand dollar price tag. Anyway, I'm starting to think maybe I should just do the work with my ear instead of my computer. Most of the posters here tend to think a software solution would be next to impossible. Might as well brush up on my ear training and spend the time using my right side of the brain instead of the left. Hell, from the looks of it I could spend three years figuring out this software, it would probably only take me three days to do a rough guess transcription. Maybe I'm finally figuring out how much harder it is to find a lazy way of doing things. philicorda wrote in message .org... Why not use mathametical models of the instruments? I would imagine the amount of samples required to cover all the sounds a violin could make would be impossible (think of playing a false harmonic on all the strings of a violin at every position, and with every bowing style). With a model, you have defined the 'prime aspects of these sounds' in a very flexible way. The computer could adjust the way the model is 'played' to find the best fit to the sound you wish to analyse. This would perhaps get nearer to fulfilling the interesting idea in your original post- "Perhaps a car engine sound file would yield three Double Basses, a flute or two in very quiet irregular rhythms, and maybe a horn would be involved during gear changes." The computer could go through every single possible sound a violin could make by iterating though all possible bow positions/angle/velocity, finger positions etc until it found the combination that would most closely approximate the sound you want to analyse. If I were to persue this, I would brutally simplify things to start with. For example, make some simple rules for an experiment... All music is played on a single instrument model that creates perfect sine waves. Each note this instrument makes has a fixed decay to silence over a period of one second. The only variables this instrument has is how loud each note is, and its pitch, fixed to a chromatic scale. The limitations of the 'player' of this instrument has is to play twenty notes per second, and as many as ten notes at once. Then, take the sound file to be analysed and every 20th of a second, try each of the limited range of sounds this instrument can create until you find the one that correlates most closely. (Litrally by fft correlation?) Once that is done, you should have a performance on a very simple instrument that has some relation to the file you wish to analyse. Then, the model could be made slightly more complex, ie - this instrument is an ideal Karplus-Strong string with a simple frequency dependent loss filter. It has the properties of the length of the string, where the string is struck, and the amount of energy imparted. It is monophonic, and can change pitch at a limited rate. The disadvantages of this way of working would be - Iterating though each sound a model could create would be *very* time consuming once the models became more realistic. It's very hard to create good physical models of real instruments. The advantages would be - It might actually work. Or at least provide a way to begin attacking this interesting but extraordinarily difficult task. The model does not just define a fixed set of sounds (samples) an instrument can create, but also defines the limitations in how that instrument can be played. I think that you would have to create a model of limitations of the player as well as the instrument anyway if you were using samples. This would be very difficult if the computer does not 'understand' the instrument like a physical model, as you would have to create a large amount of rules by hand for each sample. 2. Deduct from these analyzations the prime aspects of these sounds. If we only have, say ten frequencies to represent this sound, which ones would be the most usefull. Or would some other type of info about the file be more imprtant than it's frequencies? So now we have a set of data instead of just a pcm sound file. We can call these data sets, "fingerprints." This is mainly to help speed up the math performed later during step 4, though it will compromise the accuracy of the final product. Ideally, the user should be able to select the amount of data to be derived from the samples. 3. Analyze any given sound file. These would be the "real world" sounds. Or anything at all. In fact, I was thinking last night that the ultimate test for this software would be to feed it, say, Beethoven's 9th, and see how close it could aproximate it. 4. Run differential, or co-efficent on the "real world" sound file compared to all the "sound fingerprints" the program created in step 2. 5. Create midi file. After the program has deduced what would be the best combination of instruments in which playing styles at what pitches and what dynamics, playing at what kind of rhytmic figures, etc., the program would simply create a multiple staff midi file with all said info scored on it. Viola! |
#27
|
|||
|
|||
On Sat, 16 Oct 2004 23:44:13 -0700, Ryan wrote:
Well, I don't know Phil. Your idea sounded interesting at first, but then towards the end you describe how hard it would be to use "realisitic" models anyway, so you kind of defeat your own suggestion. Absolutely. It would perhaps be a more ideal method, though it's far more complicated and messy. I wonder how well the most simple model would work? A computers 'interpretation' with a simple string and player model would be interesting to hear, even though it may not bear much relationship to the original music. There are a number of programs out there that purport to do polyphonic pitch detection - http://www.music-notation.info/en/co...udio2midi.html But, they rely on differentiating the different instruments by their range, rather than their harmonic content, and I have no idea how well the polyphonic pitch detection works. Perhaps combining the two approaches of the pitch detection they do, and yours of harmonic 'fingerprints' to identify the instruments? Plus, this would be even more comp sci and math I'd have to learn. I do appreciate your ideas however, and I thank you. I was thinking maybe when I can afford it I would just spring for the Vienna Symphonic Library Orchestral Cube. It purports to provide samples of everything I want, recorded by world class players in anechoic chambers. It would be ideal if it wasn't for the three thousand dollar price tag. Anyway, I'm starting to think maybe I should just do the work with my ear instead of my computer. Most of the posters here tend to think a software solution would be next to impossible. Might as well brush up on my ear training and spend the time using my right side of the brain instead of the left. Hell, from the looks of it I could spend three years figuring out this software, it would probably only take me three days to do a rough guess transcription. Maybe I'm finally figuring out how much harder it is to find a lazy way of doing things. Laziness is the mother of invention. |
#28
|
|||
|
|||
Bob Cain wrote in message ...
The Ghost could address this in some detail if anyone could get him to do something besides insult people. Speak for yourself, you arrogant asshole. You have no knowledge of or appreciation for what I can address. Furthermore, based on the historical record, you couldn't care less. Four years ago, before I became aware that you were not a decent human being, I answered your questions because I knew something about the subject matter of your inquiry. Rather than being appreciative and thanking me for the information that I provided, you insulted me and started a feud that continues to this day. |
#29
|
|||
|
|||
|
#30
|
|||
|
|||
Bob Cain wrote in message ...
You aren't really defining an analysis, or even the features you would like extracted and cataloged. "Every possible way an instrument can be played" has no meaning until you very specifically give it that. It is what my high school writing teacher called a glittering generality. I'm sorry if that is a bit brutal but so was she. :-) Yes, If I was writing for another audience I would have to adress this more specifically. But you know what I'm getting at. I don't want to post a billion word technical rubrick. How would the subjective characteristics that your brain is very good at discerining be algorithmically characterized and what would be the form of the data the analysis produced? What would be in these data sets. What math? Approximate it with what? Hell man, these are the questions I came looking for the anwsers to. You were supposed to answer these! Viola! What, you want to do all this synthesis with a single instrument? :-) lol How is it spelled? Voiola? |
#31
|
|||
|
|||
Ryan wrote: Bob Cain wrote in message ... You aren't really defining an analysis, or even the features you would like extracted and cataloged. "Every possible way an instrument can be played" has no meaning until you very specifically give it that. It is what my high school writing teacher called a glittering generality. I'm sorry if that is a bit brutal but so was she. :-) Yes, If I was writing for another audience I would have to adress this more specifically. But you know what I'm getting at. I don't want to post a billion word technical rubrick. :-) Aw, give it a shot. How would the subjective characteristics that your brain is very good at discerining be algorithmically characterized and what would be the form of the data the analysis produced? What would be in these data sets. What math? Approximate it with what? Hell man, these are the questions I came looking for the anwsers to. You were supposed to answer these! I hope you understand that my intent was to point out that these aren't solved problems. There aren't even glimmers on the horizon. You are defining a musical AI with an awesome intelligence, processing capability and prodigous memory. If you were to take this to a prospective Ph.D. advisor as an area for a thesis, he'd look at you in amazement, shake his head and, if he was kind, try to help you find one little corner of it that might yield productive results if you tugged on it for a few years. There are people thinking and working on these kinds of problems but I don't know where they congregate. Viola! What, you want to do all this synthesis with a single instrument? :-) lol How is it spelled? Voiola? :-) Voila! Bob -- "Things should be described as simply as possible, but no simpler." A. Einstein |
#32
|
|||
|
|||
Ryan wrote:
Hell man, these are the questions I came looking for the anwsers to. You were supposed to answer these! He's asking you the questions for which you must provide clear answeres in order to approach your goal. -- ha |
#33
|
|||
|
|||
A phasmagorical creature posted:
Speak for yourself, you arrogant asshole. Bob Cain speaks for Bob Cain. A ghost, on the other hand, doesn't dare unveil itself in the light of day, so no one knows for whom it attempts to speak. -- ha |
#34
|
|||
|
|||
Bob Cain wrote in message ...
Yes, If I was writing for another audience I would have to adress this more specifically. But you know what I'm getting at. I don't want to post a billion word technical rubrick. :-) Aw, give it a shot. You know, if I could be assured that what I want to do is feasible, I really would write something like this up. Till then though, I'm a busy man and it seems like a huge waste of time if nothing could ever come from it. How would the subjective characteristics that your brain is very good at discerining be algorithmically characterized and what would be the form of the data the analysis produced? What would be in these data sets. What math? Approximate it with what? Hell man, these are the questions I came looking for the anwsers to. You were supposed to answer these! I hope you understand that my intent was to point out that these aren't solved problems. There aren't even glimmers on the horizon. You are defining a musical AI with an awesome intelligence, processing capability and prodigous memory. Yes. You are quite good at socratic method. I guess I just thought you knew these answers but wanted to see me "jump through some hoops" first, not malicously of course. But if what you're saying is that the math, or system of maths this would require hasn't even been "invented" yet, then that's an altogether different type of thing. Anyway, thanks for your time and input. |
#35
|
|||
|
|||
Hi Ryan- The sines and cosines that get used to build up a waveform in Fourier analysis are the "basis functions" of the Fourier transform. It is possible to decompose signals using many different types of bases. The Fourier basis (sines and cosines, harmonically related if the signal is of finite extent) has some nice mathematical properties that make the decomposition (and recomposition) simpler, mathematically, than it is with many other bases. But that simplicity doesn't make the Fourier basis "right" for all applications. In your case, you want to use as "basis functions" the signals played by standard instruments. These are much more complicated than the sines and cosines in a Fourier basis. Besides the fact that the sustained waveform from an instrument playing a note has a non-sinusoidal shape, notes are transient (they start and stop in time) and also dynamic (their pitch, volume, and timbre vary in time, e.g., due to tremolo, vibrato, etc.). Although it is mathematically possible to represent signals with such dynamic, transient structure via a Fourier transform, I don't think a Fourier decomposition is well-suited to your problem. One approach is to actually take samples of the instruments you'll use, playing all the notes available, and use them (with various durations) directly as your basis. This would be the most accurate approach, but the calculations you'd need to do to find the expansion coefficients (i.e., the score!) would probably be extremely difficult computationally, and probably not well-defined (the basis is likely neither complete nor orthogonal). You'd be doing something like additive synthesis, but with a much bigger basis than is usually used! Looking up some of the math associated with additive synthesis might provide you with some leads. A possible option that has the potential to be more computationally tractible would be to use some kind of wavelet or other time-scale or time-frequency transform rather than a Fourier transform. Very roughly speaking, you can think of such a transform as breaking up a signal into *localized* pulses, i.e., notes! That is, where a Fourier transform represents a signal as a sum of "eternal" sines and cosines of specific frequencies, a time-frequency transform breaks up the signal into separate parts that are localized both in frequency *and* time. You might be able to find some way to project a wavelet or other time-frequency transform of the sound you are interested in onto the transforms of sounds from the instruments you have available; this would give you the notes and volumes needed to most closely match the desired signal. This won't make any fundamental problems with the incompleteness or redundancy of your basis (choice of instruments & notes) go away, but use of such transforms might provide methods of approximation that make the problem more tractable computationally. A google search on "wavelets" and "music" will probably get you started. This wavelet FAQ might also help: http://www.math.ucdavis.edu/~saito/c...avelet_faq.pdf Here's a review article on time-frequency analysis of sounds from musical instruments---your basis functions, so to speak: http://epubs.siam.org/sam-bin/getfil...cles/38228.pdf If you want to learn more about Fourier expansions from a musical point of view, see: http://ccrma.stanford.edu/~jos/mdft/ Here's a reference that turned up in my own quick googling using "time scale transform music" that may provide a starting point for thinking along these lines, if you can find a copy: Kronland-Martinet R., Grossmann A. "Application of time-frequency and time-scale methods to the analysis, synthesis and transformation of natural sounds." in "Representations of Musical Signals", C. Roads, G. De Poli, A. Picciali Eds, MIT Press, october 1990. Interlibrary loan may help you here! A similar search using "time frequency transform music" turned up "Musical Transformations using the Modification of Time-Frequency Images" in a 1993 issue of *Computer Music Journal*: http://mitpress.mit.edu/catalog/item...d=6768&ttype=6 This is just from some quick googling and these are probably not the best or most recent references that may be relevant. Wavelet and time-frequency analysis is now very mature and there are entire textbooks and monographs on these topics. Good luck with this. Peace, Tom Loredo -- To respond by email, replace "somewhere" with "astro" in the return address. |
#36
|
|||
|
|||
Ryan wrote: Yes. You are quite good at socratic method. I guess I just thought you knew these answers but wanted to see me "jump through some hoops" first, not malicously of course. But if what you're saying is that the math, or system of maths this would require hasn't even been "invented" yet, then that's an altogether different type of thing. It wasn't just to get you to jump through hoops, Ryan. I'm truly interested in how a musically creative mind would specify the problem in some detail. That's good input for the more academic oriented folks who are working and thinking at the computational level. The biggest problem with all of this that I see is how to specify in detail what's in the music that can be considered features worth thinking about extracting algoritmically. If a human can't get real down with that part then there is little hope of implementing anything useful. Granted, for the non-technically but strongly musically inclined it could be a very frustrating experience to see how difficult it is to reduce things that seem obvious to her to terms that have any hope of an impelementation, but you gotta start somewhere. Bob -- "Things should be described as simply as possible, but no simpler." A. Einstein |
#37
|
|||
|
|||
|
#38
|
|||
|
|||
|
#39
|
|||
|
|||
I'm looking to find out more about writing some software that will use
traditional classical instruments to emulate "natural" or "non musical sounds." The software will perform some type of analyses on an audio file, I imagine FFT would be used at some point, but the problem with FFT is that it only tells you what "perfect" or pure sine wave based frequencies are present in a sound. Besides the flute, not much else in an orchestra has anything close to a sine wave output. After this analysis is done, the software will look through a library of sounds made by traditional instruments. These sounds will include every noise and playing style every traditional instrument can produce. The software will then juggle the sounds around at various dynamic levels in various rhythms and etc until it comes up with the closest combination to the original sound. Perhaps a car engine sound file would yield three Double Basses, a flute or two in very quiet irregular rhythms, and maybe a horn would be involved during gear changes. (The Ghost) wrote in message . com... (Ryan) wrote in message . com... Well, you and I have no bad blood between us, Ghost. What's your take on this whole idea? I don't have time at this moment to backtrack and read the entire thread. So, if you have a specific question, please (re)state it in as concise terms as possible, and I will answer it if I feel that I am qualified to do so. If not, I will do my best to refer you to someone who can. |
#40
|
|||
|
|||
Tom Loredo wrote in message ...
Hi Ryan- The sines and cosines that get used to build up a waveform in Fourier analysis are the "basis functions" of the Fourier transform. It is possible to decompose signals using many different types of bases. The Fourier basis (sines and cosines, harmonically related if the signal is of finite extent) has some nice mathematical properties that make the decomposition (and recomposition) simpler, mathematically, than it is with many other bases. But that simplicity doesn't make the Fourier basis "right" for all applications... Thank you for this copious amount of unsolicited information. It is already proving useful. |
Reply |
|
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
CAKEWALK SONAR 4 PRODUCER EDITION, Vienna Symphonic Orchestra Pro Performance ( VSL ), for giga sampler, 28 DVDs, and Quantum Leap 17 DVDs, Sonic Implants Symphonic Brass Collection [2 DVDs], M-Audio ProSessions [2 CDs], Simon Harris Beats [3 CDs], n | Marketplace | |||
CAKEWALK SONAR 4 PRODUCER EDITION, Vienna Symphonic Orchestra Pro Performance ( VSL ), for giga sampler, 28 DVDs, and Quantum Leap 17 DVDs, Sonic Implants Symphonic Brass Collection [2 DVDs], M-Audio ProSessions [2 CDs], Simon Harris Beats [3 CDs], n | General | |||
CAKEWALK SONAR 4 PRODUCER EDITION, Vienna Symphonic Orchestra Pro Performance ( VSL ), for giga sampler, 28 DVDs, and Quantum Leap 17 DVDs, Sonic Implants Symphonic Brass Collection [2 DVDs], M-Audio ProSessions [2 CDs], Simon Harris Beats [3 CDs], n | Tech | |||
CAKEWALK SONAR 4 PRODUCER EDITION, Sonic Implants Symphonic Brass Collection [2 DVDs], Vienna Symphonic Orchestra Pro Performance [4 DVDs], M-Audio ProSessions [2 CDs], Simon Harris Beats [3 CDs], new !, other | Tech | |||
Sonic Implants Symphonic Brass Collection [2 DVDs], Vienna Symphonic Orchestra Pro Performance [4 DVDs], M-Audio ProSessions [2 CDs], Simon Harris Beats [3 CDs], new !, other | Pro Audio |