Here's the pitch: With regular ray tracing, rays of light are traced backward from a pixel of the camera, to an object and eventually to a light source (or lack thereof). If you can do that with light, why can't it be done with sound?
Over ten years ago I was having breakfast with a friend and I sketched out the idea on a napkin. This kind of math is definitely not my strong point. But instead of a camera, have a microphone. Instead of tracing rays, trace vibrations, and instead of light sources, there's air and friction; or so the napkin said.
Instruments aren't the goal, but a simple model to test would be a
grade school caliber recorder. After modeling materials, blown air resonates down the tube of the recorder generating sound.
As different quality of air and different materials are created, artists build wireframes of different instruments. The effects of room acoustics could be modeled too.
This would culminate in a completely rendered orchestra performing a wave traced rendition of Beethoven's Symphony No 5.
As I mentioned earlier, instruments aren't the goal: Voices are.
After modeling complex instruments, the next step is to model and animate the speech pathway from lungs to lips. The speech animated model (or "Sam" for short) would start with simple vowels like the [a:] sound in raw, or the [æ] sound in "bad". (Wikipedia's page on the IPA has some good introductory information). Eventually Sam would be animated to fluidly pronounce most phonemes. Linguistics are also not my strong point.
I'd imagine that Sam would sound very lifeless and flat. So we would capture data from sensors attached to voice model actors. Voice capture artists would smooth out the data and bring Sam to life. Motion capture is also not my strong point.
With only a handful of voice model actors, spoken phrases could be captured for an entire movie or video game! The captured data could be applied to dozens or hundreds of different sounding models by adjusting Sam's variables. Accents would easy to add by changing the way Sam enunciates words. Playing with Sam's breathing rate would simulate a person out of breath or scared. Sam's voice could be lowered or raised or even modeled after a real person.
Eventually the captured data could be used to animate 3D model too!
This would be a perfect for RPG games like Oblivion or GTA that can benefit from an almost endless supply of different voices.
Let me know if you ever do anything with this, or find something that does. Especially if you're Bethesda!