Chris Grigg, Control-G firstname.lastname@example.org
Part 1 of 2 For IA-SIG Newsletter
In the weeks after the excellent Game Developers Conference 1999, a funny thing happened inside my head. Several separate threads of thought which I had previously regarded as not very related unexpectedly converged into something strangely significant-feeling. Late one night I sat down to write about it, and what came out was a new long-range view of the relationship between traditional software development practices, certain emerging software technologies, and the under-met needs of interactive audio content creators. GDC made it clear that we all agree on what the problems are, and that we for the most part agree on the kinds of things we want the content creators to be able to do. The purpose of this piece is to suggest a conceptual roadmap that may be able to help us all get there much, much faster. The concepts involve reconsidering how units of audio software are organized, and how they communicate with one another.
While there's ample room for disagreement about any of the implementation details I suggest here, I hope this document sparks a public conversation that leads to a shared vision that we can all benefit from. Just think how much progress could be made if all our considerable energies could somehow be coordinated.
This article is an adaptation of that nights work, addressed to IA-SIGs broad audience of audio technologists and content creators. Because the subject matter veers back and forth between software technologies and media content creation, programmers may find it irritatingly imprecise at the same time that soundmakers find it unnecessarily technical; unfortunately, this probably cant be helped given the interdisciplinary nature of the point I'm trying to make.
In any given area of technology development there comes a stage at which the focus shifts away from early struggles (i.e. getting the basic function to work at all), and toward finding the best, generalized way to control the thing over the long term. Think of cars, airplanes, computer OSs, networks, etc. eventually a stable, standard form of control emerges (wheel & pedals, GUI, WWW). With IA-SIG developments such as the work of the Interactive Composition Working Group and now a DSP plug-in WG in the works, that same transition for game sound now appears to be on the horizon.
If the groundwork we lay in these initiatives is too restrictive, it will take years to extricate ourselves and retool. Doing this really right the first time, on the other hand, could mean rapid movement towards solving some of the biggest and oldest problems that game developers and publishers face: integrated authoring tools, sound designer-controlled audio interactivity, and platform-independent audio content. And audio API and hardware vendors will also benefit from new kinds of business opportunities. The real point, of course, is the leap forward in audio experience quality that should follow.
My thesis is that the reason why weve been so stuck here for so long is that we have a bottleneck. Its called "programming".
Todays reality is that to get a game to make any reasonable sound, a developer has to choose somebodys audio system and then write code that calls its functions.
Theres only one thing wrong with that: Everybodys audio system furnishes a completely different set of functions; further, each one requires the developer to call those functions in a completely different way.
Theres only one thing wrong with that: Once youve picked an audio system and written your game engine to use it, its prohibitively expensive to change audio systems. (It happens, but its rare.)
There are only three things wrong with that: Nobodys audio system does every single part of the job really well; nobodys audio system runs on all the platforms to which the developer wants to port; and everybodys audio system needs its own special auditioning/testbed applications.
In other words, todays large audio systems tend to be all-inclusive solutions that succeed really well on a single platform; but their functionality is hard to customize; they arent transportable; and they require custom tools (and sometimes custom media formats).
The funny thing is, each of these big audio systems is already built up internally from many separately useful classes and functions that could do most all the things youd ever want to do, if only you could somehow rearrange them. A huge number of implementation differences make that observation practically irrelevant given the current way of doing things, but: What if you werent stuck with having to pick a single-vendor solution? What if (for example) you could use vendor As great synth with vendor Bs great 3D mixer? What if vendor C could write a MIDI-controlled DSP module to plug in between the synth and the mixer? What if you could set up that connection yourself, and then drive all three pieces from a scripting engine?
What if you could design the exact audio system you need, without a programmer hand-writing code to deal with buffers and property sets, by drawing block diagrams?:
In this What if? world, the units of software organization get smaller, more modular; each software unit takes on a more specific function; and all of the software units have to be able to talk to one another. This way of structuring the audio system is fundamentally different from our present practice of developing and using large-scale, do-everything audio APIs.
To make it possible to organize our runtime software and authoring systems along these lines, we would need three main things:
Each of these things is good, but its the flexibility of the combination of the three that really gets us where we want to go.
Unfortunately, this article is too long to fit into a single issue of the Newsletter, so Im going to have to save the control signal interconnection discussion (which is long) for next time. The remainder of this installment discusses the requirements and benefits of the first two, and winds up by offering a set of suggested design principles to observe in implementing this new world. Along the way Ill point out the synergistic benefits that follow from doing these things in combination.
Note that few (or none!) of these particular ideas are new; the only thing thats new here is their combination and application to game sound, and the recognition of their advantages for our specialized context.
If were going to be assembling small software components, well need to have a way to specify which components we want to use in a given context, and well need a way to make sure the right parts (functions, methods) of each component get called at the appropriate times. These sorts of services are usually provided to software components by an underlying software architecture called a framework. You can think of the framework as the thing that all the plug-ins plug into. A game would include a framework to support its audio system.
Although its conventional to furnish different kinds of frameworks for different kinds of functionality (i.e. you could decide to offer one interface for DSP plug-ins, and a separate interface for, say, music synth plug-ins), I would like to suggest that there are excellent reasons to treat all software components in this world exactly the same, and to plug every component in a given audio system into the same framework. Placing components in a framework would be done in a GUI authoring tool youd drag boxes around to form diagrams looking pretty much like the one above.
Thats because in this world, everything is a plug-in. All components use the exact same low-level calling interface (set of function/method signatures); a standard mechanism allows all components a way for each one to express its unique capabilities and unique set of controls.
Managing a framework that supports this sort of arbitrary placement of an arbitrary set of software components within it in a general way is a specialized job best addressed by use of an object model. An object model is an abstract description of the various pieces that comprise a given system, usually expressed in a way that harmonizes with object-oriented programming concepts like class, object, and property. For example, browsers use a document object model that describes all the possible entities in a webpage rendering environment. Another example is AppleScript, an object-oriented application scripting language whose object model allows (among other things) a way of reporting what documents are available, and a way of manipulating the properties of any desired document. The point to remember here is that an object model provides a single, standard mechanism for finding out whats going on with the components inside the framework, and for controlling them.
Its possible that our framework might need to be implemented in pretty different ways for different host OS environments, but its object model and the binary form of its expression would remain the same in all cases, for reasons that should become clear soon.
When a framework is able to understand a software component object model, the selection and arrangement of components present within it can be set, queried, and altered by other pieces of software, such as the game code. When our runtime audio system is assembled in a framework with an object model, the audio architecture can be adaptively customized for the purpose at hand. If we were to drive the frameworks object model interface with a scripting engine, then sound content creators would be able to configure the audio system on their own, without programmer assistance.
Note that although a native implementation of the framework would need to be built for every platform we want to run on, the object model itself is a purely logical construct. So, given a similar set of available audio software components, any audio system configuration script you might write could conceivably work without without modification on any platform where the frameworks been implemented.
None of the above makes any assumptions about what software component technology would be best to use when implementing the framework. The standard component architectures (COM, SOM, CORBA) may or may not be the best choices; obviously it makes sense to take advantage of any given platforms strong-point technologies.
Note that as long as the semantic structure of the frameworks low-level component interface was the same (i.e. used the same method or message names), any available and sufficiently efficient component technology could be used; in fact, a different component technology could be used on every platform, and audio content authored for one platform would still work without modification on any other platform.
Software component frameworks per se are nothing new; for example, ActiveX is a software component framework. But were talking about a framework intended specifically for the purpose of realtime media work. Making an actual audio system out of the isolated software components weve plugged into our framework requires signal interconnections between the components. For example, a realtime EQ plug-in needs to be fed a stream of digital audio, and the EQs output needs to be fed to some other component to be heard.
Connecting signals between the software components would also be done in the same GUI tool used to place the components. Each component box would have its own set of input and output ports as required by its purpose, and youd draw patchcords to connect them. This configuration editor tool at first glance would probably look a lot like Opcode/IRCAMs MAX.
In an ideally simple world, differences among various kinds of signals (audio, MIDI stream, command stream, etc.) would be invisible; but Ill describe audio signals and control signals separately so that in the second installment of this article I can get deeply into how it might be possible to accommodate the unpredictably unique functions of all possible classes of software component in a uniform way, and why that could be a really good idea.
Well need to define a standard for audio signal interconnection between components, because the framework is going to need to know what we want to happen when we tell it to Connect the output of the WAV file player to the input of the EQ.
I frequently hear the concern that permitting arbitrary patching between digital audio functions is going to require massive duplication of large sound buffers on a continual basis, leading to catastrophic failure; however, I just dont see it. (I refer here to actual RAM audio buffers, not any particular existing audio APIs buffer abstraction objects.) You dont move the data between buffers; instead, at every audio interrupt (or whatever your clocking mechanism is) you control the order in which the various components perform their operations on the buffers. The framework could use just one buffer per signal chain, not one buffer per component output or input.
In the WAV/EQ example, the framework would know from the direction of the patch that at every audio interrupt (or whatever) it needs to first get a bufferfull of new audio from the WAV player, and then call the EQ to process the buffer; presumably the EQs output is feeding a mixer or other component, so that downstream component would be called next. Time-domain multiplexing each buffer, if you will. Buffer duplication only becomes necessary when the audio signal path forks (which is relatively rare).
In other words, intelligence in the framework would translate this signal path configuration:
into one single RAM audio buffer and a function call sequence (at each audio interrupt or whatever) something like this (using C++ syntax):
wavPlayer ->UpdateAudio( &Path1Buffer );
eqDSP ->UpdateAudio( &Path1Buffer );
spatializerDSP->UpdateAudio( &Path1Buffer );
mixer ->UpdateAudio( &Path1Buffer );
Settling on a standard of one audio buffer per unforked signal path would give the framework all it needs to make audio connections between components, and be space- and time-efficient. Note that this places very few constraints on the system: at any given time all buffer sizes would have to be the same size, but different platforms could use different buffer sizes, and in fact the same game could use different buffer sizes at different times if desired.
So how would the framework know about these signal connections between components? It would keep track of them in terms of its object model.
Consider our block diagram again. The idea of the object model is to make every box in this kind of block signal flow diagram a component, and to make every arrow a signal flow connection. The object model is then all about the abstract concepts of components and connections. When the framework is written, it will be in an object-oriented programming language, and it will almost certainly spend a lot of its time manipulating objects of class component, and objects of class connection.
In the largest sense, a component is anything that creates, processes, or receives a signal. Media (the disk-cans in the diagrams) gets turned into signals by components; other components synthesize signals on their own, no media required. A connection is whatever it takes to plug a component output into another components input.
To support this way of thinking and to get to all the benefits that follow from it (enumerated at length below and in part two of this article, next issue) the framework and object model would need to furnish a way to introduce components (typically a component would be a low-level audio API), a way to construct configurations of component instances, and a way to make connections between those component instances. Ideally, components maintenance needs would be automatically met completely within this same architecture, but components unpredictable and unique requirements may make that impractical.
At the cost of implementing a framework with object model for each desired OS/platform, taking a components-and-signal-flow view of the world has several benefits:
In the next issue Ill get into the important area of control signals and their routing, but for now let me close by moving away from details and zooming back out to the long view. The point of this whole media component interconection idea is to meet all of our needs in the broadest possible way. For successful deployment and adoption of something along these lines, it seems to me that several basic principles ought to be observed in its design, development, and promotion:
For Further Reading: Component Software: Beyond Object-Oriented Programming, Clemens Szyperski; 1997, 1998 Addison-Wesley; ISBN 0-201-17888-5.