A Roadmap

Chris Grigg, Control-G chrisg@control-g.com

Part 1 of 2 — For IA-SIG Newsletter


In the weeks after the excellent Game Developers Conference 1999, a funny thing happened inside my head. Several separate threads of thought which I had previously regarded as not very related unexpectedly converged into something strangely significant-feeling. Late one night I sat down to write about it, and what came out was a new long-range view of the relationship between traditional software development practices, certain emerging software technologies, and the under-met needs of interactive audio content creators. GDC made it clear that we all agree on what the problems are, and that we for the most part agree on the kinds of things we want the content creators to be able to do. The purpose of this piece is to suggest a conceptual ‘roadmap’ that may be able to help us all get there much, much faster. The concepts involve reconsidering how units of audio software are organized, and how they communicate with one another.

While there's ample room for disagreement about any of the implementation details I suggest here, I hope this document sparks a public conversation that leads to a shared vision that we can all benefit from. Just think how much progress could be made if all our considerable energies could somehow be coordinated.


This article is an adaptation of that night’s work, addressed to IA-SIG’s broad audience of audio technologists and content creators. Because the subject matter veers back and forth between software technologies and media content creation, programmers may find it irritatingly imprecise at the same time that soundmakers find it unnecessarily technical; unfortunately, this probably can’t be helped given the interdisciplinary nature of the point I'm trying to make.


In any given area of technology development there comes a stage at which the focus shifts away from early struggles (i.e. getting the basic function to work at all), and toward finding the best, generalized way to control the thing over the long term. Think of cars, airplanes, computer OS’s, networks, etc. — eventually a stable, standard form of control emerges (wheel & pedals, GUI, WWW). With IA-SIG developments such as the work of the Interactive Composition Working Group and now a DSP plug-in WG in the works, that same transition for game sound now appears to be on the horizon.

If the groundwork we lay in these initiatives is too restrictive, it will take years to extricate ourselves and retool. Doing this really right the first time, on the other hand, could mean rapid movement towards solving some of the biggest and oldest problems that game developers and publishers face: integrated authoring tools, sound designer-controlled audio interactivity, and platform-independent audio content. And audio API and hardware vendors will also benefit from new kinds of business opportunities. The real point, of course, is the leap forward in audio experience quality that should follow.


My thesis is that the reason why we’ve been so stuck here for so long is that we have a bottleneck. It’s called "programming".

Today’s reality is that to get a game to make any reasonable sound, a developer has to choose somebody’s audio system and then write code that calls its functions.

There’s only one thing wrong with that: Everybody’s audio system furnishes a completely different set of functions; further, each one requires the developer to call those functions in a completely different way.

There’s only one thing wrong with that: Once you’ve picked an audio system and written your game engine to use it, it’s prohibitively expensive to change audio systems. (It happens, but it’s rare.)

There are only three things wrong with that: Nobody’s audio system does every single part of the job really well; nobody’s audio system runs on all the platforms to which the developer wants to port; and everybody’s audio system needs its own special auditioning/testbed applications.

In other words, today’s large audio systems tend to be all-inclusive solutions that succeed really well on a single platform; but their functionality is hard to customize; they aren’t transportable; and they require custom tools (and sometimes custom media formats).

The funny thing is, each of these big audio systems is already built up internally from many separately useful classes and functions that could do most all the things you’d ever want to do, if only you could somehow rearrange them. A huge number of implementation differences make that observation practically irrelevant given the current way of doing things, but: What if you weren’t stuck with having to pick a single-vendor solution? What if (for example) you could use vendor A’s great synth with vendor B’s great 3D mixer? What if vendor C could write a MIDI-controlled DSP module to plug in between the synth and the mixer? What if you could set up that connection yourself, and then drive all three pieces from a scripting engine?

What if you could design the exact audio system you need, without a programmer hand-writing code to deal with buffers and property sets, by drawing block diagrams?:

In this ‘What if?’ world, the units of software organization get smaller, more modular; each software unit takes on a more specific function; and all of the software units have to be able to talk to one another. This way of structuring the audio system is fundamentally different from our present practice of developing and using large-scale, do-everything audio API’s.

To make it possible to organize our runtime software and authoring systems along these lines, we would need three main things:

Each of these things is good, but it’s the flexibility of the combination of the three that really gets us where we want to go.

Unfortunately, this article is too long to fit into a single issue of the Newsletter, so I’m going to have to save the control signal interconnection discussion (which is long) for next time. The remainder of this installment discusses the requirements and benefits of the first two, and winds up by offering a set of suggested design principles to observe in implementing this new world. Along the way I’ll point out the synergistic benefits that follow from doing these things in combination.

Note that few (or none!) of these particular ideas are new; the only thing that’s new here is their combination and application to game sound, and the recognition of their advantages for our specialized context.

A Standardized Software Component Framework with Object Model

If we’re going to be assembling small software components, we’ll need to have a way to specify which components we want to use in a given context, and we’ll need a way to make sure the right parts (functions, methods) of each component get called at the appropriate times. These sorts of services are usually provided to software components by an underlying software architecture called a framework. You can think of the framework as the thing that all the plug-ins plug into. A game would include a framework to support its audio system.

Although it’s conventional to furnish different kinds of frameworks for different kinds of functionality (i.e. you could decide to offer one interface for DSP plug-ins, and a separate interface for, say, music synth plug-ins), I would like to suggest that there are excellent reasons to treat all software components in this world exactly the same, and to plug every component in a given audio system into the same framework. Placing components in a framework would be done in a GUI authoring tool — you’d drag boxes around to form diagrams looking pretty much like the one above.

That’s because in this world, everything is a plug-in. All components use the exact same low-level calling interface (set of function/method signatures); a standard mechanism allows all components a way for each one to express its unique capabilities and unique set of controls.

Managing a framework that supports this sort of arbitrary placement of an arbitrary set of software components within it in a general way is a specialized job best addressed by use of an ‘object model’. An ‘object model’ is an abstract description of the various pieces that comprise a given system, usually expressed in a way that harmonizes with object-oriented programming concepts like class, object, and property. For example, browsers use a ‘document object model’ that describes all the possible entities in a webpage rendering environment. Another example is AppleScript, an object-oriented application scripting language whose object model allows (among other things) a way of reporting what documents are available, and a way of manipulating the properties of any desired document. The point to remember here is that an object model provides a single, standard mechanism for finding out what’s going on with the components inside the framework, and for controlling them.

It’s possible that our framework might need to be implemented in pretty different ways for different host OS environments, but its object model and the binary form of its expression would remain the same in all cases, for reasons that should become clear soon.

When a framework is able to understand a software component object model, the selection and arrangement of components present within it can be set, queried, and altered by other pieces of software, such as the game code. When our runtime audio system is assembled in a framework with an object model, the audio architecture can be adaptively customized for the purpose at hand. If we were to drive the framework’s object model interface with a scripting engine, then sound content creators would be able to configure the audio system on their own, without programmer assistance.

Note that although a native implementation of the framework would need to be built for every platform we want to run on, the object model itself is a purely logical construct. So, given a similar set of available audio software components, any audio system configuration script you might write could conceivably work without without modification on any platform where the framework’s been implemented.

Technical Aspects

None of the above makes any assumptions about what software component technology would be best to use when implementing the framework. The standard component architectures (COM, SOM, CORBA) may or may not be the best choices; obviously it makes sense to take advantage of any given platform’s strong-point technologies.

Note that as long as the semantic structure of the framework’s low-level component interface was the same (i.e. used the same method or message names), any available and sufficiently efficient component technology could be used; in fact, a different component technology could be used on every platform, and audio content authored for one platform would still work without modification on any other platform.


Software component frameworks per se are nothing new; for example, ActiveX is a software component framework. But we’re talking about a framework intended specifically for the purpose of realtime media work. Making an actual audio system out of the isolated software components we’ve plugged into our framework requires signal interconnections between the components. For example, a realtime EQ plug-in needs to be fed a stream of digital audio, and the EQ’s output needs to be fed to some other component to be heard.

Connecting signals between the software components would also be done in the same GUI tool used to place the components. Each component box would have its own set of input and output ports as required by its purpose, and you’d draw patchcords to connect them. This configuration editor tool at first glance would probably look a lot like Opcode/IRCAM’s MAX.

In an ideally simple world, differences among various kinds of signals (audio, MIDI stream, command stream, etc.) would be invisible; but I’ll describe audio signals and control signals separately so that in the second installment of this article I can get deeply into how it might be possible to accommodate the unpredictably unique functions of all possible classes of software component in a uniform way, and why that could be a really good idea.

A Standard for Audio Signal Interconnection Between Software Components

We’ll need to define a standard for audio signal interconnection between components, because the framework is going to need to know what we want to happen when we tell it to ‘Connect the output of the WAV file player to the input of the EQ.’

I frequently hear the concern that permitting arbitrary patching between digital audio functions is going to require massive duplication of large sound buffers on a continual basis, leading to catastrophic failure; however, I just don’t see it. (I refer here to actual RAM audio buffers, not any particular existing audio API’s buffer abstraction objects.) You don’t move the data between buffers; instead, at every audio interrupt (or whatever your clocking mechanism is) you control the order in which the various components perform their operations on the buffers. The framework could use just one buffer per signal chain, not one buffer per component output or input.

In the WAV/EQ example, the framework would know from the direction of the patch that at every audio interrupt (or whatever) it needs to first get a bufferfull of new audio from the WAV player, and then call the EQ to process the buffer; presumably the EQ’s output is feeding a mixer or other component, so that downstream component would be called next. Time-domain multiplexing each buffer, if you will. Buffer duplication only becomes necessary when the audio signal path forks (which is relatively rare).

In other words, intelligence in the framework would translate this signal path configuration:


into one single RAM audio buffer and a function call sequence (at each audio interrupt or whatever) something like this (using C++ syntax):



Settling on a standard of one audio buffer per unforked signal path would give the framework all it needs to make audio connections between components, and be space- and time-efficient. Note that this places very few constraints on the system: at any given time all buffer sizes would have to be the same size, but different platforms could use different buffer sizes, and in fact the same game could use different buffer sizes at different times if desired.

Interconnections in the Framework’s Object Model

So how would the framework know about these signal connections between components? It would keep track of them in terms of its object model.

Consider our block diagram again. The idea of the object model is to make every box in this kind of block signal flow diagram a component, and to make every arrow a signal flow connection. The object model is then all about the abstract concepts of components and connections. When the framework is written, it will be in an object-oriented programming language, and it will almost certainly spend a lot of its time manipulating objects of class component, and objects of class connection.

In the largest sense, a component is anything that creates, processes, or receives a signal. Media (the disk-cans in the diagrams) gets turned into signals by components; other components synthesize signals on their own, no media required. A connection is whatever it takes to plug a component output into another component’s input.

To support this way of thinking— and to get to all the benefits that follow from it (enumerated at length below and in part two of this article, next issue)— the framework and object model would need to furnish a way to introduce components (typically a component would be a low-level audio API), a way to construct configurations of component instances, and a way to make connections between those component instances. Ideally, components’ maintenance needs would be automatically met completely within this same architecture, but components’ unpredictable and unique requirements may make that impractical.

Benefits I

At the cost of implementing a framework with object model for each desired OS/platform, taking a components-and-signal-flow view of the world has several benefits:

  1. Promotes Programmer-Sound Creator Understanding. It gets the programmers talking the same language as the sound content creators, which is likely to help increase understanding (not to mention civility), tending to improve delivered game sound quality. This interconnection model already makes sense to audio content creators from their studio experience and, given a good framework implementation, should be preferred by many game engine programmers too.

  2. Increases Clarity. It also gets the programmers thinking in terms of the abstract audio processes operating on audio signals, rather than any particular OS or API vendor’s low-level programming model. With well-factored audio DSP components, this way of thinking will lead to a better understanding of the dynamic CPU/DSP expense of various operations. (Doing the architecture right and in open-source would be a way to minimize adoptance reluctance.)

  3. Simplifies Writing of Audio Code. With a manager layer in place to handle the ordering of calls on buffers and the relationships between API’s, the coder just doesn’t have to do as much work, which is always welcome.

  4. Modular Interface Promotes Use of More Audio API’s. Once the framework’s been learned, adding further audio functionality becomes easy— for example, use of specialized EQ’s or inline effects processors could become common. As this area grows, vendors of DSP now available primarily as professional plug-ins may start to move into licensing for the game runtime environment with SDK-style versions of their API’s packaged for the framework. One can easily imagine the creation of ‘mini-DSP-slice’ libraries with multiple versions optimized for various purposes. More vendors get to play.

  5. Encourages a View of Audio Libraries as Interchangeable. By making it easier to see the different API’s as the audio components they are, and by making it easier to swap those components, it will make developers feel less locked-in to particular vendors and particular API’s. This should in turn drive competition and improvement in the low-level audio components as developers swap and compare more frequently.

  6. Simplifies Porting of Game Engines and Titles. The same component-and-connection framework could be implemented on any modern OS, using the same high-level interface across platforms. That means that the audio part of the job of porting a game or engine to any platform where the framework has been implemented would be much simpler. (Note that the architecture only has to be implemented once per platform for all developers’ content and game engines to port. Note also that for hardware vendors, seating the low-level API functionality primarily in the card drivers rather than in host CPU code could make deployment to other platforms using the same hardware profile [i.e. PCI/PCMCIA] much easier by reducing the amount of middleware that needs to be written per platform.)

  7. Encourages Porting of Low-Level Audio APIs to New Platforms. Once the framework exists on a platform, then API vendors have the basis for a built-in market on a new platform as developers begin to port titles to that platform within the framework.

Design Principles

In the next issue I’ll get into the important area of control signals and their routing, but for now let me close by moving away from details and zooming back out to the long view. The point of this whole media component interconection idea is to meet all of our needs in the broadest possible way. For successful deployment and adoption of something along these lines, it seems to me that several basic principles ought to be observed in its design, development, and promotion:


For Further Reading: Component Software: Beyond Object-Oriented Programming, Clemens Szyperski; 1997, 1998 Addison-Wesley; ISBN 0-201-17888-5.