Interactive Composition Column 1.1

A Direct Music Primer
by Alexander Brandon and Mark Miller

Much has been said in the past few months about DirectMusic and, thereby, about interactive music. In this first column, we will provide a basic outline of DirectMusic and then lay out some real world and theoretical context in which to examine it. In future, we will delve down into the details of actually using the system and others like it to create interactive sound tracks.


DirectMusic is a system native to the DirectX 6.1 SDK which adds specialized music playback and synthesis to the next generation of the Microsoft "Windows" operating system. DirectMusic Producer is the editing environment in which the composer can write music and create different "Styles" in which the music can be changed interactively. Rather than simply play back preset MIDI or Digital Audio data, the DM system works with a set of parameters defining music content generation. In this context, 'generation' refers to the fact that the actual music that is played back is assembled by the playback engine, from a combination of these parameters, music data created by the composer and input about the state of the application (a game, for example) just before it is needed for playback. This makes the music more "interactive" and responsive.

The system also improves PC sound playback quality by standardizing a synthesizer in software rather than relying on the varied existing hardware specifications and solutions, (although hardware specific
channel playback is supported.) The system also adds variety and power by supporting DownLoadable Sounds (DLS1). Beyond what is shipped on the disk, the most interesting and perhaps most unique attribute of DM is its open ended architecture. This architecture allows developers to create applications such as sequencers, MIDI effects processors and custom sound design tools to further simplify and or enhance the composition and playback process.

So you can see that DirectMusic is not the end of PC composition, it is merely a more solid foundation for music playback and construction methods going forward.

Design Commentary

The system is quite an interesting and imaginative leap for Microsoft, but not one that comes as a surprise. For the last several years Microsoft has expanded its operations in nearly every field of computing, however its MS-DOS and Windows offerings have not been revolutionary, rather they have emulated and built upon them existing trends, like the MAC OS, (for example). Apart from the DirectX SDK, DirectMusic is the first such example we have seen from MS of a completely new approach to generating content. It will be met with a great deal of enthusiasm as well as a great deal of criticism. For the purposes of this report, the product will be simply be reviewed and thoroughly examined on it's own merits.

The only Microsoft specific consideration to be made here is the fact that DM will have a much wider market distribution than most Interactive Music products, being that it is included free with DirectX 6.1 SDK. Considering the number of Windows machines in out there today, and their use by nearly every developer creating software, this fact must be taken into consideration. Before we can really judge DM as a product, we must first present an approach to interactive music in games, a rapidly expanding and evolving field, and compare it's requirements to the capabilities of DirectMusic as closely and thoroughly as possible, being precise as to details.

At first glance, DirectMusic and DirectMusic Producer comprise an excellent set of tools for developers. That being said, we must ask the question "how excellent?" Considering other, established tools such as Miles, Beatnik, AMStudio, and all of the proprietary ways of approaching interactive music will give us a platform for comparison. (details on systems such as LucasArts' "Imuse" and EA's Adaptive Audio engines will follow in future updates, for this purpose). But before we do even that, we need to answer "The Big Question":

Has Microsoft left the station too late?

Most, if not all industry veterans agree that if DirectMusic had been presented five years ago it would have been the best method for game music development and been rapidly adopted. Today, however, new methods of real-time music streaming from CD / DVD are being implemented that make DirectMusic seem obsolete, so is this approach still valid? After serious consideration, the answer must be, yes. Real-time multiple track music streaming can be a great way to do game sound tracks, but it certainly isn't the only way, or even necessarily the best way. One can stream music, give a game a movie-like feel with this technique and come off with a superlative soundtrack, but the music can't be varied responsively each time the user plays. This static quality often detracts form the player's satisfaction. Streamed digital audio is also large <storage wise> and bandwidth intensive, which can severely limit the scope of its application.
This would lead one to ask:

How important .is. actual sound playback quality in the overall picture?

This is a very important question indeed. Today, it could be argued that MIDI controlled playback on sound card technology still can hardly hold a candle to professionally recorded audio. If this is true, than one could not fault a composer for wanting the highest standards possible and wanting to go pure DA. But the game industry is still a leap from the movie and music industries in terms of playback system technology. Compare, for example, a typical pair of $20 PC speakers to a home or actual theater sound system. Given this and the new capabilities presented by DirectMusic the gap may will be narrowed even further. With its support of custom DLS based samples instead of stock GM presets, DM's interactive MIDI playback should finally provide a listening experience that can approach the quality pre- recorded music. This is, in fact, still an area where an under-funded practitioner can begin with the simplest of tools and create acceptably impressive work. That said, there are still some major discrepancies. What about, for example, real time DSP effects? dynamics? EQ? Stereo enhancement? Digital 5.1 surround? The road to outstanding sound quality will not be a short one, but DirectMusic has at least put its foot down in the name of interactive music where others have been loathe to trod.

Lastly, this streaming DA vs. interactive MIDI plus DLS question is really comparing apples to oranges. The only problem in this comparison is that people are used to apples, and while the orange may be a better tasting fruit, people won't know until they actually have a taste. Streamed music in the form of pre-composed performances has been the staple of listening and music in general throughout human history. Attempts at randomization (Cage, for example, and Stockhausen) may have been recognized as leaps forward, but have failed to become accepted in popular culture. So one might say that DM, the untested fruit, will really need to prove itself powerful and useful in the hands of those composers who embrace it, or be left behind.

Even so, many game developers have already begun planning for adding DirectMusic to their games. Other game audio system projects, both established and in the works, are planning to apply similar methods of interactivity as those used by DM to their offerings. So while DirectMusic may not have the highest standard of sound quality compared to CD / DVD streaming, there seem to be compelling advantages to its approach that are gaining a large foothold in the developer marketplace. Perhaps, then, the appropriate behavior of music in an interactive context may, in fact, be equally or more important than the bottom line sound quality.

Interactive Audio in Games - An Approach

In games, when we look at how music is to be played, we see several methods. The most conventional is a single looped piece for each "level", (or whatever term is described for a chunk of playtime that is defined by the developer), being played constantly over and over again. This has worked quite well for many years, but the fact that the public seems satisfied with it does not mean it is the way to use music most effectively. Music, being so abstract and nebulous, does not lend itself to straight forward paths of evolution in perceived quality the way video technology does in the improvement of image quality. 3D sound, for instance, does not have quite the immediate and awe inspiring effect of 3D video.. (genuine 3D video, holographic, etc..).

What we can now do is look at different ways to play audio. Gathering information from years of subject discussion, we can see that random music generation in its purest form is just as nonsensical as random visual generation. People need a foothold in reality, things must have consistent shape and color definition to be held as discernible objects. So let us toss out completely random music generation for now. It may be used in the future, but we do not yet know a method to make this practical and useful in today's context. A more fruitful approach would be to begin with music that has recognizable elements of shape and form. This music should then be adapted during playback to fit changes in user action and location. This method has already been used in a variety of contexts, with some notable success. This is what we will pursue for now.

Thematic Development

To begin this discussion, we must talk about thematic development. Thematic development is something that has been vital to successful music scores, most recently, game scores, for many hundreds of years. We will take a moment to explain it so that we can fully understand its importance. (NOTE: This is a theory and not a fact about music. Music holds few, if any, absolutes, as any art form does, so we are certainly not expecting everyone to agree with the following analysis.)

To identify with something readily, there must be something constant and unchanging in it. There must be least one identifiable characteristic that remains in the memory from encounter to encounter. Objects currently existing in reality such as trees and birds present little problem with this kind of recognition. Used in virtual environments like games, such representations and variations of these real world shapes and objects give the player a start on a comfortable and recognizable set of surroundings.

Once that is done, the unimaginable and non repetitive can be ventured into. This is, however, very difficult to do well and is based on relativistic principles that we won't go into here. This is not just something that is done in games, but in movies, books, and nearly every leap of the imagination presented to a wide audience. Science Fiction, for instance, presents bold and incredible concepts, but by the rules it is given must base those concepts on things already discovered. This may sound limiting, but in fact it is essential for the readers' or viewers' enjoyment.

In a novel, the author begins by using something already understood... English, for example, to define things the reader can recognize... the language we know speaks of planets, stars, humans, human behavior, etc.. And then the author begins to unravel this comfortable blanket of common knowledge around the reader to expose them to things they have never before considered... Doing so leaves an unforgettable effect if done with care and precision. The best writers of fiction, fantasy, and science fiction have proven the success of this method for many years.

The same set of rules can be applied to interactive game music in a loose way. By giving the environments and characters in a game themes, the player can grasp their identity, and once the themes are established, variation can be introduced. Easy examples are movie soundtracks, and symphonic classical and romantic orchestral music, from Musorsgsky's "Pictures at an Exhibition" to John Williams' "Jaws".

Repetition in contrast to theme and variation

In an article by David Yackley and a summary of DirectMusic elsewhere in Microsoft's documentation , the author stated that "repetition is boring". In one sense, this is correct. In another, it is not. The authors aren't guilty of stating that DirectMusic should vary everything all the time, but they don't necessarily paint the whole picture. Developing and establishing a theme before introducing variation, while not absolutely necessary, adds a great deal. It familiarizes the player with the game in a more intimate way than constant, pervasive variation can.

On the other hand, repetition in its purest sense will bore the player eventually. The answer to this is DirectMusic's ChordMaps, Styles, and Templates. When used properly, the composer can take a theme for a character or an environment, specify the harmonies they would like to see used, add rhythm if necessary, specify the variation, and let the system create the desired effect. In this way, the soundtrack can STILL have the themes that a composer / producer wants for the characters, and yet programmatically introduce meaningful variations. Examples of this will be forthcoming.

In conclusion, the idea of thematic development, of buildup and resolution, are very important to the successful realization of real-time, interactive music. Currently, DirectMusic tackles this in a more complicated and comprehensive way than ever before.

In the next installment of this column, we will begin to go into the details of how this is actually done, in DirectMusic and other "adaptive audio" systems.