Interview with Spencer Critchley, Part 1

by Alexander Brandon

Let's get right to it... Spencer Critchley is a developer who's been involved heavily both with interactive audio and linear audio, so he can speak from both sides of the fence... we speak with him on the future of audio on the web and enough general subjects to keep just about anyone in the SIG interested. Read on! Check out his website as well. Alex's questions are in italics and Spencer is in regular type.

What's your background?

You might want to have a look at another website that has just gone up. I’ll end up taking the bio from that site and putting it up on my site. I’ve since shaved off the goatee, but otherwise its accurate. <laughs>

As it says there in a nutshell I’ve got a background as a producer and creative director, and I started out in the music industry in the 80s in Canada, and I was a songwriter and producer signed with Warner Chappell music. I also was a writer / broadcaster with the Canadian Broadcasting Corporation. In the beginning of the 90s I moved to California, and I think in a nutshell I spent most of the 90s working in interactive media starting with CDROMs, then I went on to do interactive television, computer games, the web, and then most recently telecom consulting with this company called BeVocal.

It says in your bio that at Beatnik, you “directed production of web content for MTV, Yahoo!, David Bowie, Moby, Britney Spears, Thomas Dolby and others.”  How did that work, and how did it work out? Was it popular?

"...the adoption of audio on the web has been growing, but its been happening a lot more slowly than we thought, which is sort of typical of the web in general..."

Actually, I was at Beatnik in 99 and 2000, and Thomas (Dolby) hired me to set up their creative production department. We were working on products to demonstrate the potential of Beatnik’s web technology, so we would do things like sonify pages for Yahoo, for example. One of the first big projects we did was Yahoo Digital, which was Yahoo’s first step into a rich media site, so we sonified the interface and developed these things called ‘quick clips’ which allows you to quickly browse music for download where you hear these little 5 second samples when you rolled over a link with your mouse. For MTV and a bunch of recording artists we developed these web based remixers, in which you can remix their songs. There were contests where you could go see Britney Spears play live if you did the best Britney Spears remix and similar kinds of things for all the other artists we did, and this was based on an idea Thomas came up with. He happened to have the multitrack master for “Fame” by David Bowie, because he did some work with Bowie in the past, and he realized that using Beatnik it’d be possible to make this remix interface on a web page and he did it himself using Dreamweaver, the prototype, and that was the first one that Beatnik did was the remix of Fame and that proved really popular. So the engineers at Beatnik then took that idea and developed a more robust version and made it so you can remix songs and store the data representing your remix, and email the URL to all your friends so that when they click on the URL and go back to that page and hear the remix that you did. We also advised some of these people like MTV and David Bowie for example on other applications of beatnik to sonify the interfaces for their sites, when Bowie relaunched they used beatnik to sonify the interface, so I advised his company on how to do that.

I haven’t been in close touch with Beatnik since the fall of 2000, so I’m not a really super reliable source on what they’re doing right now, but I do know that they’ve moved their focus away from concentrating on web sonification towards licensing this technology to be used in things like cell phones and PDAs, so I don’t think they’re doing much of that right now, it seemed to be very popular at the time but is not a core focus at this point, and I would say that my observation is that it has taken people longer for people to adopt interactive audio on the web than any of us thought it would. Thomas and the original members of Beatnik started the company in 1996 and I joined in the spring of 99’ and we all fully expected that by sometime in 2000 it would all be taking off like wildfire <laughs>, and the adoption of audio on the web has been growing but its been happening a lot more slowly than we thought, which is sort of typical of the web in general in a lot of ways. Rich media on the web and broadband is just taking off slowly as a whole.

"In a well designed site it will be silent where it needs to be, but where sound is justified it will be woven together with something approaching the sophistication of a film or TV soundtrack, with one important difference: its not a linear presentation the way video and film is. "

What is your forward thinking for what will happen in web audio that will be advanced? Right now we have sites on the web that play background music and make sounds when you click / mouse over links or garphics, and that’s pretty much it for the most part. So what ideas do you have or that your colleagues had that could be an advanced way to sonify?

I think things are happening now, but they’re happening slowly. This is something I used to tell people when I was working at Beatnik, but when you think about movie and TV soundtracks, essentially you’re just playing sounds in sync with the picture so at the top level that doesn’t seem really complicated... and in the early days it wasn’t, you have a mono soundtrack that lined up with the picture, and any mix would be rudimentary, you know... a few actors talking and background sound effects with maybe some music playing, but in a modern movie you have potentially hundreds of separate tracks of dialogue, music, and sfx, with a great deal of expertise going into creating that soundtrack. Similarly on the web, initially if there was any sound at all on a website there might be an introductory thing or cute sfx when you do a mouse over, and that would be it... the rest of the site would be silent. What you’re seeing now is more layering and people applying more of the skills that they would put into a movie or TV soundtrack. So that if you go to a well developed Flash site there might be background music, ambient sound, dialogue, rollover sfx, and you won’t get these awkward silences that last through whole areas of the site for no apparent reason.

In a well designed site it will be silent where it needs to be, but where sound is justified it will be woven together with something approaching the sophistication of a film or TV soundtrack, with one important difference: its not a linear presentation the way video and film is. You are taking into account the fact that you’re not sure where the user is gonna go at any given moment, and so you are designing a layered interactive soundtrack, and I don’t think that’s the only thing that’s gonna happen with audio on the web, but I think that it’s a big enough thing, that people will be mastering that for years to come, just like it took them years to master doing a well developed film or TV soundtrack. I think that people who are doing computer games will be familiar with this, because as they know as when you sit down to create music and sound from computer games its great if you’re a good composer or sound designer from the traditional world, but you then need to develop these extra skills of anticipating what the flow will be like when you can’t be sure what the user is going to do from one moment to the next.  You’re designing dynamic edits and crossfades that will work at any moment, so its designing for the 4th dimension and I think that’s an important area in development that will continue to keep people busy for awhile, partly because of the design skills that are involved. Again, computer game people will know that this is a whole extra way of thinking that gets to be like 3 dimensional chess. So partly because of the design skills and partly because of the technical skills. People from the linear media world may very well be experts on Pro Tools and recording studios and post facilities and so on, but in order to work on the web they have to get familiar with things like HTML and Javascript and Flash and they have to start to absorb programming concepts like objects and instances of objects so I think there’s still a fair bit of development that can happen there.

Let's take linear music and sound, as its traditionally created. You have the artist, the engineers, and the composers separate, at least typically in most echelons of film and TV, and that can very well happen with the web. Brian Schmidt (head of XBox audio at Microsoft) has said many times there will always be a need for people that integrate this stuff instead of linear audio, actually making games react to something the player does is going to be either up to the people that write the music or someone different, yes?

Exactly. I think that’s quite right. Another way of thinking of this is that media keeps getting more and more structured. Analog media has almost no structure, and digital media can be structured in a very detailed way, because its made out of bits, and those bits are information you can tag the media in all kinds of different ways, whether its just markers in the timeline identifying the start of ‘scene 1’ or embedded computer code. As media becomes more and more structured it can be used in different ways, and people need to start to blur the distinction between entertainment programming and computer programming. Its an interesting pun that we use the word ‘program’ on TV and its also an application that runs on your computer, and I think those two things are actually merging. 

In the past, creating broadcast or filmed entertainment required you to master this considerable skill set, but that was involved in doing something that was locked in a linear fashion in time. Now there is the addition of this whole extra skill set that involves interactivity with the user. I think there will be people having to expand their skill set and in some cases you’ll see specialization as you do now... you pointed out there are people who specialize only in re recording mixers in the film industry, and other people specialize in recording music for the soundtrack, and others specialize in dialog editing, so I think at the high end you’ll see specialization like that, but also people who do a little of everything, like in the pop music business... you see people who write the song, perform the song, and produce the song, and they have a combination of artistic and technical skills.  One way you’ll see this happen is recording engineers will learn more and more about the programming side, since they’re already technically oriented people, and if you look at the way that music recording and film recording has evolved, it becomes more obvious of the need for that skill set evolution. 

At first there was resistance to adopting Pro Tools, for instance. Especially in the film business, because it took a while for Pro Tools to become so rock solid stable that you could trust the sync to be perfect and you could trust it not to crash in the middle of an extremely expensive session. The cost of having your Pro Tools system crash in the middle of a film session is so high, that it just wasn’t worth it, but Pro tools has become so stable and reliable now that everyone is adopting it, and so you have people who in the past would only work with magnetic tape or film stock have since become Pro Tools experts and that’s a matter of course, and that’s a whole new way of thinking about their work. I’m fairly sure you’ll see the same people starting to absorb Flash and concepts of interactivity and getting familiar with the idea of creating an instance of an embedded player, for example is a really common concept in web sound will become similar to creating a region in a Pro Tools track.

"those of us who are interested in this stuff for its own sake tend to get excited about stuff that’s advanced and subtle, and gives the user all kinds of abilities, but I think people should realize that most mainstream users don’t want to work that hard and will gravitate towards things that allows them more easy choices.."

Another point I would make is that a lot of us who are interested in this for its own sake get really excited about advanced uses of interactive audio where you’re doing things like remixing music and changing the effects sends or linking it to animations and possibly having generated compositions that create themselves in response to sophisticated things that are happening, but in a lot of mainstream uses of this, it won’t be that radical. We’ll find that people just want to listen to dialogue, music, and sound effects, and they don’t want to have to work too hard as they listen to it. Its just that that stuff will have to be able to flow with them as they browse through the content.

Its kind of like interactive TV. People have been exploring all kinds of uses of interactive TV. The system I worked on in 1995 at Silicon Graphics was a lot like the present day web, except it was all based on your television with fiber optic connections to your TV set.

Isn’t that what web TV was all about?

Web Tv was kind of a simplified outgrowth of that kind of work. If you think about the web it allows you to browse information, news, entertainment listing, lets you do email, live chats, etc.. The SGI system I worked on was doing all of that before the graphical web really hit. It was doing it with higher resolution and greater speed, greater fidelity. You were able to watch high resolution video on demand for example. Unfortunately it cost a FORTUNE to implement it, so it was an interesting experiment but it wasn’t commercially viable. My point is, just like interactive audio on the web people tried all sorts of advanced uses of it, and what we’ve been finding over the years is that people like to watch TV. And they like to watch TV because its been easy. They’ve been working all day and they don’t want to work more, interacting with their TV set and making a bunch of decisions about what should happen next. The average person has the attitude that that’s the job of writers and producers and actors that have made these decisions for me and come up with something entertaining that I can relax and watch. The kind of interactivity that the mainstream user likes is something that increases their easy choices. So if they’re watching ITV and… with terrorism, if they want to find out more about this Bin Laden guy, if its easy for them to do something that gives them more information, they find that sort of stuff valuable.

Maybe not even click or press a button, maybe just say “computer!”

Yeah, or talk. This touches on what I’m doing with BeVocal which turns the whole interface to the world of information, and entertainment. It makes your voice that interface and the phone is the device to make it as easy as possible and accessible anywhere you have a cell phone.

As I say, those of us who are interested in this stuff for its own sake tend to get excited about stuff that’s advanced and subtle, and gives the user all kinds of abilities, but I think people should realize that most mainstream users don’t want to work that hard and will gravitate towards things that allows them more easy choices, that amount to clicking on a hotspot on a screen that allows more information or the opportunity to buy something or something more straightforward like that.