English

A Practical Guide To Diffusion

by Chris Rolfe

Fadein
Speaker Configuration
Automated Matrix Diffusion
AudioBox and ABControl
Composing a Diffusion
Using Static Assigns
Using Dynamic Patterns
Fadeout

This paper is intended to serve as a brief guide to automated electroacoustic music concert diffusion.

Fadein

The performance of electroacoustic music necessarily entails amplifiers and speakers, which isn't an unreasonable definition of diffusion, but actual practice varies widely. For instance, a relatively early tape-work, Edgar Varèse's Poème Electronique (1957), was realized on an architecturally grand-scale using 400 speakers in a pavilion designed by Corbusier. In contrast, Alvin Lucier's I am sitting in a room (1966) uses multiple iterations through a single microphone and speaker to exploit the combined effects of room resonance, microphone, and speaker response. 400 speakers, or one speaker 400 times - both compositions are intrinsically dependent upon a particular approach to loudspeaker realization, and could justly be described as composed diffusions, but, otherwise, are apples and oranges.

For simplicity's sake, then, I'll limit myself to practical matters, and, somewhat arbitrarily, define diffusion as the realization of one or more sources over multiple (2-16) loudspeakers. For the same reason, I'm also going to ignore some obvious and useful speaker placements such as overhead speakers, far speakers, etc., in favour of more straightforward arrangements in the horizontal plane. By no means, do I want to imply that there is only one way to do things, or intend to dissuade anyone from swinging speakers from large ropes, passing out open-air headphones to the audience, or setting fire to the equipment (if that's what the Muse demands).

Onto diffusion, then. Simply speaking the number of speakers often determines the best approach to diffusion (and spatialization in general).

Speaker Configuration

Mono

Or to paraphrase the Australians, "one source, one bloody speaker". Not diffusion by the above definition, but in fact many auditory cues like reverberation and Doppler shift are sufficiently persuasive that we can use them to manipulate sound fields even under monophonic reproduction.

Stereo

Moving back and forth from monophonic to stereo mixing is the best introduction and training ground for multi-channel diffusion I could suggest. Most of the basic principles that apply to stereo mixing, concepts like image width, placement and balance, and common practices like AB'ing the stereo- and mono- mix, are useful when applied to larger speaker configurations. Also, stereo utilizes the seemingly passé, but still astounding (to me, anyway) phenomenon of phantom imaging, without which Pink Floyd would have been headphone-only music (not to offend those who think Pink Floyd is headphone-only music). Finally, the use of stereo-recording techniques (X-Y pairs or Soundfield microphones) and many sophisticated processing tools for decorrelation (high end stereo reverb, for example), can be applied and studied in a stereo environment, before trying to translate them into more complex speaker settings.

Quad

Never very popular as a consumer format, quad does, however, add a front-back dimension, and so is useful for concert diffusion. Together with a mixing board it's an economical and simple diffusion setup, and is very handy in predicting more complex speaker diffusions. It's also worth mentioning that quad systems were the platform for much of the research into spatialization techniques and many landmark works in computer music. John Chowning's Turenas (1972), for example, employed computer-generated sources to simulate movement in a quadraphonic field. All of the important techniques in loudspeaker localization and spatialization, amplitude panning, Doppler shift, global vs. local reverb 1[1. Gary Kendall provides a good introduction to loudspeaker localization / spatialization techniques in the Computer Music Journal, 19:4, Winter 1995.], as well as more compositional ideas like dialogue, counterpoint 2[2. See Emmerson., S. 1994 "Local/Field: Towards a Typology of Live Electroacoustic Music." Proceedings of the 1994 International Computer Music Conference. San Francisco;

also, Wishart, T. 1985. On Sonic Art. York, UK: Imagineering Press.

also, Smalley, D. 1986. "Spectro-Morphpology and Structuring Processes.".

also, MacDonald, A. 1995. "Performance Practice in the Presentation of Electroacoustic Music". Computer Music Journal, 19:4, pp. 88-92, Winter 1995.], and spatial polyphony can be demonstrated and experienced on a quad system along both horizontal axes.

8 Channel

There is a qualitative change in experience as we move from 4 to 8 speakers that I can't entirely explain, but which nonetheless requires explanation. The additional speakers improve fidelity, since speakers work most efficiently when not taxed by overly complex signals. Assigning individual channels to individual speakers for monitoring, therefore, is useful for critical listening and clarity. There is, too, an immersive reality to discrete, eight-channel reproduction that is not easily achievable with mono, stereo, or quad reproduction. To hear an example, if you have access to 8 speakers, take any 4 minute ambient recording (my personal favourite is frogs) and do the following:

Edit the recording into 8 separate 30 second segments (in order to create uncorrelated chunks of audio). Mix all 8 segments into mono, stereo, quad, and 8-channel versions. Now play back the 8 frog channels, first, mono to 8 speakers, stereo to 8 speakers, quad to 8 speakers, and finally, 8 to 8 (each frog gets its own speaker). The electronic sum of the different mixes should be identical, but the 8 to 8, discrete version is undeniably more enveloping. One feels slightly "wetter" listening to the 8-channel discrete frogs 3[3. Some users report a corrollary effect when hearing a stereo work expanded onto 8 discrete channels for the first time, describing an "openeness" or "hollow" quality, possibly because the individual channels, now separated in space, are not masking each other as much.].

Another benefit of having 8 speakers, assuming they are arranged more or less in a circle, is that the angle between speakers is only 45°. Because amplitude panning, which is still the core technique for most music diffusion, is sensitive to listener position, an audience member sitting even a few feet off-center will not perceive a phantom image between the panned pair, but, rather, localize the sound on the nearer of the two speakers 4[4. The Precedence or Haas Effect is well-covered in all the pyschoacoustic literature. See Kendall, Bergeault, Emmerson or Bergman.]. Phantom imaging is also weaker between a front and back speaker because (most of us) have ears that face forward. Placing an extra speaker pair immediately left and right of the audience largely solves the problem.

Automated Matrix Diffusion

The main problem, as we graduate from stereo-to-quad realizations to 8 speaker arrays, is controlling the mix. Given two hands and a conventional mixer, even a quad mix can be difficult to perform. One approach would be to pre-calculate each speaker placement and panning effect, in conjunction with effects and other enhancements. Robert Normandeau, for example, has by his own account 5[5. Bouhalassa, N. 1999. "Robert Normandeau interviewed by Ned Bouhalassa ". Canadian Electroacoustic Community - Short Takes.], been working this way for years. The problem with pre-calculated diffusion on an 8 source, 8 speaker system is that any change in any source track level or placement requires a readjustment of all 8 output tracks and therefore, potentially, all 64 interim tracks. This premixing approach is somewhat tedious, and requires a very good spatial imagination. The alternative is a real-time, automated matrix mixer.

My first experience working with automated diffusion systems came about as a result of programming a control interface for an 8-channel digital matrix mixer, the DM-8 6[6. The hardware, the DM-8, was developed by Tim Bartoo of Harmonic Functions, and the design was co-ordinated by Dave Murphy (Simon Fraser University). I wrote the Max-based algorithmic and sequencing program, Matrix, that controls the mixer.

The compositional ideas for the development came from composer, Barry Truax, a long-time advocate of multi-channel diffusion. His Powers of Two, which was realized on the DM-8/Matrix system, is a careful study in multi-channel diffusion techniques. It contains many continuous rotations (clock-wise, counter-wise, both contrary and chase-movements), step-wise rotations, quick, random assigns, as well as generally judicious placements and re-assignments throughout the work. He has since realized a number of works on the DM-8, notably Pendlerdrom, Powers of Two, and Sequence of Earlier Heaven.

My own works, Bronze Wound (1996) and The Answer Which the Court Gives (1996), were also originallly diffused on the DM-8, as well as a series of works for the World Soundscape project in 1996 (Westerkamp, et al.), Dami‡n Keller's Toco me non Voya (1999) and another dozen or two pieces by students, visiting composers and faculty of Simon Fraser University.]. A matrix mixer differs from a conventional mixer in that the usual on/off buttons in the assign-to-buss section are replaced by level controls. An 8 input, 8 output mixer has, for example, 64 (8 times 8) independently adjustable levels commonly called crosspoints and usually conceived of as a square, or, matrix.

The basic diffusion unit in the DM8 is the crosspoint fade, which is equivalent to the statement:

"Fade input 1 to output 1 at 0.0 dB over 4 seconds".

Adding a sequencer and/or pattern generator capable of chaining several such messages together gives you a very useful method of creating patterns, such as rotations, side-to-side panning, front-to-back gestures, etc. In terms of the algorithmic approach implemented in the Matrix software, we implemented two classes of commands: static assigns, which are a one time crossfade to a new input assign setting, and, dynamic generators, an algorithmic approach.

The static assigns are straightforward, like taking a snapshot of the current matrix that you can recall later. Dynamic generators are bit more complicated.

Figure 1: Dynamic Generator in DM-8 Matrix Controller

The three most important parameters are rate, fade% and speaker list. Rate determines how often to crossfade, crossfade% determines the duration of the crossfade between speakers, and the speaker list determines the order in which speakers are used. Assuming our 8 speakers are arranged around a circle, then, a dynamic pattern with rate = 1 second, fade%=100, and speaker list = {1, 2, 3, 4, 5, 6, 7, 8} would create a smooth, clockwise rotation at 45° per second, or, one complete turn every 8 seconds. Reversing the speaker list, {8, 7, 6, 5, 4, 3, 2, 1} causes a rotation in a counterclockwise direction. Setting the fade%=0 on the original clockwise rotation does not effect the pattern, or the overall rotation rate (still 45° per second), but instead of panning continuously, the input steps from one speaker to the next. The crossfade duration is set to the minimum required (25 milliseconds) for a smooth, artifact-free transition.

AudioBox and ABControl

The DM8 was, really, a prototype system, and only a few models were ever built. The basic principles have been refined and expanded, however, in Harmonic Functions' AudioBox, a 16 X 16 matrix mixer with 8 channels of hard-disk playback with Mac control software by Third Monk Software, where I am the principle programmer. Although primarily marketed as a theatrical show-control engine, with cue-based internal playback capability, presets, etc., and adding additional features like EQ and channel delay, the roots of the AudioBox mixer lie with the original DM8, and therefore the system is quite capable of handling music diffusion, although it differs from the DM8/Matrix system in several ways.

The DM8/Matrix system was speaker-centric; that is, the user programmed directly to the outputs/speakers. Thus a clockwise rotation involves stepping through a speaker_list {1, 2, 3, 4, 5, 6, 7, 8}. ABControl, on the other hand, uses vector panning. Thus a clockwise rotation is specified as 0°, 45°, 90°, 135°, 180°, 215°, 270°, 315°.

ABControl refers to a speaker map that allows it to translate the vector angles into actual crosspoint levels. A rotation mapped onto vector angles can then be played back on a quad, 8-channel, or other configuration. ABControl also contains its own generators, and other tools to automate diffusion patterns, but aside from the differences required by the vector vs. speaker approach, the core ideas are similar.

Figure 3: *ABControl Diffusion Generator*

Figure 3, for example is set to generate a smoothly-rotating, clockwise pattern on input 1 (+45° every 1 second) and a smoothly-rotating, counterclockwise pattern on input 2 (-45° every 1 second).

Composing a Diffusion

A good diffusion, like good electroacoustic practice, is sensitive to its material and clear in its aesthetic intent. In fact, a haphazard or inappropriate diffusion can quickly devolve into a jumbled mess that will harm rather than enhance the realization. One of the nice things about designing automated diffusion systems is that I get to hear a wide variety of diffusions. Furthermore, because the diffusions are programmed, I also have access to an explicit score and can study exactly how various effects are being achieved. What follows are a few rules of thumb regarding the use of static assigns and dynamic patterns that have emerged over the years. It's a bit of a laundry list, and there are conflicting guidelines, but these tips might prove helpful to some, especially to composers relatively new to automated diffusion:

Using Static Assigns

A sensible goal in the static assignment of tracks is to maintain channels on discrete speakers (as in the discrete frog example above). However intentional departure from this guideline is useful in certain instances such as at the end of Bronze Wound where I collapse all 8 channels onto a single front speaker as an exit strategy. The number of active speakers might be, too, a compositional element, perhaps moving from monophonic, single speaker, to stereo, to quad, to 8 channel and back again as part of the diffusion structure. And somewhere, surely, there is a composer clever enough to realize an 8-channel diffusion that uses only two speakers at any given moment. In manual diffusion, panning stereo sources, this might even be considered the limited norm (to do it by choice, however, would be very bold).

Many composers also prefer the slightly more diffuse sound created by assigning a channel to two or more speakers. This also results in a more predictable mix of levels because the overall amplitude is less dependent upon listener position or individual speaker variations (how robust, for instance, was the stereo mix on the Beatles' album that had all of the singing parts on the left speaker?). The cost of a more stable mix, though, is a loss of control over localization, so there is a tradeoff to consider. You should also take care with simple waveforms to ensure that phase cancellations don't cause the sound to vanish in some listening positions.

A common misconception, by the way, is that a single channel assigned to all speakers will surround the listener. In reality, unless the signals are decorrelated in some way, the listener simply localizes the sound on the nearest speaker (decorrelation, usually achieved through random phase-shifting, seeks to disrupt the tendency of similar signals to fuse in our perception).

For the sake of intelligibility, text or other prominent material is often assigned to the front, or side speakers. This does not mean that all voice is played monophonically - a common diffusion practice, also found in stereo mixing, pans dialogues left and right. At least partly because of intelligibility, the front-back dimension is usually handled differently than the left-right axis. Trevor Wishart makes the point, too, that sounds behind us have a slightly more ominous quality.

Before you rush to place all your tracks on the front speakers, however, be reminded that low frequency material is harder to localize, and can be safely assigned to the (probably) underutilized rear speaker(s). Less prominent, or less startling sounds also do well behind the listener, and, of course, intentionally startling the audience from time-to-time might be your purpose in life.

Speech need not necessarily be fixed in its assignment, or always come from the front. Darren Copeland, for instance, picks up on the semantic content of the text in Life Unseen on a phrase like, "a sound over here... there... or behind you", panning each word appropriately. Many acousmatic, soundscape, and text diffusions seem to be driven by the material's concrete and/or semantic references.

And still other composers, Damián Keller in Toco y me Voy (Touch and Go), or Kenneth Newby in Seasonal Round, for example, like to work once in a while with a less formal, circular seating (rather than a front-facing shoe-box seating), and therefore intentionally subvert the front-back, left-right distinction.

Another important consideration in static placement is maintaining a balanced, or at least conscious, relationship between levels on the left and right sides, and even, to a degree, between the front and back. Consider, when preparing and assigning tracks, whether the result will be lopsided because the loudest, most frequent sounds are concentrated on one side. And, while it is not unusual for a diffusion to be a bit front-heavy, the reverse (most material on the rear speakers) is rare.

Handling stereo source tracks, either artificial or recorded (mimetic), is mostly a matter of common sense, but here are a few things to keep in mind:

Be careful not to collapse a stereo source unintentionally by placing both tracks onto a shared speaker. This can have unpredictable (source-dependent) results such as lowered volume due to phase cancellations between the two source channels, and, more predictably, the disappearance of any panning gestures.

Be aware that a careful left-right stereo image will be distorted if skewed to a front-back or off-axis position, the front-most channel becoming more dominant. A slight level boost or brightening EQ on the back channel can help to compensate for what amounts to a muffling (because it's behind the head) of the rear channel. This can also be done at the mixer or on the speaker itself (in some cases), rather than in the original source tracks.

Be advised that most stereo recording techniques and artificial processes (like the better stereo reverbs) usually assume a 90° spread between speakers. In an 8-channel circular arrangement of speakers, the angle between speakers is, of course, only 45°. This is not a huge problem at the front and back, where the center speaker can be omitted, but most stereo images are altered or lost when placed on the side speakers (180° apart).

Regarding static transitions, the word static might seem to mean a fixed, inalterable assignment, but the ability to reconfigure assignments is actually one of the handiest features of automated diffusion.

Often in making efficient use of available recording tracks, a composer inserts a short section of unrelated material into any currently silent track. It is therefore useful to be able to reassign that channel as needed.

I mentioned earlier the use of quick successive crossfades in Darren Copeland's Life Unseen, but slower crossfades and reconfigurations can also be gorgeous. The middle section of Barry Truax's Sequence of Earlier Heaven, for example, introduces new stereo material onto a front speaker pair, slowly crossfading, over 10 to 20 seconds, to the four corner speakers. Hans-Ulrich Werner, in the Vancouver Soundscape project, used a front-to-back introduction of stereo-paired speakers to complement a fly-by recording of a seaplane in which the front speakers were faded in over 20 seconds, the side speakers over 40 seconds, and the back speakers over 90 seconds. Because the ear tends to localize a sound on the nearest speaker, this is perceived as a front-to-back movement, becoming slightly more diffuse as the rear speakers reach their maximum output.

And while the compositional motivations to reassign any given channel are as varied as compositional approaches, I think the following three principles might be somewhat true:

Any reassignment should be motivated by or related to the underlying musical structure or some aspect of the source material.

Complex static transitions, as opposed to complete, 8-channel, instantaneous reconfigurations, are not only feasible, but are desirable, in that they help to clarify the change. Repositioning or introducing new material in a staggered fashion works well in diffusion. Rather than crossfade 8-channels to a new position all at once, each track (or stereo pair) can move independently. This gives the listener a little time to absorb each new element and location, before introducing the next. For the same reasons that many composers like to use transitional sounds in between disparate material, transitional positions are also useful.

Subliminally slow transitions can also be interesting. A stereo image, for example, can be flipped over the course of a section, extending to several minutes or more, although some thought has to go into avoiding a prolonged collapse of the stereo image.

Using Dynamic Patterns

Static transitions bring us to dynamic or patterned diffusion - that is higher level, often algorithmic, generation of assignments. These can include rotations (clockwise, counterclockwise), crisscrossing (front/back, side-side, or meandering), continuous and stepwise motion, and random, irregular and rhythmic patterns, as well as all of the range of variations possible with algorithmic control.

A good example of a compositional application of a regular, stepwise rotation occurs in Act I of Barry Truax's Powers of Two in which a gong begins a slow, clockwise procession. The substitution of the word procession for rotation in this case is Barry's, but I find it very apt, given that this is a stepwise movement, and that I can imagine a gong-carrier moving around the audience.

This example also points to the importance of carefully handling stereo material within an algorithmic diffusion. The gong example actually consists of a stereo pair of tracks; at the beginning of the rotation, the stereo pair is assigned front-center, and front-right respectively; during the procession the, nominally, left channel lags behind the right channel by 45°, thus avoiding the stereo image from collapsing.

Another method of maintaining stereo images, is to keep them at complementary angles, that is, on opposite sides of the circle, although, as I mentioned above, not all stereo images survive a 180° placement.

Generally, the use of algorithmically generated crossfades complicates the maintenance of stereo images. A little time spent with pen, paper and score, however, can keep stereo channels off each other's toes. Another method, quite simple when using ABControl, is to restrict each channel of a stereo pair to alternate speakers, effectively limiting each channel to an independent set of quad speakers.

A recent example of contrary rotation of a stereo pair is contained in a diffusion that Leonard Paul and myself finished recently of his electronica/dance piece, Shuff (1999). Shuff contains a stereo drum track, which we set in opposite (clockwise, and counterclockwise) step rotations at a rate of 1 bar per step, 8 bars per rotation; at two points (0° and 180°) the rotations cross each other. We took great care in starting and stopping the rotations (essentially, manipulating the phase of the rotations) to ensure that the collapse to a single speaker at 0° and 180° occurred at musically unobtrusive moments.

Shuff also uses two crisscross patterns. The original source contains two effects tracks, each with an independent LFO controlling the filter cutoff. We didn't mickey-mouse the LFO rates, but borrowed a page from electronica, setting FX_1 on a side-to-side, smoothly cycling pattern over 4 beats, and FX_2 on a front-to-back, smoothly cycling pattern over 5 beats. The 5 against 4 pattern repeats, then, every 20 beats, but is a little more subtle than a mutual one-bar cycle.

By the way, the choice to use continuous crisscrossing on the effects, and a stepwise counter rotation on the drums was not arbitrary: percussive patterns can be easily repositioned using a stepwise pattern; continuous material implies a smoother transition.

There is a strongly subjective element of taste involved here, but I personally find excessive use of continuous rotations and unmotivated diffusion patterns tedious. Repetitious diffusion patterns are a bit like LFO's in electronic music: wonderful for the first few moments, but quickly tiresome.

Another very important factor to consider when using rotation, or crisscross patterns, or any amplitude panning, in fact, is that the rate of movement is not arbitrary but bound by what is perceptually reasonable. The auditory system requires up to a 1/4 second to integrate a broadband or percussive sound, and longer for mid- to low- frequencies. This means that when a pattern approaches 180° per second, sounds, especially percussive ones, will tend to divide into separate streams. A rotation, because it is more predictable, is more robust than a random pattern, but, then too, a 180° per second rotation can be a slightly nauseating subjective experience (rapid, random patterns, on the one hand, work quite well on inherently fragmented or stochastic material).

This tendency of sound streams to separate can be useful in a deterministic way, though. Again, in Shuff, we ran a pattern on the lead synth that does a quick snap crossfade from the back of the room to center-front every two bars, but with the duration of only a sixteenth note. The effect is not so much to move the lead from back-to-front, but to separate a short, glissando from the main track. This is the spatial equivalent of virtual polyphony.

I hope the above comments on diffusion patterns haven't led you to imagine that diffusers are furiously moving point-source sound-objects through space at will. It is emphatically not feasible, despite the marketing hype one sometimes hears, to control image and distance, or even location, for all conceivable source material over loudspeakers. The material has to be considered in any speaker placement or crossfading.

Fadeout

In conclusion, then, it seems useful to adapt diffusion to the source material (environmental, point source, line source, abstract, mimetic, etc.) and vice versa. In other words, to compose material with diffusion in mind, to distinguish between monophonic, stereo (or sound field) recordings, and to recognize the role of musical context and structure in informing the realization (is the work a colour study, pulse-driven, stochastic, soundscape, orchestral, electronica, etc.?).

To summarize my overall attitude to diffusion, it might be going too far to say that less is more, but perhaps, know when enough is enough. Just because you can pan a track 360°, does not mean that you have to pan a track 360° (I, for one, don't want my Hank Williams buzzing around my head). On the other hand, automated diffusion is useful across a wide range of electroacoustic music, and opens up many intriguing compositional possibilities. From a listener's perspective, it is certainly a much richer, livelier experience than conventional mono and stereo sound reinforcement.

http://www.thirdmonk.com

eContact!

eC!

Social top