AUI Design Specs

Technical Specifications

I have in mind that each "widget" in an AUI will have a) a 3D location with respect to either the speaker/headphones or the user's anatomy, and b) a sound source whether it is a pre-recorded file or a software-generated stream of audio. In addition there will be c) a volume control associated with each one, as well as, to be perfectly general, d) a general purpose, extensible method of providing additional controls to the sound source. (One may imagine that one might want to tell the widget to add in some audio filter, change the voice, speed up or slow down, switch to a different audio file as the data source, or even do any of the entire range of text-to-speech system controls, etc. Alternatively if an audio source has its own API, as in the case of a TTS system, then it may be best for the application program to control the source directly through that API, so that the AUI API is minimized.)

System Design Features

An AUI infrastructure should be implemented as an asynchronous programming system which controls the state of the current AUI scene. The scene can be thought of as a set of audio widgets each with their respective location, audio media source, playback state, and parameter set. So the API functionality can be thought of as putting a speaker at some location in auditory space and wiring it to some sound source, setting its volume level and other parameters, and, when desired, telling it to start playback as well as pause/seek/flush/stop (=pause,seek-to-end).

An AUI scene must be limited in the number of these widgets it can handle, with limitations being both real (up to the limit of the number of 3D sound sources that the sound hardware can generate and mix together), and virtual (where additional sources at different locations can be mixed in if they take turns being silent).

The audio output system associated with a single speaker-set must be (at least) double threaded. The low-level output thread, lets call it DAPump() should be highly real-time and should basically be a copying function; it should have an output buffer size and loop time-scale equivalent to only 30 to 100ms of audio. On the other hand the higher-level output buffer manager, let's call it the DAMixer(), can schedule long sequences into the current audio output flow. With an infrastructure like this, when a generic AUI API program calls a function to D/A a buffer of samples, it effectively calls a DAMixer() method mixer giving it a pointer to a buffer in memory and instructions as to how to mix it in (volume, delay or priority, 3D location, sample-rate). The mixer then mixes that signal into its output buffer, shared with the DAPump(), and the DAPump thread, because it is real-time, starts sending out the audio with the new stuff mixed in, within 30-100ms.

API

The API should include widget-specific functions: The API should also include widget-group functions to create a group of widgets which can be manipulated as a set, (corresponding in some ways to the concept of container windows in GUIs) along with corresponding joint manipulation functions:

A natural grouping is the set of widgets which take turns being silent while the others in the group have their audio signal presented. Turntaking rules for a group should be established, including time-slicing and ordering. For ordering, the order in which a widget is added to a group is its playback order in the sequence of widgets in the group. For time-slicing there are various approaches which might be supported:

The infrastructure underneath the API needs Note that the auditory-space locations of the widgets can be changed in real-time (to simulate flying, for example, over the auditory landscape); those should be instantly updated into the synthesis process.

This infrastructure could be implemented in Java, or C, inside PCs, workstations, or enhanced stereo systems or even portable walkman-type stereos, eventually.

Reference dimensions:

Let (0,0,0) be located in the middle of the head (at the third eye, behind and between the user's eyebrows, or more or less equivalently, in the center of stereo audio space halfway between the ears. Let the dimensions be ordered by perceptual importance, so that the first dimension is the stereo dimension, i.e., left-to-right (going from negative to positive as real-number graphs typically do); the second dimension is back-to-front (going from negative to positive, considering behind as negative and ahead as positive), and the third dimension is below-to-above, (also going from negative to positive; here the orientation is the same as for altitude).

Requirements of the Software Infrastructure/API

An AUI ought to cleanly handle

API definititions (draft, not implemented)

auidget 
object AUIWidget {
  public method create();
  public method destroy();
  public method setLocation(LOCATION l);
  public method setLocation(LOCATION l,TRAJECTORYFUNCTION *f);
  public method LOCATION getLocation();  /* return NULL on error */

  public method setVolume(VOLUME v); /* return -1 on error */
  public method VOLUME getVolume();  /* return NULL on error */

  public method setState(AUIWIDGETSTATE s); /* return -1 on error */
  public method AUIWIDGETSTATE getState();  /* return NULL on error */

  public method setInput(FILE *f,SAMPLEINFO i);  /* return -1 on error */
  public method setInput(SOCKET *socket,SAMPLEINFO i); /* return -1 on error */
  public method setInput(AUDIOSTRUCT *audioStruct,SAMPLEINFO i); /* return -1 on error */

  private LOCATION location;
  private VOLUME volume;
  private AUIWIDGETSTATE state;
  private SHORT buffer[BUFSIZE];
  private int index = -1;

  in create() {
	location = NULL;
	volume = NULL;
	state = NULL;
	start a thread which waits for state to be unlocked for playback
	and then spits data from buffer out to the kernel audio buffers
	up to some limited duration (10ms?), unlock them

	don't let state be unlocked without having volume and location set
	at least to reasonable defaults.
  }
}

kernel: wait on kernel audio buffers to be unlocked, then spit them out
	to the audio drivers, until some limit is attained, then relinquish
 	the kernel audio buffer lock, loop.

GUI vs AUI

All modern GUI software development, including X-Windows, MS-Windows, Apple's UI, tcl-tk, Java AWT and JFC's, Berlin, and GGI, has as its motivation providing a layer of stuff that GUI programmers really need, namely a way to get their software entities (data input dialogs, data representations, action choice selection methods) displayed on the screen, which is to say, an implemented API that includes ways of creating and controlling the various GUI widgets that people like to have, scrollbars, buttons, menus, etc.

Similarly, AUI programmers need a way to get their own software entities (data input dialogs, data representations, action choice selection methods) represented audibly through stereo or 3D audio playback devices, which is to say, an API that includes ways of creating and controlling the various AUI features that people might wish to control.

Thus GUI work and AUI work are related at some highly abstract level but practically speaking, for most programming purposes, they are distinct. For example the sound card drivers that need to be written to form the basis for an AUI system are separate from the video card drivers that need to be written to form the basis of a GUI system.

Needs, Limitations

Developing applications that make use of an AUI infrastructure means application developers themselves must think in terms of the audio sources which could correspond to their various interface and logical entities in the application domain.

Everyone thinks they are a graphic designer and can usefully critique anyone's GUI layout; now we all have to become audio environment designers and cultivate our taste regarding AUI features! For professional qualifications one might try to become musical composer, but the concerns of music are quite different from those of computer software applications in general, so it's up to us lay people and inventors!

And it's up to users themselves who need or want it to develop it for themselves!

 

Copyright © 1998-2003 Tom Veatch All rights reserved.
Modified: April 8, 2003