AUI Design Specs
Technical Specifications
I have in mind that each "widget" in an AUI will have a) a
3D location with respect to either the speaker/headphones
or the user's anatomy, and b) a sound source whether it is a
pre-recorded file or a software-generated stream of
audio. In addition there will be c) a volume control
associated with each one, as well as, to be perfectly
general, d) a general purpose, extensible method of
providing additional controls to the sound source. (One
may imagine that one might want to tell the widget to add
in some audio filter, change the voice, speed up or slow
down, switch to a different audio file as the data
source, or even do any of the entire range of
text-to-speech system controls, etc. Alternatively if
an audio source has its own API, as in the case of a TTS system, then
it may be best for the application program to control the
source directly through that API, so that the AUI API is
minimized.)
System Design Features
An AUI infrastructure should be implemented as an
asynchronous programming system
which controls the state of the current AUI scene. The
scene can be thought of as a set of audio widgets each
with their respective location, audio media source,
playback state, and parameter set. So the API
functionality can be thought of as putting a speaker at
some location in auditory space and wiring it to some
sound source, setting its volume level and other parameters, and, when
desired, telling it to start playback as well as
pause/seek/flush/stop (=pause,seek-to-end).
An AUI scene must be limited in the number of these widgets
it can handle, with limitations being both real (up to
the limit of the number of 3D sound sources that the
sound hardware can generate and mix together), and
virtual (where additional sources at different locations
can be mixed in if they take turns being silent).
The audio output system associated with a single
speaker-set must be (at least) double threaded. The
low-level output thread, lets call it DAPump() should be
highly real-time and should basically be a copying
function; it should have an output buffer size and loop
time-scale equivalent to only 30 to 100ms of audio. On
the other hand the higher-level output buffer manager,
let's call it the DAMixer(), can schedule long sequences
into the current audio output flow. With an
infrastructure like this, when a generic AUI API program
calls a function to D/A a buffer of samples, it
effectively calls a DAMixer() method mixer giving it a
pointer to a buffer in memory and instructions as to how
to mix it in (volume, delay or priority, 3D location,
sample-rate). The mixer then mixes that signal into its
output buffer, shared with the DAPump(), and the DAPump
thread, because it is real-time, starts sending out the
audio with the new stuff mixed in, within 30-100ms.
API
The API should include widget-specific functions:
- to create an aui-widget,
- to associate it with a 3D location,
- to associate it with a sound source whether that is
- a file or
- a stream socket to which audio will be sent
- to change its 3D location whether
in absolute position terms or as relative rotation/translation
instantaneously or gradually along some trajectory
- to control playback:
- start
- stop
- pause
- flush
- seek
The API should also include widget-group functions to create a group
of widgets which can be manipulated as a set, (corresponding in some
ways to the concept of container windows in GUIs) along with
corresponding joint manipulation functions:
- joint volume control,
- joint rotation or translation in the space
- possibly, manipulation of those controls that are shared
among the member widgets.
A natural grouping is the set of widgets which take turns being silent
while the others in the group have their audio signal presented.
Turntaking rules for a group should be established, including
time-slicing and ordering. For ordering, the order in which a widget
is added to a group is its playback order in the sequence of widgets
in the group. For time-slicing there are various approaches which might
be supported:
- "microsoft multitasking", i.e., play-until-done-then-go-to-next,
- fixed-cycle time-slicing where N widgets each take C/N seconds
to complete a regular cycle of C seconds duration,
- demand-fixed time-slicing, where each widget in the group specifies
its desired playback duration (equivalent to MSMT but the
sound source may not entirely complete)
- demand-weighted fixed-cycle time-slicing where N widgets each
state their weight Wi, and a fixed cycle of C seconds is
divided among the widgets in proportion to their stated weight
The infrastructure underneath the API needs
- a) to suck down the audio media stream from the various
sources associated with each widget, subject to the widgets'
playback state,
- b) to synthesize a 2 or more dimensional (for stereo or
multi-speaker presentation) audio signal which "displays"
the stream at the target location in space, and
- c) to mix these signals using the widget's volume control feature.
Note that the auditory-space locations of the widgets can be changed
in real-time (to simulate flying, for example, over the auditory
landscape); those should be instantly updated into the synthesis
process.
This infrastructure could be implemented in Java, or C, inside PCs,
workstations, or enhanced stereo systems or even portable walkman-type
stereos, eventually.
Reference dimensions:
Let (0,0,0) be located in the middle of the head (at the third eye,
behind and between the user's eyebrows, or more or less equivalently,
in the center of stereo audio space halfway between the ears. Let the
dimensions be ordered by perceptual importance, so that the first
dimension is the stereo dimension, i.e., left-to-right (going from
negative to positive as real-number graphs typically do); the second
dimension is back-to-front (going from negative to positive,
considering behind as negative and ahead as positive), and the third
dimension is below-to-above, (also going from negative to positive;
here the orientation is the same as for altitude).
Requirements of the Software Infrastructure/API
An AUI ought to cleanly handle
- 1) a single active text stream source
- 2) a number of other tool resources available to be activated:
playback and volume controls on each of the entities
various appliances: calculators, CD players, phone dialer,
MIDI player, etc.
- 3) any signals that need to be monitored, alarms
- 4) context switching between one tool or text stream and another
- 5) flying through the space at speed, hearing what's out there.
- 6) a feedback stream would be important for the blind and also
for sighted users of speech recognition.
- -Do TTS on speech recognition results to monitor accuracy.
- -Play letter names on keyboard input so you know what keys you typed.
API definititions (draft, not implemented)
auidget
object AUIWidget {
public method create();
public method destroy();
public method setLocation(LOCATION l);
public method setLocation(LOCATION l,TRAJECTORYFUNCTION *f);
public method LOCATION getLocation(); /* return NULL on error */
public method setVolume(VOLUME v); /* return -1 on error */
public method VOLUME getVolume(); /* return NULL on error */
public method setState(AUIWIDGETSTATE s); /* return -1 on error */
public method AUIWIDGETSTATE getState(); /* return NULL on error */
public method setInput(FILE *f,SAMPLEINFO i); /* return -1 on error */
public method setInput(SOCKET *socket,SAMPLEINFO i); /* return -1 on error */
public method setInput(AUDIOSTRUCT *audioStruct,SAMPLEINFO i); /* return -1 on error */
private LOCATION location;
private VOLUME volume;
private AUIWIDGETSTATE state;
private SHORT buffer[BUFSIZE];
private int index = -1;
in create() {
location = NULL;
volume = NULL;
state = NULL;
start a thread which waits for state to be unlocked for playback
and then spits data from buffer out to the kernel audio buffers
up to some limited duration (10ms?), unlock them
don't let state be unlocked without having volume and location set
at least to reasonable defaults.
}
}
kernel: wait on kernel audio buffers to be unlocked, then spit them out
to the audio drivers, until some limit is attained, then relinquish
the kernel audio buffer lock, loop.
GUI vs AUI
All modern GUI software development, including X-Windows,
MS-Windows, Apple's UI, tcl-tk, Java AWT and JFC's,
Berlin, and GGI, has as its motivation providing a layer
of stuff that GUI programmers really need, namely a way
to get their software entities (data input dialogs, data
representations, action choice selection methods)
displayed on the screen, which is to say, an implemented
API that includes ways of creating and controlling the
various GUI widgets that people like to have, scrollbars,
buttons, menus, etc.
Similarly, AUI programmers need a way to get their own
software entities (data input dialogs, data
representations, action choice selection methods)
represented audibly through stereo or 3D audio playback
devices, which is to say, an API that includes ways of
creating and controlling the various AUI features that
people might wish to control.
Thus GUI work and AUI work are related at some highly
abstract level but practically speaking, for most
programming purposes, they are distinct. For example
the sound card drivers that need to be written to form
the basis for an AUI system are separate from the
video card drivers that need to be written to form the
basis of a GUI system.
Needs, Limitations
Developing applications that make use of an AUI
infrastructure means application developers themselves must
think in terms of the audio sources which could correspond to their
various interface and logical entities in the application domain.
Everyone thinks they are a graphic designer and can
usefully critique anyone's GUI layout; now we all have to
become audio environment designers and cultivate our
taste regarding AUI features! For professional qualifications
one might try to become
musical composer, but the concerns of music are quite different from
those of computer software applications in general, so it's up to
us lay people and inventors!
And it's up to users themselves who need or want it to
develop it for themselves!
Copyright © 1998-2003
Tom Veatch
All rights reserved.
Modified: April 8, 2003