Skip to main content

Audio Playback

Learn how to play audio received from Autessa agents in both real-time voice mode and standard audio responses.

Overview

Autessa delivers audio in two contexts:

  1. Real-time Voice Mode: Receive streaming audio chunks via WebSocket during conversational interactions
  2. Standard Audio Response: Receive audio in HTTP responses from agent executions

Both use the same audio format (16kHz, base64-encoded PCM) and the same utility classes for playback.


Audio Response Format

Audio received from Autessa has the following structure:

{
"streamedOutput": {
"sampleRate": 16000,
"base64Audio": "AAAA...",
"transcription": "Would you like any tips on running routes or safety?",
"isFinalChunk": false,
"outputType": "AUDIO"
}
}
  • Name
    sampleRate
    Type
    number
    Description

    Always 16000 Hz (16 kHz)

  • Name
    base64Audio
    Type
    string
    Description

    Base64-encoded PCM audio data (Int16, mono, little-endian)

  • Name
    transcription
    Type
    string
    Description

    Text transcription of the audio chunk

  • Name
    isFinalChunk
    Type
    boolean
    Description

    Indicates if this is the last chunk in the stream

  • Name
    outputType
    Type
    string
    Description

    "AUDIO" for audio responses, "TEXT" for text-only


Utility Classes

WavUtils - WAV File Creation

Converts PCM data to playable WAV format:

export class WavUtils {
private static writeString(view: DataView, offset: number, str: string): void {
for (let i = 0; i < str.length; i++) {
view.setUint8(offset + i, str.charCodeAt(i))
}
}

/**

* Create WAV file from PCM data
*/
static createWavFromPCM(pcmData: Uint8Array, sampleRate: number): Uint8Array {
const numChannels = 1 // Mono
const bitsPerSample = 16
const byteRate = sampleRate * numChannels * bitsPerSample / 8
const blockAlign = numChannels * bitsPerSample / 8
const dataSize = pcmData.length
const fileSize = 44 + dataSize // WAV header is 44 bytes

const wavBuffer = new ArrayBuffer(fileSize)
const view = new DataView(wavBuffer)

// RIFF chunk descriptor
this.writeString(view, 0, 'RIFF')
view.setUint32(4, fileSize - 8, true)
this.writeString(view, 8, 'WAVE')

// fmt sub-chunk
this.writeString(view, 12, 'fmt ')
view.setUint32(16, 16, true)
view.setUint16(20, 1, true)
view.setUint16(22, numChannels, true)
view.setUint32(24, sampleRate, true)
view.setUint32(28, byteRate, true)
view.setUint16(32, blockAlign, true)
view.setUint16(34, bitsPerSample, true)

// data sub-chunk
this.writeString(view, 36, 'data')
view.setUint32(40, dataSize, true)

// Copy PCM data
const wavData = new Uint8Array(wavBuffer)
wavData.set(pcmData, 44)

return wavData
}

/**

* Create WAV from Float32Array
*/
static createWavFromFloat32(float32Array: Float32Array, sampleRate: number): Uint8Array {
// Convert Float32 to Int16
const int16Array = new Int16Array(float32Array.length)
for (let i = 0; i < float32Array.length; i++) {
const s = Math.max(-1, Math.min(1, float32Array[i]))
int16Array[i] = s < 0 ? s * 0x8000 : s * 0x7FFF
}

const pcmBytes = new Uint8Array(int16Array.buffer)
return this.createWavFromPCM(pcmBytes, sampleRate)
}
}

DataUtils - Data Conversion

Handles base64 decoding and blob URL creation:

export class DataUtils {
/**

* Convert base64 string to Uint8Array
*/
static base64ToUint8Array(base64: string): Uint8Array {
if (!base64 || base64.length === 0) {
throw new Error('Invalid base64 string provided')
}

const binaryString = atob(base64)
const bytes = new Uint8Array(binaryString.length)

for (let i = 0; i < binaryString.length; i++) {
bytes[i] = binaryString.charCodeAt(i)
}

return bytes
}

/**

* Create blob URL for audio playback
*/
static createAudioBlobUrl(audioData: Uint8Array, mimeType: string = 'audio/wav'): string {
const blob = new Blob([audioData], { type: mimeType })
return URL.createObjectURL(blob)
}

/**

* Revoke blob URL to free memory
*/
static revokeAudioBlobUrl(url: string): void {
if (url.startsWith('blob:')) {
URL.revokeObjectURL(url)
}
}

/**

* Convert Float32Array to Int16Array for audio
*/
static convertFloat32ToInt16(float32Array: Float32Array): Int16Array {
const int16Array = new Int16Array(float32Array.length)
for (let i = 0; i < float32Array.length; i++) {
const s = Math.max(-1, Math.min(1, float32Array[i]))
int16Array[i] = s < 0 ? s * 0x8000 : s * 0x7FFF
}
return int16Array
}
}

AudioQueue - Queue Management

Manages sequential playback of audio chunks:

export interface AudioQueueItem {
url: string
transcription: string
timestamp: number
sampleRate: number
}

export interface AudioQueueStatus {
isPlaying: boolean
currentIndex: number
totalItems: number
currentItem: AudioQueueItem | null
}

export class AudioQueue {
private queue: AudioQueueItem[] = []
private currentIndex: number = -1
private audioElement: HTMLAudioElement | null = null
private isPlaying: boolean = false
private isInterrupted: boolean = false

constructor(audioElement?: HTMLAudioElement) {
this.audioElement = audioElement || this.createAudioElement()
this.setupEventListeners()
}

private createAudioElement(): HTMLAudioElement {
return new Audio()
}

private setupEventListeners(): void {
if (!this.audioElement) return

this.audioElement.addEventListener('ended', () => {
this.playNext()
})

this.audioElement.addEventListener('error', (error) => {
console.error('Audio playback error:', error)
this.isPlaying = false
this.playNext()
})
}

/**

* Add audio item to queue
*/
add(item: AudioQueueItem): void {
this.queue.push(item)
}

/**

* Play next item in queue
*/
playNext(): void {
// Don't advance if interrupted
if (this.isInterrupted) {
this.isPlaying = false
return
}

const nextIndex = this.currentIndex + 1

if (nextIndex < this.queue.length && this.audioElement) {
const nextAudio = this.queue[nextIndex]
this.audioElement.src = nextAudio.url
this.currentIndex = nextIndex
this.isPlaying = true

this.audioElement.play().catch(error => {
console.error('Error playing audio chunk:', error)
this.isPlaying = false
this.playNext()
})
} else {
this.isPlaying = false
}
}

/**

* Start playback from beginning
*/
play(): void {
if (this.queue.length > 0) {
this.isInterrupted = false
this.currentIndex = -1
this.playNext()
}
}

/**

* Pause playback
*/
pause(): void {
if (this.audioElement) {
this.audioElement.pause()
}
this.isPlaying = false
}

/**

* Stop playback and reset
*/
stop(): void {
if (this.audioElement) {
this.audioElement.pause()
this.audioElement.currentTime = 0
}
this.isPlaying = false
this.currentIndex = -1
this.isInterrupted = false
}

/**

* Set interrupted state (user started speaking)
*/
setInterrupted(interrupted: boolean): void {
this.isInterrupted = interrupted
if (interrupted && this.isPlaying) {
this.pause()
}
}

/**

* Clear queue and free memory
*/
clear(): void {
this.stop()
this.queue.forEach(item => {
if (item.url.startsWith('blob:')) {
DataUtils.revokeAudioBlobUrl(item.url)
}
})
this.queue = []
}

/**

* Discard queued items (called when interruption ends)
*/
discardQueue(): void {
// Keep currently playing item, discard rest
const currentItem = this.currentIndex >= 0 ? this.queue[this.currentIndex] : null

// Free memory for discarded items
this.queue.forEach((item, index) => {
if (index !== this.currentIndex && item.url.startsWith('blob:')) {
DataUtils.revokeAudioBlobUrl(item.url)
}
})

this.queue = currentItem ? [currentItem] : []
this.currentIndex = currentItem ? 0 : -1
}

/**

* Get current playback status
*/
getStatus(): AudioQueueStatus {
return {
isPlaying: this.isPlaying,
currentIndex: this.currentIndex,
totalItems: this.queue.length,
currentItem: this.currentIndex >= 0 ? this.queue[this.currentIndex] : null
}
}

/**

* Check if interrupted
*/
isCurrentlyInterrupted(): boolean {
return this.isInterrupted
}
}

AudioProcessor - Main Playback Class

Combines all utilities for complete audio playback:

export class AudioProcessor {
private audioQueue: AudioQueue

constructor(audioElement?: HTMLAudioElement) {
this.audioQueue = new AudioQueue(audioElement)
}

/**

* Process incoming audio chunk from Autessa
*/
async processAudioChunk(
base64Audio: string,
sampleRate: number,
transcription?: string
): Promise<AudioQueueItem | null> {
try {
const pcmBytes = DataUtils.base64ToUint8Array(base64Audio)
const wavData = WavUtils.createWavFromPCM(pcmBytes, sampleRate)
const url = DataUtils.createAudioBlobUrl(wavData)

const audioItem: AudioQueueItem = {
url,
transcription: transcription || '',
timestamp: Date.now(),
sampleRate
}

this.audioQueue.add(audioItem)
return audioItem

} catch (error) {
console.error('Error processing audio chunk:', error)
return null
}
}

/**

* Start playback
*/
play(): void {
this.audioQueue.play()
}

/**

* Pause playback
*/
pause(): void {
this.audioQueue.pause()
}

/**

* Stop playback
*/
stop(): void {
this.audioQueue.stop()
}

/**

* Set interrupted state (user is speaking)
*/
setInterrupted(interrupted: boolean): void {
this.audioQueue.setInterrupted(interrupted)

// When interruption ends, discard queued audio
if (!interrupted) {
this.audioQueue.discardQueue()
}
}

/**

* Clear all audio and free memory
*/
clear(): void {
this.audioQueue.clear()
}

/**

* Get playback status
*/
getStatus(): AudioQueueStatus {
return this.audioQueue.getStatus()
}

/**

* Check if interrupted
*/
isInterrupted(): boolean {
return this.audioQueue.isCurrentlyInterrupted()
}
}

Usage Examples

Real-time Voice Mode Playback

For WebSocket voice mode, process incoming audio chunks and handle interruptions:

// Initialize audio processor
const audioProcessor = new AudioProcessor()

// Set up WebSocket
const agentId = 123
const apiKey = 'your_api_key'
const wsUrl = `wss://api.autessa.com/ws/clients/agents/execute?authorization=${apiKey}&resourceId=${agentId}&voiceModeEnabled=true`
const websocket = new WebSocket(wsUrl)

// Track user speech for interruption handling
let userIsSpeaking = false

websocket.onmessage = async (event) => {
try {
const response = JSON.parse(event.data)

if (response.streamedOutput?.outputType === 'AUDIO') {
const { base64Audio, sampleRate, transcription, isFinalChunk } = response.streamedOutput

// Process audio chunk
const audioItem = await audioProcessor.processAudioChunk(
base64Audio,
sampleRate,
transcription
)

if (audioItem) {
console.log('Processed:', transcription)

// Start playing if this is the first chunk and user isn't speaking
const status = audioProcessor.getStatus()
if (status.totalItems === 1 && !status.isPlaying && !userIsSpeaking) {
audioProcessor.play()
}

if (isFinalChunk) {
console.log('Audio stream complete')
}
}
}
} catch (error) {
console.error('Error processing audio:', error)
}
}

// Handle user speech detection (from AudioRecorder)
recorder.onSpeechStart = () => {
userIsSpeaking = true
audioProcessor.setInterrupted(true)
console.log('User started speaking - pausing agent audio')
}

recorder.onSpeechEnd = () => {
userIsSpeaking = false
audioProcessor.setInterrupted(false)
console.log('User stopped speaking - discarding queued agent audio')
}

// Cleanup when done
websocket.onclose = () => {
audioProcessor.clear()
console.log('Audio processor cleaned up')
}

Standard Audio Response Playback

For HTTP responses with audio:

const audioProcessor = new AudioProcessor()

// Execute agent with audio output
const response = await fetch(
`https://api.autessa.com/clients/agents/execute?resourceId=${agentId}`,
{
method: 'POST',
headers: {
'Authorization': apiKey,
'Content-Type': 'application/json'
},
body: JSON.stringify({
agentId: agentId,
input: [{ inputType: 'TEXT', content: 'Hello' }],
executionOutputMode: 'AUDIO'
})
}
)

const result = await response.json()

// Process audio chunks from response
if (result.streamedOutput?.outputType === 'AUDIO') {
await audioProcessor.processAudioChunk(
result.streamedOutput.base64Audio,
result.streamedOutput.sampleRate,
result.streamedOutput.transcription
)

audioProcessor.play()
}

// Cleanup when done
audioProcessor.clear()

Processing Multiple Chunks

When receiving multiple chunks in sequence:

async function handleAudioStream(chunks: any[]) {
const audioProcessor = new AudioProcessor()

for (const chunk of chunks) {
if (chunk.streamedOutput?.outputType === 'AUDIO') {
await audioProcessor.processAudioChunk(
chunk.streamedOutput.base64Audio,
chunk.streamedOutput.sampleRate,
chunk.streamedOutput.transcription
)
}
}

// Start playback after all chunks are queued
audioProcessor.play()

// Wait for playback to complete
return new Promise((resolve) => {
const checkStatus = setInterval(() => {
const status = audioProcessor.getStatus()
if (!status.isPlaying && status.currentIndex >= status.totalItems - 1) {
clearInterval(checkStatus)
audioProcessor.clear()
resolve(undefined)
}
}, 100)
})
}

Interruption Handling

For conversational AI, handle user interruptions properly:

When User Starts Speaking

// Pause agent audio immediately
audioProcessor.setInterrupted(true)

// Update UI to show user is speaking
updateUI({ userSpeaking: true, agentSpeaking: false })

When User Stops Speaking

// Discard queued agent audio (it's outdated)
audioProcessor.setInterrupted(false)

// Update UI
updateUI({ userSpeaking: false })

Why Discard Queued Audio?

When the user interrupts, the queued audio becomes contextually irrelevant. The setInterrupted(false) method automatically calls discardQueue() to:

  • Stop playing current audio
  • Remove all queued items except the current one
  • Free memory by revoking blob URLs

Memory Management

Automatic Cleanup

The AudioProcessor automatically manages memory:

// Blob URLs are created when processing chunks
const audioItem = await audioProcessor.processAudioChunk(base64Audio, sampleRate)
// audioItem.url is a blob URL like "blob:https://example.com/uuid"

// Blob URLs are automatically revoked when clearing
audioProcessor.clear() // Frees all blob URLs

Manual Cleanup

For fine-grained control:

// Clear specific items
DataUtils.revokeAudioBlobUrl(audioItem.url)

// Clear entire queue
audioProcessor.clear()

Best Practices

  1. Call clear() when done: Always clean up after playback completes
  2. Handle interruptions: Use setInterrupted() to manage user speech
  3. Monitor memory: For long sessions, periodically clear old audio
  4. Check playback status: Use getStatus() to know when to clean up

Error Handling

Robust Audio Processing

async function robustAudioProcessing(streamedData: any) {
try {
if (streamedData.streamedOutput?.outputType === 'AUDIO') {
const { base64Audio, sampleRate, transcription, isFinalChunk } = streamedData.streamedOutput

// Validate required fields
if (!base64Audio || !sampleRate) {
throw new Error('Missing required audio data')
}

const audioItem = await audioProcessor.processAudioChunk(
base64Audio,
sampleRate,
transcription
)

if (!audioItem) {
throw new Error('Failed to process audio chunk')
}

// Auto-play first chunk
const status = audioProcessor.getStatus()
if (status.totalItems === 1 && !status.isPlaying) {
audioProcessor.play()
}

if (isFinalChunk) {
console.log('Audio stream complete')
}

return audioItem
}
} catch (error) {
console.error('Error processing audio:', error)

// Cleanup on error
audioProcessor.clear()

return null
}
}

Handle Playback Errors

// The AudioQueue automatically handles playback errors
// by logging and attempting to play the next chunk

// You can also monitor the audio element directly
const audioElement = new Audio()
const audioProcessor = new AudioProcessor(audioElement)

audioElement.addEventListener('error', (e) => {
console.error('Playback error:', e)
// Implement retry logic or user notification
})

Advanced Usage

Custom Audio Element

Use a custom audio element for more control:

const audioElement = document.getElementById('my-audio') as HTMLAudioElement
const audioProcessor = new AudioProcessor(audioElement)

// Now you have full control over the audio element
audioElement.volume = 0.8
audioElement.playbackRate = 1.2

Monitor Playback Progress

const status = audioProcessor.getStatus()

console.log(`Playing: ${status.isPlaying}`)
console.log(`Current: ${status.currentIndex + 1} of ${status.totalItems}`)
console.log(`Transcription: ${status.currentItem?.transcription}`)

Pause and Resume

// Pause playback
audioProcessor.pause()

// Resume playback
audioProcessor.play()

// Stop and reset
audioProcessor.stop()

Performance Optimization

Chunked Processing

For real-time streams, process chunks as they arrive:

websocket.onmessage = async (event) => {
const response = JSON.parse(event.data)

// Process immediately without waiting
if (response.streamedOutput?.outputType === 'AUDIO') {
audioProcessor.processAudioChunk(
response.streamedOutput.base64Audio,
response.streamedOutput.sampleRate,
response.streamedOutput.transcription
).then(audioItem => {
// Auto-start on first chunk
const status = audioProcessor.getStatus()
if (status.totalItems === 1 && !status.isPlaying) {
audioProcessor.play()
}
})
}
}

Limit Queue Size

For long-running sessions, limit queue size:

class AudioProcessor {
private maxQueueSize: number = 50

async processAudioChunk(...args) {
const item = await super.processAudioChunk(...args)

// Trim old items if queue too large
const status = this.getStatus()
if (status.totalItems > this.maxQueueSize) {
// Implementation would remove old items
// and revoke their blob URLs
}

return item
}
}

Next Steps

  • Learn how to record audio in the Audio Recording guide
  • See complete voice mode examples in the Agent API documentation
  • Explore the full Agent API in the Agent API reference