Audio Playback
Learn how to play audio received from Autessa agents in both real-time voice mode and standard audio responses.
Overview
Autessa delivers audio in two contexts:
- Real-time Voice Mode: Receive streaming audio chunks via WebSocket during conversational interactions
- Standard Audio Response: Receive audio in HTTP responses from agent executions
Both use the same audio format (16kHz, base64-encoded PCM) and the same utility classes for playback.
Audio Response Format
Audio received from Autessa has the following structure:
{
"streamedOutput": {
"sampleRate": 16000,
"base64Audio": "AAAA...",
"transcription": "Would you like any tips on running routes or safety?",
"isFinalChunk": false,
"outputType": "AUDIO"
}
}
- Name
sampleRate- Type
- number
- Description
Always 16000 Hz (16 kHz)
- Name
base64Audio- Type
- string
- Description
Base64-encoded PCM audio data (Int16, mono, little-endian)
- Name
transcription- Type
- string
- Description
Text transcription of the audio chunk
- Name
isFinalChunk- Type
- boolean
- Description
Indicates if this is the last chunk in the stream
- Name
outputType- Type
- string
- Description
"AUDIO" for audio responses, "TEXT" for text-only
Utility Classes
WavUtils - WAV File Creation
Converts PCM data to playable WAV format:
export class WavUtils {
private static writeString(view: DataView, offset: number, str: string): void {
for (let i = 0; i < str.length; i++) {
view.setUint8(offset + i, str.charCodeAt(i))
}
}
/**
* Create WAV file from PCM data
*/
static createWavFromPCM(pcmData: Uint8Array, sampleRate: number): Uint8Array {
const numChannels = 1 // Mono
const bitsPerSample = 16
const byteRate = sampleRate * numChannels * bitsPerSample / 8
const blockAlign = numChannels * bitsPerSample / 8
const dataSize = pcmData.length
const fileSize = 44 + dataSize // WAV header is 44 bytes
const wavBuffer = new ArrayBuffer(fileSize)
const view = new DataView(wavBuffer)
// RIFF chunk descriptor
this.writeString(view, 0, 'RIFF')
view.setUint32(4, fileSize - 8, true)
this.writeString(view, 8, 'WAVE')
// fmt sub-chunk
this.writeString(view, 12, 'fmt ')
view.setUint32(16, 16, true)
view.setUint16(20, 1, true)
view.setUint16(22, numChannels, true)
view.setUint32(24, sampleRate, true)
view.setUint32(28, byteRate, true)
view.setUint16(32, blockAlign, true)
view.setUint16(34, bitsPerSample, true)
// data sub-chunk
this.writeString(view, 36, 'data')
view.setUint32(40, dataSize, true)
// Copy PCM data
const wavData = new Uint8Array(wavBuffer)
wavData.set(pcmData, 44)
return wavData
}
/**
* Create WAV from Float32Array
*/
static createWavFromFloat32(float32Array: Float32Array, sampleRate: number): Uint8Array {
// Convert Float32 to Int16
const int16Array = new Int16Array(float32Array.length)
for (let i = 0; i < float32Array.length; i++) {
const s = Math.max(-1, Math.min(1, float32Array[i]))
int16Array[i] = s < 0 ? s * 0x8000 : s * 0x7FFF
}
const pcmBytes = new Uint8Array(int16Array.buffer)
return this.createWavFromPCM(pcmBytes, sampleRate)
}
}
DataUtils - Data Conversion
Handles base64 decoding and blob URL creation:
export class DataUtils {
/**
* Convert base64 string to Uint8Array
*/
static base64ToUint8Array(base64: string): Uint8Array {
if (!base64 || base64.length === 0) {
throw new Error('Invalid base64 string provided')
}
const binaryString = atob(base64)
const bytes = new Uint8Array(binaryString.length)
for (let i = 0; i < binaryString.length; i++) {
bytes[i] = binaryString.charCodeAt(i)
}
return bytes
}
/**
* Create blob URL for audio playback
*/
static createAudioBlobUrl(audioData: Uint8Array, mimeType: string = 'audio/wav'): string {
const blob = new Blob([audioData], { type: mimeType })
return URL.createObjectURL(blob)
}
/**
* Revoke blob URL to free memory
*/
static revokeAudioBlobUrl(url: string): void {
if (url.startsWith('blob:')) {
URL.revokeObjectURL(url)
}
}
/**
* Convert Float32Array to Int16Array for audio
*/
static convertFloat32ToInt16(float32Array: Float32Array): Int16Array {
const int16Array = new Int16Array(float32Array.length)
for (let i = 0; i < float32Array.length; i++) {
const s = Math.max(-1, Math.min(1, float32Array[i]))
int16Array[i] = s < 0 ? s * 0x8000 : s * 0x7FFF
}
return int16Array
}
}
AudioQueue - Queue Management
Manages sequential playback of audio chunks:
export interface AudioQueueItem {
url: string
transcription: string
timestamp: number
sampleRate: number
}
export interface AudioQueueStatus {
isPlaying: boolean
currentIndex: number
totalItems: number
currentItem: AudioQueueItem | null
}
export class AudioQueue {
private queue: AudioQueueItem[] = []
private currentIndex: number = -1
private audioElement: HTMLAudioElement | null = null
private isPlaying: boolean = false
private isInterrupted: boolean = false
constructor(audioElement?: HTMLAudioElement) {
this.audioElement = audioElement || this.createAudioElement()
this.setupEventListeners()
}
private createAudioElement(): HTMLAudioElement {
return new Audio()
}
private setupEventListeners(): void {
if (!this.audioElement) return
this.audioElement.addEventListener('ended', () => {
this.playNext()
})
this.audioElement.addEventListener('error', (error) => {
console.error('Audio playback error:', error)
this.isPlaying = false
this.playNext()
})
}
/**
* Add audio item to queue
*/
add(item: AudioQueueItem): void {
this.queue.push(item)
}
/**
* Play next item in queue
*/
playNext(): void {
// Don't advance if interrupted
if (this.isInterrupted) {
this.isPlaying = false
return
}
const nextIndex = this.currentIndex + 1
if (nextIndex < this.queue.length && this.audioElement) {
const nextAudio = this.queue[nextIndex]
this.audioElement.src = nextAudio.url
this.currentIndex = nextIndex
this.isPlaying = true
this.audioElement.play().catch(error => {
console.error('Error playing audio chunk:', error)
this.isPlaying = false
this.playNext()
})
} else {
this.isPlaying = false
}
}
/**
* Start playback from beginning
*/
play(): void {
if (this.queue.length > 0) {
this.isInterrupted = false
this.currentIndex = -1
this.playNext()
}
}
/**
* Pause playback
*/
pause(): void {
if (this.audioElement) {
this.audioElement.pause()
}
this.isPlaying = false
}
/**
* Stop playback and reset
*/
stop(): void {
if (this.audioElement) {
this.audioElement.pause()
this.audioElement.currentTime = 0
}
this.isPlaying = false
this.currentIndex = -1
this.isInterrupted = false
}
/**
* Set interrupted state (user started speaking)
*/
setInterrupted(interrupted: boolean): void {
this.isInterrupted = interrupted
if (interrupted && this.isPlaying) {
this.pause()
}
}
/**
* Clear queue and free memory
*/
clear(): void {
this.stop()
this.queue.forEach(item => {
if (item.url.startsWith('blob:')) {
DataUtils.revokeAudioBlobUrl(item.url)
}
})
this.queue = []
}
/**
* Discard queued items (called when interruption ends)
*/
discardQueue(): void {
// Keep currently playing item, discard rest
const currentItem = this.currentIndex >= 0 ? this.queue[this.currentIndex] : null
// Free memory for discarded items
this.queue.forEach((item, index) => {
if (index !== this.currentIndex && item.url.startsWith('blob:')) {
DataUtils.revokeAudioBlobUrl(item.url)
}
})
this.queue = currentItem ? [currentItem] : []
this.currentIndex = currentItem ? 0 : -1
}
/**
* Get current playback status
*/
getStatus(): AudioQueueStatus {
return {
isPlaying: this.isPlaying,
currentIndex: this.currentIndex,
totalItems: this.queue.length,
currentItem: this.currentIndex >= 0 ? this.queue[this.currentIndex] : null
}
}
/**
* Check if interrupted
*/
isCurrentlyInterrupted(): boolean {
return this.isInterrupted
}
}
AudioProcessor - Main Playback Class
Combines all utilities for complete audio playback:
export class AudioProcessor {
private audioQueue: AudioQueue
constructor(audioElement?: HTMLAudioElement) {
this.audioQueue = new AudioQueue(audioElement)
}
/**
* Process incoming audio chunk from Autessa
*/
async processAudioChunk(
base64Audio: string,
sampleRate: number,
transcription?: string
): Promise<AudioQueueItem | null> {
try {
const pcmBytes = DataUtils.base64ToUint8Array(base64Audio)
const wavData = WavUtils.createWavFromPCM(pcmBytes, sampleRate)
const url = DataUtils.createAudioBlobUrl(wavData)
const audioItem: AudioQueueItem = {
url,
transcription: transcription || '',
timestamp: Date.now(),
sampleRate
}
this.audioQueue.add(audioItem)
return audioItem
} catch (error) {
console.error('Error processing audio chunk:', error)
return null
}
}
/**
* Start playback
*/
play(): void {
this.audioQueue.play()
}
/**
* Pause playback
*/
pause(): void {
this.audioQueue.pause()
}
/**
* Stop playback
*/
stop(): void {
this.audioQueue.stop()
}
/**
* Set interrupted state (user is speaking)
*/
setInterrupted(interrupted: boolean): void {
this.audioQueue.setInterrupted(interrupted)
// When interruption ends, discard queued audio
if (!interrupted) {
this.audioQueue.discardQueue()
}
}
/**
* Clear all audio and free memory
*/
clear(): void {
this.audioQueue.clear()
}
/**
* Get playback status
*/
getStatus(): AudioQueueStatus {
return this.audioQueue.getStatus()
}
/**
* Check if interrupted
*/
isInterrupted(): boolean {
return this.audioQueue.isCurrentlyInterrupted()
}
}
Usage Examples
Real-time Voice Mode Playback
For WebSocket voice mode, process incoming audio chunks and handle interruptions:
// Initialize audio processor
const audioProcessor = new AudioProcessor()
// Set up WebSocket
const agentId = 123
const apiKey = 'your_api_key'
const wsUrl = `wss://api.autessa.com/ws/clients/agents/execute?authorization=${apiKey}&resourceId=${agentId}&voiceModeEnabled=true`
const websocket = new WebSocket(wsUrl)
// Track user speech for interruption handling
let userIsSpeaking = false
websocket.onmessage = async (event) => {
try {
const response = JSON.parse(event.data)
if (response.streamedOutput?.outputType === 'AUDIO') {
const { base64Audio, sampleRate, transcription, isFinalChunk } = response.streamedOutput
// Process audio chunk
const audioItem = await audioProcessor.processAudioChunk(
base64Audio,
sampleRate,
transcription
)
if (audioItem) {
console.log('Processed:', transcription)
// Start playing if this is the first chunk and user isn't speaking
const status = audioProcessor.getStatus()
if (status.totalItems === 1 && !status.isPlaying && !userIsSpeaking) {
audioProcessor.play()
}
if (isFinalChunk) {
console.log('Audio stream complete')
}
}
}
} catch (error) {
console.error('Error processing audio:', error)
}
}
// Handle user speech detection (from AudioRecorder)
recorder.onSpeechStart = () => {
userIsSpeaking = true
audioProcessor.setInterrupted(true)
console.log('User started speaking - pausing agent audio')
}
recorder.onSpeechEnd = () => {
userIsSpeaking = false
audioProcessor.setInterrupted(false)
console.log('User stopped speaking - discarding queued agent audio')
}
// Cleanup when done
websocket.onclose = () => {
audioProcessor.clear()
console.log('Audio processor cleaned up')
}
Standard Audio Response Playback
For HTTP responses with audio:
const audioProcessor = new AudioProcessor()
// Execute agent with audio output
const response = await fetch(
`https://api.autessa.com/clients/agents/execute?resourceId=${agentId}`,
{
method: 'POST',
headers: {
'Authorization': apiKey,
'Content-Type': 'application/json'
},
body: JSON.stringify({
agentId: agentId,
input: [{ inputType: 'TEXT', content: 'Hello' }],
executionOutputMode: 'AUDIO'
})
}
)
const result = await response.json()
// Process audio chunks from response
if (result.streamedOutput?.outputType === 'AUDIO') {
await audioProcessor.processAudioChunk(
result.streamedOutput.base64Audio,
result.streamedOutput.sampleRate,
result.streamedOutput.transcription
)
audioProcessor.play()
}
// Cleanup when done
audioProcessor.clear()
Processing Multiple Chunks
When receiving multiple chunks in sequence:
async function handleAudioStream(chunks: any[]) {
const audioProcessor = new AudioProcessor()
for (const chunk of chunks) {
if (chunk.streamedOutput?.outputType === 'AUDIO') {
await audioProcessor.processAudioChunk(
chunk.streamedOutput.base64Audio,
chunk.streamedOutput.sampleRate,
chunk.streamedOutput.transcription
)
}
}
// Start playback after all chunks are queued
audioProcessor.play()
// Wait for playback to complete
return new Promise((resolve) => {
const checkStatus = setInterval(() => {
const status = audioProcessor.getStatus()
if (!status.isPlaying && status.currentIndex >= status.totalItems - 1) {
clearInterval(checkStatus)
audioProcessor.clear()
resolve(undefined)
}
}, 100)
})
}
Interruption Handling
For conversational AI, handle user interruptions properly:
When User Starts Speaking
// Pause agent audio immediately
audioProcessor.setInterrupted(true)
// Update UI to show user is speaking
updateUI({ userSpeaking: true, agentSpeaking: false })
When User Stops Speaking
// Discard queued agent audio (it's outdated)
audioProcessor.setInterrupted(false)
// Update UI
updateUI({ userSpeaking: false })
Why Discard Queued Audio?
When the user interrupts, the queued audio becomes contextually irrelevant. The setInterrupted(false) method automatically calls discardQueue() to:
- Stop playing current audio
- Remove all queued items except the current one
- Free memory by revoking blob URLs
Memory Management
Automatic Cleanup
The AudioProcessor automatically manages memory:
// Blob URLs are created when processing chunks
const audioItem = await audioProcessor.processAudioChunk(base64Audio, sampleRate)
// audioItem.url is a blob URL like "blob:https://example.com/uuid"
// Blob URLs are automatically revoked when clearing
audioProcessor.clear() // Frees all blob URLs
Manual Cleanup
For fine-grained control:
// Clear specific items
DataUtils.revokeAudioBlobUrl(audioItem.url)
// Clear entire queue
audioProcessor.clear()
Best Practices
- Call
clear()when done: Always clean up after playback completes - Handle interruptions: Use
setInterrupted()to manage user speech - Monitor memory: For long sessions, periodically clear old audio
- Check playback status: Use
getStatus()to know when to clean up
Error Handling
Robust Audio Processing
async function robustAudioProcessing(streamedData: any) {
try {
if (streamedData.streamedOutput?.outputType === 'AUDIO') {
const { base64Audio, sampleRate, transcription, isFinalChunk } = streamedData.streamedOutput
// Validate required fields
if (!base64Audio || !sampleRate) {
throw new Error('Missing required audio data')
}
const audioItem = await audioProcessor.processAudioChunk(
base64Audio,
sampleRate,
transcription
)
if (!audioItem) {
throw new Error('Failed to process audio chunk')
}
// Auto-play first chunk
const status = audioProcessor.getStatus()
if (status.totalItems === 1 && !status.isPlaying) {
audioProcessor.play()
}
if (isFinalChunk) {
console.log('Audio stream complete')
}
return audioItem
}
} catch (error) {
console.error('Error processing audio:', error)
// Cleanup on error
audioProcessor.clear()
return null
}
}
Handle Playback Errors
// The AudioQueue automatically handles playback errors
// by logging and attempting to play the next chunk
// You can also monitor the audio element directly
const audioElement = new Audio()
const audioProcessor = new AudioProcessor(audioElement)
audioElement.addEventListener('error', (e) => {
console.error('Playback error:', e)
// Implement retry logic or user notification
})
Advanced Usage
Custom Audio Element
Use a custom audio element for more control:
const audioElement = document.getElementById('my-audio') as HTMLAudioElement
const audioProcessor = new AudioProcessor(audioElement)
// Now you have full control over the audio element
audioElement.volume = 0.8
audioElement.playbackRate = 1.2
Monitor Playback Progress
const status = audioProcessor.getStatus()
console.log(`Playing: ${status.isPlaying}`)
console.log(`Current: ${status.currentIndex + 1} of ${status.totalItems}`)
console.log(`Transcription: ${status.currentItem?.transcription}`)
Pause and Resume
// Pause playback
audioProcessor.pause()
// Resume playback
audioProcessor.play()
// Stop and reset
audioProcessor.stop()
Performance Optimization
Chunked Processing
For real-time streams, process chunks as they arrive:
websocket.onmessage = async (event) => {
const response = JSON.parse(event.data)
// Process immediately without waiting
if (response.streamedOutput?.outputType === 'AUDIO') {
audioProcessor.processAudioChunk(
response.streamedOutput.base64Audio,
response.streamedOutput.sampleRate,
response.streamedOutput.transcription
).then(audioItem => {
// Auto-start on first chunk
const status = audioProcessor.getStatus()
if (status.totalItems === 1 && !status.isPlaying) {
audioProcessor.play()
}
})
}
}
Limit Queue Size
For long-running sessions, limit queue size:
class AudioProcessor {
private maxQueueSize: number = 50
async processAudioChunk(...args) {
const item = await super.processAudioChunk(...args)
// Trim old items if queue too large
const status = this.getStatus()
if (status.totalItems > this.maxQueueSize) {
// Implementation would remove old items
// and revoke their blob URLs
}
return item
}
}
Next Steps
- Learn how to record audio in the Audio Recording guide
- See complete voice mode examples in the Agent API documentation
- Explore the full Agent API in the Agent API reference