Audio Playback

Learn how to play audio received from Autessa agents in both real-time voice mode and standard audio responses.

Overview

Autessa delivers audio in two contexts:

Real-time Voice Mode: Receive streaming audio chunks via WebSocket during conversational interactions
Standard Audio Response: Receive audio in HTTP responses from agent executions

Both use the same audio format (16kHz, base64-encoded PCM) and the same utility classes for playback.

Audio Response Format

Audio received from Autessa has the following structure:

{
  "streamedOutput": {
    "sampleRate": 16000,
    "base64Audio": "AAAA...",
    "transcription": "Would you like any tips on running routes or safety?",
    "isFinalChunk": false,
    "outputType": "AUDIO"
  }
}

Name
sampleRate
Type
number
Description
Always 16000 Hz (16 kHz)
Name
base64Audio
Type
string
Description
Base64-encoded PCM audio data (Int16, mono, little-endian)
Name
transcription
Type
string
Description
Text transcription of the audio chunk
Name
isFinalChunk
Type
boolean
Description
Indicates if this is the last chunk in the stream
Name
outputType
Type
string
Description
"AUDIO" for audio responses, "TEXT" for text-only

Utility Classes

WavUtils - WAV File Creation

Converts PCM data to playable WAV format:

export class WavUtils {
  private static writeString(view: DataView, offset: number, str: string): void {
    for (let i = 0; i < str.length; i++) {
      view.setUint8(offset + i, str.charCodeAt(i))
    }
  }

  /**

   * Create WAV file from PCM data
   */
  static createWavFromPCM(pcmData: Uint8Array, sampleRate: number): Uint8Array {
    const numChannels = 1 // Mono
    const bitsPerSample = 16
    const byteRate = sampleRate * numChannels * bitsPerSample / 8
    const blockAlign = numChannels * bitsPerSample / 8
    const dataSize = pcmData.length
    const fileSize = 44 + dataSize // WAV header is 44 bytes

    const wavBuffer = new ArrayBuffer(fileSize)
    const view = new DataView(wavBuffer)

    // RIFF chunk descriptor
    this.writeString(view, 0, 'RIFF')
    view.setUint32(4, fileSize - 8, true)
    this.writeString(view, 8, 'WAVE')

    // fmt sub-chunk
    this.writeString(view, 12, 'fmt ')
    view.setUint32(16, 16, true)
    view.setUint16(20, 1, true)
    view.setUint16(22, numChannels, true)
    view.setUint32(24, sampleRate, true)
    view.setUint32(28, byteRate, true)
    view.setUint16(32, blockAlign, true)
    view.setUint16(34, bitsPerSample, true)

    // data sub-chunk
    this.writeString(view, 36, 'data')
    view.setUint32(40, dataSize, true)

    // Copy PCM data
    const wavData = new Uint8Array(wavBuffer)
    wavData.set(pcmData, 44)

    return wavData
  }

  /**

   * Create WAV from Float32Array
   */
  static createWavFromFloat32(float32Array: Float32Array, sampleRate: number): Uint8Array {
    // Convert Float32 to Int16
    const int16Array = new Int16Array(float32Array.length)
    for (let i = 0; i < float32Array.length; i++) {
      const s = Math.max(-1, Math.min(1, float32Array[i]))
      int16Array[i] = s < 0 ? s * 0x8000 : s * 0x7FFF
    }

    const pcmBytes = new Uint8Array(int16Array.buffer)
    return this.createWavFromPCM(pcmBytes, sampleRate)
  }
}

DataUtils - Data Conversion

Handles base64 decoding and blob URL creation:

export class DataUtils {
  /**

   * Convert base64 string to Uint8Array
   */
  static base64ToUint8Array(base64: string): Uint8Array {
    if (!base64 || base64.length === 0) {
      throw new Error('Invalid base64 string provided')
    }

    const binaryString = atob(base64)
    const bytes = new Uint8Array(binaryString.length)

    for (let i = 0; i < binaryString.length; i++) {
      bytes[i] = binaryString.charCodeAt(i)
    }

    return bytes
  }

  /**

   * Create blob URL for audio playback
   */
  static createAudioBlobUrl(audioData: Uint8Array, mimeType: string = 'audio/wav'): string {
    const blob = new Blob([audioData], { type: mimeType })
    return URL.createObjectURL(blob)
  }

  /**

   * Revoke blob URL to free memory
   */
  static revokeAudioBlobUrl(url: string): void {
    if (url.startsWith('blob:')) {
      URL.revokeObjectURL(url)
    }
  }

  /**

   * Convert Float32Array to Int16Array for audio
   */
  static convertFloat32ToInt16(float32Array: Float32Array): Int16Array {
    const int16Array = new Int16Array(float32Array.length)
    for (let i = 0; i < float32Array.length; i++) {
      const s = Math.max(-1, Math.min(1, float32Array[i]))
      int16Array[i] = s < 0 ? s * 0x8000 : s * 0x7FFF
    }
    return int16Array
  }
}

AudioQueue - Queue Management

Manages sequential playback of audio chunks:

export interface AudioQueueItem {
  url: string
  transcription: string
  timestamp: number
  sampleRate: number
}

export interface AudioQueueStatus {
  isPlaying: boolean
  currentIndex: number
  totalItems: number
  currentItem: AudioQueueItem | null
}

export class AudioQueue {
  private queue: AudioQueueItem[] = []
  private currentIndex: number = -1
  private audioElement: HTMLAudioElement | null = null
  private isPlaying: boolean = false
  private isInterrupted: boolean = false

  constructor(audioElement?: HTMLAudioElement) {
    this.audioElement = audioElement || this.createAudioElement()
    this.setupEventListeners()
  }

  private createAudioElement(): HTMLAudioElement {
    return new Audio()
  }

  private setupEventListeners(): void {
    if (!this.audioElement) return

    this.audioElement.addEventListener('ended', () => {
      this.playNext()
    })

    this.audioElement.addEventListener('error', (error) => {
      console.error('Audio playback error:', error)
      this.isPlaying = false
      this.playNext()
    })
  }

  /**

   * Add audio item to queue
   */
  add(item: AudioQueueItem): void {
    this.queue.push(item)
  }

  /**

   * Play next item in queue
   */
  playNext(): void {
    // Don't advance if interrupted
    if (this.isInterrupted) {
      this.isPlaying = false
      return
    }

    const nextIndex = this.currentIndex + 1

    if (nextIndex < this.queue.length && this.audioElement) {
      const nextAudio = this.queue[nextIndex]
      this.audioElement.src = nextAudio.url
      this.currentIndex = nextIndex
      this.isPlaying = true

      this.audioElement.play().catch(error => {
        console.error('Error playing audio chunk:', error)
        this.isPlaying = false
        this.playNext()
      })
    } else {
      this.isPlaying = false
    }
  }

  /**

   * Start playback from beginning
   */
  play(): void {
    if (this.queue.length > 0) {
      this.isInterrupted = false
      this.currentIndex = -1
      this.playNext()
    }
  }

  /**

   * Pause playback
   */
  pause(): void {
    if (this.audioElement) {
      this.audioElement.pause()
    }
    this.isPlaying = false
  }

  /**

   * Stop playback and reset
   */
  stop(): void {
    if (this.audioElement) {
      this.audioElement.pause()
      this.audioElement.currentTime = 0
    }
    this.isPlaying = false
    this.currentIndex = -1
    this.isInterrupted = false
  }

  /**

   * Set interrupted state (user started speaking)
   */
  setInterrupted(interrupted: boolean): void {
    this.isInterrupted = interrupted
    if (interrupted && this.isPlaying) {
      this.pause()
    }
  }

  /**

   * Clear queue and free memory
   */
  clear(): void {
    this.stop()
    this.queue.forEach(item => {
      if (item.url.startsWith('blob:')) {
        DataUtils.revokeAudioBlobUrl(item.url)
      }
    })
    this.queue = []
  }

  /**

   * Discard queued items (called when interruption ends)
   */
  discardQueue(): void {
    // Keep currently playing item, discard rest
    const currentItem = this.currentIndex >= 0 ? this.queue[this.currentIndex] : null

    // Free memory for discarded items
    this.queue.forEach((item, index) => {
      if (index !== this.currentIndex && item.url.startsWith('blob:')) {
        DataUtils.revokeAudioBlobUrl(item.url)
      }
    })

    this.queue = currentItem ? [currentItem] : []
    this.currentIndex = currentItem ? 0 : -1
  }

  /**

   * Get current playback status
   */
  getStatus(): AudioQueueStatus {
    return {
      isPlaying: this.isPlaying,
      currentIndex: this.currentIndex,
      totalItems: this.queue.length,
      currentItem: this.currentIndex >= 0 ? this.queue[this.currentIndex] : null
    }
  }

  /**

   * Check if interrupted
   */
  isCurrentlyInterrupted(): boolean {
    return this.isInterrupted
  }
}

AudioProcessor - Main Playback Class

Combines all utilities for complete audio playback:

export class AudioProcessor {
  private audioQueue: AudioQueue

  constructor(audioElement?: HTMLAudioElement) {
    this.audioQueue = new AudioQueue(audioElement)
  }

  /**

   * Process incoming audio chunk from Autessa
   */
  async processAudioChunk(
    base64Audio: string,
    sampleRate: number,
    transcription?: string
  ): Promise<AudioQueueItem | null> {
    try {
      const pcmBytes = DataUtils.base64ToUint8Array(base64Audio)
      const wavData = WavUtils.createWavFromPCM(pcmBytes, sampleRate)
      const url = DataUtils.createAudioBlobUrl(wavData)

      const audioItem: AudioQueueItem = {
        url,
        transcription: transcription || '',
        timestamp: Date.now(),
        sampleRate
      }

      this.audioQueue.add(audioItem)
      return audioItem

    } catch (error) {
      console.error('Error processing audio chunk:', error)
      return null
    }
  }

  /**

   * Start playback
   */
  play(): void {
    this.audioQueue.play()
  }

  /**

   * Pause playback
   */
  pause(): void {
    this.audioQueue.pause()
  }

  /**

   * Stop playback
   */
  stop(): void {
    this.audioQueue.stop()
  }

  /**

   * Set interrupted state (user is speaking)
   */
  setInterrupted(interrupted: boolean): void {
    this.audioQueue.setInterrupted(interrupted)

    // When interruption ends, discard queued audio
    if (!interrupted) {
      this.audioQueue.discardQueue()
    }
  }

  /**

   * Clear all audio and free memory
   */
  clear(): void {
    this.audioQueue.clear()
  }

  /**

   * Get playback status
   */
  getStatus(): AudioQueueStatus {
    return this.audioQueue.getStatus()
  }

  /**

   * Check if interrupted
   */
  isInterrupted(): boolean {
    return this.audioQueue.isCurrentlyInterrupted()
  }
}

Usage Examples

Real-time Voice Mode Playback

For WebSocket voice mode, process incoming audio chunks and handle interruptions:

// Initialize audio processor
const audioProcessor = new AudioProcessor()

// Set up WebSocket
const agentId = 123
const apiKey = 'your_api_key'
const wsUrl = `wss://api.autessa.com/ws/clients/agents/execute?authorization=${apiKey}&resourceId=${agentId}&voiceModeEnabled=true`
const websocket = new WebSocket(wsUrl)

// Track user speech for interruption handling
let userIsSpeaking = false

websocket.onmessage = async (event) => {
  try {
    const response = JSON.parse(event.data)

    if (response.streamedOutput?.outputType === 'AUDIO') {
      const { base64Audio, sampleRate, transcription, isFinalChunk } = response.streamedOutput

      // Process audio chunk
      const audioItem = await audioProcessor.processAudioChunk(
        base64Audio,
        sampleRate,
        transcription
      )

      if (audioItem) {
        console.log('Processed:', transcription)

        // Start playing if this is the first chunk and user isn't speaking
        const status = audioProcessor.getStatus()
        if (status.totalItems === 1 && !status.isPlaying && !userIsSpeaking) {
          audioProcessor.play()
        }

        if (isFinalChunk) {
          console.log('Audio stream complete')
        }
      }
    }
  } catch (error) {
    console.error('Error processing audio:', error)
  }
}

// Handle user speech detection (from AudioRecorder)
recorder.onSpeechStart = () => {
  userIsSpeaking = true
  audioProcessor.setInterrupted(true)
  console.log('User started speaking - pausing agent audio')
}

recorder.onSpeechEnd = () => {
  userIsSpeaking = false
  audioProcessor.setInterrupted(false)
  console.log('User stopped speaking - discarding queued agent audio')
}

// Cleanup when done
websocket.onclose = () => {
  audioProcessor.clear()
  console.log('Audio processor cleaned up')
}

Standard Audio Response Playback

For HTTP responses with audio:

const audioProcessor = new AudioProcessor()

// Execute agent with audio output
const response = await fetch(
  `https://api.autessa.com/clients/agents/execute?resourceId=${agentId}`,
  {
    method: 'POST',
    headers: {
      'Authorization': apiKey,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      agentId: agentId,
      input: [{ inputType: 'TEXT', content: 'Hello' }],
      executionOutputMode: 'AUDIO'
    })
  }
)

const result = await response.json()

// Process audio chunks from response
if (result.streamedOutput?.outputType === 'AUDIO') {
  await audioProcessor.processAudioChunk(
    result.streamedOutput.base64Audio,
    result.streamedOutput.sampleRate,
    result.streamedOutput.transcription
  )

  audioProcessor.play()
}

// Cleanup when done
audioProcessor.clear()

Processing Multiple Chunks

When receiving multiple chunks in sequence:

async function handleAudioStream(chunks: any[]) {
  const audioProcessor = new AudioProcessor()

  for (const chunk of chunks) {
    if (chunk.streamedOutput?.outputType === 'AUDIO') {
      await audioProcessor.processAudioChunk(
        chunk.streamedOutput.base64Audio,
        chunk.streamedOutput.sampleRate,
        chunk.streamedOutput.transcription
      )
    }
  }

  // Start playback after all chunks are queued
  audioProcessor.play()

  // Wait for playback to complete
  return new Promise((resolve) => {
    const checkStatus = setInterval(() => {
      const status = audioProcessor.getStatus()
      if (!status.isPlaying && status.currentIndex >= status.totalItems - 1) {
        clearInterval(checkStatus)
        audioProcessor.clear()
        resolve(undefined)
      }
    }, 100)
  })
}

Interruption Handling

For conversational AI, handle user interruptions properly:

When User Starts Speaking

// Pause agent audio immediately
audioProcessor.setInterrupted(true)

// Update UI to show user is speaking
updateUI({ userSpeaking: true, agentSpeaking: false })

When User Stops Speaking

// Discard queued agent audio (it's outdated)
audioProcessor.setInterrupted(false)

// Update UI
updateUI({ userSpeaking: false })

Why Discard Queued Audio?

When the user interrupts, the queued audio becomes contextually irrelevant. The setInterrupted(false) method automatically calls discardQueue() to:

Stop playing current audio
Remove all queued items except the current one
Free memory by revoking blob URLs

Memory Management

Automatic Cleanup

The AudioProcessor automatically manages memory:

// Blob URLs are created when processing chunks
const audioItem = await audioProcessor.processAudioChunk(base64Audio, sampleRate)
// audioItem.url is a blob URL like "blob:https://example.com/uuid"

// Blob URLs are automatically revoked when clearing
audioProcessor.clear() // Frees all blob URLs

Manual Cleanup

For fine-grained control:

// Clear specific items
DataUtils.revokeAudioBlobUrl(audioItem.url)

// Clear entire queue
audioProcessor.clear()

Best Practices

Call clear() when done: Always clean up after playback completes
Handle interruptions: Use setInterrupted() to manage user speech
Monitor memory: For long sessions, periodically clear old audio
Check playback status: Use getStatus() to know when to clean up

Error Handling

Robust Audio Processing

async function robustAudioProcessing(streamedData: any) {
  try {
    if (streamedData.streamedOutput?.outputType === 'AUDIO') {
      const { base64Audio, sampleRate, transcription, isFinalChunk } = streamedData.streamedOutput

      // Validate required fields
      if (!base64Audio || !sampleRate) {
        throw new Error('Missing required audio data')
      }

      const audioItem = await audioProcessor.processAudioChunk(
        base64Audio,
        sampleRate,
        transcription
      )

      if (!audioItem) {
        throw new Error('Failed to process audio chunk')
      }

      // Auto-play first chunk
      const status = audioProcessor.getStatus()
      if (status.totalItems === 1 && !status.isPlaying) {
        audioProcessor.play()
      }

      if (isFinalChunk) {
        console.log('Audio stream complete')
      }

      return audioItem
    }
  } catch (error) {
    console.error('Error processing audio:', error)

    // Cleanup on error
    audioProcessor.clear()

    return null
  }
}

Handle Playback Errors

// The AudioQueue automatically handles playback errors
// by logging and attempting to play the next chunk

// You can also monitor the audio element directly
const audioElement = new Audio()
const audioProcessor = new AudioProcessor(audioElement)

audioElement.addEventListener('error', (e) => {
  console.error('Playback error:', e)
  // Implement retry logic or user notification
})

Advanced Usage

Custom Audio Element

Use a custom audio element for more control:

const audioElement = document.getElementById('my-audio') as HTMLAudioElement
const audioProcessor = new AudioProcessor(audioElement)

// Now you have full control over the audio element
audioElement.volume = 0.8
audioElement.playbackRate = 1.2

Monitor Playback Progress

const status = audioProcessor.getStatus()

console.log(`Playing: ${status.isPlaying}`)
console.log(`Current: ${status.currentIndex + 1} of ${status.totalItems}`)
console.log(`Transcription: ${status.currentItem?.transcription}`)

Pause and Resume

// Pause playback
audioProcessor.pause()

// Resume playback
audioProcessor.play()

// Stop and reset
audioProcessor.stop()

Performance Optimization

Chunked Processing

For real-time streams, process chunks as they arrive:

websocket.onmessage = async (event) => {
  const response = JSON.parse(event.data)

  // Process immediately without waiting
  if (response.streamedOutput?.outputType === 'AUDIO') {
    audioProcessor.processAudioChunk(
      response.streamedOutput.base64Audio,
      response.streamedOutput.sampleRate,
      response.streamedOutput.transcription
    ).then(audioItem => {
      // Auto-start on first chunk
      const status = audioProcessor.getStatus()
      if (status.totalItems === 1 && !status.isPlaying) {
        audioProcessor.play()
      }
    })
  }
}

Limit Queue Size

For long-running sessions, limit queue size:

class AudioProcessor {
  private maxQueueSize: number = 50

  async processAudioChunk(...args) {
    const item = await super.processAudioChunk(...args)

    // Trim old items if queue too large
    const status = this.getStatus()
    if (status.totalItems > this.maxQueueSize) {
      // Implementation would remove old items
      // and revoke their blob URLs
    }

    return item
  }
}

Next Steps

Learn how to record audio in the Audio Recording guide
See complete voice mode examples in the Agent API documentation
Explore the full Agent API in the Agent API reference

Learn how to play audio received from Autessa agents in both real-time voice mode and standard audio responses.​

Overview​

Audio Response Format​

Utility Classes​

WavUtils - WAV File Creation​

DataUtils - Data Conversion​

AudioQueue - Queue Management​

AudioProcessor - Main Playback Class​

Usage Examples​

Real-time Voice Mode Playback​

Standard Audio Response Playback​

Processing Multiple Chunks​

Interruption Handling​

When User Starts Speaking​

When User Stops Speaking​

Why Discard Queued Audio?​

Memory Management​

Automatic Cleanup​

Manual Cleanup​

Best Practices​

Error Handling​

Robust Audio Processing​

Handle Playback Errors​

Advanced Usage​

Custom Audio Element​

Monitor Playback Progress​

Pause and Resume​

Performance Optimization​

Chunked Processing​

Limit Queue Size​

Next Steps​