Skip to main content

Agent API

The Agent API allows you to create and manage conversations with AI agents, execute them with text or audio input, and receive real-time streaming responses.

All Agent API requests require the resourceId query parameter set to your agent ID and an Authorization header with your API key.


Base URL

REST API: https://api.autessa.com/clients/agents
WebSocket: wss://api.autessa.com/ws/clients/agents/execute

Authentication

HTTP REST APIs

Pass your API key in the Authorization header (no "Bearer" prefix):

Authorization: your_api_key_here

WebSocket APIs

Pass your API key as a query parameter:

authorization=your_api_key_here
curl https://api.autessa.com/clients/agents/execute-create?resourceId=123 \
-H "Authorization: your_api_key" \
-H "Content-Type: application/json"

REST API Endpoints

Create Conversation POST/execute-create

Creates a new conversation for multi-turn agent interactions. A conversation maintains context across multiple agent executions. You only need to create a conversation if you want multi-turn interactions - single-shot requests don't require conversations.

Note: Once created, a conversation is tied to a specific agent version. Even if you publish a new version, existing conversations will continue using the original version.

Query Parameters

  • Name
    resourceId
    Type
    integer
    Description

    The ID of the agent.

Request Body

  • Name
    agentId
    Type
    integer
    Description

    The ID of the agent to create a conversation for.

  • Name
    version
    Type
    CustomVersion
    Description

    Optional version override. If not specified, uses the PUBLISHED version.

Response

  • Name
    conversationId
    Type
    string
    Description

    The unique ID for the created conversation. Use this in subsequent execute requests.

  • Name
    errors
    Type
    array<string>
    Description

    Array of error messages if any occurred.

curl --request POST \
--url 'https://api.autessa.com/clients/agents/execute-create?resourceId=123' \
--header 'Content-Type: application/json' \
--header 'Authorization: your_api_key' \
--data '{
"agentId": 123
}'
JSON
{
"conversationId": "conv_a1b2c3d4e5",
"errors": []
}

Execute Agent POST/execute

Executes an agent synchronously with multimodal input (text or audio). This endpoint returns the complete response once processing is done.

Note: Synchronous execution only supports TEXT output. For AUDIO output or real-time streaming, use the WebSocket endpoint.

Query Parameters

  • Name
    resourceId
    Type
    integer
    Description

    The ID of the agent.

Request Body

  • Name
    agentId
    Type
    integer
    Description

    The ID of the agent to execute.

  • Name
    conversationId
    Type
    string
    Description

    The conversation ID from /execute-create. Required for multi-turn conversations.

  • Name
    input
    Type
    array<MultimodalInput>
    Description

    Array of multimodal inputs. See Multimodal Input.

  • Name
    environmentVariables
    Type
    object
    Description

    Key-value pairs for environment variables that won't be stored on the server.

  • Name
    promptTemplateVariables
    Type
    object
    Description

    Key-value pairs to inject into the agent's prompt template. Variables are referenced in the agent instructions using @@{variableName}@@ syntax.

  • Name
    version
    Type
    CustomVersion
    Description

    Version override. Ignored if using an existing conversationId.

Response

  • Name
    output
    Type
    array<MultimodalOutput>
    Description

    Array of output responses from the agent.

  • Name
    errors
    Type
    array<string>
    Description

    Array of error messages if any occurred.

curl --request POST \
--url 'https://api.autessa.com/clients/agents/execute?resourceId=123' \
--header 'Content-Type: application/json' \
--header 'Authorization: your_api_key' \
--data '{
"agentId": 123,
"conversationId": "conv_a1b2c3d4e5",
"input": [
{
"inputType": "TEXT",
"content": "What is the weather today?"
}
],
"environmentVariables": {
"USER_LOCATION": "Seattle"
},
"promptTemplateVariables": {
"customerName": "John Doe",
"accountType": "Premium"
}
}'
JSON
{
"output": [
{
"outputType": "TEXT",
"content": "The weather in Seattle today is 65°F with partly cloudy skies."
}
],
"errors": []
}

Close Conversation POST/execute-close

Closes an existing conversation. This triggers logging and any configured auto-evaluation for the conversation. It's safe to call this even for expired conversations.

Important: Conversations automatically close after 12 hours of inactivity, but it's best practice to explicitly close them when done.

Query Parameters

  • Name
    resourceId
    Type
    integer
    Description

    The ID of the agent.

Request Body

  • Name
    agentId
    Type
    integer
    Description

    The ID of the agent.

  • Name
    conversationId
    Type
    string
    Description

    The conversation ID to close.

Response

  • Name
    errors
    Type
    array<string>
    Description

    Array of error messages if any occurred.

curl --request POST \
--url 'https://api.autessa.com/clients/agents/execute-close?resourceId=123' \
--header 'Content-Type: application/json' \
--header 'Authorization: your_api_key' \
--data '{
"agentId": 123,
"conversationId": "conv_a1b2c3d4e5"
}'
JSON
{
"errors": []
}

Get Conversation POST/conversation

Retrieves the complete conversation history including all messages and evaluation context.

Query Parameters

  • Name
    resourceId
    Type
    integer
    Description

    The ID of the agent.

Request Body

  • Name
    agentId
    Type
    integer
    Description

    The ID of the agent.

  • Name
    conversationId
    Type
    string
    Description

    The conversation ID to retrieve.

Response

  • Name
    conversation
    Type
    AgentConversationDto
    Description

    The complete conversation data including logs and evaluation context.

  • Name
    errors
    Type
    array<string>
    Description

    Array of error messages if any occurred.

cURL
curl --request POST \
--url 'https://api.autessa.com/clients/agents/conversation?resourceId=123' \
--header 'Content-Type: application/json' \
--header 'Authorization: your_api_key' \
--data '{
"agentId": 123,
"conversationId": "conv_a1b2c3d4e5"
}'
JSON
{
"conversation": {
"resourceId": 123,
"version": "1",
"conversationId": "conv_a1b2c3d4e5",
"apiKeyId": "key_xyz",
"closeDate": null,
"logs": [
{
"logType": "USER",
"userMultimodalInputs": [
{
"inputType": "TEXT",
"content": "What is the weather in Seattle?"
}
]
},
{
"logType": "TOOL_REQUEST",
"executeToolRequest": {
"toolCallId": "call_abc123",
"name": "GetWeather",
"variables": {
"location": "Seattle"
}
}
},
{
"logType": "TOOL_RESPONSE",
"executeToolResponse": {
"toolCallId": "call_abc123",
"toolName": "GetWeather",
"result": {
"temperature": 65,
"condition": "Partly Cloudy"
}
}
},
{
"logType": "MESSAGE_TO_USER",
"multimodalOutput": {
"outputType": "TEXT",
"content": "The weather in Seattle is currently 65°F with partly cloudy skies."
}
}
]
},
"errors": []
}

Conversation Log Structure

Conversation logs returned by the /conversation endpoint contain detailed information about all interactions in a conversation. Each log entry has a logType field that determines which other fields are present.

Log Types

  • Name
    USER
    Description

    User input to the agent. Contains userMultimodalInputs array with text or audio inputs.

  • Name
    MESSAGE_TO_USER
    Description

    Agent's response to the user. Contains multimodalOutput with text or audio output.

  • Name
    TOOL_REQUEST
    Description

    Request to execute a tool. Contains executeToolRequest with tool name and variables.

  • Name
    TOOL_RESPONSE
    Description

    Result from tool execution. Contains executeToolResponse with the tool's result.

AgentLog Structure

Each log entry has one of these structures based on its logType:

USER Log:

  • Name
    logType
    Type
    string
    Description

    "USER"

  • Name
    userMultimodalInputs
    Type
    array<MultimodalInput>
    Description

    Array of user inputs (text or audio). See Multimodal Input.

MESSAGE_TO_USER Log:

  • Name
    logType
    Type
    string
    Description

    "MESSAGE_TO_USER"

  • Name
    multimodalOutput
    Type
    MultimodalOutput
    Description

    The agent's response (text or audio).

TOOL_REQUEST Log:

  • Name
    logType
    Type
    string
    Description

    "TOOL_REQUEST"

  • Name
    executeToolRequest
    Type
    object
    Description

    Tool execution request details.

  • Name
    executeToolRequest.toolCallId
    Type
    string
    Description

    Unique identifier for this tool call.

  • Name
    executeToolRequest.name
    Type
    string
    Description

    Name of the tool being invoked.

  • Name
    executeToolRequest.variables
    Type
    object
    Description

    Key-value pairs of arguments passed to the tool.

TOOL_RESPONSE Log:

  • Name
    logType
    Type
    string
    Description

    "TOOL_RESPONSE"

  • Name
    executeToolResponse
    Type
    object
    Description

    Tool execution result.

  • Name
    executeToolResponse.toolCallId
    Type
    string
    Description

    Matches the toolCallId from the corresponding TOOL_REQUEST.

  • Name
    executeToolResponse.toolName
    Type
    string
    Description

    Name of the tool that was executed.

  • Name
    executeToolResponse.result
    Type
    any
    Description

    The result returned by the tool.

{
"logType": "USER",
"userMultimodalInputs": [
{
"inputType": "TEXT",
"content": "What is the weather?"
}
]
}

Get Conversation Status POST/conversation-status

Checks the current status of a conversation (IN_PROGRESS, CLOSED, or NOT_FOUND).

Query Parameters

  • Name
    resourceId
    Type
    integer
    Description

    The ID of the agent.

Request Body

  • Name
    agentId
    Type
    integer
    Description

    The ID of the agent.

  • Name
    conversationId
    Type
    string
    Description

    The conversation ID to check.

Response

  • Name
    conversationId
    Type
    string
    Description

    The conversation ID that was checked.

  • Name
    status
    Type
    string
    Description

    One of: IN_PROGRESS, CLOSED, NOT_FOUND

  • Name
    errors
    Type
    array<string>
    Description

    Array of error messages if any occurred.

cURL
curl --request POST \
--url 'https://api.autessa.com/clients/agents/conversation-status?resourceId=123' \
--header 'Content-Type: application/json' \
--header 'Authorization: your_api_key' \
--data '{
"agentId": 123,
"conversationId": "conv_a1b2c3d4e5"
}'
JSON
{
"conversationId": "conv_a1b2c3d4e5",
"status": "IN_PROGRESS",
"errors": []
}

Revive Conversation POST/revive-conversation

Revives a closed conversation, allowing you to continue the interaction where it left off.

Query Parameters

  • Name
    resourceId
    Type
    integer
    Description

    The ID of the agent.

Request Body

  • Name
    agentId
    Type
    integer
    Description

    The ID of the agent.

  • Name
    conversationId
    Type
    string
    Description

    The conversation ID to revive.

Response

  • Name
    conversationId
    Type
    string
    Description

    The revived conversation ID.

  • Name
    status
    Type
    string
    Description

    Status message about the revival operation.

  • Name
    errors
    Type
    array<string>
    Description

    Array of error messages if any occurred.

cURL
curl --request POST \
--url 'https://api.autessa.com/clients/agents/revive-conversation?resourceId=123' \
--header 'Content-Type: application/json' \
--header 'Authorization: your_api_key' \
--data '{
"agentId": 123,
"conversationId": "conv_a1b2c3d4e5"
}'
JSON
{
"conversationId": "conv_a1b2c3d4e5",
"status": "REVIVED",
"errors": []
}

Generates a pre-signed S3 upload URL for audio files. Use this when you want to upload audio files separately rather than sending them as base64 in the request.

Note: Currently supports WAV format only.

Query Parameters

  • Name
    resourceId
    Type
    integer
    Description

    The ID of the agent.

Request Body

Currently empty (WAV format is hardcoded).

Response

  • Name
    uploadUrl
    Type
    string
    Description

    Pre-signed URL for uploading the audio file via PUT request.

  • Name
    s3Uri
    Type
    string
    Description

    S3 URI to use in AudioInput when referencing this file.

  • Name
    expiresInSeconds
    Type
    integer
    Description

    Time in seconds until the upload URL expires.

  • Name
    errors
    Type
    array<string>
    Description

    Array of error messages if any occurred.

cURL
curl --request POST \
--url 'https://api.autessa.com/clients/agents/generate-audio-upload-link?resourceId=123' \
--header 'Content-Type: application/json' \
--header 'Authorization: your_api_key' \
--data '{}'
JSON
{
"uploadUrl": "https://s3.amazonaws.com/autessa-audio/...",
"s3Uri": "s3://autessa-audio/user-123/audio-xyz.wav",
"expiresInSeconds": 3600,
"errors": []
}
Upload the file
# Step 1: Get upload link
# Step 2: Upload audio file
curl --request PUT \
--url 'https://s3.amazonaws.com/autessa-audio/...' \
--header 'Content-Type: audio/wav' \
--data-binary '@audio.wav'

# Step 3: Use s3Uri in AudioInput
# {
# "inputType": "AUDIO",
# "s3Uri": "s3://autessa-audio/user-123/audio-xyz.wav"
# }

WebSocket API

Multimodal Agent Execution WSS/ws/clients/agents/execute

Real-time agent execution with streaming responses. Supports both text and audio input/output with streaming capabilities.

Connection URL

wss://api.autessa.com/ws/clients/agents/execute?authorization={key}&resourceId={agentId}

Query Parameters

  • Name
    authorization
    Type
    string
    Description

    Your API key.

  • Name
    resourceId
    Type
    integer
    Description

    The agent ID.

  • Name
    conversationId
    Type
    string
    Description

    Optional conversation ID to revive and continue.

  • Name
    voiceModeEnabled
    Type
    boolean
    Description

    Set to "true" for voice mode with continuous audio streaming.

Connection Lifecycle

  1. Connect - WebSocket connection established
  2. Status Message - Server sends conversation status
  3. Send Messages - Client sends MULTIMODAL or VOICE messages
  4. Receive Streams - Server streams responses in real-time
  5. Disconnect - Connection closes, conversation auto-closes
websocat "wss://api.autessa.com/ws/clients/agents/execute?authorization=your_api_key&resourceId=123"

Server-to-Client Messages

When you connect, the server immediately sends a status message:

{
"conversationId": "conv_abc123",
"status": "CREATED",
"errors": []
}

Client-to-Server Messages

MULTIMODAL Message

Send a JSON message to execute the agent with multimodal input:

  • Name
    messageType
    Type
    string
    Description

    Must be "MULTIMODAL"

  • Name
    agentId
    Type
    integer
    Description

    The agent ID

  • Name
    input
    Type
    array<MultimodalInput>
    Description

    Array of multimodal inputs (text or audio)

  • Name
    executionOutputMode
    Type
    string
    Description

    "TEXT" or "AUDIO" - desired output format. Defaults to "TEXT".

  • Name
    environmentVariables
    Type
    object
    Description

    Key-value pairs for environment variables that won't be stored on the server.

  • Name
    promptTemplateVariables
    Type
    object
    Description

    Key-value pairs to inject into the agent's prompt template. Variables are referenced in the agent instructions using @@{variableName}@@ syntax.

JSON
{
"messageType": "MULTIMODAL",
"agentId": 123,
"executionOutputMode": "TEXT",
"input": [
{
"inputType": "TEXT",
"content": "What's the weather in Seattle?"
}
],
"environmentVariables": {
"USER_LOCATION": "Seattle"
},
"promptTemplateVariables": {
"customerName": "John Doe",
"accountType": "Premium"
}
}

VOICE Message

Establish voice mode for continuous audio streaming:

  • Name
    messageType
    Type
    string
    Description

    Must be "VOICE"

  • Name
    agentId
    Type
    integer
    Description

    The agent ID

  • Name
    audioFormat
    Type
    AudioFormat
    Description

    Audio format specification for the audio bytes you'll send

  • Name
    outputType
    Type
    string
    Description

    "TEXT" or "AUDIO" - desired output format

  • Name
    environmentVariables
    Type
    object
    Description

    Environment variables

JSON
{
"messageType": "VOICE",
"agentId": 123,
"audioFormat": {
"sampleRate": 16000,
"sampleSizeInBits": 16,
"channels": 1,
"signed": true,
"bigEndian": false
},
"outputType": "AUDIO",
"environmentVariables": {}
}

After establishing voice mode, send raw audio bytes as binary WebSocket messages.

Streaming Response Messages

The server streams responses in real-time. There are three types of messages:

1. Text Stream Messages

JSON
{
"streamedOutput": {
"outputType": "TEXT",
"content": " Seattle"
}
}

Text is streamed token-by-token. Concatenate the content fields to build the full response.

2. Tool Stream Messages

{
"streamedToolMessages": {
"responseType": "EXECUTE_TOOL",
"toolName": "GetWeather",
"invocationId": "inv_123",
"publicVars": {
"city": "Seattle",
"state": "WA"
},
"streamMessage": "EXECUTION START: GetWeather"
}
}

3. Audio Stream Messages

When executionOutputMode: "AUDIO" is requested:

JSON
{
"streamedOutput": {
"outputType": "AUDIO",
"sampleRate": 16000,
"base64Audio": "//8AAAMAAAD...",
"transcription": "The weather in Seattle is 65 degrees",
"isFinalChunk": false
}
}

Audio is streamed in chunks. Decode the base64Audio and play it sequentially.

4. Final Response

After streaming completes, the server sends a final message with the complete output:

JSON
{
"conversationId": "conv_abc123",
"output": [
{
"outputType": "TEXT",
"content": "The weather in Seattle is currently 65°F with partly cloudy skies."
}
],
"errors": []
}

Multimodal Input

Autessa supports multimodal input - you can send text, audio, or a combination of both. Each input is distinguished by its inputType field, which can be either "TEXT" or "AUDIO".

Text Input

  • Name
    inputType
    Type
    string
    Description

    Must be "TEXT"

  • Name
    content
    Type
    string
    Description

    The text message from the user

JSON
{
"inputType": "TEXT",
"content": "What's the weather today?"
}

Audio Input

Audio can be provided in two ways: base64 encoded data or S3 URI.

  • Name
    inputType
    Type
    string
    Description

    Must be "AUDIO"

  • Name
    base64EncodedData
    Type
    string
    Description

    Base64 encoded audio data (option 1)

  • Name
    s3Uri
    Type
    string
    Description

    S3 URI from /generate-audio-upload-link (option 2)

  • Name
    audioFormat
    Type
    AudioFormat
    Description

    Audio format metadata

  • Name
    transcription
    Type
    string
    Description

    Optional pre-transcribed text

{
"inputType": "AUDIO",
"base64EncodedData": "UklGRiQAAABXQVZFZm10...",
"audioFormat": {
"sampleRate": 16000,
"sampleSizeInBits": 16,
"channels": 1,
"signed": true,
"bigEndian": false
}
}

AudioFormat

  • Name
    sampleRate
    Type
    integer
    Description

    Samples per second in Hz. Common values: 16000 (telephony), 44100 (CD quality), 48000 (professional).

  • Name
    sampleSizeInBits
    Type
    integer
    Description

    Bits per sample. Typical values: 8, 16, 24, 32.

  • Name
    channels
    Type
    integer
    Description

    Number of audio channels. 1 = mono, 2 = stereo.

  • Name
    signed
    Type
    boolean
    Description

    Whether samples are signed (true) or unsigned (false).

  • Name
    bigEndian
    Type
    boolean
    Description

    Byte order: true = big-endian, false = little-endian.


Multimodal Output

Agent responses can be in text or audio format. The output type is distinguished by the outputType field, which can be either "TEXT" or "AUDIO".

Text Output

  • Name
    outputType
    Type
    string
    Description

    Must be "TEXT"

  • Name
    content
    Type
    string
    Description

    The text response from the agent

JSON
{
"outputType": "TEXT",
"content": "The weather in Seattle is 65°F with partly cloudy skies."
}

Audio Output

Audio output is used when you request executionOutputMode: "AUDIO" via WebSocket.

  • Name
    outputType
    Type
    string
    Description

    Must be "AUDIO"

  • Name
    sampleRate
    Type
    integer
    Description

    Samples per second in Hz (e.g., 16000)

  • Name
    base64Audio
    Type
    string
    Description

    Base64 encoded audio data

  • Name
    transcription
    Type
    string
    Description

    Text transcription of the audio

  • Name
    isFinalChunk
    Type
    boolean
    Description

    Whether this is the final chunk of audio in the stream

JSON
{
"outputType": "AUDIO",
"sampleRate": 16000,
"base64Audio": "UklGRiQAAABXQVZFZm10...",
"transcription": "The weather in Seattle is 65 degrees with partly cloudy skies.",
"isFinalChunk": true
}

Custom Version

Override the default PUBLISHED version of an agent:

  • Name
    versionState
    Type
    string
    Description

    Version state: "PUBLISHED", "DRAFT", or "ARCHIVED". Defaults to "PUBLISHED".

  • Name
    versionNumber
    Type
    integer
    Description

    Specific version number to use (optional).

{
"agentId": 123,
"version": {
"versionState": "DRAFT"
},
"input": [...]
}

Enumerations

EConversationStatus

  • IN_PROGRESS - Conversation is active
  • CLOSED - Conversation has been closed
  • NOT_FOUND - Conversation doesn't exist

ExecutionOutputMode

  • TEXT - Text output (default)
  • AUDIO - Audio output (base64 encoded)

EInputType

  • TEXT - Text input
  • AUDIO - Audio input

EOutputType

  • TEXT - Text output
  • AUDIO - Audio output

EVersionState

  • PUBLISHED - Published version (default)
  • DRAFT - Draft/sandbox version
  • ARCHIVED - Archived version

Rate Limiting

Agent execution endpoints are rate limited. Rate limit information is included in response headers:

  • X-RateLimit-Limit-Minute - Requests allowed per minute
  • X-RateLimit-Remaining-Minute - Remaining requests this minute
  • X-RateLimit-Limit-Day - Requests allowed per day
  • X-RateLimit-Remaining-Day - Remaining requests today

When rate limited, you'll receive a 429 status code with a Retry-After header.


Complete Examples

Single-Turn Conversation

// Simple one-shot agent execution
const response = await fetch(
'https://api.autessa.com/clients/agents/execute?resourceId=123',
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'your_api_key'
},
body: JSON.stringify({
agentId: 123,
input: [
{
inputType: 'TEXT',
content: 'Tell me a joke'
}
]
})
}
);

const data = await response.json();
console.log(data.output[0].content);

Multi-Turn Conversation

JavaScript
// Create conversation
const createResp = await fetch(
'https://api.autessa.com/clients/agents/execute-create?resourceId=123',
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'your_api_key'
},
body: JSON.stringify({ agentId: 123 })
}
);

const { conversationId } = await createResp.json();

// First message
const resp1 = await fetch(
'https://api.autessa.com/clients/agents/execute?resourceId=123',
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'your_api_key'
},
body: JSON.stringify({
agentId: 123,
conversationId,
input: [{ inputType: 'TEXT', content: 'Hi, I need help booking a flight' }]
})
}
);

// Second message (continues conversation)
const resp2 = await fetch(
'https://api.autessa.com/clients/agents/execute?resourceId=123',
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'your_api_key'
},
body: JSON.stringify({
agentId: 123,
conversationId,
input: [{ inputType: 'TEXT', content: 'From Seattle to San Francisco' }]
})
}
);

// Close conversation
await fetch(
'https://api.autessa.com/clients/agents/execute-close?resourceId=123',
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'your_api_key'
},
body: JSON.stringify({
agentId: 123,
conversationId
})
}
);

Mixed Text and Audio Input

JavaScript
// Example showing both text and audio inputs in one request
const response = await fetch(
'https://api.autessa.com/clients/agents/execute?resourceId=123',
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'your_api_key'
},
body: JSON.stringify({
agentId: 123,
conversationId: 'conv_a1b2c3d4e5',
input: [
// Text input
{
inputType: 'TEXT',
content: 'Here is my question:'
},
// Audio input
{
inputType: 'AUDIO',
s3Uri: 's3://autessa-audio/user-123/question.wav',
audioFormat: {
sampleRate: 16000,
sampleSizeInBits: 16,
channels: 1,
signed: true,
bigEndian: false
}
}
]
})
}
);

const data = await response.json();
// Response has outputType field to distinguish format
console.log('Output type:', data.output[0].outputType); // "TEXT" or "AUDIO"

WebSocket Streaming

JavaScript
const ws = new WebSocket(
'wss://api.autessa.com/ws/clients/agents/execute?authorization=your_api_key&resourceId=123'
);

let conversationId;
let fullResponse = '';

ws.onopen = () => {
console.log('Connected');
};

ws.onmessage = (event) => {
const data = JSON.parse(event.data);

// Connection status
if (data.status) {
conversationId = data.conversationId;
console.log('Conversation created:', conversationId);

// Send first message
ws.send(JSON.stringify({
messageType: 'MULTIMODAL',
agentId: 123,
executionOutputMode: 'TEXT',
input: [
{
inputType: 'TEXT',
content: 'What is the weather?'
}
]
}));
return;
}

// Streaming text
if (data.streamedOutput) {
fullResponse += data.streamedOutput.content;
console.log('Stream:', data.streamedOutput.content);
}

// Tool messages
if (data.streamedToolMessages) {
console.log('Tool:', data.streamedToolMessages.streamMessage);
}

// Final output
if (data.output) {
console.log('Final response:', data.output[0].content);
console.log('Full streamed response:', fullResponse);
}
};

ws.onerror = (error) => {
console.error('WebSocket error:', error);
};

ws.onclose = () => {
console.log('Disconnected');
};

Real-time Voice Mode

For conversational AI with continuous audio streaming, use the voiceModeEnabled=true parameter. This enables bidirectional audio streaming for natural voice conversations.

const agentId = 123
const apiKey = 'your_api_key'

// Connect with voice mode enabled
const ws = new WebSocket(
`wss://api.autessa.com/ws/clients/agents/execute?authorization=${apiKey}&resourceId=${agentId}&voiceModeEnabled=true`
)

let conversationId

ws.onopen = () => {
console.log('Connected in voice mode')
}

ws.onmessage = async (event) => {
// Handle connection status
if (typeof event.data === 'string') {
const data = JSON.parse(event.data)

if (data.status) {
conversationId = data.conversationId
console.log('Conversation created:', conversationId)

// Send voice establishment message
ws.send(JSON.stringify({
messageType: 'VOICE',
agentId: agentId,
audioFormat: {
sampleRate: 16000,
sampleSizeInBits: 32,
channels: 1,
signed: true,
bigEndian: false
},
outputType: 'AUDIO',
environmentVariables: {},
promptTemplateVariables: {}
}))

console.log('Voice mode established')
return
}

// Handle audio output chunks
if (data.streamedOutput?.outputType === 'AUDIO') {
const { base64Audio, sampleRate, transcription, isFinalChunk } = data.streamedOutput

console.log('Agent:', transcription)

// Process audio chunk for playback
// See: https://docs.autessa.com/api/audio-playback
await audioProcessor.processAudioChunk(base64Audio, sampleRate, transcription)

// Auto-start playback
const status = audioProcessor.getStatus()
if (status.totalItems === 1 && !status.isPlaying) {
audioProcessor.play()
}

if (isFinalChunk) {
console.log('Agent finished speaking')
}
}
}
}

// Start recording and stream audio
// See complete implementation at: https://docs.autessa.com/api/audio-record
const recorder = new AudioRecorder(16000)

recorder.onSpeechStart = () => {
console.log('User started speaking')
audioProcessor.setInterrupted(true)
}

recorder.onSpeechEnd = () => {
console.log('User stopped speaking')
audioProcessor.setInterrupted(false)
}

// Start continuous recording with WebSocket
await recorder.startContinuousRecording(
`wss://api.autessa.com/ws/clients/agents/execute?authorization=${apiKey}&resourceId=${agentId}&voiceModeEnabled=true`,
{
outputType: 'AUDIO',
agentId: agentId,
environmentVariables: {},
promptTemplateVariables: {}
}
)

// Later: cleanup
recorder.stopRecording()
audioProcessor.clear()
ws.close()

Voice Mode Protocol:

  1. Connect with voiceModeEnabled=true
  2. Wait for connection status message
  3. Send VOICE establishment message with audio format
  4. Stream raw audio bytes as binary WebSocket messages
  5. Receive streamed audio chunks in real-time

For complete implementation with Voice Activity Detection and audio utilities, see the Audio Recording and Audio Playback guides.

Voice Establishment Message

After connecting, send this message to establish voice mode:

VOICE Message
{
"messageType": "VOICE",
"agentId": 123,
"audioFormat": {
"sampleRate": 16000,
"sampleSizeInBits": 32,
"channels": 1,
"signed": true,
"bigEndian": false
},
"outputType": "AUDIO",
"environmentVariables": {},
"promptTemplateVariables": {}
}

After sending the establishment message:

  • Send raw audio bytes as binary WebSocket messages (ArrayBuffer/bytes)
  • Receive JSON messages with audio chunks and transcriptions
  • Audio format: Float32Array converted to ArrayBuffer (32-bit float, mono, 16kHz)

Audio Streaming Format

Input Audio (Client → Server):

  • Send raw ArrayBuffer from Float32Array
  • 16kHz sample rate, 32-bit float, mono
  • Continuous streaming (detected via Voice Activity Detection)

Output Audio (Server → Client):

{
"streamedOutput": {
"outputType": "AUDIO",
"sampleRate": 16000,
"base64Audio": "UklGRiQAAABXQVZF...",
"transcription": "Hello! How can I help you today?",
"isFinalChunk": false
}
}

Complete Voice Mode Example

This example combines everything for a working voice conversation:

Complete TypeScript Example
import { AudioRecorder } from './audio-recorder'
import { AudioProcessor } from './audio-processor'

class VoiceConversation {
private agentId: number
private apiKey: string
private ws: WebSocket | null = null
private recorder: AudioRecorder
private audioProcessor: AudioProcessor
private conversationId: string | null = null

constructor(agentId: number, apiKey: string) {
this.agentId = agentId
this.apiKey = apiKey
this.recorder = new AudioRecorder(16000)
this.audioProcessor = new AudioProcessor()
this.setupCallbacks()
}

private setupCallbacks() {
// Handle user speech detection
this.recorder.onSpeechStart = () => {
console.log('👤 User speaking...')
this.audioProcessor.setInterrupted(true)
}

this.recorder.onSpeechEnd = () => {
console.log('👤 User stopped')
this.audioProcessor.setInterrupted(false)
}
}

async start() {
const wsUrl = `wss://api.autessa.com/ws/clients/agents/execute?authorization=${this.apiKey}&resourceId=${this.agentId}&voiceModeEnabled=true`

this.ws = new WebSocket(wsUrl)

this.ws.onopen = () => {
console.log('✅ Connected')
}

this.ws.onmessage = async (event) => {
const data = JSON.parse(event.data)

// Connection status
if (data.status) {
this.conversationId = data.conversationId
console.log('💬 Conversation:', this.conversationId)

// Establish voice mode
await this.recorder.startContinuousRecording(wsUrl, {
outputType: 'AUDIO',
agentId: this.agentId,
environmentVariables: {},
promptTemplateVariables: {}
})

console.log('🎤 Voice mode active')
return
}

// Audio responses
if (data.streamedOutput?.outputType === 'AUDIO') {
const { base64Audio, sampleRate, transcription, isFinalChunk } = data.streamedOutput

console.log('🤖 Agent:', transcription)

// Play audio
await this.audioProcessor.processAudioChunk(base64Audio, sampleRate, transcription)

const status = this.audioProcessor.getStatus()
if (status.totalItems === 1 && !status.isPlaying) {
this.audioProcessor.play()
}

if (isFinalChunk) {
console.log('🤖 Agent finished')
}
}
}

this.ws.onerror = (error) => {
console.error('❌ Error:', error)
}

this.ws.onclose = () => {
console.log('👋 Disconnected')
this.cleanup()
}
}

cleanup() {
this.recorder.stopRecording()
this.audioProcessor.clear()
if (this.ws) {
this.ws.close()
this.ws = null
}
}
}

// Usage
const conversation = new VoiceConversation(123, 'your_api_key')
await conversation.start()

// Later: cleanup
conversation.cleanup()

Production Considerations:

  1. Interruption Handling: The example above handles user interruptions by pausing agent audio when the user starts speaking
  2. Voice Activity Detection: The AudioRecorder includes VAD to detect speech vs silence
  3. Memory Management: Always call cleanup methods to free blob URLs and close connections
  4. Error Handling: Implement reconnection logic for production use
  5. Audio Format: Use 16kHz, mono, Float32 for recording; receive Int16 PCM for playback

For complete utility class implementations, see: