Agent API

The Agent API allows you to create and manage conversations with AI agents, execute them with text or audio input, and receive real-time streaming responses.

All Agent API requests require the resourceId query parameter set to your agent ID and an Authorization header with your API key.

Base URL

REST API:   https://api.autessa.com/clients/agents
WebSocket:  wss://api.autessa.com/ws/clients/agents/execute

Authentication

HTTP REST APIs

Pass your API key in the Authorization header (no "Bearer" prefix):

Authorization: your_api_key_here

WebSocket APIs

Pass your API key as a query parameter:

authorization=your_api_key_here

REST
WebSocket

curl https://api.autessa.com/clients/agents/execute-create?resourceId=123 \
  -H "Authorization: your_api_key" \
  -H "Content-Type: application/json"

websocat "wss://api.autessa.com/ws/clients/agents/execute?authorization=your_api_key&resourceId=123"

REST API Endpoints

Create Conversation POST`/execute-create`

Creates a new conversation for multi-turn agent interactions. A conversation maintains context across multiple agent executions. You only need to create a conversation if you want multi-turn interactions - single-shot requests don't require conversations.

Note: Once created, a conversation is tied to a specific agent version. Even if you publish a new version, existing conversations will continue using the original version.

Query Parameters

Name
resourceId
Type
integer
Description
The ID of the agent.

Request Body

Name
agentId
Type
integer
Description
The ID of the agent to create a conversation for.
Name
version
Type
CustomVersion
Description
Optional version override. If not specified, uses the PUBLISHED version.

Response

Name
conversationId
Type
string
Description
The unique ID for the created conversation. Use this in subsequent execute requests.
Name
errors
Type
array<string>
Description
Array of error messages if any occurred.

cURL
JavaScript

  curl --request POST \
    --url 'https://api.autessa.com/clients/agents/execute-create?resourceId=123' \
    --header 'Content-Type: application/json' \
    --header 'Authorization: your_api_key' \
    --data '{
      "agentId": 123
    }'

const response = await fetch(
'https://api.autessa.com/clients/agents/execute-create?resourceId=123',
{
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'your_api_key'
  },
  body: JSON.stringify({
    agentId: 123
  })
}
);

const data = await response.json();
console.log(data.conversationId);

JSON
{
  "conversationId": "conv_a1b2c3d4e5",
  "errors": []
}

Execute Agent POST`/execute`

Executes an agent synchronously with multimodal input (text or audio). This endpoint returns the complete response once processing is done.

Note: Synchronous execution only supports TEXT output. For AUDIO output or real-time streaming, use the WebSocket endpoint.

Query Parameters

Name
resourceId
Type
integer
Description
The ID of the agent.

Request Body

Name
agentId
Type
integer
Description
The ID of the agent to execute.
Name
conversationId
Type
string
Description
The conversation ID from /execute-create. Required for multi-turn conversations.
Name
input
Type
array<MultimodalInput>
Description
Array of multimodal inputs. See Multimodal Input.
Name
environmentVariables
Type
object
Description
Key-value pairs for environment variables that won't be stored on the server.
Name
promptTemplateVariables
Type
object
Description
Key-value pairs to inject into the agent's prompt template. Variables are referenced in the agent instructions using @@{variableName}@@ syntax.
Name
version
Type
CustomVersion
Description
Version override. Ignored if using an existing conversationId.

Response

Name
output
Type
array<MultimodalOutput>
Description
Array of output responses from the agent.
Name
errors
Type
array<string>
Description
Array of error messages if any occurred.

cURL
JavaScript

  curl --request POST \
    --url 'https://api.autessa.com/clients/agents/execute?resourceId=123' \
    --header 'Content-Type: application/json' \
    --header 'Authorization: your_api_key' \
    --data '{
      "agentId": 123,
      "conversationId": "conv_a1b2c3d4e5",
      "input": [
        {
          "inputType": "TEXT",
          "content": "What is the weather today?"
        }
      ],
      "environmentVariables": {
        "USER_LOCATION": "Seattle"
      },
      "promptTemplateVariables": {
        "customerName": "John Doe",
        "accountType": "Premium"
      }
    }'

const response = await fetch(
'https://api.autessa.com/clients/agents/execute?resourceId=123',
{
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'your_api_key'
  },
  body: JSON.stringify({
    agentId: 123,
    conversationId: 'conv_a1b2c3d4e5',
    input: [
      {
        inputType: 'TEXT',
        content: 'What is the weather today?'
      }
    ],
    environmentVariables: {
      USER_LOCATION: 'Seattle'
    },
    promptTemplateVariables: {
      customerName: 'John Doe',
      accountType: 'Premium'
    }
  })
}
);

const data = await response.json();

JSON
{
  "output": [
    {
      "outputType": "TEXT",
      "content": "The weather in Seattle today is 65°F with partly cloudy skies."
    }
  ],
  "errors": []
}

Close Conversation POST`/execute-close`

Closes an existing conversation. This triggers logging and any configured auto-evaluation for the conversation. It's safe to call this even for expired conversations.

Important: Conversations automatically close after 12 hours of inactivity, but it's best practice to explicitly close them when done.

Query Parameters

Name
resourceId
Type
integer
Description
The ID of the agent.

Request Body

Name
agentId
Type
integer
Description
The ID of the agent.
Name
conversationId
Type
string
Description
The conversation ID to close.

Response

Name
errors
Type
array<string>
Description
Array of error messages if any occurred.

cURL
JavaScript

  curl --request POST \
    --url 'https://api.autessa.com/clients/agents/execute-close?resourceId=123' \
    --header 'Content-Type: application/json' \
    --header 'Authorization: your_api_key' \
    --data '{
      "agentId": 123,
      "conversationId": "conv_a1b2c3d4e5"
    }'

await fetch(
'https://api.autessa.com/clients/agents/execute-close?resourceId=123',
{
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'your_api_key'
  },
  body: JSON.stringify({
    agentId: 123,
    conversationId: 'conv_a1b2c3d4e5'
  })
}
);

JSON
{
  "errors": []
}

Get Conversation POST`/conversation`

Retrieves the complete conversation history including all messages and evaluation context.

Query Parameters

Name
resourceId
Type
integer
Description
The ID of the agent.

Request Body

Name
agentId
Type
integer
Description
The ID of the agent.
Name
conversationId
Type
string
Description
The conversation ID to retrieve.

Response

Name
conversation
Type
AgentConversationDto
Description
The complete conversation data including logs and evaluation context.
Name
errors
Type
array<string>
Description
Array of error messages if any occurred.

cURL
curl --request POST \
  --url 'https://api.autessa.com/clients/agents/conversation?resourceId=123' \
  --header 'Content-Type: application/json' \
  --header 'Authorization: your_api_key' \
  --data '{
    "agentId": 123,
    "conversationId": "conv_a1b2c3d4e5"
  }'

JSON
{
  "conversation": {
    "resourceId": 123,
    "version": "1",
    "conversationId": "conv_a1b2c3d4e5",
    "apiKeyId": "key_xyz",
    "closeDate": null,
    "logs": [
      {
        "logType": "USER",
        "userMultimodalInputs": [
          {
            "inputType": "TEXT",
            "content": "What is the weather in Seattle?"
          }
        ]
      },
      {
        "logType": "TOOL_REQUEST",
        "executeToolRequest": {
          "toolCallId": "call_abc123",
          "name": "GetWeather",
          "variables": {
            "location": "Seattle"
          }
        }
      },
      {
        "logType": "TOOL_RESPONSE",
        "executeToolResponse": {
          "toolCallId": "call_abc123",
          "toolName": "GetWeather",
          "result": {
            "temperature": 65,
            "condition": "Partly Cloudy"
          }
        }
      },
      {
        "logType": "MESSAGE_TO_USER",
        "multimodalOutput": {
          "outputType": "TEXT",
          "content": "The weather in Seattle is currently 65°F with partly cloudy skies."
        }
      }
    ]
  },
  "errors": []
}

Conversation Log Structure

Conversation logs returned by the /conversation endpoint contain detailed information about all interactions in a conversation. Each log entry has a logType field that determines which other fields are present.

Log Types

Name
USER
Description
User input to the agent. Contains userMultimodalInputs array with text or audio inputs.
Name
MESSAGE_TO_USER
Description
Agent's response to the user. Contains multimodalOutput with text or audio output.
Name
TOOL_REQUEST
Description
Request to execute a tool. Contains executeToolRequest with tool name and variables.
Name
TOOL_RESPONSE
Description
Result from tool execution. Contains executeToolResponse with the tool's result.

AgentLog Structure

Each log entry has one of these structures based on its logType:

USER Log:

Name
logType
Type
string
Description
"USER"
Name
userMultimodalInputs
Type
array<MultimodalInput>
Description
Array of user inputs (text or audio). See Multimodal Input.

MESSAGE_TO_USER Log:

Name
logType
Type
string
Description
"MESSAGE_TO_USER"
Name
multimodalOutput
Type
MultimodalOutput
Description
The agent's response (text or audio).

TOOL_REQUEST Log:

Name
logType
Type
string
Description
"TOOL_REQUEST"
Name
executeToolRequest
Type
object
Description
Tool execution request details.
Name
executeToolRequest.toolCallId
Type
string
Description
Unique identifier for this tool call.
Name
executeToolRequest.name
Type
string
Description
Name of the tool being invoked.
Name
executeToolRequest.variables
Type
object
Description
Key-value pairs of arguments passed to the tool.

TOOL_RESPONSE Log:

Name
logType
Type
string
Description
"TOOL_RESPONSE"
Name
executeToolResponse
Type
object
Description
Tool execution result.
Name
executeToolResponse.toolCallId
Type
string
Description
Matches the toolCallId from the corresponding TOOL_REQUEST.
Name
executeToolResponse.toolName
Type
string
Description
Name of the tool that was executed.
Name
executeToolResponse.result
Type
any
Description
The result returned by the tool.

  {
    "logType": "USER",
    "userMultimodalInputs": [
      {
        "inputType": "TEXT",
        "content": "What is the weather?"
      }
    ]
  }

{
"logType": "USER",
"userMultimodalInputs": [
  {
    "inputType": "AUDIO",
    "s3Uri": "s3://autessa-audio/...",
    "audioFormat": {
      "sampleRate": 16000,
      "sampleSizeInBits": 16,
      "channels": 1,
      "signed": true,
      "bigEndian": false
    },
    "transcription": "What is the weather?"
  }
]
}

{
"logType": "MESSAGE_TO_USER",
"multimodalOutput": {
  "outputType": "TEXT",
  "content": "It's 65°F and sunny."
}
}

{
"logType": "MESSAGE_TO_USER",
"multimodalOutput": {
  "outputType": "AUDIO",
  "sampleRate": 16000,
  "base64Audio": "UklGRiQAAABXQVZF...",
  "transcription": "It's 65 degrees and sunny.",
  "isFinalChunk": true
}
}

{
"logType": "TOOL_REQUEST",
"executeToolRequest": {
  "toolCallId": "call_123",
  "name": "GetWeather",
  "variables": {
    "location": "Seattle",
    "units": "fahrenheit"
  }
}
}

{
"logType": "TOOL_RESPONSE",
"executeToolResponse": {
  "toolCallId": "call_123",
  "toolName": "GetWeather",
  "result": {
    "temperature": 65,
    "condition": "Sunny"
  }
}
}

Get Conversation Status POST`/conversation-status`

Checks the current status of a conversation (IN_PROGRESS, CLOSED, or NOT_FOUND).

Query Parameters

Name
resourceId
Type
integer
Description
The ID of the agent.

Request Body

Name
agentId
Type
integer
Description
The ID of the agent.
Name
conversationId
Type
string
Description
The conversation ID to check.

Response

Name
conversationId
Type
string
Description
The conversation ID that was checked.
Name
status
Type
string
Description
One of: IN_PROGRESS, CLOSED, NOT_FOUND
Name
errors
Type
array<string>
Description
Array of error messages if any occurred.

cURL
curl --request POST \
  --url 'https://api.autessa.com/clients/agents/conversation-status?resourceId=123' \
  --header 'Content-Type: application/json' \
  --header 'Authorization: your_api_key' \
  --data '{
    "agentId": 123,
    "conversationId": "conv_a1b2c3d4e5"
  }'

JSON
{
  "conversationId": "conv_a1b2c3d4e5",
  "status": "IN_PROGRESS",
  "errors": []
}

Revive Conversation POST`/revive-conversation`

Revives a closed conversation, allowing you to continue the interaction where it left off.

Query Parameters

Name
resourceId
Type
integer
Description
The ID of the agent.

Request Body

Name
agentId
Type
integer
Description
The ID of the agent.
Name
conversationId
Type
string
Description
The conversation ID to revive.

Response

Name
conversationId
Type
string
Description
The revived conversation ID.
Name
status
Type
string
Description
Status message about the revival operation.
Name
errors
Type
array<string>
Description
Array of error messages if any occurred.

cURL
curl --request POST \
  --url 'https://api.autessa.com/clients/agents/revive-conversation?resourceId=123' \
  --header 'Content-Type: application/json' \
  --header 'Authorization: your_api_key' \
  --data '{
    "agentId": 123,
    "conversationId": "conv_a1b2c3d4e5"
  }'

JSON
{
  "conversationId": "conv_a1b2c3d4e5",
  "status": "REVIVED",
  "errors": []
}

Generate Audio Upload Link POST`/generate-audio-upload-link`

Generates a pre-signed S3 upload URL for audio files. Use this when you want to upload audio files separately rather than sending them as base64 in the request.

Note: Currently supports WAV format only.

Query Parameters

Name
resourceId
Type
integer
Description
The ID of the agent.

Request Body

Currently empty (WAV format is hardcoded).

Response

Name
uploadUrl
Type
string
Description
Pre-signed URL for uploading the audio file via PUT request.
Name
s3Uri
Type
string
Description
S3 URI to use in AudioInput when referencing this file.
Name
expiresInSeconds
Type
integer
Description
Time in seconds until the upload URL expires.
Name
errors
Type
array<string>
Description
Array of error messages if any occurred.

cURL
curl --request POST \
  --url 'https://api.autessa.com/clients/agents/generate-audio-upload-link?resourceId=123' \
  --header 'Content-Type: application/json' \
  --header 'Authorization: your_api_key' \
  --data '{}'

JSON
{
  "uploadUrl": "https://s3.amazonaws.com/autessa-audio/...",
  "s3Uri": "s3://autessa-audio/user-123/audio-xyz.wav",
  "expiresInSeconds": 3600,
  "errors": []
}

Upload the file
# Step 1: Get upload link
# Step 2: Upload audio file
curl --request PUT \
  --url 'https://s3.amazonaws.com/autessa-audio/...' \
  --header 'Content-Type: audio/wav' \
  --data-binary '@audio.wav'

# Step 3: Use s3Uri in AudioInput
# {
#   "inputType": "AUDIO",
#   "s3Uri": "s3://autessa-audio/user-123/audio-xyz.wav"
# }

WebSocket API

Multimodal Agent Execution WSS`/ws/clients/agents/execute`

Real-time agent execution with streaming responses. Supports both text and audio input/output with streaming capabilities.

Connection URL

wss://api.autessa.com/ws/clients/agents/execute?authorization={key}&resourceId={agentId}

Query Parameters

Name
authorization
Type
string
Description
Your API key.
Name
resourceId
Type
integer
Description
The agent ID.
Name
conversationId
Type
string
Description
Optional conversation ID to revive and continue.
Name
voiceModeEnabled
Type
boolean
Description
Set to "true" for voice mode with continuous audio streaming.

Connection Lifecycle

Connect - WebSocket connection established
Status Message - Server sends conversation status
Send Messages - Client sends MULTIMODAL or VOICE messages
Receive Streams - Server streams responses in real-time
Disconnect - Connection closes, conversation auto-closes

websocat
JavaScript
Python

websocat "wss://api.autessa.com/ws/clients/agents/execute?authorization=your_api_key&resourceId=123"

const ws = new WebSocket(
  'wss://api.autessa.com/ws/clients/agents/execute?authorization=your_api_key&resourceId=123'
);

ws.onopen = () => {
  console.log('Connected to agent');
};

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  console.log('Received:', data);
};

import websocket
import json

ws = websocket.WebSocket()
ws.connect(
  "wss://api.autessa.com/ws/clients/agents/execute?authorization=your_api_key&resourceId=123"
)

# Receive status
status = json.loads(ws.recv())
print(status)

Server-to-Client Messages

When you connect, the server immediately sends a status message:

New Conversation
Revived Conversation
Error

{
  "conversationId": "conv_abc123",
  "status": "CREATED",
  "errors": []
}

{
  "conversationId": "conv_abc123",
  "status": "REVIVED",
  "errors": []
}

{
  "conversationId": null,
  "status": "ERROR",
  "errors": ["Invalid API key"]
}

Client-to-Server Messages

MULTIMODAL Message

Send a JSON message to execute the agent with multimodal input:

Name
messageType
Type
string
Description
Must be "MULTIMODAL"
Name
agentId
Type
integer
Description
The agent ID
Name
input
Type
array<MultimodalInput>
Description
Array of multimodal inputs (text or audio)
Name
executionOutputMode
Type
string
Description
"TEXT" or "AUDIO" - desired output format. Defaults to "TEXT".
Name
environmentVariables
Type
object
Description
Key-value pairs for environment variables that won't be stored on the server.
Name
promptTemplateVariables
Type
object
Description
Key-value pairs to inject into the agent's prompt template. Variables are referenced in the agent instructions using @@{variableName}@@ syntax.

JSON
{
  "messageType": "MULTIMODAL",
  "agentId": 123,
  "executionOutputMode": "TEXT",
  "input": [
    {
      "inputType": "TEXT",
      "content": "What's the weather in Seattle?"
    }
  ],
  "environmentVariables": {
    "USER_LOCATION": "Seattle"
  },
  "promptTemplateVariables": {
    "customerName": "John Doe",
    "accountType": "Premium"
  }
}

VOICE Message

Establish voice mode for continuous audio streaming:

Name
messageType
Type
string
Description
Must be "VOICE"
Name
agentId
Type
integer
Description
The agent ID
Name
audioFormat
Type
AudioFormat
Description
Audio format specification for the audio bytes you'll send
Name
outputType
Type
string
Description
"TEXT" or "AUDIO" - desired output format
Name
environmentVariables
Type
object
Description
Environment variables

JSON
{
  "messageType": "VOICE",
  "agentId": 123,
  "audioFormat": {
    "sampleRate": 16000,
    "sampleSizeInBits": 16,
    "channels": 1,
    "signed": true,
    "bigEndian": false
  },
  "outputType": "AUDIO",
  "environmentVariables": {}
}

After establishing voice mode, send raw audio bytes as binary WebSocket messages.

Streaming Response Messages

The server streams responses in real-time. There are three types of messages:

1. Text Stream Messages

JSON
{
  "streamedOutput": {
    "outputType": "TEXT",
    "content": " Seattle"
  }
}

Text is streamed token-by-token. Concatenate the content fields to build the full response.

2. Tool Stream Messages

Tool Start
Tool Variables
Tool End
Tool Result

{
  "streamedToolMessages": {
    "responseType": "EXECUTE_TOOL",
    "toolName": "GetWeather",
    "invocationId": "inv_123",
    "publicVars": {
      "city": "Seattle",
      "state": "WA"
    },
    "streamMessage": "EXECUTION START: GetWeather"
  }
}

{
  "streamedToolMessages": {
    "responseType": "EXECUTE_TOOL",
    "toolName": "GetWeather",
    "invocationId": "inv_123",
    "streamMessage": "EXECUTION VARIABLES: {\"city\":\"Seattle\",\"state\":\"WA\"}"
  }
}

{
  "streamedToolMessages": {
    "responseType": "EXECUTE_TOOL",
    "toolName": "GetWeather",
    "invocationId": "inv_123",
    "streamMessage": "EXECUTION END: GetWeather"
  }
}

{
  "streamedToolMessages": {
    "responseType": "EXECUTE_TOOL",
    "toolName": "GetWeather",
    "invocationId": "inv_123",
    "streamMessage": "EXECUTION RESULT: \"It is 65°F and partly cloudy\""
  }
}

3. Audio Stream Messages

When executionOutputMode: "AUDIO" is requested:

JSON
{
  "streamedOutput": {
    "outputType": "AUDIO",
    "sampleRate": 16000,
    "base64Audio": "//8AAAMAAAD...",
    "transcription": "The weather in Seattle is 65 degrees",
    "isFinalChunk": false
  }
}

Audio is streamed in chunks. Decode the base64Audio and play it sequentially.

4. Final Response

After streaming completes, the server sends a final message with the complete output:

JSON
{
  "conversationId": "conv_abc123",
  "output": [
    {
      "outputType": "TEXT",
      "content": "The weather in Seattle is currently 65°F with partly cloudy skies."
    }
  ],
  "errors": []
}

Multimodal Input

Autessa supports multimodal input - you can send text, audio, or a combination of both. Each input is distinguished by its inputType field, which can be either "TEXT" or "AUDIO".

Text Input

Name
inputType
Type
string
Description
Must be "TEXT"
Name
content
Type
string
Description
The text message from the user

JSON
{
  "inputType": "TEXT",
  "content": "What's the weather today?"
}

Audio Input

Audio can be provided in two ways: base64 encoded data or S3 URI.

Name
inputType
Type
string
Description
Must be "AUDIO"
Name
base64EncodedData
Type
string
Description
Base64 encoded audio data (option 1)
Name
s3Uri
Type
string
Description
S3 URI from /generate-audio-upload-link (option 2)
Name
audioFormat
Type
AudioFormat
Description
Audio format metadata
Name
transcription
Type
string
Description
Optional pre-transcribed text

Base64
S3 URI

{
  "inputType": "AUDIO",
  "base64EncodedData": "UklGRiQAAABXQVZFZm10...",
  "audioFormat": {
    "sampleRate": 16000,
    "sampleSizeInBits": 16,
    "channels": 1,
    "signed": true,
    "bigEndian": false
  }
}

{
  "inputType": "AUDIO",
  "s3Uri": "s3://autessa-audio/user-123/audio.wav",
  "audioFormat": {
"sampleRate": 16000,
"sampleSizeInBits": 16,
"channels": 1,
"signed": true,
"bigEndian": false
  }
}

AudioFormat

Name
sampleRate
Type
integer
Description
Samples per second in Hz. Common values: 16000 (telephony), 44100 (CD quality), 48000 (professional).
Name
sampleSizeInBits
Type
integer
Description
Bits per sample. Typical values: 8, 16, 24, 32.
Name
channels
Type
integer
Description
Number of audio channels. 1 = mono, 2 = stereo.
Name
signed
Type
boolean
Description
Whether samples are signed (true) or unsigned (false).
Name
bigEndian
Type
boolean
Description
Byte order: true = big-endian, false = little-endian.

Multimodal Output

Agent responses can be in text or audio format. The output type is distinguished by the outputType field, which can be either "TEXT" or "AUDIO".

Text Output

Name
outputType
Type
string
Description
Must be "TEXT"
Name
content
Type
string
Description
The text response from the agent

JSON
{
  "outputType": "TEXT",
  "content": "The weather in Seattle is 65°F with partly cloudy skies."
}

Audio Output

Audio output is used when you request executionOutputMode: "AUDIO" via WebSocket.

Name
outputType
Type
string
Description
Must be "AUDIO"
Name
sampleRate
Type
integer
Description
Samples per second in Hz (e.g., 16000)
Name
base64Audio
Type
string
Description
Base64 encoded audio data
Name
transcription
Type
string
Description
Text transcription of the audio
Name
isFinalChunk
Type
boolean
Description
Whether this is the final chunk of audio in the stream

JSON
{
  "outputType": "AUDIO",
  "sampleRate": 16000,
  "base64Audio": "UklGRiQAAABXQVZFZm10...",
  "transcription": "The weather in Seattle is 65 degrees with partly cloudy skies.",
  "isFinalChunk": true
}

Custom Version

Override the default PUBLISHED version of an agent:

Name
versionState
Type
string
Description
Version state: "PUBLISHED", "DRAFT", or "ARCHIVED". Defaults to "PUBLISHED".
Name
versionNumber
Type
integer
Description
Specific version number to use (optional).

REST API
WebSocket

{
  "agentId": 123,
  "version": {
    "versionState": "DRAFT"
  },
  "input": [...]
}

wss://api.autessa.com/ws/clients/agents/execute?authorization=key&resourceId=123&versionState=DRAFT

Enumerations

EConversationStatus

IN_PROGRESS - Conversation is active
CLOSED - Conversation has been closed
NOT_FOUND - Conversation doesn't exist

ExecutionOutputMode

TEXT - Text output (default)
AUDIO - Audio output (base64 encoded)

EInputType

TEXT - Text input
AUDIO - Audio input

EOutputType

TEXT - Text output
AUDIO - Audio output

EVersionState

PUBLISHED - Published version (default)
DRAFT - Draft/sandbox version
ARCHIVED - Archived version

Rate Limiting

Agent execution endpoints are rate limited. Rate limit information is included in response headers:

X-RateLimit-Limit-Minute - Requests allowed per minute
X-RateLimit-Remaining-Minute - Remaining requests this minute
X-RateLimit-Limit-Day - Requests allowed per day
X-RateLimit-Remaining-Day - Remaining requests today

When rate limited, you'll receive a 429 status code with a Retry-After header.

Complete Examples

Single-Turn Conversation

JavaScript
Python

// Simple one-shot agent execution
const response = await fetch(
  'https://api.autessa.com/clients/agents/execute?resourceId=123',
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': 'your_api_key'
    },
    body: JSON.stringify({
      agentId: 123,
      input: [
        {
          inputType: 'TEXT',
          content: 'Tell me a joke'
        }
      ]
    })
  }
);

const data = await response.json();
console.log(data.output[0].content);

import requests

response = requests.post(
    'https://api.autessa.com/clients/agents/execute?resourceId=123',
    headers={
        'Content-Type': 'application/json',
        'Authorization': 'your_api_key'
    },
    json={
        'agentId': 123,
        'input': [
            {
                'inputType': 'TEXT',
                'content': 'Tell me a joke'
            }
        ]
    }
)

data = response.json()
print(data['output'][0]['content'])

Multi-Turn Conversation

JavaScript
// Create conversation
const createResp = await fetch(
  'https://api.autessa.com/clients/agents/execute-create?resourceId=123',
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': 'your_api_key'
    },
    body: JSON.stringify({ agentId: 123 })
  }
);

const { conversationId } = await createResp.json();

// First message
const resp1 = await fetch(
  'https://api.autessa.com/clients/agents/execute?resourceId=123',
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': 'your_api_key'
    },
    body: JSON.stringify({
      agentId: 123,
      conversationId,
      input: [{ inputType: 'TEXT', content: 'Hi, I need help booking a flight' }]
    })
  }
);

// Second message (continues conversation)
const resp2 = await fetch(
  'https://api.autessa.com/clients/agents/execute?resourceId=123',
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': 'your_api_key'
    },
    body: JSON.stringify({
      agentId: 123,
      conversationId,
      input: [{ inputType: 'TEXT', content: 'From Seattle to San Francisco' }]
    })
  }
);

// Close conversation
await fetch(
  'https://api.autessa.com/clients/agents/execute-close?resourceId=123',
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': 'your_api_key'
    },
    body: JSON.stringify({
      agentId: 123,
      conversationId
    })
  }
);

Mixed Text and Audio Input

JavaScript
// Example showing both text and audio inputs in one request
const response = await fetch(
  'https://api.autessa.com/clients/agents/execute?resourceId=123',
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': 'your_api_key'
    },
    body: JSON.stringify({
      agentId: 123,
      conversationId: 'conv_a1b2c3d4e5',
      input: [
        // Text input
        {
          inputType: 'TEXT',
          content: 'Here is my question:'
        },
        // Audio input
        {
          inputType: 'AUDIO',
          s3Uri: 's3://autessa-audio/user-123/question.wav',
          audioFormat: {
            sampleRate: 16000,
            sampleSizeInBits: 16,
            channels: 1,
            signed: true,
            bigEndian: false
          }
        }
      ]
    })
  }
);

const data = await response.json();
// Response has outputType field to distinguish format
console.log('Output type:', data.output[0].outputType); // "TEXT" or "AUDIO"

WebSocket Streaming

JavaScript
const ws = new WebSocket(
  'wss://api.autessa.com/ws/clients/agents/execute?authorization=your_api_key&resourceId=123'
);

let conversationId;
let fullResponse = '';

ws.onopen = () => {
  console.log('Connected');
};

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);

  // Connection status
  if (data.status) {
    conversationId = data.conversationId;
    console.log('Conversation created:', conversationId);

    // Send first message
    ws.send(JSON.stringify({
      messageType: 'MULTIMODAL',
      agentId: 123,
      executionOutputMode: 'TEXT',
      input: [
        {
          inputType: 'TEXT',
          content: 'What is the weather?'
        }
      ]
    }));
    return;
  }

  // Streaming text
  if (data.streamedOutput) {
    fullResponse += data.streamedOutput.content;
    console.log('Stream:', data.streamedOutput.content);
  }

  // Tool messages
  if (data.streamedToolMessages) {
    console.log('Tool:', data.streamedToolMessages.streamMessage);
  }

  // Final output
  if (data.output) {
    console.log('Final response:', data.output[0].content);
    console.log('Full streamed response:', fullResponse);
  }
};

ws.onerror = (error) => {
  console.error('WebSocket error:', error);
};

ws.onclose = () => {
  console.log('Disconnected');
};

Real-time Voice Mode

For conversational AI with continuous audio streaming, use the voiceModeEnabled=true parameter. This enables bidirectional audio streaming for natural voice conversations.

JavaScript
Python

const agentId = 123
const apiKey = 'your_api_key'

// Connect with voice mode enabled
const ws = new WebSocket(
  `wss://api.autessa.com/ws/clients/agents/execute?authorization=${apiKey}&resourceId=${agentId}&voiceModeEnabled=true`
)

let conversationId

ws.onopen = () => {
  console.log('Connected in voice mode')
}

ws.onmessage = async (event) => {
  // Handle connection status
  if (typeof event.data === 'string') {
    const data = JSON.parse(event.data)

    if (data.status) {
      conversationId = data.conversationId
      console.log('Conversation created:', conversationId)

      // Send voice establishment message
      ws.send(JSON.stringify({
        messageType: 'VOICE',
        agentId: agentId,
        audioFormat: {
          sampleRate: 16000,
          sampleSizeInBits: 32,
          channels: 1,
          signed: true,
          bigEndian: false
        },
        outputType: 'AUDIO',
        environmentVariables: {},
        promptTemplateVariables: {}
      }))

      console.log('Voice mode established')
      return
    }

    // Handle audio output chunks
    if (data.streamedOutput?.outputType === 'AUDIO') {
      const { base64Audio, sampleRate, transcription, isFinalChunk } = data.streamedOutput

      console.log('Agent:', transcription)

      // Process audio chunk for playback
      // See: https://docs.autessa.com/api/audio-playback
      await audioProcessor.processAudioChunk(base64Audio, sampleRate, transcription)

      // Auto-start playback
      const status = audioProcessor.getStatus()
      if (status.totalItems === 1 && !status.isPlaying) {
        audioProcessor.play()
      }

      if (isFinalChunk) {
        console.log('Agent finished speaking')
      }
    }
  }
}

// Start recording and stream audio
// See complete implementation at: https://docs.autessa.com/api/audio-record
const recorder = new AudioRecorder(16000)

recorder.onSpeechStart = () => {
  console.log('User started speaking')
  audioProcessor.setInterrupted(true)
}

recorder.onSpeechEnd = () => {
  console.log('User stopped speaking')
  audioProcessor.setInterrupted(false)
}

// Start continuous recording with WebSocket
await recorder.startContinuousRecording(
  `wss://api.autessa.com/ws/clients/agents/execute?authorization=${apiKey}&resourceId=${agentId}&voiceModeEnabled=true`,
  {
    outputType: 'AUDIO',
    agentId: agentId,
    environmentVariables: {},
    promptTemplateVariables: {}
  }
)

// Later: cleanup
recorder.stopRecording()
audioProcessor.clear()
ws.close()

import websocket
import json
import asyncio

agent_id = 123
api_key = 'your_api_key'

# Connect with voice mode
ws_url = f"wss://api.autessa.com/ws/clients/agents/execute?authorization={api_key}&resourceId={agent_id}&voiceModeEnabled=true"
ws = websocket.create_connection(ws_url)

# Receive connection status
status = json.loads(ws.recv())
conversation_id = status['conversationId']
print(f"Connected: {conversation_id}")

# Send voice establishment message
ws.send(json.dumps({
    "messageType": "VOICE",
    "agentId": agent_id,
    "audioFormat": {
        "sampleRate": 16000,
        "sampleSizeInBits": 32,
        "channels": 1,
        "signed": True,
        "bigEndian": False
    },
    "outputType": "AUDIO",
    "environmentVariables": {},
    "promptTemplateVariables": {}
}))

print("Voice mode established")

# Now stream audio bytes
# See Python audio recording implementation in docs
while True:
    # Receive audio or text responses
    response = json.loads(ws.recv())

    if 'streamedOutput' in response:
        output = response['streamedOutput']
        if output['outputType'] == 'AUDIO':
            print(f"Agent: {output['transcription']}")
            # Process audio chunk
            # See: https://docs.autessa.com/api/audio-playback

ws.close()

Voice Mode Protocol:

Connect with voiceModeEnabled=true
Wait for connection status message
Send VOICE establishment message with audio format
Stream raw audio bytes as binary WebSocket messages
Receive streamed audio chunks in real-time

For complete implementation with Voice Activity Detection and audio utilities, see the Audio Recording and Audio Playback guides.

Voice Establishment Message

After connecting, send this message to establish voice mode:

VOICE Message
{
  "messageType": "VOICE",
  "agentId": 123,
  "audioFormat": {
    "sampleRate": 16000,
    "sampleSizeInBits": 32,
    "channels": 1,
    "signed": true,
    "bigEndian": false
  },
  "outputType": "AUDIO",
  "environmentVariables": {},
  "promptTemplateVariables": {}
}

After sending the establishment message:

Send raw audio bytes as binary WebSocket messages (ArrayBuffer/bytes)
Receive JSON messages with audio chunks and transcriptions
Audio format: Float32Array converted to ArrayBuffer (32-bit float, mono, 16kHz)

Audio Streaming Format

Input Audio (Client → Server):

Send raw ArrayBuffer from Float32Array
16kHz sample rate, 32-bit float, mono
Continuous streaming (detected via Voice Activity Detection)

Output Audio (Server → Client):

{
  "streamedOutput": {
    "outputType": "AUDIO",
    "sampleRate": 16000,
    "base64Audio": "UklGRiQAAABXQVZF...",
    "transcription": "Hello! How can I help you today?",
    "isFinalChunk": false
  }
}

Complete Voice Mode Example

This example combines everything for a working voice conversation:

Complete TypeScript Example
import { AudioRecorder } from './audio-recorder'
import { AudioProcessor } from './audio-processor'

class VoiceConversation {
  private agentId: number
  private apiKey: string
  private ws: WebSocket | null = null
  private recorder: AudioRecorder
  private audioProcessor: AudioProcessor
  private conversationId: string | null = null

  constructor(agentId: number, apiKey: string) {
    this.agentId = agentId
    this.apiKey = apiKey
    this.recorder = new AudioRecorder(16000)
    this.audioProcessor = new AudioProcessor()
    this.setupCallbacks()
  }

  private setupCallbacks() {
    // Handle user speech detection
    this.recorder.onSpeechStart = () => {
      console.log('👤 User speaking...')
      this.audioProcessor.setInterrupted(true)
    }

    this.recorder.onSpeechEnd = () => {
      console.log('👤 User stopped')
      this.audioProcessor.setInterrupted(false)
    }
  }

  async start() {
    const wsUrl = `wss://api.autessa.com/ws/clients/agents/execute?authorization=${this.apiKey}&resourceId=${this.agentId}&voiceModeEnabled=true`

    this.ws = new WebSocket(wsUrl)

    this.ws.onopen = () => {
      console.log('✅ Connected')
    }

    this.ws.onmessage = async (event) => {
      const data = JSON.parse(event.data)

      // Connection status
      if (data.status) {
        this.conversationId = data.conversationId
        console.log('💬 Conversation:', this.conversationId)

        // Establish voice mode
        await this.recorder.startContinuousRecording(wsUrl, {
          outputType: 'AUDIO',
          agentId: this.agentId,
          environmentVariables: {},
          promptTemplateVariables: {}
        })

        console.log('🎤 Voice mode active')
        return
      }

      // Audio responses
      if (data.streamedOutput?.outputType === 'AUDIO') {
        const { base64Audio, sampleRate, transcription, isFinalChunk } = data.streamedOutput

        console.log('🤖 Agent:', transcription)

        // Play audio
        await this.audioProcessor.processAudioChunk(base64Audio, sampleRate, transcription)

        const status = this.audioProcessor.getStatus()
        if (status.totalItems === 1 && !status.isPlaying) {
          this.audioProcessor.play()
        }

        if (isFinalChunk) {
          console.log('🤖 Agent finished')
        }
      }
    }

    this.ws.onerror = (error) => {
      console.error('❌ Error:', error)
    }

    this.ws.onclose = () => {
      console.log('👋 Disconnected')
      this.cleanup()
    }
  }

  cleanup() {
    this.recorder.stopRecording()
    this.audioProcessor.clear()
    if (this.ws) {
      this.ws.close()
      this.ws = null
    }
  }
}

// Usage
const conversation = new VoiceConversation(123, 'your_api_key')
await conversation.start()

// Later: cleanup
conversation.cleanup()

Production Considerations:

Interruption Handling: The example above handles user interruptions by pausing agent audio when the user starts speaking
Voice Activity Detection: The AudioRecorder includes VAD to detect speech vs silence
Memory Management: Always call cleanup methods to free blob URLs and close connections
Error Handling: Implement reconnection logic for production use
Audio Format: Use 16kHz, mono, Float32 for recording; receive Int16 PCM for playback

For complete utility class implementations, see:

Base URL​

Authentication​

HTTP REST APIs​

WebSocket APIs​

REST API Endpoints​

Create Conversation POST/execute-create​

Query Parameters​

Request Body​

Response​

Execute Agent POST/execute​

Query Parameters​

Request Body​

Response​

Close Conversation POST/execute-close​

Query Parameters​

Request Body​

Response​

Get Conversation POST/conversation​

Query Parameters​

Request Body​

Response​

Conversation Log Structure​

Log Types​

AgentLog Structure​

Get Conversation Status POST/conversation-status​

Query Parameters​

Request Body​

Response​

Revive Conversation POST/revive-conversation​

Query Parameters​

Request Body​

Response​

Generate Audio Upload Link POST/generate-audio-upload-link​

Query Parameters​

Request Body​

Response​

WebSocket API​

Multimodal Agent Execution WSS/ws/clients/agents/execute​

Connection URL​

Query Parameters​

Connection Lifecycle​

Server-to-Client Messages​

Client-to-Server Messages​

MULTIMODAL Message​

VOICE Message​

Streaming Response Messages​

1. Text Stream Messages​

2. Tool Stream Messages​

3. Audio Stream Messages​

4. Final Response​

Multimodal Input​

Text Input​

Audio Input​

AudioFormat​

Multimodal Output​

Text Output​

Audio Output​

Custom Version​

Enumerations​

EConversationStatus​

ExecutionOutputMode​

EInputType​

EOutputType​

EVersionState​

Rate Limiting​

Complete Examples​

Single-Turn Conversation​

Multi-Turn Conversation​

Mixed Text and Audio Input​

WebSocket Streaming​

Real-time Voice Mode​

Voice Establishment Message​

Audio Streaming Format​

Complete Voice Mode Example​

Base URL

Authentication

HTTP REST APIs

WebSocket APIs

REST API Endpoints

Create Conversation POST`/execute-create`

Query Parameters

Request Body

Response

Execute Agent POST`/execute`

Query Parameters

Request Body

Response

Close Conversation POST`/execute-close`

Query Parameters

Request Body

Response

Get Conversation POST`/conversation`

Query Parameters

Request Body

Response

Conversation Log Structure

Log Types

AgentLog Structure

Get Conversation Status POST`/conversation-status`

Query Parameters

Request Body

Response

Revive Conversation POST`/revive-conversation`

Query Parameters

Request Body

Response

Generate Audio Upload Link POST`/generate-audio-upload-link`

Query Parameters

Request Body

Response

WebSocket API

Multimodal Agent Execution WSS`/ws/clients/agents/execute`

Connection URL

Query Parameters

Connection Lifecycle

Server-to-Client Messages

Client-to-Server Messages

MULTIMODAL Message

VOICE Message

Streaming Response Messages

1. Text Stream Messages

2. Tool Stream Messages

3. Audio Stream Messages

4. Final Response

Multimodal Input

Text Input

Audio Input

AudioFormat

Multimodal Output

Text Output

Audio Output

Custom Version

Enumerations

EConversationStatus

ExecutionOutputMode

EInputType

EOutputType

EVersionState

Rate Limiting

Complete Examples

Single-Turn Conversation

Multi-Turn Conversation

Mixed Text and Audio Input

WebSocket Streaming

Real-time Voice Mode

Voice Establishment Message

Audio Streaming Format

Complete Voice Mode Example