Agent API
The Agent API allows you to create and manage conversations with AI agents, execute them with text or audio input, and receive real-time streaming responses.
All Agent API requests require the resourceId query parameter set to your agent ID and an Authorization header with your API key.
Base URL
REST API: https://api.autessa.com/clients/agents
WebSocket: wss://api.autessa.com/ws/clients/agents/execute
Authentication
HTTP REST APIs
Pass your API key in the Authorization header (no "Bearer" prefix):
Authorization: your_api_key_here
WebSocket APIs
Pass your API key as a query parameter:
authorization=your_api_key_here
- REST
- WebSocket
curl https://api.autessa.com/clients/agents/execute-create?resourceId=123 \
-H "Authorization: your_api_key" \
-H "Content-Type: application/json"
websocat "wss://api.autessa.com/ws/clients/agents/execute?authorization=your_api_key&resourceId=123"
REST API Endpoints
Create Conversation POST/execute-create
Creates a new conversation for multi-turn agent interactions. A conversation maintains context across multiple agent executions. You only need to create a conversation if you want multi-turn interactions - single-shot requests don't require conversations.
Note: Once created, a conversation is tied to a specific agent version. Even if you publish a new version, existing conversations will continue using the original version.
Query Parameters
- Name
resourceId- Type
- integer
- Description
The ID of the agent.
Request Body
- Name
agentId- Type
- integer
- Description
The ID of the agent to create a conversation for.
- Name
version- Type
- CustomVersion
- Description
Optional version override. If not specified, uses the PUBLISHED version.
Response
- Name
conversationId- Type
- string
- Description
The unique ID for the created conversation. Use this in subsequent execute requests.
- Name
errors- Type
- array<string>
- Description
Array of error messages if any occurred.
- cURL
- JavaScript
curl --request POST \
--url 'https://api.autessa.com/clients/agents/execute-create?resourceId=123' \
--header 'Content-Type: application/json' \
--header 'Authorization: your_api_key' \
--data '{
"agentId": 123
}'
const response = await fetch(
'https://api.autessa.com/clients/agents/execute-create?resourceId=123',
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'your_api_key'
},
body: JSON.stringify({
agentId: 123
})
}
);
const data = await response.json();
console.log(data.conversationId);
{
"conversationId": "conv_a1b2c3d4e5",
"errors": []
}
Execute Agent POST/execute
Executes an agent synchronously with multimodal input (text or audio). This endpoint returns the complete response once processing is done.
Note: Synchronous execution only supports TEXT output. For AUDIO output or real-time streaming, use the WebSocket endpoint.
Query Parameters
- Name
resourceId- Type
- integer
- Description
The ID of the agent.
Request Body
- Name
agentId- Type
- integer
- Description
The ID of the agent to execute.
- Name
conversationId- Type
- string
- Description
The conversation ID from
/execute-create. Required for multi-turn conversations.
- Name
input- Type
- array<MultimodalInput>
- Description
Array of multimodal inputs. See Multimodal Input.
- Name
environmentVariables- Type
- object
- Description
Key-value pairs for environment variables that won't be stored on the server.
- Name
promptTemplateVariables- Type
- object
- Description
Key-value pairs to inject into the agent's prompt template. Variables are referenced in the agent instructions using
@@{variableName}@@syntax.
- Name
version- Type
- CustomVersion
- Description
Version override. Ignored if using an existing conversationId.
Response
- Name
output- Type
- array<MultimodalOutput>
- Description
Array of output responses from the agent.
- Name
errors- Type
- array<string>
- Description
Array of error messages if any occurred.
- cURL
- JavaScript
curl --request POST \
--url 'https://api.autessa.com/clients/agents/execute?resourceId=123' \
--header 'Content-Type: application/json' \
--header 'Authorization: your_api_key' \
--data '{
"agentId": 123,
"conversationId": "conv_a1b2c3d4e5",
"input": [
{
"inputType": "TEXT",
"content": "What is the weather today?"
}
],
"environmentVariables": {
"USER_LOCATION": "Seattle"
},
"promptTemplateVariables": {
"customerName": "John Doe",
"accountType": "Premium"
}
}'
const response = await fetch(
'https://api.autessa.com/clients/agents/execute?resourceId=123',
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'your_api_key'
},
body: JSON.stringify({
agentId: 123,
conversationId: 'conv_a1b2c3d4e5',
input: [
{
inputType: 'TEXT',
content: 'What is the weather today?'
}
],
environmentVariables: {
USER_LOCATION: 'Seattle'
},
promptTemplateVariables: {
customerName: 'John Doe',
accountType: 'Premium'
}
})
}
);
const data = await response.json();
{
"output": [
{
"outputType": "TEXT",
"content": "The weather in Seattle today is 65°F with partly cloudy skies."
}
],
"errors": []
}
Close Conversation POST/execute-close
Closes an existing conversation. This triggers logging and any configured auto-evaluation for the conversation. It's safe to call this even for expired conversations.
Important: Conversations automatically close after 12 hours of inactivity, but it's best practice to explicitly close them when done.
Query Parameters
- Name
resourceId- Type
- integer
- Description
The ID of the agent.
Request Body
- Name
agentId- Type
- integer
- Description
The ID of the agent.
- Name
conversationId- Type
- string
- Description
The conversation ID to close.
Response
- Name
errors- Type
- array<string>
- Description
Array of error messages if any occurred.
- cURL
- JavaScript
curl --request POST \
--url 'https://api.autessa.com/clients/agents/execute-close?resourceId=123' \
--header 'Content-Type: application/json' \
--header 'Authorization: your_api_key' \
--data '{
"agentId": 123,
"conversationId": "conv_a1b2c3d4e5"
}'
await fetch(
'https://api.autessa.com/clients/agents/execute-close?resourceId=123',
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'your_api_key'
},
body: JSON.stringify({
agentId: 123,
conversationId: 'conv_a1b2c3d4e5'
})
}
);
{
"errors": []
}
Get Conversation POST/conversation
Retrieves the complete conversation history including all messages and evaluation context.
Query Parameters
- Name
resourceId- Type
- integer
- Description
The ID of the agent.
Request Body
- Name
agentId- Type
- integer
- Description
The ID of the agent.
- Name
conversationId- Type
- string
- Description
The conversation ID to retrieve.
Response
- Name
conversation- Type
- AgentConversationDto
- Description
The complete conversation data including logs and evaluation context.
- Name
errors- Type
- array<string>
- Description
Array of error messages if any occurred.
curl --request POST \
--url 'https://api.autessa.com/clients/agents/conversation?resourceId=123' \
--header 'Content-Type: application/json' \
--header 'Authorization: your_api_key' \
--data '{
"agentId": 123,
"conversationId": "conv_a1b2c3d4e5"
}'
{
"conversation": {
"resourceId": 123,
"version": "1",
"conversationId": "conv_a1b2c3d4e5",
"apiKeyId": "key_xyz",
"closeDate": null,
"logs": [
{
"logType": "USER",
"userMultimodalInputs": [
{
"inputType": "TEXT",
"content": "What is the weather in Seattle?"
}
]
},
{
"logType": "TOOL_REQUEST",
"executeToolRequest": {
"toolCallId": "call_abc123",
"name": "GetWeather",
"variables": {
"location": "Seattle"
}
}
},
{
"logType": "TOOL_RESPONSE",
"executeToolResponse": {
"toolCallId": "call_abc123",
"toolName": "GetWeather",
"result": {
"temperature": 65,
"condition": "Partly Cloudy"
}
}
},
{
"logType": "MESSAGE_TO_USER",
"multimodalOutput": {
"outputType": "TEXT",
"content": "The weather in Seattle is currently 65°F with partly cloudy skies."
}
}
]
},
"errors": []
}
Conversation Log Structure
Conversation logs returned by the /conversation endpoint contain detailed information about all interactions in a conversation. Each log entry has a logType field that determines which other fields are present.
Log Types
- Name
USER- Description
User input to the agent. Contains
userMultimodalInputsarray with text or audio inputs.
- Name
MESSAGE_TO_USER- Description
Agent's response to the user. Contains
multimodalOutputwith text or audio output.
- Name
TOOL_REQUEST- Description
Request to execute a tool. Contains
executeToolRequestwith tool name and variables.
- Name
TOOL_RESPONSE- Description
Result from tool execution. Contains
executeToolResponsewith the tool's result.
AgentLog Structure
Each log entry has one of these structures based on its logType:
USER Log:
- Name
logType- Type
- string
- Description
"USER"
- Name
userMultimodalInputs- Type
- array<MultimodalInput>
- Description
Array of user inputs (text or audio). See Multimodal Input.
MESSAGE_TO_USER Log:
- Name
logType- Type
- string
- Description
"MESSAGE_TO_USER"
- Name
multimodalOutput- Type
- MultimodalOutput
- Description
The agent's response (text or audio).
TOOL_REQUEST Log:
- Name
logType- Type
- string
- Description
"TOOL_REQUEST"
- Name
executeToolRequest- Type
- object
- Description
Tool execution request details.
- Name
executeToolRequest.toolCallId- Type
- string
- Description
Unique identifier for this tool call.
- Name
executeToolRequest.name- Type
- string
- Description
Name of the tool being invoked.
- Name
executeToolRequest.variables- Type
- object
- Description
Key-value pairs of arguments passed to the tool.
TOOL_RESPONSE Log:
- Name
logType- Type
- string
- Description
"TOOL_RESPONSE"
- Name
executeToolResponse- Type
- object
- Description
Tool execution result.
- Name
executeToolResponse.toolCallId- Type
- string
- Description
Matches the toolCallId from the corresponding TOOL_REQUEST.
- Name
executeToolResponse.toolName- Type
- string
- Description
Name of the tool that was executed.
- Name
executeToolResponse.result- Type
- any
- Description
The result returned by the tool.
- USER (Text)
- USER (Audio)
- MESSAGE_TO_USER (Text)
- MESSAGE_TO_USER (Audio)
- TOOL_REQUEST
- TOOL_RESPONSE
{
"logType": "USER",
"userMultimodalInputs": [
{
"inputType": "TEXT",
"content": "What is the weather?"
}
]
}
{
"logType": "USER",
"userMultimodalInputs": [
{
"inputType": "AUDIO",
"s3Uri": "s3://autessa-audio/...",
"audioFormat": {
"sampleRate": 16000,
"sampleSizeInBits": 16,
"channels": 1,
"signed": true,
"bigEndian": false
},
"transcription": "What is the weather?"
}
]
}
{
"logType": "MESSAGE_TO_USER",
"multimodalOutput": {
"outputType": "TEXT",
"content": "It's 65°F and sunny."
}
}
{
"logType": "MESSAGE_TO_USER",
"multimodalOutput": {
"outputType": "AUDIO",
"sampleRate": 16000,
"base64Audio": "UklGRiQAAABXQVZF...",
"transcription": "It's 65 degrees and sunny.",
"isFinalChunk": true
}
}
{
"logType": "TOOL_REQUEST",
"executeToolRequest": {
"toolCallId": "call_123",
"name": "GetWeather",
"variables": {
"location": "Seattle",
"units": "fahrenheit"
}
}
}
{
"logType": "TOOL_RESPONSE",
"executeToolResponse": {
"toolCallId": "call_123",
"toolName": "GetWeather",
"result": {
"temperature": 65,
"condition": "Sunny"
}
}
}
Get Conversation Status POST/conversation-status
Checks the current status of a conversation (IN_PROGRESS, CLOSED, or NOT_FOUND).
Query Parameters
- Name
resourceId- Type
- integer
- Description
The ID of the agent.
Request Body
- Name
agentId- Type
- integer
- Description
The ID of the agent.
- Name
conversationId- Type
- string
- Description
The conversation ID to check.
Response
- Name
conversationId- Type
- string
- Description
The conversation ID that was checked.
- Name
status- Type
- string
- Description
One of:
IN_PROGRESS,CLOSED,NOT_FOUND
- Name
errors- Type
- array<string>
- Description
Array of error messages if any occurred.
curl --request POST \
--url 'https://api.autessa.com/clients/agents/conversation-status?resourceId=123' \
--header 'Content-Type: application/json' \
--header 'Authorization: your_api_key' \
--data '{
"agentId": 123,
"conversationId": "conv_a1b2c3d4e5"
}'
{
"conversationId": "conv_a1b2c3d4e5",
"status": "IN_PROGRESS",
"errors": []
}
Revive Conversation POST/revive-conversation
Revives a closed conversation, allowing you to continue the interaction where it left off.
Query Parameters
- Name
resourceId- Type
- integer
- Description
The ID of the agent.
Request Body
- Name
agentId- Type
- integer
- Description
The ID of the agent.
- Name
conversationId- Type
- string
- Description
The conversation ID to revive.
Response
- Name
conversationId- Type
- string
- Description
The revived conversation ID.
- Name
status- Type
- string
- Description
Status message about the revival operation.
- Name
errors- Type
- array<string>
- Description
Array of error messages if any occurred.
curl --request POST \
--url 'https://api.autessa.com/clients/agents/revive-conversation?resourceId=123' \
--header 'Content-Type: application/json' \
--header 'Authorization: your_api_key' \
--data '{
"agentId": 123,
"conversationId": "conv_a1b2c3d4e5"
}'
{
"conversationId": "conv_a1b2c3d4e5",
"status": "REVIVED",
"errors": []
}
Generate Audio Upload Link POST/generate-audio-upload-link
Generates a pre-signed S3 upload URL for audio files. Use this when you want to upload audio files separately rather than sending them as base64 in the request.
Note: Currently supports WAV format only.
Query Parameters
- Name
resourceId- Type
- integer
- Description
The ID of the agent.
Request Body
Currently empty (WAV format is hardcoded).
Response
- Name
uploadUrl- Type
- string
- Description
Pre-signed URL for uploading the audio file via PUT request.
- Name
s3Uri- Type
- string
- Description
S3 URI to use in AudioInput when referencing this file.
- Name
expiresInSeconds- Type
- integer
- Description
Time in seconds until the upload URL expires.
- Name
errors- Type
- array<string>
- Description
Array of error messages if any occurred.
curl --request POST \
--url 'https://api.autessa.com/clients/agents/generate-audio-upload-link?resourceId=123' \
--header 'Content-Type: application/json' \
--header 'Authorization: your_api_key' \
--data '{}'
{
"uploadUrl": "https://s3.amazonaws.com/autessa-audio/...",
"s3Uri": "s3://autessa-audio/user-123/audio-xyz.wav",
"expiresInSeconds": 3600,
"errors": []
}
# Step 1: Get upload link
# Step 2: Upload audio file
curl --request PUT \
--url 'https://s3.amazonaws.com/autessa-audio/...' \
--header 'Content-Type: audio/wav' \
--data-binary '@audio.wav'
# Step 3: Use s3Uri in AudioInput
# {
# "inputType": "AUDIO",
# "s3Uri": "s3://autessa-audio/user-123/audio-xyz.wav"
# }
WebSocket API
Multimodal Agent Execution WSS/ws/clients/agents/execute
Real-time agent execution with streaming responses. Supports both text and audio input/output with streaming capabilities.
Connection URL
wss://api.autessa.com/ws/clients/agents/execute?authorization={key}&resourceId={agentId}
Query Parameters
- Name
authorization- Type
- string
- Description
Your API key.
- Name
resourceId- Type
- integer
- Description
The agent ID.
- Name
conversationId- Type
- string
- Description
Optional conversation ID to revive and continue.
- Name
voiceModeEnabled- Type
- boolean
- Description
Set to "true" for voice mode with continuous audio streaming.
Connection Lifecycle
- Connect - WebSocket connection established
- Status Message - Server sends conversation status
- Send Messages - Client sends MULTIMODAL or VOICE messages
- Receive Streams - Server streams responses in real-time
- Disconnect - Connection closes, conversation auto-closes
- websocat
- JavaScript
- Python
websocat "wss://api.autessa.com/ws/clients/agents/execute?authorization=your_api_key&resourceId=123"
const ws = new WebSocket(
'wss://api.autessa.com/ws/clients/agents/execute?authorization=your_api_key&resourceId=123'
);
ws.onopen = () => {
console.log('Connected to agent');
};
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
console.log('Received:', data);
};
import websocket
import json
ws = websocket.WebSocket()
ws.connect(
"wss://api.autessa.com/ws/clients/agents/execute?authorization=your_api_key&resourceId=123"
)
# Receive status
status = json.loads(ws.recv())
print(status)
Server-to-Client Messages
When you connect, the server immediately sends a status message:
- New Conversation
- Revived Conversation
- Error
{
"conversationId": "conv_abc123",
"status": "CREATED",
"errors": []
}
{
"conversationId": "conv_abc123",
"status": "REVIVED",
"errors": []
}
{
"conversationId": null,
"status": "ERROR",
"errors": ["Invalid API key"]
}
Client-to-Server Messages
MULTIMODAL Message
Send a JSON message to execute the agent with multimodal input:
- Name
messageType- Type
- string
- Description
Must be
"MULTIMODAL"
- Name
agentId- Type
- integer
- Description
The agent ID
- Name
input- Type
- array<MultimodalInput>
- Description
Array of multimodal inputs (text or audio)
- Name
executionOutputMode- Type
- string
- Description
"TEXT"or"AUDIO"- desired output format. Defaults to"TEXT".
- Name
environmentVariables- Type
- object
- Description
Key-value pairs for environment variables that won't be stored on the server.
- Name
promptTemplateVariables- Type
- object
- Description
Key-value pairs to inject into the agent's prompt template. Variables are referenced in the agent instructions using
@@{variableName}@@syntax.
{
"messageType": "MULTIMODAL",
"agentId": 123,
"executionOutputMode": "TEXT",
"input": [
{
"inputType": "TEXT",
"content": "What's the weather in Seattle?"
}
],
"environmentVariables": {
"USER_LOCATION": "Seattle"
},
"promptTemplateVariables": {
"customerName": "John Doe",
"accountType": "Premium"
}
}
VOICE Message
Establish voice mode for continuous audio streaming:
- Name
messageType- Type
- string
- Description
Must be
"VOICE"
- Name
agentId- Type
- integer
- Description
The agent ID
- Name
audioFormat- Type
- AudioFormat
- Description
Audio format specification for the audio bytes you'll send
- Name
outputType- Type
- string
- Description
"TEXT"or"AUDIO"- desired output format
- Name
environmentVariables- Type
- object
- Description
Environment variables
{
"messageType": "VOICE",
"agentId": 123,
"audioFormat": {
"sampleRate": 16000,
"sampleSizeInBits": 16,
"channels": 1,
"signed": true,
"bigEndian": false
},
"outputType": "AUDIO",
"environmentVariables": {}
}
After establishing voice mode, send raw audio bytes as binary WebSocket messages.
Streaming Response Messages
The server streams responses in real-time. There are three types of messages:
1. Text Stream Messages
{
"streamedOutput": {
"outputType": "TEXT",
"content": " Seattle"
}
}
Text is streamed token-by-token. Concatenate the content fields to build the full response.
2. Tool Stream Messages
- Tool Start
- Tool Variables
- Tool End
- Tool Result
{
"streamedToolMessages": {
"responseType": "EXECUTE_TOOL",
"toolName": "GetWeather",
"invocationId": "inv_123",
"publicVars": {
"city": "Seattle",
"state": "WA"
},
"streamMessage": "EXECUTION START: GetWeather"
}
}
{
"streamedToolMessages": {
"responseType": "EXECUTE_TOOL",
"toolName": "GetWeather",
"invocationId": "inv_123",
"streamMessage": "EXECUTION VARIABLES: {\"city\":\"Seattle\",\"state\":\"WA\"}"
}
}
{
"streamedToolMessages": {
"responseType": "EXECUTE_TOOL",
"toolName": "GetWeather",
"invocationId": "inv_123",
"streamMessage": "EXECUTION END: GetWeather"
}
}
{
"streamedToolMessages": {
"responseType": "EXECUTE_TOOL",
"toolName": "GetWeather",
"invocationId": "inv_123",
"streamMessage": "EXECUTION RESULT: \"It is 65°F and partly cloudy\""
}
}
3. Audio Stream Messages
When executionOutputMode: "AUDIO" is requested:
{
"streamedOutput": {
"outputType": "AUDIO",
"sampleRate": 16000,
"base64Audio": "//8AAAMAAAD...",
"transcription": "The weather in Seattle is 65 degrees",
"isFinalChunk": false
}
}
Audio is streamed in chunks. Decode the base64Audio and play it sequentially.
4. Final Response
After streaming completes, the server sends a final message with the complete output:
{
"conversationId": "conv_abc123",
"output": [
{
"outputType": "TEXT",
"content": "The weather in Seattle is currently 65°F with partly cloudy skies."
}
],
"errors": []
}
Multimodal Input
Autessa supports multimodal input - you can send text, audio, or a combination of both. Each input is distinguished by its inputType field, which can be either "TEXT" or "AUDIO".
Text Input
- Name
inputType- Type
- string
- Description
Must be
"TEXT"
- Name
content- Type
- string
- Description
The text message from the user
{
"inputType": "TEXT",
"content": "What's the weather today?"
}
Audio Input
Audio can be provided in two ways: base64 encoded data or S3 URI.
- Name
inputType- Type
- string
- Description
Must be
"AUDIO"
- Name
base64EncodedData- Type
- string
- Description
Base64 encoded audio data (option 1)
- Name
s3Uri- Type
- string
- Description
S3 URI from
/generate-audio-upload-link(option 2)
- Name
audioFormat- Type
- AudioFormat
- Description
Audio format metadata
- Name
transcription- Type
- string
- Description
Optional pre-transcribed text
- Base64
- S3 URI
{
"inputType": "AUDIO",
"base64EncodedData": "UklGRiQAAABXQVZFZm10...",
"audioFormat": {
"sampleRate": 16000,
"sampleSizeInBits": 16,
"channels": 1,
"signed": true,
"bigEndian": false
}
}
{
"inputType": "AUDIO",
"s3Uri": "s3://autessa-audio/user-123/audio.wav",
"audioFormat": {
"sampleRate": 16000,
"sampleSizeInBits": 16,
"channels": 1,
"signed": true,
"bigEndian": false
}
}
AudioFormat
- Name
sampleRate- Type
- integer
- Description
Samples per second in Hz. Common values: 16000 (telephony), 44100 (CD quality), 48000 (professional).
- Name
sampleSizeInBits- Type
- integer
- Description
Bits per sample. Typical values: 8, 16, 24, 32.
- Name
channels- Type
- integer
- Description
Number of audio channels. 1 = mono, 2 = stereo.
- Name
signed- Type
- boolean
- Description
Whether samples are signed (true) or unsigned (false).
- Name
bigEndian- Type
- boolean
- Description
Byte order: true = big-endian, false = little-endian.
Multimodal Output
Agent responses can be in text or audio format. The output type is distinguished by the outputType field, which can be either "TEXT" or "AUDIO".
Text Output
- Name
outputType- Type
- string
- Description
Must be
"TEXT"
- Name
content- Type
- string
- Description
The text response from the agent
{
"outputType": "TEXT",
"content": "The weather in Seattle is 65°F with partly cloudy skies."
}
Audio Output
Audio output is used when you request executionOutputMode: "AUDIO" via WebSocket.
- Name
outputType- Type
- string
- Description
Must be
"AUDIO"
- Name
sampleRate- Type
- integer
- Description
Samples per second in Hz (e.g., 16000)
- Name
base64Audio- Type
- string
- Description
Base64 encoded audio data
- Name
transcription- Type
- string
- Description
Text transcription of the audio
- Name
isFinalChunk- Type
- boolean
- Description
Whether this is the final chunk of audio in the stream
{
"outputType": "AUDIO",
"sampleRate": 16000,
"base64Audio": "UklGRiQAAABXQVZFZm10...",
"transcription": "The weather in Seattle is 65 degrees with partly cloudy skies.",
"isFinalChunk": true
}
Custom Version
Override the default PUBLISHED version of an agent:
- Name
versionState- Type
- string
- Description
Version state:
"PUBLISHED","DRAFT", or"ARCHIVED". Defaults to"PUBLISHED".
- Name
versionNumber- Type
- integer
- Description
Specific version number to use (optional).
- REST API
- WebSocket
{
"agentId": 123,
"version": {
"versionState": "DRAFT"
},
"input": [...]
}
wss://api.autessa.com/ws/clients/agents/execute?authorization=key&resourceId=123&versionState=DRAFT
Enumerations
EConversationStatus
IN_PROGRESS- Conversation is activeCLOSED- Conversation has been closedNOT_FOUND- Conversation doesn't exist
ExecutionOutputMode
TEXT- Text output (default)AUDIO- Audio output (base64 encoded)
EInputType
TEXT- Text inputAUDIO- Audio input
EOutputType
TEXT- Text outputAUDIO- Audio output
EVersionState
PUBLISHED- Published version (default)DRAFT- Draft/sandbox versionARCHIVED- Archived version
Rate Limiting
Agent execution endpoints are rate limited. Rate limit information is included in response headers:
X-RateLimit-Limit-Minute- Requests allowed per minuteX-RateLimit-Remaining-Minute- Remaining requests this minuteX-RateLimit-Limit-Day- Requests allowed per dayX-RateLimit-Remaining-Day- Remaining requests today
When rate limited, you'll receive a 429 status code with a Retry-After header.
Complete Examples
Single-Turn Conversation
- JavaScript
- Python
// Simple one-shot agent execution
const response = await fetch(
'https://api.autessa.com/clients/agents/execute?resourceId=123',
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'your_api_key'
},
body: JSON.stringify({
agentId: 123,
input: [
{
inputType: 'TEXT',
content: 'Tell me a joke'
}
]
})
}
);
const data = await response.json();
console.log(data.output[0].content);
import requests
response = requests.post(
'https://api.autessa.com/clients/agents/execute?resourceId=123',
headers={
'Content-Type': 'application/json',
'Authorization': 'your_api_key'
},
json={
'agentId': 123,
'input': [
{
'inputType': 'TEXT',
'content': 'Tell me a joke'
}
]
}
)
data = response.json()
print(data['output'][0]['content'])
Multi-Turn Conversation
// Create conversation
const createResp = await fetch(
'https://api.autessa.com/clients/agents/execute-create?resourceId=123',
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'your_api_key'
},
body: JSON.stringify({ agentId: 123 })
}
);
const { conversationId } = await createResp.json();
// First message
const resp1 = await fetch(
'https://api.autessa.com/clients/agents/execute?resourceId=123',
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'your_api_key'
},
body: JSON.stringify({
agentId: 123,
conversationId,
input: [{ inputType: 'TEXT', content: 'Hi, I need help booking a flight' }]
})
}
);
// Second message (continues conversation)
const resp2 = await fetch(
'https://api.autessa.com/clients/agents/execute?resourceId=123',
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'your_api_key'
},
body: JSON.stringify({
agentId: 123,
conversationId,
input: [{ inputType: 'TEXT', content: 'From Seattle to San Francisco' }]
})
}
);
// Close conversation
await fetch(
'https://api.autessa.com/clients/agents/execute-close?resourceId=123',
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'your_api_key'
},
body: JSON.stringify({
agentId: 123,
conversationId
})
}
);
Mixed Text and Audio Input
// Example showing both text and audio inputs in one request
const response = await fetch(
'https://api.autessa.com/clients/agents/execute?resourceId=123',
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'your_api_key'
},
body: JSON.stringify({
agentId: 123,
conversationId: 'conv_a1b2c3d4e5',
input: [
// Text input
{
inputType: 'TEXT',
content: 'Here is my question:'
},
// Audio input
{
inputType: 'AUDIO',
s3Uri: 's3://autessa-audio/user-123/question.wav',
audioFormat: {
sampleRate: 16000,
sampleSizeInBits: 16,
channels: 1,
signed: true,
bigEndian: false
}
}
]
})
}
);
const data = await response.json();
// Response has outputType field to distinguish format
console.log('Output type:', data.output[0].outputType); // "TEXT" or "AUDIO"
WebSocket Streaming
const ws = new WebSocket(
'wss://api.autessa.com/ws/clients/agents/execute?authorization=your_api_key&resourceId=123'
);
let conversationId;
let fullResponse = '';
ws.onopen = () => {
console.log('Connected');
};
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
// Connection status
if (data.status) {
conversationId = data.conversationId;
console.log('Conversation created:', conversationId);
// Send first message
ws.send(JSON.stringify({
messageType: 'MULTIMODAL',
agentId: 123,
executionOutputMode: 'TEXT',
input: [
{
inputType: 'TEXT',
content: 'What is the weather?'
}
]
}));
return;
}
// Streaming text
if (data.streamedOutput) {
fullResponse += data.streamedOutput.content;
console.log('Stream:', data.streamedOutput.content);
}
// Tool messages
if (data.streamedToolMessages) {
console.log('Tool:', data.streamedToolMessages.streamMessage);
}
// Final output
if (data.output) {
console.log('Final response:', data.output[0].content);
console.log('Full streamed response:', fullResponse);
}
};
ws.onerror = (error) => {
console.error('WebSocket error:', error);
};
ws.onclose = () => {
console.log('Disconnected');
};
Real-time Voice Mode
For conversational AI with continuous audio streaming, use the voiceModeEnabled=true parameter. This enables bidirectional audio streaming for natural voice conversations.
- JavaScript
- Python
const agentId = 123
const apiKey = 'your_api_key'
// Connect with voice mode enabled
const ws = new WebSocket(
`wss://api.autessa.com/ws/clients/agents/execute?authorization=${apiKey}&resourceId=${agentId}&voiceModeEnabled=true`
)
let conversationId
ws.onopen = () => {
console.log('Connected in voice mode')
}
ws.onmessage = async (event) => {
// Handle connection status
if (typeof event.data === 'string') {
const data = JSON.parse(event.data)
if (data.status) {
conversationId = data.conversationId
console.log('Conversation created:', conversationId)
// Send voice establishment message
ws.send(JSON.stringify({
messageType: 'VOICE',
agentId: agentId,
audioFormat: {
sampleRate: 16000,
sampleSizeInBits: 32,
channels: 1,
signed: true,
bigEndian: false
},
outputType: 'AUDIO',
environmentVariables: {},
promptTemplateVariables: {}
}))
console.log('Voice mode established')
return
}
// Handle audio output chunks
if (data.streamedOutput?.outputType === 'AUDIO') {
const { base64Audio, sampleRate, transcription, isFinalChunk } = data.streamedOutput
console.log('Agent:', transcription)
// Process audio chunk for playback
// See: https://docs.autessa.com/api/audio-playback
await audioProcessor.processAudioChunk(base64Audio, sampleRate, transcription)
// Auto-start playback
const status = audioProcessor.getStatus()
if (status.totalItems === 1 && !status.isPlaying) {
audioProcessor.play()
}
if (isFinalChunk) {
console.log('Agent finished speaking')
}
}
}
}
// Start recording and stream audio
// See complete implementation at: https://docs.autessa.com/api/audio-record
const recorder = new AudioRecorder(16000)
recorder.onSpeechStart = () => {
console.log('User started speaking')
audioProcessor.setInterrupted(true)
}
recorder.onSpeechEnd = () => {
console.log('User stopped speaking')
audioProcessor.setInterrupted(false)
}
// Start continuous recording with WebSocket
await recorder.startContinuousRecording(
`wss://api.autessa.com/ws/clients/agents/execute?authorization=${apiKey}&resourceId=${agentId}&voiceModeEnabled=true`,
{
outputType: 'AUDIO',
agentId: agentId,
environmentVariables: {},
promptTemplateVariables: {}
}
)
// Later: cleanup
recorder.stopRecording()
audioProcessor.clear()
ws.close()
import websocket
import json
import asyncio
agent_id = 123
api_key = 'your_api_key'
# Connect with voice mode
ws_url = f"wss://api.autessa.com/ws/clients/agents/execute?authorization={api_key}&resourceId={agent_id}&voiceModeEnabled=true"
ws = websocket.create_connection(ws_url)
# Receive connection status
status = json.loads(ws.recv())
conversation_id = status['conversationId']
print(f"Connected: {conversation_id}")
# Send voice establishment message
ws.send(json.dumps({
"messageType": "VOICE",
"agentId": agent_id,
"audioFormat": {
"sampleRate": 16000,
"sampleSizeInBits": 32,
"channels": 1,
"signed": True,
"bigEndian": False
},
"outputType": "AUDIO",
"environmentVariables": {},
"promptTemplateVariables": {}
}))
print("Voice mode established")
# Now stream audio bytes
# See Python audio recording implementation in docs
while True:
# Receive audio or text responses
response = json.loads(ws.recv())
if 'streamedOutput' in response:
output = response['streamedOutput']
if output['outputType'] == 'AUDIO':
print(f"Agent: {output['transcription']}")
# Process audio chunk
# See: https://docs.autessa.com/api/audio-playback
ws.close()
Voice Mode Protocol:
- Connect with
voiceModeEnabled=true - Wait for connection status message
- Send VOICE establishment message with audio format
- Stream raw audio bytes as binary WebSocket messages
- Receive streamed audio chunks in real-time
For complete implementation with Voice Activity Detection and audio utilities, see the Audio Recording and Audio Playback guides.
Voice Establishment Message
After connecting, send this message to establish voice mode:
{
"messageType": "VOICE",
"agentId": 123,
"audioFormat": {
"sampleRate": 16000,
"sampleSizeInBits": 32,
"channels": 1,
"signed": true,
"bigEndian": false
},
"outputType": "AUDIO",
"environmentVariables": {},
"promptTemplateVariables": {}
}
After sending the establishment message:
- Send raw audio bytes as binary WebSocket messages (ArrayBuffer/bytes)
- Receive JSON messages with audio chunks and transcriptions
- Audio format: Float32Array converted to ArrayBuffer (32-bit float, mono, 16kHz)
Audio Streaming Format
Input Audio (Client → Server):
- Send raw ArrayBuffer from Float32Array
- 16kHz sample rate, 32-bit float, mono
- Continuous streaming (detected via Voice Activity Detection)
Output Audio (Server → Client):
{
"streamedOutput": {
"outputType": "AUDIO",
"sampleRate": 16000,
"base64Audio": "UklGRiQAAABXQVZF...",
"transcription": "Hello! How can I help you today?",
"isFinalChunk": false
}
}
Complete Voice Mode Example
This example combines everything for a working voice conversation:
import { AudioRecorder } from './audio-recorder'
import { AudioProcessor } from './audio-processor'
class VoiceConversation {
private agentId: number
private apiKey: string
private ws: WebSocket | null = null
private recorder: AudioRecorder
private audioProcessor: AudioProcessor
private conversationId: string | null = null
constructor(agentId: number, apiKey: string) {
this.agentId = agentId
this.apiKey = apiKey
this.recorder = new AudioRecorder(16000)
this.audioProcessor = new AudioProcessor()
this.setupCallbacks()
}
private setupCallbacks() {
// Handle user speech detection
this.recorder.onSpeechStart = () => {
console.log('👤 User speaking...')
this.audioProcessor.setInterrupted(true)
}
this.recorder.onSpeechEnd = () => {
console.log('👤 User stopped')
this.audioProcessor.setInterrupted(false)
}
}
async start() {
const wsUrl = `wss://api.autessa.com/ws/clients/agents/execute?authorization=${this.apiKey}&resourceId=${this.agentId}&voiceModeEnabled=true`
this.ws = new WebSocket(wsUrl)
this.ws.onopen = () => {
console.log('✅ Connected')
}
this.ws.onmessage = async (event) => {
const data = JSON.parse(event.data)
// Connection status
if (data.status) {
this.conversationId = data.conversationId
console.log('💬 Conversation:', this.conversationId)
// Establish voice mode
await this.recorder.startContinuousRecording(wsUrl, {
outputType: 'AUDIO',
agentId: this.agentId,
environmentVariables: {},
promptTemplateVariables: {}
})
console.log('🎤 Voice mode active')
return
}
// Audio responses
if (data.streamedOutput?.outputType === 'AUDIO') {
const { base64Audio, sampleRate, transcription, isFinalChunk } = data.streamedOutput
console.log('🤖 Agent:', transcription)
// Play audio
await this.audioProcessor.processAudioChunk(base64Audio, sampleRate, transcription)
const status = this.audioProcessor.getStatus()
if (status.totalItems === 1 && !status.isPlaying) {
this.audioProcessor.play()
}
if (isFinalChunk) {
console.log('🤖 Agent finished')
}
}
}
this.ws.onerror = (error) => {
console.error('❌ Error:', error)
}
this.ws.onclose = () => {
console.log('👋 Disconnected')
this.cleanup()
}
}
cleanup() {
this.recorder.stopRecording()
this.audioProcessor.clear()
if (this.ws) {
this.ws.close()
this.ws = null
}
}
}
// Usage
const conversation = new VoiceConversation(123, 'your_api_key')
await conversation.start()
// Later: cleanup
conversation.cleanup()
Production Considerations:
- Interruption Handling: The example above handles user interruptions by pausing agent audio when the user starts speaking
- Voice Activity Detection: The AudioRecorder includes VAD to detect speech vs silence
- Memory Management: Always call cleanup methods to free blob URLs and close connections
- Error Handling: Implement reconnection logic for production use
- Audio Format: Use 16kHz, mono, Float32 for recording; receive Int16 PCM for playback
For complete utility class implementations, see: