Batch Persistent Worker
Available from version 15.5.0
A batch persistent worker (also called an HTTP batch worker) is a long-running transcription service that accepts jobs over HTTP. Unlike standard batch workers, it stays alive between jobs — meaning you pay the startup cost once, not on every request.
Why use a persistent worker?
The persistent worker is especially beneficial for smaller audio files, where startup overhead would otherwise dominate total turnaround time.
Starting the worker
docker run -it \
-e LICENSE_TOKEN=$TOKEN_VALUE \
-p PORT:18000 \
batch-asr-transcriber-en:15.5.0 \
--run-mode http \
--parallel=4 \
--all-formats /output_dir_name
Parameters
To use a different internal port, set the SM_BATCH_WORKER_LISTEN_PORT environment variable.
Submitting a job
curl -X POST address.of.container:PORT/v2/jobs \
-H 'X-SM-Processing-Data: {"parallel_engines": 2, "user_id": "MY_USER_ID"}' \
-F 'config={
"type": "transcription",
"transcription_config": {
"language": "en",
"diarization": "speaker",
"operating_point": "enhanced"
}
}' \
-F 'data_file=@~/audio_file.mp3'
import asyncio
import os
from dotenv import load_dotenv
from speechmatics.batch import AsyncClient
load_dotenv()
async def main():
client = AsyncClient(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
url="address.of.container:PORT/v2"
)
result = await client.transcribe(
"audio.wav",
parallel_engines=2,
user_id="MY_USER_ID"
)
print(result.transcript_text)
await client.close()
asyncio.run(main())
Response codes
Managing capacity
The worker processes multiple jobs concurrently, up to the --parallel limit you set at startup.
Each job can request multiple engines using the parallel_engines value in the X-SM-Processing-Data header. More engines per job means faster turnaround for that job, at the cost of reduced concurrency for others.
To check available capacity before submitting, query the /jobs health endpoint. The unused_engines field tells you how many engines are free.
If a job requests more engines than are currently available, it will be rejected:
HTTP 503: {"detail": "Server busy: 8 engines not available (2 engines in use, 5 parallel allowed)"}
Requesting parallel engines
curl -X POST address.of.container:PORT/v2/jobs \
-H 'X-SM-Processing-Data: {"parallel_engines": 2}' \
-F 'config={"type": "transcription", "transcription_config": {"language": "en"}}' \
-F 'data_file=@~/audio_file.mp3'
Speaker identification
To enable the Speaker identification feature you can use the same logic used for the one shot batch container.
To enable per-customer encrypted identifiers (as used in our SaaS offering), pass a user_id in the X-SM-Processing-Data header.
curl -X POST address.of.container:PORT/v2/jobs \
-H 'X-SM-Processing-Data: {"user_id": "MY_USER_ID"}' \
-F 'config={
"type": "transcription",
"transcription_config": {
"language": "en",
"diarization": "speaker",
"operating_point": "enhanced"
}
}' \
-F 'data_file=@~/audio_file.mp3'
For details on secrets management, refer to the Speaker identification documentation.
Job API reference
GET /v2/jobs
Returns a list of jobs.
Query parameters:
Example response:
{
"jobs": [
{
"id": "191f47e4a4204fa4ac2b",
"created_at": "2026-03-18T19:27:42.436Z",
"data_name": "5_min",
"text_name": null,
"duration": 300,
"status": "RUNNING",
"config": {
"type": "transcription",
"transcription_config": {
"language": "en",
"diarization": "speaker",
"operating_point": "enhanced"
}
}
},
{
"id": "6dcb02e0dc5943e2b643",
"created_at": "2026-03-18T19:27:47.550Z",
"data_name": "5_min",
"text_name": null,
"duration": 300,
"status": "RUNNING",
"config": {
"type": "transcription",
"transcription_config": {
"language": "en",
"diarization": "speaker",
"operating_point": "enhanced"
}
}
}
]
}
GET /v2/jobs/{job_id}
Returns the status of a specific job.
Example response:
{
"job": {
"id": "191f47e4a4204fa4ac2b",
"created_at": "2026-03-18T19:27:42.436Z",
"data_name": "5_min",
"duration": 300,
"status": "DONE",
"config": {
"type": "transcription",
"transcription_config": {
"language": "en",
"diarization": "speaker",
"operating_point": "enhanced"
}
},
"request_id": "191f47e4a4204fa4ac2b"
}
}
GET /v2/jobs/{job_id}/transcript
Returns the transcript for a completed job.
Query parameters:
Error responses:
GET /v2/jobs/{job_id}/log
Returns the processing logs for a specific job.
Health endpoints
The worker exposes three health endpoints on the same port as job submission.
These endpoints are designed to work as liveness and readiness probes in a Kubernetes cluster.
GET /jobs
Returns current engine usage and a list of active jobs. Use unused_engines to determine how many engines you can request for the next job.
Example response:
{
"active_jobs": [
{ "job_id": "f8a564954b334eecb823", "parallel_engines": 1 },
{ "job_id": "29351ae8cf2c4e8694f0", "parallel_engines": 1 }
],
"max_engines": 8,
"unused_engines": 6
}
GET /live
Liveness probe. Returns 200 when all container services are running and healthy.
{ "live": true }
GET /ready
Readiness probe. Returns 200 when at least one engine slot is free, 503 when all engines are occupied.
{
"ready": true,
"engines_used": 2
}