Agent Integration

Run speech recognition as local infrastructure for agents, products, and desktop workflows: OpenAI-compatible API, MCP server, voice input, and subtitle generation.

OpenAI API SDK & curl Workflows MCP Server Voice Input Subtitle Generator

OpenAI-Compatible API Server

funasr-server exposes /v1/audio/transcriptions, /v1/models, /health, and Swagger docs at /docs. It works with frameworks that already know the OpenAI audio API.

pip install funasr fastapi uvicorn python-multipart
funasr-server --device cuda --port 8000

# CPU fallback
funasr-server --device cpu --model sensevoice --port 8000

Model alias	Backend	Best for
`sensevoice`	`iic/SenseVoiceSmall`	Fast multilingual ASR with language/emotion/event tags.
`paraformer`	`paraformer-zh` + VAD + punctuation	Chinese production transcription.
`paraformer-en`	`paraformer-en` + VAD	English transcription.
`fun-asr-nano`	`FunAudioLLM/Fun-ASR-Nano-2512`	31-language LLM-based ASR with timestamps.

OpenAI SDK and curl

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
result = client.audio.transcriptions.create(
    model="sensevoice",
    file=open("meeting.wav", "rb"),
)
print(result.text)

verbose = client.audio.transcriptions.create(
    model="sensevoice",
    file=open("meeting.wav", "rb"),
    response_format="verbose_json",
)
print(verbose.segments)

curl http://localhost:8000/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F model=sensevoice \
  -F response_format=verbose_json

Dify, n8n, and workflow engines

For low-code tools that call HTTP nodes or webhook workers, use the workflow recipes. They cover multipart upload settings, Dify custom tools, n8n binary file fields, audio URL workers, timeouts, and production guardrails. For Node.js, TypeScript, or Next.js services, start from the JavaScript/TypeScript recipes. For GUI smoke tests, import the Postman collection; for CLI checks on Windows, Linux, or macOS, run the Python smoke test; for browser upload and microphone checks, launch the Gradio demo; before sharing the service, review the security and gateway guide; for schema-driven imports, use the OpenAPI spec.

Open workflow recipes · JS/TS recipes · Python smoke test · Gradio demo · Security guide · Open Postman collection · OpenAPI spec

MCP Server

The MCP server gives Claude Code, Claude Desktop, Cursor, Windsurf, and other MCP clients a local transcribe_audio tool.

pip install funasr
python examples/mcp_server/funasr_mcp.py

{
  "mcpServers": {
    "funasr": {
      "command": "python",
      "args": ["/path/to/examples/mcp_server/funasr_mcp.py"],
      "env": {"FUNASR_DEVICE": "cuda"}
    }
  }
}

Variable	Default	Description
`FUNASR_DEVICE`	`cpu`	`cuda`, `cpu`, or `mps`.
`FUNASR_MODEL`	`iic/SenseVoiceSmall`	ASR model used by the MCP tool.

Desktop Voice Input

The voice input example records from your microphone, sends audio to funasr-server, and pastes recognized text into the current cursor position.

pip install funasr sounddevice numpy pyperclip openai pynput
funasr-server --device cuda

cd examples/voice_input
python funasr_input.py --server http://localhost:8000/v1 --model sensevoice

Platform	Recording	Paste
macOS	Yes	AppleScript
Linux	Yes	xdotool
Windows	Yes	Manual Ctrl+V if needed

Subtitle Generator

The subtitle example turns audio or video files into SRT/VTT, optionally with speaker labels.

cd examples/subtitle
python generate_subtitle.py video.mp4
python generate_subtitle.py meeting.wav --spk
python generate_subtitle.py podcast.mp3 --format vtt
python generate_subtitle.py audio.wav --device cpu

Option	Default	Description
`--format`	`srt`	`srt` or `vtt`.
`--model`	`iic/SenseVoiceSmall`	ASR model.
`--spk`	off	Add speaker labels.
`--lang`	`auto`	Language hint.