close

Agent Integration

Run speech recognition as local infrastructure for agents, products, and desktop workflows: OpenAI-compatible API, MCP server, voice input, and subtitle generation.

OpenAI-Compatible API Server

funasr-server exposes /v1/audio/transcriptions, /v1/models, /health, and Swagger docs at /docs. It works with frameworks that already know the OpenAI audio API.

pip install funasr fastapi uvicorn python-multipart
funasr-server --device cuda --port 8000

# CPU fallback
funasr-server --device cpu --model sensevoice --port 8000
Model aliasBackendBest for
sensevoiceiic/SenseVoiceSmallFast multilingual ASR with language/emotion/event tags.
paraformerparaformer-zh + VAD + punctuationChinese production transcription.
paraformer-enparaformer-en + VADEnglish transcription.
fun-asr-nanoFunAudioLLM/Fun-ASR-Nano-251231-language LLM-based ASR with timestamps.

OpenAI SDK and curl

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
result = client.audio.transcriptions.create(
    model="sensevoice",
    file=open("meeting.wav", "rb"),
)
print(result.text)

verbose = client.audio.transcriptions.create(
    model="sensevoice",
    file=open("meeting.wav", "rb"),
    response_format="verbose_json",
)
print(verbose.segments)
curl http://localhost:8000/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F model=sensevoice \
  -F response_format=verbose_json

Dify, n8n, and workflow engines

For low-code tools that call HTTP nodes or webhook workers, use the workflow recipes. They cover multipart upload settings, Dify custom tools, n8n binary file fields, audio URL workers, timeouts, and production guardrails. For Node.js, TypeScript, or Next.js services, start from the JavaScript/TypeScript recipes. For GUI smoke tests, import the Postman collection; for CLI checks on Windows, Linux, or macOS, run the Python smoke test; for browser upload and microphone checks, launch the Gradio demo; before sharing the service, review the security and gateway guide; for schema-driven imports, use the OpenAPI spec.

Open workflow recipes · JS/TS recipes · Python smoke test · Gradio demo · Security guide · Open Postman collection · OpenAPI spec

MCP Server

The MCP server gives Claude Code, Claude Desktop, Cursor, Windsurf, and other MCP clients a local transcribe_audio tool.

pip install funasr
python examples/mcp_server/funasr_mcp.py
{
  "mcpServers": {
    "funasr": {
      "command": "python",
      "args": ["/path/to/examples/mcp_server/funasr_mcp.py"],
      "env": {"FUNASR_DEVICE": "cuda"}
    }
  }
}
VariableDefaultDescription
FUNASR_DEVICEcpucuda, cpu, or mps.
FUNASR_MODELiic/SenseVoiceSmallASR model used by the MCP tool.

Desktop Voice Input

The voice input example records from your microphone, sends audio to funasr-server, and pastes recognized text into the current cursor position.

pip install funasr sounddevice numpy pyperclip openai pynput
funasr-server --device cuda

cd examples/voice_input
python funasr_input.py --server http://localhost:8000/v1 --model sensevoice
PlatformRecordingPaste
macOSYesAppleScript
LinuxYesxdotool
WindowsYesManual Ctrl+V if needed

Subtitle Generator

The subtitle example turns audio or video files into SRT/VTT, optionally with speaker labels.

cd examples/subtitle
python generate_subtitle.py video.mp4
python generate_subtitle.py meeting.wav --spk
python generate_subtitle.py podcast.mp3 --format vtt
python generate_subtitle.py audio.wav --device cpu
OptionDefaultDescription
--formatsrtsrt or vtt.
--modeliic/SenseVoiceSmallASR model.
--spkoffAdd speaker labels.
--langautoLanguage hint.