VoiceMode MCP

mbailey/voicemode

VoiceMode MCP brings natural conversations to Claude Code. It supports any OpenAI API compatible voice services and installs free and open source voice services (Whisper.cpp and Kokoro-FastAPI).

CLAUDE.md

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Voice Interaction

Load the voicemode skill for voice conversation support: /voicemode:voicemode

Project Overview

VoiceMode is a Python package that provides voice interaction capabilities for AI assistants through the Model Context Protocol (MCP). It enables natural voice conversations with Claude Code and other AI coding assistants by integrating speech-to-text (STT) and text-to-speech (TTS) services.

Key Commands

Development & Testing

# Install in development mode with dependencies
make dev-install

# Run all unit tests
make test
# Or directly: uv run pytest tests/ -v --tb=short

# Run specific test
uv run pytest tests/test_voice_mode.py -v

# Clean build artifacts
make clean

Building & Publishing

# Build Python package
make build-package

# Build development version (auto-versioned)
make build-dev  

# Test package installation
make test-package

# Release workflow (bumps version, tags, pushes)
make release

Documentation

# Serve docs locally at http://localhost:8000
make docs-serve

# Build documentation site
make docs-build

# Check docs for errors (strict mode)
make docs-check

Architecture Overview

Core Components

  1. MCP Server (voice_mode/server.py)

    • FastMCP-based server providing voice tools via stdio transport
    • Auto-imports all tools, prompts, and resources
    • Handles FFmpeg availability checks and logging setup
  2. Tool System (voice_mode/tools/)

    • converse.py: Primary voice conversation tool with TTS/STT integration
    • service.py: Unified service management for Whisper/Kokoro
    • providers.py: Provider discovery and registry management
    • devices.py: Audio device detection and management
    • Services subdirectory contains install/uninstall tools for Whisper and Kokoro
    • See Tool Loading Architecture for internal details
  3. Provider System (voice_mode/providers.py)

    • Dynamic discovery of OpenAI-compatible TTS/STT endpoints
    • Health checking and failover support
    • Maintains registry of available voice services
  4. Configuration (voice_mode/config.py)

    • Environment-based configuration with sensible defaults
    • Support for voice preference files (project/user level)
    • Audio format configuration (PCM, MP3, WAV, FLAC, AAC, Opus)
  5. Resources (voice_mode/resources/)

    • MCP resources exposed for client access
    • Statistics, configuration, changelog, and version information
    • Whisper model management

Service Architecture

The project supports multiple voice service backends:

  • OpenAI API: Cloud-based TTS/STT (requires API key)
  • Whisper.cpp: Local speech-to-text service
  • Kokoro: Local text-to-speech with multiple voices

Services can be installed and managed through MCP tools, with automatic service discovery and health checking.

Key Design Patterns

  1. OpenAI API Compatibility: All voice services expose OpenAI-compatible endpoints, enabling transparent switching between providers
  2. Dynamic Tool Discovery: Tools are auto-imported from the tools directory structure
  3. Failover Support: Automatic fallback between services based on availability
  4. Local Microphone Transport: Direct audio capture via PyAudio for voice interactions
  5. Audio Format Negotiation: Automatic format validation against provider capabilities

Development Notes

  • The project uses uv for package management (not pip directly)
  • Python 3.10+ is required
  • FFmpeg is required for audio processing
  • The project follows a modular architecture with FastMCP patterns
  • Service installation tools handle platform-specific setup (launchd on macOS, systemd on Linux)
  • Event logging and conversation logging are available for debugging
  • WebRTC VAD is used for silence detection when available

Testing

  • Unit tests: tests/ - run with make test
  • Manual tests: tests/manual/ - require user interaction

Logging

Logs are stored in ~/.voicemode/:

  • logs/conversations/ - Voice exchange history (JSONL)
  • logs/events/ - Operational events and errors
  • audio/ - Saved TTS/STT audio files
  • voicemode.env - User configuration

VoiceMode Suite

This is the core Python package. VoiceMode is a suite of related projects:

For a complete overview of all VoiceMode components, read:

Quick reference:

  • voicemode (this repo) - Python MCP server for local voice mode
  • voicemode-dev - Cloudflare Workers backend for voicemode.dev
  • voicemode-ios - Native iOS app
  • voicemode-macos - Native macOS app
  • voicemode-meta - Project coordination and operations

See Also

README.md

VoiceMode

Natural voice conversations with Claude Code (and other MCP capable agents)

PyPI Downloads PyPI Downloads PyPI Downloads

VoiceMode enables natural voice conversations with Claude Code. Voice isn't about replacing typing - it's about being available when typing isn't.

Perfect for:

  • Walking to your next meeting
  • Cooking while debugging
  • Giving your eyes a break after hours of screen time
  • Holding a coffee (or a dog)
  • Any moment when your hands or eyes are busy

See It In Action

VoiceMode Demo

Quick Start

Requirements: Computer with microphone and speakers

Option 1: Claude Code Plugin (Recommended)

The fastest way for Claude Code users to get started:

# Add the VoiceMode marketplace
claude plugin marketplace add mbailey/voicemode

# Install VoiceMode plugin
claude plugin install voicemode@voicemode

## Install dependencies (CLI, Local Voice Services)

/voicemode:install

# Start talking!
/voicemode:converse

Option 2: Python installer package

Installs dependencies and the VoiceMode Python package.

# Install UV package manager (if needed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Run the installer (sets up dependencies and local voice services)
uvx voice-mode-install

# Add to Claude Code
claude mcp add --scope user voicemode -- uvx --refresh voice-mode

# Optional: Add OpenAI API key as fallback for local services
export OPENAI_API_KEY=your-openai-key

# Start a conversation
claude converse

For manual setup, see the Getting Started Guide.

Features

  • Natural conversations - speak naturally, hear responses immediately
  • Works offline - optional local voice services (Whisper STT, Kokoro TTS)
  • Low latency - fast enough to feel like a real conversation
  • Smart silence detection - stops recording when you stop speaking
  • Privacy options - run entirely locally or use cloud services

Compatibility

Platforms: Linux, macOS, Windows (WSL), NixOS Python: 3.10-3.14

Configuration

VoiceMode works out of the box. For customization:

# Set OpenAI API key (if using cloud services)
export OPENAI_API_KEY="your-key"

# Or configure via file
voicemode config edit

See the Configuration Guide for all options.

Permissions Setup (Optional)

To use VoiceMode without permission prompts, add to ~/.claude/settings.json:

{
  "permissions": {
    "allow": [
      "mcp__voicemode__converse",
      "mcp__voicemode__service"
    ]
  }
}

See the Permissions Guide for more options.

Local Voice Services

For privacy or offline use, install local speech services:

  • Whisper.cpp - Local speech-to-text
  • Kokoro - Local text-to-speech with multiple voices

These provide the same API as OpenAI, so VoiceMode switches seamlessly between them.

Installation Details

Ubuntu/Debian

sudo apt update
sudo apt install -y ffmpeg gcc libasound2-dev libasound2-plugins libportaudio2 portaudio19-dev pulseaudio pulseaudio-utils python3-dev

WSL2 users: The pulseaudio packages above are required for microphone access.

Fedora/RHEL

sudo dnf install alsa-lib-devel ffmpeg gcc portaudio portaudio-devel python3-devel

macOS

brew install ffmpeg node portaudio

NixOS

# Use development shell
nix develop github:mbailey/voicemode

# Or install system-wide
nix profile install github:mbailey/voicemode

From source

git clone https://github.com/mbailey/voicemode.git
cd voicemode
uv tool install -e .

NixOS system-wide

# In /etc/nixos/configuration.nix
environment.systemPackages = [
  (builtins.getFlake "github:mbailey/voicemode").packages.${pkgs.system}.default
];

Troubleshooting

Problem Solution
No microphone access Check terminal/app permissions. WSL2 needs pulseaudio packages.
UV not found Run curl -LsSf https://astral.sh/uv/install.sh | sh
OpenAI API error Verify OPENAI_API_KEY is set correctly
No audio output Check system audio settings and available devices

Save Audio for Debugging

export VOICEMODE_SAVE_AUDIO=true
# Files saved to ~/.voicemode/audio/YYYY/MM/

Documentation

Full documentation: voice-mode.readthedocs.io

Links

License

MIT - A Failmode Project


mcp-name: com.failmode/voicemode