ruslan/subtitles

Fork 0

Live RU→UZ subtitle overlay for Mars IT School — streaming Whisper + Gemini translation, always-on-top overlay (macOS). Fork-origin: Vanyoo/realtime-subtitle (MIT).

HTML 62.2%
Python 35.4%
Shell 1.7%
JavaScript 0.7%

Find a file

Ruslan faf1b56654 playbook: add 'App roadmap' section (quality/latency + packaging plan) New #app section with two cards — docs/ROADMAP.md priorities (P0/P1) and docs/PACKAGING.md phases (0–4, incl. the config-in-bundle blocker and the Gemini-key/signing open decisions). Wired into pnav + uz/ru/en i18n.		2026-05-12 18:52:30 +05:00
assets	Add first-run model download + macOS packaging scaffolding	2026-05-12 18:46:23 +05:00
demo	Reorganize: drop upstream legacy, Mars RU→UZ cleanup	2026-05-12 13:52:33 +05:00
docs	Add first-run model download + macOS packaging scaffolding	2026-05-12 18:46:23 +05:00
packaging	Add first-run model download + macOS packaging scaffolding	2026-05-12 18:46:23 +05:00
web	playbook: add 'App roadmap' section (quality/latency + packaging plan)	2026-05-12 18:52:30 +05:00
.env.example	Reorganize: drop upstream legacy, Mars RU→UZ cleanup	2026-05-12 13:52:33 +05:00
.gitignore	Add first-run model download + macOS packaging scaffolding	2026-05-12 18:46:23 +05:00
agreement.py	Reorganize: drop upstream legacy, Mars RU→UZ cleanup	2026-05-12 13:52:33 +05:00
audio_capture.py	feat	2025-12-11 14:27:57 +08:00
CLAUDE.md	Add first-run model download + macOS packaging scaffolding	2026-05-12 18:46:23 +05:00
config.ini.example	Reorganize: drop upstream legacy, Mars RU→UZ cleanup	2026-05-12 13:52:33 +05:00
config.py	Reorganize: drop upstream legacy, Mars RU→UZ cleanup	2026-05-12 13:52:33 +05:00
gemini_reconcile.py	Reorganize: drop upstream legacy, Mars RU→UZ cleanup	2026-05-12 13:52:33 +05:00
glossary.py	Reorganize: drop upstream legacy, Mars RU→UZ cleanup	2026-05-12 13:52:33 +05:00
hotkeys.py	Reorganize: drop upstream legacy, Mars RU→UZ cleanup	2026-05-12 13:52:33 +05:00
install_mac.sh	Add first-run model download + macOS packaging scaffolding	2026-05-12 18:46:23 +05:00
LICENSE	feat	2025-12-10 18:10:13 +08:00
main.py	Add first-run model download + macOS packaging scaffolding	2026-05-12 18:46:23 +05:00
menubar.py	Reorganize: drop upstream legacy, Mars RU→UZ cleanup	2026-05-12 13:52:33 +05:00
model_setup.py	Add first-run model download + macOS packaging scaffolding	2026-05-12 18:46:23 +05:00
overlay_window.py	Reorganize: drop upstream legacy, Mars RU→UZ cleanup	2026-05-12 13:52:33 +05:00
README.md	Reorganize: drop upstream legacy, Mars RU→UZ cleanup	2026-05-12 13:52:33 +05:00
recorder.py	Reorganize: drop upstream legacy, Mars RU→UZ cleanup	2026-05-12 13:52:33 +05:00
reloader.py	Reorganize: drop upstream legacy, Mars RU→UZ cleanup	2026-05-12 13:52:33 +05:00
requirements.txt	Add first-run model download + macOS packaging scaffolding	2026-05-12 18:46:23 +05:00
settings_dialog.py	Reorganize: drop upstream legacy, Mars RU→UZ cleanup	2026-05-12 13:52:33 +05:00
start_mac.sh	Reorganize: drop upstream legacy, Mars RU→UZ cleanup	2026-05-12 13:52:33 +05:00
transcriber.py	Reorganize: drop upstream legacy, Mars RU→UZ cleanup	2026-05-12 13:52:33 +05:00
translator.py	Reorganize: drop upstream legacy, Mars RU→UZ cleanup	2026-05-12 13:52:33 +05:00

README.md

Subtitles — live RU→UZ subtitle overlay

Real-time subtitles for spoken Russian, translated to Uzbek (or any other language), shown in an always-on-top, click-through overlay window. macOS / Apple Silicon.

Originally a fork of Vanyoo/realtime-subtitle (MIT); now developed independently for Mars IT School (live-translate Russian lessons → Uzbek for students).

mic / system audio (BlackHole)
   → AudioCapture (sounddevice, 16 kHz mono, chunks + VAD)
   → Whisper (MLX whisper-large-v3-turbo) + domain glossary prompt
   → LocalAgreement-2 (confirmed prefix vs. hypothesis tail)
   → Translator (Gemini 2.5 Flash, RU→UZ)
   → OverlayWindow (compact rolling, bilingual, bottom-center, click-through)

End-to-end latency ≈ 2–3 s on Apple Silicon. Two-tier UX: instant hypothesis text (grey italic) → confirmed text (solid) once two ASR passes agree.

Install (macOS)

brew install uv ffmpeg          # uv is required; ffmpeg is used by the ASR backends
brew install blackhole-2ch      # optional — needed to caption system audio (Zoom/video), not just the mic
./install_mac.sh                # creates .venv, installs deps, seeds .env

Put API keys in .env (gitignored — install_mac.sh seeds it from ~/voice/.env if present):

GEMINI_API_KEY=...              # Gemini, the translation backend
OPENAI_API_KEY=...              # only if [translation] provider = openai
# GEMINI_MODEL=gemini-2.5-flash # optional override

First run downloads the Whisper model (~1.5 GB for large-v3-turbo) to ~/.cache/huggingface/hub/.

Run

uv run python main.py     # overlay + menu-bar item + global hotkeys
./start_mac.sh            # same, with hot reload on .py/.ini changes (dev)

Control via the menu-bar item (🎙 / 🟢 / 🟡), the overlay hover-bar, or hotkeys: ⌘⇧S pause · ⌘⇧H hide overlay · ⌘⇧C toggle click-through · ⌘⇧Q quit. Drag the overlay to reposition it. Global hotkeys need macOS Accessibility permission.

Configuration

Settings live in config.ini (see config.ini.example for the annotated reference) and can also be edited from the in-app Settings window. Most are read at startup, so changing the model / language / audio params needs a restart.

Key knobs: [translation] provider/model/target_lang, [transcription] backend/whisper_model/source_language, [audio] silence_threshold/silence_duration/max_phrase_duration/update_interval, [overlay] width/max_blocks/show_original.

To capture system audio instead of the mic: install BlackHole, create a Multi-Output Device in Audio MIDI Setup (BlackHole 2ch + your speakers), set the app's output to it. The app auto-detects the BlackHole input.

Layout

main.py            entry point (pipeline + overlay + menu bar + hotkeys)
config.py          config.ini / .env loader
audio_capture.py   sounddevice capture, VAD, streaming chunks
transcriber.py     ASR: MLX Whisper (default) or faster-whisper; hallucination/confidence filters
agreement.py       LocalAgreement-2 (confirmed prefix vs. hypothesis)
glossary.py        Mars terms/names + sliding context window → Whisper initial_prompt
translator.py      Gemini (default) / OpenAI translation, RU→UZ prompt, reasoning-dump sanitiser
overlay_window.py  compact rolling overlay, hover control-bar, click-through
menubar.py         NSStatusItem menu-bar control
hotkeys.py         global hotkeys (pynput)
recorder.py        per-phrase audio + transcript dump (debug/archive → recordings/)
settings_dialog.py in-app settings window
gemini_reconcile.py  Stage-3 stub (Gemini audio second-pass + reconcile) — not wired in yet
reloader.py        dev hot-reload (used by start_mac.sh)

docs/              ROADMAP.md (improvement plan) · STARTUP_SPEC.md · PITCH.md
web/               landing / investor pages
demo/              setup screenshots (Audio MIDI / BlackHole)

Status & roadmap

Working prototype, validated on real Russian webinar speech → Uzbek. See docs/ROADMAP.md for the prioritised plan (config hot-reload, recordings cap, cloud streaming ASR, sentence-boundary finalisation, term-maps, Stage-3 reconcile, app bundle, …) and docs/STARTUP_SPEC.md / docs/PITCH.md for the product/startup framing.

README.md Unescape Escape