- HTML 62.2%
- Python 35.4%
- Shell 1.7%
- JavaScript 0.7%
New #app section with two cards — docs/ROADMAP.md priorities (P0/P1) and docs/PACKAGING.md phases (0–4, incl. the config-in-bundle blocker and the Gemini-key/signing open decisions). Wired into pnav + uz/ru/en i18n. |
||
|---|---|---|
| assets | ||
| demo | ||
| docs | ||
| packaging | ||
| web | ||
| .env.example | ||
| .gitignore | ||
| agreement.py | ||
| audio_capture.py | ||
| CLAUDE.md | ||
| config.ini.example | ||
| config.py | ||
| gemini_reconcile.py | ||
| glossary.py | ||
| hotkeys.py | ||
| install_mac.sh | ||
| LICENSE | ||
| main.py | ||
| menubar.py | ||
| model_setup.py | ||
| overlay_window.py | ||
| README.md | ||
| recorder.py | ||
| reloader.py | ||
| requirements.txt | ||
| settings_dialog.py | ||
| start_mac.sh | ||
| transcriber.py | ||
| translator.py | ||
Subtitles — live RU→UZ subtitle overlay
Real-time subtitles for spoken Russian, translated to Uzbek (or any other language), shown in an always-on-top, click-through overlay window. macOS / Apple Silicon.
Originally a fork of Vanyoo/realtime-subtitle (MIT); now developed independently for Mars IT School (live-translate Russian lessons → Uzbek for students).
mic / system audio (BlackHole)
→ AudioCapture (sounddevice, 16 kHz mono, chunks + VAD)
→ Whisper (MLX whisper-large-v3-turbo) + domain glossary prompt
→ LocalAgreement-2 (confirmed prefix vs. hypothesis tail)
→ Translator (Gemini 2.5 Flash, RU→UZ)
→ OverlayWindow (compact rolling, bilingual, bottom-center, click-through)
End-to-end latency ≈ 2–3 s on Apple Silicon. Two-tier UX: instant hypothesis text (grey italic) → confirmed text (solid) once two ASR passes agree.
Install (macOS)
brew install uv ffmpeg # uv is required; ffmpeg is used by the ASR backends
brew install blackhole-2ch # optional — needed to caption system audio (Zoom/video), not just the mic
./install_mac.sh # creates .venv, installs deps, seeds .env
Put API keys in .env (gitignored — install_mac.sh seeds it from ~/voice/.env if present):
GEMINI_API_KEY=... # Gemini, the translation backend
OPENAI_API_KEY=... # only if [translation] provider = openai
# GEMINI_MODEL=gemini-2.5-flash # optional override
First run downloads the Whisper model (~1.5 GB for large-v3-turbo) to ~/.cache/huggingface/hub/.
Run
uv run python main.py # overlay + menu-bar item + global hotkeys
./start_mac.sh # same, with hot reload on .py/.ini changes (dev)
Control via the menu-bar item (🎙 / 🟢 / 🟡), the overlay hover-bar, or hotkeys:
⌘⇧S pause · ⌘⇧H hide overlay · ⌘⇧C toggle click-through · ⌘⇧Q quit.
Drag the overlay to reposition it. Global hotkeys need macOS Accessibility permission.
Configuration
Settings live in config.ini (see config.ini.example for the annotated reference) and can
also be edited from the in-app Settings window. Most are read at startup, so changing the
model / language / audio params needs a restart.
Key knobs: [translation] provider/model/target_lang, [transcription] backend/whisper_model/source_language,
[audio] silence_threshold/silence_duration/max_phrase_duration/update_interval, [overlay] width/max_blocks/show_original.
To capture system audio instead of the mic: install BlackHole, create a Multi-Output Device in Audio MIDI Setup (BlackHole 2ch + your speakers), set the app's output to it. The app auto-detects the BlackHole input.
Layout
main.py entry point (pipeline + overlay + menu bar + hotkeys)
config.py config.ini / .env loader
audio_capture.py sounddevice capture, VAD, streaming chunks
transcriber.py ASR: MLX Whisper (default) or faster-whisper; hallucination/confidence filters
agreement.py LocalAgreement-2 (confirmed prefix vs. hypothesis)
glossary.py Mars terms/names + sliding context window → Whisper initial_prompt
translator.py Gemini (default) / OpenAI translation, RU→UZ prompt, reasoning-dump sanitiser
overlay_window.py compact rolling overlay, hover control-bar, click-through
menubar.py NSStatusItem menu-bar control
hotkeys.py global hotkeys (pynput)
recorder.py per-phrase audio + transcript dump (debug/archive → recordings/)
settings_dialog.py in-app settings window
gemini_reconcile.py Stage-3 stub (Gemini audio second-pass + reconcile) — not wired in yet
reloader.py dev hot-reload (used by start_mac.sh)
docs/ ROADMAP.md (improvement plan) · STARTUP_SPEC.md · PITCH.md
web/ landing / investor pages
demo/ setup screenshots (Audio MIDI / BlackHole)
Status & roadmap
Working prototype, validated on real Russian webinar speech → Uzbek. See docs/ROADMAP.md for
the prioritised plan (config hot-reload, recordings cap, cloud streaming ASR, sentence-boundary
finalisation, term-maps, Stage-3 reconcile, app bundle, …) and docs/STARTUP_SPEC.md /
docs/PITCH.md for the product/startup framing.
License
MIT. Original work © 2025 Van; subsequent changes © Mars IT School. See LICENSE.