Standalone Android ASR · May 2026

Transcriber

On-device speech intelligence for Android. Record, transcribe, translate, and clean up without leaving the phone.

Gemma 4 Whisper.cpp Apache-2.0 Sideload APK
100% Local inference — Whisper.cpp and Gemma 4 run entirely on the device's GPU and CPU.
0 Network calls in a run. Audio, transcripts, and sidecars never leave the phone. Sharing is opt-in.
2.6GB Default model — Gemma 4 E2B weights, fetched once. Bigger Whisper and embedding models optional.
2 ASR engines side by side — Gemma 4 E2B and Whisper.cpp. Pick one per recording.
4 Languages anchored against drift — Arabic, Ukrainian, English, Dutch. Multi-select, constrained auto-detect.
84ms Library search across every segment of ~1000 recordings. Sub-100 ms, case-insensitive.

From mic to clean transcript, in one pass.

Capture
16 kHz mono PCM Mic or imported WAV / MP3 / M4A / AAC / OGG / FLAC
Segment
VAD silence scan Cut points snap to pauses ≥ 250 ms inside ±2 s
Recognize
Gemma 4 E2B or Whisper.cpp Streamed token by token, single long-lived engine
Diarize
sherpa-onnx clusters Reconciled with Gemma's inline Speaker N: labels
Post-process
Summary, rewrite, clean, translate Same Gemma engine, four prompt presets
Persist
Room DB + sidecars .txt · .srt · .json · .speakers.json beside the audio file

Two engines, one app. Switch per recording.

Gemma 4 E2B · LiteRT-LM 0.11 Default
Transcribe, translate, and diarize from one weight set.
Size
2.59 GB on disk · 676 MB GPU footprint
Best for
Dialectal Arabic, code-switching, Ukrainian
Strengths
Holds Gulf Arabic without collapsing to MSA. Streaming decode keeps peak heap ~6 MB regardless of file length.
Whisper.cpp · ggml
The classic open-source baseline.
Models
tiny (75 MB) · small (466 MB) · large-v3-turbo q5_0 (574 MB)
Best for
Predictable latency · live transcription on weak devices
Strengths
Faster cold start. Pairs with sherpa-onnx for stable speaker IDs across hour-long meetings.

Including the ones that usually drift to English.

Arabic
العربية
Gulf / Qatari dialect anchored in the prompt — won't collapse to MSA at temperature 0.1.
Ukrainian
Українська
Held against Russian by the same dialect-name-twice template Google's audio cookbook recommends.
English
English
US, UK, and accented variants. The baseline that everything is checked against.
Dutch
Nederlands
NL and BE. Domain vocabulary packs propagate as Gemma spelling hints.

Multi-select picker. Empty = full auto-detect. One = forced. Multiple = constrained auto — faster and more accurate than letting Gemma consider 100+ languages.

Three ways to figure out who said what.

01
Gemma-only
Prompt-based Speaker N: labels emitted inline by Gemma 4. No extra download.
Zero install
02
Hybrid
sherpa-onnx pre-pass clusters speakers globally. Cluster IDs become per-chunk hints for Gemma, and reconcile its inline labels at write time.
Recommended
03
Whisper + sherpa
Run Whisper.cpp end-to-end, then assign speakers from sherpa's cluster output. Stable IDs across hour-long files.
Globally consistent

Bonus: when someone says "Hi, I'm Ahmed" in a segment, the detected name propagates to every segment with the same speaker key.

Four presets that read everything first.

A
Summary
TL;DR plus bullet key points and a decisions / action items list. Opens as a new tab next to Transcript.
B
Context-aware rewrite
Reads the entire transcript first, then rewrites each line using the conversation as context. Back-propagates name spellings; fixes homophones; doesn't paraphrase.
C
Clean
Line by line — fix STT errors, add punctuation, preserve language. Honors verbatim, filler, and tone settings.
D
Translate & polish
Idiomatic English prose, not segment-by-segment. Same Gemma engine, prompted to read for meaning before rendering.

Output is rendered Markdown · Share + Delete per output · all four prompts editable in Settings.

Teach it your jargon. Then forget you did.

Medical
amoxicillin
atrial fibrillation
ICD-10
MeSH · RxNorm
IT / Software
Kubernetes
OpenTelemetry
CNCF
ACM CCS
Construction
CSI MasterFormat
ASTM · NEN
QCS · ДБН
materials, codes
Legal
res ipsa loquitur
certiorari
Cornell LII Wex
rechtspraak.nl
Finance
CDS · IRS · MiFID
SEC · BIS · AFM
NBU · QFMA
instruments, regs
Plus Custom vocabulary· Tone & verbatim· Filler removal· {snippet:signoff}

Designed for the moment the phone tries to kill the job.

Foreground service
Mic FGS during capture, mediaProcessing FGS for batch jobs. Partial wake lock keeps long jobs alive.
VAD-aligned chunks
28-second targets land on the nearest pause within ±2 s. Words no longer cut mid-syllable at chunk boundaries.
Silent-wedge watchdog
60-second no-delta cancel. Engine is released, reloaded, and the chunk retried once with trimmed input.
Silence pre-skip
RMS below −54 dBFS bypasses Gemma entirely. Eliminates the prefill wedge before it can start.
FIFO job queue
Bulk-enqueue and walk away. Tasks survive process death via Room; charger-parked jobs replay on launch.
Run uninterrupted
First-class flow for the Samsung battery-exemption prompt — the one that keeps long jobs from getting reaped.

The compute knobs that actually matter.

Backend
AutoGPU onlyCPU only
Auto falls back GPU → CPU on init failure. Pin GPU for max throughput, CPU for predictable latency.
Context window
4K8K16K32K
8K is the sweet spot. Bump higher when Context-aware rewrite truncates on hour-long meetings.
CPU threads
Auto2468
Auto picks ~half the cores. Going above physical-core count consistently hurts; don't.

Find recordings by what was said in them.

retention cohort 3 results · 84 ms
01
Q3 Planning with Engineering Team
…shifted the retention cohort definition to 28-day…
14 May · 42 min
02
1:1 — Sara
…what does the retention cohort look like for…
12 May · 28 min
03
Voice memo · 11:42
…revisit the retention cohort spreadsheet…
08 May · 02 min

Case-insensitive. Auto-titled recordings get real names — Q3 Planning with Engineering Team instead of Recording_2026-05-17_14-23-30.

A complete speech stack — yours, on the phone, under Apache-2.0.

Stack
Gemma 4 · LiteRT-LM 0.11 · Whisper.cpp · sherpa-onnx · Compose Material 3 · Room · ExoPlayer
Distribution
APK sideload. Curated model downloads inside the app. Custom *.bin / *.litertlm import.
Promise
Audio never leaves the phone. The code is readable. Nothing phones home.
Created by Yuri Ihnatov ihnatov.nl · nl.ihnatov.transcriber · v1 · 2026