Local voice notes with Gemma 4: Privacy-first journaling for developers

A growing number of developers are prioritizing privacy in their workflows, and tools that process data locally are gaining traction. A recent project called Mnemonic takes this trend further by offering a streamlined way to capture voice notes, transcribe them, and drop them directly into daily journal files—all without sending data to the cloud.

A minimalist approach to voice journaling

Mnemonic is a lightweight macOS application designed to live in the menu bar. It allows users to press a hotkey, speak, and immediately see their transcribed thoughts as bullet points in a Markdown file labeled with the current date. For example, a session might produce entries like:

- 14:35 This is a new node. Let me try to see if it'll work.
- 15:12 I want to email Sarah tomorrow about the migration plan.
- 16:08 The bug is in how we handle the empty array case in `merge_chunks`.

The tool avoids over-engineering by stripping away features like auto-generated summaries, titles, or task lists. Instead, it focuses solely on transcription and lightly cleaning the output. Early versions attempted to structure every 30-second thought with titles and action items, but the latest iteration simplifies this process entirely.

Key features that enhance usability

Mnemonic has evolved beyond basic transcription with three notable additions, each designed to be either optional or invisible to the user:

Image attachment support: Users can take a screenshot while recording by pressing a keyboard shortcut, or drag a region to capture and narrate in one motion. Gemma 4 processes both the audio and image together, embedding a reference to both in the journal entry. The image is saved alongside the audio file.

Recording queue system: Recording and processing are decoupled. Users can release the hotkey after speaking, and Mnemonic will continue processing the note in the background. A queue system ensures notes are handled serially, and quitting the app mid-process won’t disrupt the workflow. The menu bar icon changes color to indicate status—gray when idle and red while recording.

Intent routing (opt-in): A secondary, narrow call to Gemma 4 determines whether a voice note contains a request for the operating system to perform an action, such as "remind me to call Sarah at 3 PM." If detected, the app triggers a whitelisted macOS Shortcut. Users have five seconds to undo the action. This functionality avoids complex scripting or external integrations.

Seamless integration with existing workflows

The tool’s file format follows the YYYY-MM-DD.md convention used by Obsidian’s Daily Notes plugin. By configuring the notes_dir setting to point to an Obsidian vault, users can automatically send their voice notes to today’s daily note. This integration unlocks Obsidian’s graph view, backlinks, and search features without additional setup.

Privacy-first design with local processing

Mnemonic operates entirely offline. No network traffic leaves the local machine, and the binary is signed and notarized for security. The developer has explicitly avoided linking telemetry or analytics code into the application.

The technical backbone: How Gemma 4 powers Mnemonic

Mnemonic leverages three key capabilities of Gemma 4 through a single local model: native audio processing, vision input handling, and lightweight reasoning. All operations are performed against a locally running llama-server instance.

Why the E4B model was the right choice

Gemma 4 is available in four sizes, but only two support audio processing: E2B and E4B. For a 16 GB laptop, the E4B model proved to be the best fit. According to the official model card, the differences between E2B and E4B are significant:

MMLU Pro performance improved from 60.0% to 69.4% — crucial for accurately transcribing technical vocabulary in voice notes.
BBEH reasoning quality jumped from 21.9% to 33.1% — essential for intent routing and correcting misstatements.
Minor improvements in audio recognition (CoVoST from 33.47 to 35.54) and multilingual support (FLEURS from 0.09 to 0.08).

The E4B model at Q4_K_M quantization occupies 4.98 GB in GGUF format, with audio and vision encoders adding roughly 1 GB. This configuration runs smoothly alongside an IDE and browser on 16 GB of RAM.

A single-model, single-pass architecture

Traditional voice note tools often use a two-stage pipeline: an automatic speech recognition (ASR) model followed by a text-based large language model for cleaning and structuring. Mnemonic consolidates this into a single pass with Gemma 4.

The benefits of this approach include:

Reduced resource usage: Only one model runs in memory, and there’s a single HTTP request per recording. A two-stage system would require two model downloads, two processes, and more failure points.
Better context retention: Processing audio directly allows the model to interpret pauses, hesitations, and restarts. For example, it can distinguish between filler words and meaningful statements like "I think."
Lower latency: Combining transcription and cleaning into one step minimizes end-to-end delay.

The same principle applies to vision. Instead of using OCR to extract text from screenshots and then merging it with the transcript, Mnemonic sends both the audio and image to Gemma 4. The model produces a cohesive bullet point that references both inputs, such as "the panic mentioned in line 42" rather than a generic description.

Intent routing with minimal overhead

Determining whether a voice note should trigger a macOS Shortcut required careful design to avoid bloating the application with unnecessary frameworks. The solution uses a second, narrow call to Gemma 4 to classify the intent of the note. If the note requests an action, Mnemonic triggers the corresponding Shortcut—provided it’s whitelisted. Users retain a five-second window to undo the action, ensuring safety.

Open-source foundation and easy installation

Mnemonic is built as a Rust workspace with Tauri 2 for the menu-bar app, clap for the CLI, and a shared mnemonic-core crate for audio handling, Markdown generation, and llama-server integration. The application is distributed as a single Apple Silicon DMG, code-signed, notarized, and stapled. It’s licensed under MIT and available on GitHub.

Installation is straightforward via Homebrew:

brew tap EduardMaghakyan/tap
brew install --cask mnemonic

The future of local-first productivity tools

As developers increasingly seek tools that respect their privacy and integrate seamlessly into existing workflows, Mnemonic stands out as a practical example of what’s possible with local AI models. By leveraging Gemma 4’s audio and vision capabilities, the tool delivers a compelling balance of simplicity, functionality, and control—all while keeping data on the user’s device.

Expect to see more projects like Mnemonic emerge as the ecosystem around local AI models matures, offering users alternatives to cloud-dependent solutions without sacrificing convenience or power.

AI summary

Mnemonic, yerel olarak çalışan AI modeli Gemma 4 E4B ile sesli notlarınızı otomatik olarak Markdown günlüğünüze aktarıyor. Hiçbir bulut, telemetri ya da gereksiz özetleme olmadan, saf düşüncelerinizi kaydedin.