Author: constantinm

  • Local Voice Assistant: Whisper + Ollama + Piper on the Raspberry Pi

    Local Voice Assistant: Whisper + Ollama + Piper on the Raspberry Pi

    !Microphone – symbol of local voice processing

    Photo: Holger.Ellgaard, Wikimedia Commons, CC BY-SA 3.0

    Voice Control Without the Cloud: Is That Really Possible?

    Alexa, Google Home, Siri — they all work well, but share one common denominator: your voice commands end up on someone else’s servers. In the smart home context, where commands like “turn off the bedroom light” or “unlock the front door” are the norm, that’s unnecessary data sharing.

    The good news: in 2026, a fully local voice assistant is no longer a DIY project requiring hours of configuration. With Home Assistant’s Wyoming protocol, Whisper for speech recognition, Ollama as the LLM backend, and Piper for text-to-speech, you have all the ingredients — and the Raspberry Pi 5 is powerful enough to process everything locally.


    The Architecture: Four Building Blocks, One System

    Microphone → [STT] Whisper → [LLM] Ollama → [TTS] Piper → Speaker
                     ↕                ↕               ↕
                 Wyoming Protocol ← Home Assistant Assist

    The Wyoming protocol is the glue: it defines how Home Assistant communicates with external STT, TTS, and wake word services. All components run as Docker containers on the home server and are automatically discovered by HA.

    !Speech-to-Text process

    Illustration of the speech-to-text pipeline. Wikimedia Commons, LGPL

    Block 1: Whisper (Speech-to-Text)

    wyoming-faster-whisper is the recommended STT component for HA. It’s built on faster-whisper — a CTranslate2 reimplementation that runs up to 4x faster than the original PyTorch model on CPU.

    Model recommendations for Pi 5:

    Model RAM Required Quality
    tiny ~1 GB for testing
    small ~2 GB good balance
    large-v3-turbo ~3 GB recommended (almost as good as large-v3)

    The large-v3-turbo model is OpenAI’s smart reduction of large-v3 from 32 to 4 decoder layers — nearly the same recognition accuracy, significantly less RAM. For non-English languages it’s clearly the first choice.

    Block 2: Ollama (Language Model)

    Ollama runs as a Docker container and exposes an OpenAI-compatible API. Home Assistant has had a native Ollama integration since 2024, which uses the LLM directly as a conversation agent in the Assist pipeline.

    Model recommendations for voice assistants:

    • Pi 5, 8 GB RAM: llama3.2:3b (~2 GB) or gemma3:1b (~800 MB)
    • Pi 5, 16 GB RAM: llama3.1:8b (~5 GB) for better language comprehension quality
    • Specifically for HA control: fixt/home-3b-v3 — a model fine-tuned for home automation commands that returns device states as function calls

    Response times are honest: llama3.2:3b takes 5–15 seconds on the Pi 5 for a short reply. That’s borderline for a voice assistant, but acceptable — especially when the server is dedicated and has no other load spikes.

    Block 3: Piper (Text-to-Speech)

    Piper is the Open Home Foundation’s neural TTS system, optimized for embedded hardware. A typical sentence is synthesized on the Pi 5 in under one second. Many languages are supported with multiple voices at different quality levels:

    • en_US-lessac-high — high quality American English male voice
    • en_US-amy-medium — medium quality female voice
    • en_GB-alan-medium — British English male voice

    Block 4: wyoming-satellite (optional, but useful)

    Don’t want to plug a microphone directly into the home server, but want voice input and output in multiple rooms? wyoming-satellite turns an inexpensive Raspberry Pi Zero 2W with a USB microphone into a distributed voice satellite. Audio streams go over LAN to the server; processing stays centralized.


    Setup: Docker Compose in 30 Minutes

    All components can be set up as a Docker stack. Here’s a complete Compose file for the home server:

    version: "3.8"
    
    services:
      wyoming-whisper:
        image: rhasspy/wyoming-faster-whisper
        ports:
          - "10300:10300"
        volumes:
          - whisper-data:/data
        environment:
          - WHISPER_MODEL=large-v3-turbo
          - WHISPER_LANGUAGE=en
        restart: unless-stopped
    
      ollama:
        image: ollama/ollama
        ports:
          - "11434:11434"
        volumes:
          - ollama-data:/root/.ollama
        restart: unless-stopped
    
      wyoming-piper:
        image: rhasspy/wyoming-piper
        ports:
          - "10200:10200"
        volumes:
          - piper-data:/data
        environment:
          - PIPER_VOICE=en_US-lessac-high
        restart: unless-stopped
    
    volumes:
      whisper-data:
      ollama-data:
      piper-data:

    After startup:

    # Pull the Ollama model
    docker exec ollama ollama pull llama3.2:3b
    # or for HA-specific control:
    docker exec ollama ollama pull fixt/home-3b-v3

    Configuring Home Assistant

    1. Add Wyoming integration: Settings → Integrations → Wyoming Protocol

    – STT: :10300

    – TTS: :10200

    2. Add Ollama integration: Settings → Integrations → Ollama

    – URL: http://:11434

    – Model: fixt/home-3b-v3 or llama3.2:3b

    3. Create Assist pipeline: Settings → Voice → Assist

    – Speech recognition: Wyoming (wyoming-faster-whisper)

    – Conversation agent: Ollama

    – Text-to-speech: Wyoming (wyoming-piper)

    After that, any HA device with a microphone — or a satellite Pi in the hallway — can accept voice commands.


    Realistic Performance Numbers

    On a Raspberry Pi 5 with 16 GB RAM and SSD:

    Phase Duration
    Wake word detection <100 ms (openwakeword)
    STT (large-v3-turbo, short sentence) 1–3 s
    LLM response (llama3.2:3b) 5–15 s
    TTS (Piper, one sentence) <1 s
    Total ~7–20 s

    That’s not Alexa-fast. Users wanting shorter latency can drop to smaller models (gemma3:1b + small Whisper) and accept slightly lower quality — or use Speaches as a combined STT/TTS server that mimics the OpenAI API and integrates with n8n and other tools.


    When Is It Worth It?

    A local voice assistant is worth it primarily when:

    • Privacy is a priority — not a single voice fragment leaves the home network
    • Reliability without internet matters — no outage during cloud disruptions
    • HA integration is the focus — no other assistant knows your HA entities as well as a directly integrated local LLM
    • You already run a home server — the additional load is manageable

    Those who primarily want to play music, set timers, or check the weather will find a cloud solution more convenient day-to-day. The local assistant shines with complex HA commands: “Turn all lights on the ground floor to 30 percent and close the blinds” — a sentence that locally trained models like home-3b-v3 can directly translate into HA actions.


    Sources and Further Reading

  • AI Agents in 2026: From Chatbots to Autonomous Systems at Home

    AI Agents in 2026: From Chatbots to Autonomous Systems at Home

    From Concept to Reality: AI Agents in Everyday Life

    Just two years ago, autonomous AI agents were mostly a topic for research labs and tech giants. In 2026, that has fundamentally changed. Models run locally on a Raspberry Pi 5, Home Assistant talks directly to a self-hosted LLM, and n8n workflows use agents that make decisions independently.

    But what exactly is an AI agent, and why is now the right time to start exploring them?


    What Makes an AI Agent?

    A classic language model answers a question — and that’s it. An AI agent, on the other hand, can:

    • Pursue goals, not just respond to prompts
    • Use tools (call APIs, read files, execute code)
    • Plan multiple steps and pass results between them
    • Collaborate with other agents

    The key paradigm shift: instead of specifying *how* to do something, you tell the agent *what* the goal is. Intent-based computing instead of instruction-based computing.


    The Most Important Trends in 2026

    1. Multi-Agent Systems Become the Standard

    Single agents hit limits quickly. The answer: teams of specialized agents that solve complex tasks together. Frameworks like CrewAI (44,000+ GitHub stars) and Microsoft Research’s AutoGen (54,000+ GitHub stars) make it possible to coordinate agents with clearly defined roles — researcher, writer, reviewer — into a coherent workflow.

    For home users, this is especially interesting: these systems can run entirely locally, with no dependency on cloud APIs.

    2. Local LLMs Have Reached Maturity

    Ollama has established itself as the de facto standard for local model management. A single command is enough to start models like Llama 3.2, Mistral 7B, or DeepSeek-R1 — with an OpenAI-compatible API that works with virtually every tool.

    Hardware requirements in 2026 are manageable:

    ModelRAM RequiredBest For
    Llama 3.2 3B4 GBSimple tasks, fast responses
    Mistral 7B8 GBGood all-round model
    Llama 3.1 8B8–10 GBMore complex reasoning
    Qwen 2.5 Coder8 GBCode generation

    A Raspberry Pi 5 with 8 GB RAM can run Llama 3.2 3B without issue — not lightning fast, but completely adequate for many home automation tasks.

    3. Home Assistant Becomes the AI Control Center

    Home Assistant has evolved into the natural integration platform for local AI agents. Since the blog post from September 2025, HA supports full AI Task Entities, tool-calling, and agentic loop support.

    The home-llm integration goes even further: a local model gets access to all HA entities and can control devices autonomously, without having to explicitly program every command. The model understands context — “it’s getting cold” can result in the heat being turned up and the blinds closing.

    Practical Example: Local Voice Assistant with Whisper + Ollama

    Microphone → Whisper (Speech-to-Text, local)
               → Ollama Llama 3.2 (Intent + Tool-Calling)
               → Home Assistant REST API
               → Device is controlled

    Latency: under 2 seconds on a Raspberry Pi 5 with 16 GB RAM. Completely offline — no data leaves the home network.

    4. Raspberry Pi AI HAT+ 2: Dedicated AI Hardware for the Pi

    Since January 2026, the Raspberry Pi AI HAT+ 2 has been available. The Hailo-10H accelerator brings up to 40 TOPS (INT4) and 8 GB of its own LPDDR4X memory. This significantly offloads the main processor and enables faster inference at noticeably lower power consumption.

    For home automation, this means: a Pi 5 with AI HAT+ 2 can continuously evaluate sensor data, detect anomalies, and act proactively — without noticeable performance overhead for other tasks.

    5. n8n + Ollama: Visual Agent Workflows Without Coding

    n8n has established itself as the ideal platform for agentic workflows that don’t require programming. Combined with a local Ollama server, powerful automations emerge:

    • Energy reporting: Sensor data from Home Assistant → Ollama analyzes → WhatsApp summary
    • Smart alerts: Anomaly in consumption data → Agent evaluates context → Push notification only when truly relevant
    • Shopping assistant: Inventory sensor drops below threshold → Agent checks calendar and prices → Shopping list in Notion

    A simple n8n setup for local AI:

    {
      "nodes": [
        { "type": "n8n-nodes-base.scheduleTrigger" },
        { "type": "@n8n/n8n-nodes-langchain.lmOllama",
          "parameters": { "model": "llama3.2", "baseUrl": "http://localhost:11434" }
        },
        { "type": "@n8n/n8n-nodes-langchain.agent" }
      ]
    }

    Getting Started: Practical Recommendations

    Level 1 — Experiment locally (doable right now):

    1. Install Ollama: curl -fsSL https://ollama.com/install.sh | sh
    2. Pull a model: ollama pull llama3.2
    3. Start Open WebUI as a chat interface via Docker

    Level 2 — Connect to Home Assistant:

    1. Install home-llm Custom Integration via HACS
    2. Configure the Ollama endpoint in HA
    3. Set Assist to use your local LLM as the conversation agent

    Level 3 — Build your own agents:

    1. Connect n8n with the Ollama node
    2. Build first agentic workflows for energy reporting or notifications
    3. Optional: CrewAI or LangGraph for more complex multi-agent scenarios

    What’s Coming Next?

    Development continues to accelerate. A few trends taking shape in 2026:

    • Embedded agents: Smaller, specialized models running directly on microcontrollers — first experiments with ESP32 and Cortex-M are underway
    • Persistent memory: Agents that learn across sessions and permanently store personal preferences
    • Local computer use: Agents that operate the desktop — currently cloud-only, but first local implementations are on the horizon

    Conclusion

    2026 is the year AI agents found their way from data centers into the living room. The combination of powerful local hardware (Raspberry Pi 5, AI HAT+), mature frameworks (Ollama, Home Assistant, n8n), and improved models makes it possible to run real agent systems without any cloud dependency.

    The barrier to entry has never been lower — and full control over your own data stays entirely with you.


    Sources:

  • Paperclip AI: The Open-Source Platform for Autonomous AI Companies

    AI Technology Abstract
    Paperclip AI — Where artificial intelligence meets corporate structure

    What if an entire company were run by AI agents — complete with budgets, hierarchies, governance, and audit trails? That’s exactly what Paperclip AI makes possible.

    What is Paperclip AI?

    Paperclip is an open-source orchestration platform for so-called “Zero-Human Companies” — organizations run entirely by AI agents. It’s neither a chatbot nor a traditional agent framework. Instead, it’s a management layer that sits on top of existing AI agents, coordinating, monitoring, and budgeting them.

    The project is available under the MIT license on GitHub with over 21,000 stars and is actively maintained. The current version is v0.3.1 (March 2026).

    The Core Concept: Bring Your Own Agent

    Unlike platforms such as CrewAI or AutoGen that come with their own agent definitions, Paperclip follows a “Bring Your Own Agent” approach. This means you can plug in any AI runtime — from Claude Code and OpenClaw to Codex, Cursor, or even simple Bash scripts.

    Agents communicate through an intelligent heartbeat system: they wake up, receive tasks, complete them, and report back. Through Runtime Skill Injection, agents can learn new workflows on the fly — without any retraining.

    Code on screen
    Paperclip supports any AI runtime — from Claude Code to custom scripts

    Organizational Structure Like a Real Company

    What makes Paperclip unique is how consistently it maps corporate structures onto AI agents:

    • Org charts with hierarchies, roles, titles, and reporting lines
    • Departments like Engineering, Finance, and Operations
    • Projects and issues — every task is traceable back to the company mission
    • Multi-company support — a single deployment can run multiple isolated “companies” simultaneously

    In practice, this means you can define a CEO agent that makes strategic decisions and delegates tasks to a CTO agent, who in turn coordinates engineering agents. Each agent has a clearly defined area of responsibility.

    Team organization
    AI agents organized in corporate hierarchies — from CEO to individual contributors

    Budget Control: No Agent Runs Unchecked

    A common problem with autonomous AI systems is spiraling costs. Paperclip solves this with built-in budget management:

    • Monthly budgets per agent with hard enforcement
    • Automatic pause at 100% utilization, warning at 80%
    • Cost tracking at task, project, and company level
    • Transparent billing: every API call is logged

    If you’re running 20 Claude Code sessions in parallel, you’ll always know exactly what each agent costs — and can throttle or pause individual agents as needed.

    Financial dashboard
    Per-agent budget tracking with automatic enforcement and cost transparency

    Governance and Control

    Paperclip treats human users as “Board Members” with full control over the AI organization:

    • Approval gates for strategic decisions — critical actions require human sign-off
    • Approve, pause, terminate, or override agents at any time
    • Versioned configurations with rollback capability
    • Immutable audit log (append-only): every tool call, every instruction, every decision is recorded

    This makes Paperclip particularly attractive for scenarios where traceability and compliance matter.

    Control and oversight
    Board-level governance with approval gates and immutable audit trails

    Technical Architecture

    Paperclip is written in TypeScript and uses the following technologies:

    • Backend: Node.js 20+ with Express
    • Frontend: React-based dashboard
    • Database: PostgreSQL (embedded or external)
    • API: REST on port 3100
    • Deployment: Docker, Docker Compose, or natively via npx

    The setup is surprisingly simple:

    # Quick start with npx (embedded PostgreSQL)
    npx paperclipai onboard --yes
    
    # Or via Docker Compose
    docker compose -f docker-compose.quickstart.yml up -d

    Use Cases

    1. Autonomous Software Company

    Organize multiple coding agents as a team: a CTO agent plans the architecture, engineering agents implement features, and a test engineer validates the results. All coordinated through Paperclip’s issue system.

    2. Multi-Business Management

    Anyone running multiple projects or business models in parallel can create a separate “company” in Paperclip for each — with isolated budgets, agents, and governance rules.

    3. Smart Home and Infrastructure

    Specialized agents monitor and optimize home infrastructure: a Smart Home agent manages Home Assistant automations, while a DevOps agent handles Docker containers and backups.

    4. Content and Reporting

    Agents that regularly generate reports, create social media posts, or analyze data — with clear budget limits and human approval before publication.

    5. 24/7 Operations

    Thanks to the heartbeat system, agents work autonomously around the clock. They wake up, check for new tasks, complete them, and go back to standby — without human intervention.

    How Does Paperclip Compare to Other Frameworks?

    Data comparison
    Paperclip operates one layer above traditional agent frameworks
    Aspect Paperclip CrewAI / AutoGen
    Focus Organization & Governance Workflow Execution
    Agents Bring Your Own (any runtime) Built-in definitions
    Budget Built-in, per agent Not available
    Multi-Company Yes, isolated No
    Audit Append-only log Minimal
    Metaphor “Found a company” “Define a workflow”

    Paperclip doesn’t compete directly with agent frameworks — it sits one layer above. You use Paperclip to coordinate and monitor agents running on any runtime of your choice.

    Real-World Example: Paperclip on a Raspberry Pi

    Raspberry Pi hardware
    Running an entire AI company on a Raspberry Pi 5 — Paperclip makes it possible

    Paperclip runs perfectly as a Docker container on a Raspberry Pi 5 — including its PostgreSQL database. In my setup, Paperclip manages eight specialized agents:

    • CEO — strategic coordination and prioritization
    • CTO — technical architecture decisions
    • CFO — budget monitoring and cost optimization
    • Python Expert — scripting and automation
    • Java Expert — backend development
    • Test Engineer — quality assurance and testing
    • Smart Home Expert — Home Assistant and IoT integrations
    • Financial Expert — analytics and reporting

    All agents use Claude Sonnet as their backend and have access to the local network, Docker containers, and SSH connections. The heartbeat system ensures agents only become active when there are actual tasks to complete — saving both resources and costs.

    Getting Started

    Ready to try Paperclip? Here’s how to get started in under five minutes:

    # Option 1: Quick start (includes embedded PostgreSQL)
    npx paperclipai onboard --yes
    
    # Option 2: Docker (recommended for servers)
    git clone https://github.com/paperclipai/paperclip.git
    cd paperclip
    docker compose -f docker-compose.quickstart.yml up -d
    
    # Then open http://localhost:3100 in your browser

    From there, you can create your first company, define agents, assign roles, and start delegating tasks.

    Conclusion

    Paperclip AI fills a gap in the AI ecosystem: while other tools focus on executing individual agents, Paperclip provides the organizational infrastructure for entire AI teams. With budget management, governance, audit logs, and the flexible “Bring Your Own Agent” approach, it’s the ideal platform for anyone looking to move beyond single chatbot interactions.

    Whether as an experiment on a Raspberry Pi or as a production system in the cloud — Paperclip makes the leap from “I use AI” to “AI works for me” tangible.


    Links: GitHub · Website · License: MIT

© 2026 Constantin’s Tech Lab. Powered by WordPress.

About · Uses · Paperclip AI · Impressum · Datenschutz