Author: constantinm

Local Voice Assistant: Whisper + Ollama + Piper on the Raspberry Pi
!Microphone – symbol of local voice processing

Photo: Holger.Ellgaard, Wikimedia Commons, CC BY-SA 3.0

Voice Control Without the Cloud: Is That Really Possible?

Alexa, Google Home, Siri — they all work well, but share one common denominator: your voice commands end up on someone else’s servers. In the smart home context, where commands like “turn off the bedroom light” or “unlock the front door” are the norm, that’s unnecessary data sharing.

The good news: in 2026, a fully local voice assistant is no longer a DIY project requiring hours of configuration. With Home Assistant’s Wyoming protocol, Whisper for speech recognition, Ollama as the LLM backend, and Piper for text-to-speech, you have all the ingredients — and the Raspberry Pi 5 is powerful enough to process everything locally.

The Architecture: Four Building Blocks, One System
```
Microphone → [STT] Whisper → [LLM] Ollama → [TTS] Piper → Speaker
                 ↕                ↕               ↕
             Wyoming Protocol ← Home Assistant Assist
```
The Wyoming protocol is the glue: it defines how Home Assistant communicates with external STT, TTS, and wake word services. All components run as Docker containers on the home server and are automatically discovered by HA.

!Speech-to-Text process

Illustration of the speech-to-text pipeline. Wikimedia Commons, LGPL

Block 1: Whisper (Speech-to-Text)

wyoming-faster-whisper is the recommended STT component for HA. It’s built on faster-whisper — a CTranslate2 reimplementation that runs up to 4x faster than the original PyTorch model on CPU.

Model recommendations for Pi 5:

Model RAM Required Quality

tiny ~1 GB for testing

small ~2 GB good balance

large-v3-turbo ~3 GB recommended (almost as good as large-v3)

The large-v3-turbo model is OpenAI’s smart reduction of large-v3 from 32 to 4 decoder layers — nearly the same recognition accuracy, significantly less RAM. For non-English languages it’s clearly the first choice.

Block 2: Ollama (Language Model)

Ollama runs as a Docker container and exposes an OpenAI-compatible API. Home Assistant has had a native Ollama integration since 2024, which uses the LLM directly as a conversation agent in the Assist pipeline.

Model recommendations for voice assistants:
- Pi 5, 8 GB RAM: llama3.2:3b (~2 GB) or gemma3:1b (~800 MB)
- Pi 5, 16 GB RAM: llama3.1:8b (~5 GB) for better language comprehension quality
- Specifically for HA control: fixt/home-3b-v3 — a model fine-tuned for home automation commands that returns device states as function calls
Response times are honest: llama3.2:3b takes 5–15 seconds on the Pi 5 for a short reply. That’s borderline for a voice assistant, but acceptable — especially when the server is dedicated and has no other load spikes.

Block 3: Piper (Text-to-Speech)

Piper is the Open Home Foundation’s neural TTS system, optimized for embedded hardware. A typical sentence is synthesized on the Pi 5 in under one second. Many languages are supported with multiple voices at different quality levels:
- en_US-lessac-high — high quality American English male voice
- en_US-amy-medium — medium quality female voice
- en_GB-alan-medium — British English male voice
Block 4: wyoming-satellite (optional, but useful)

Don’t want to plug a microphone directly into the home server, but want voice input and output in multiple rooms? wyoming-satellite turns an inexpensive Raspberry Pi Zero 2W with a USB microphone into a distributed voice satellite. Audio streams go over LAN to the server; processing stays centralized.

Setup: Docker Compose in 30 Minutes

All components can be set up as a Docker stack. Here’s a complete Compose file for the home server:
```
version: "3.8"

services:
  wyoming-whisper:
    image: rhasspy/wyoming-faster-whisper
    ports:
      - "10300:10300"
    volumes:
      - whisper-data:/data
    environment:
      - WHISPER_MODEL=large-v3-turbo
      - WHISPER_LANGUAGE=en
    restart: unless-stopped

  ollama:
    image: ollama/ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama-data:/root/.ollama
    restart: unless-stopped

  wyoming-piper:
    image: rhasspy/wyoming-piper
    ports:
      - "10200:10200"
    volumes:
      - piper-data:/data
    environment:
      - PIPER_VOICE=en_US-lessac-high
    restart: unless-stopped

volumes:
  whisper-data:
  ollama-data:
  piper-data:
```
After startup:
```
# Pull the Ollama model
docker exec ollama ollama pull llama3.2:3b
# or for HA-specific control:
docker exec ollama ollama pull fixt/home-3b-v3
```
Configuring Home Assistant

1. Add Wyoming integration: Settings → Integrations → Wyoming Protocol

– STT: :10300

– TTS: :10200

2. Add Ollama integration: Settings → Integrations → Ollama

– URL: http://:11434

– Model: fixt/home-3b-v3 or llama3.2:3b

3. Create Assist pipeline: Settings → Voice → Assist

– Speech recognition: Wyoming (wyoming-faster-whisper)

– Conversation agent: Ollama

– Text-to-speech: Wyoming (wyoming-piper)

After that, any HA device with a microphone — or a satellite Pi in the hallway — can accept voice commands.

Realistic Performance Numbers

On a Raspberry Pi 5 with 16 GB RAM and SSD:

Phase Duration

Wake word detection <100 ms (openwakeword)

STT (large-v3-turbo, short sentence) 1–3 s

LLM response (llama3.2:3b) 5–15 s

TTS (Piper, one sentence) <1 s

Total ~7–20 s

That’s not Alexa-fast. Users wanting shorter latency can drop to smaller models (gemma3:1b + small Whisper) and accept slightly lower quality — or use Speaches as a combined STT/TTS server that mimics the OpenAI API and integrates with n8n and other tools.

When Is It Worth It?

A local voice assistant is worth it primarily when:
- Privacy is a priority — not a single voice fragment leaves the home network
- Reliability without internet matters — no outage during cloud disruptions
- HA integration is the focus — no other assistant knows your HA entities as well as a directly integrated local LLM
- You already run a home server — the additional load is manageable
Those who primarily want to play music, set timers, or check the weather will find a cloud solution more convenient day-to-day. The local assistant shines with complex HA commands: “Turn all lights on the ground floor to 30 percent and close the blinds” — a sentence that locally trained models like home-3b-v3 can directly translate into HA actions.

Sources and Further Reading
March 13, 2026
AI Agents in 2026: From Chatbots to Autonomous Systems at Home
From Concept to Reality: AI Agents in Everyday Life

Just two years ago, autonomous AI agents were mostly a topic for research labs and tech giants. In 2026, that has fundamentally changed. Models run locally on a Raspberry Pi 5, Home Assistant talks directly to a self-hosted LLM, and n8n workflows use agents that make decisions independently.

But what exactly is an AI agent, and why is now the right time to start exploring them?

What Makes an AI Agent?

A classic language model answers a question — and that’s it. An AI agent, on the other hand, can:
- Pursue goals, not just respond to prompts
- Use tools (call APIs, read files, execute code)
- Plan multiple steps and pass results between them
- Collaborate with other agents
The key paradigm shift: instead of specifying *how* to do something, you tell the agent *what* the goal is. Intent-based computing instead of instruction-based computing.

The Most Important Trends in 2026

1. Multi-Agent Systems Become the Standard

Single agents hit limits quickly. The answer: teams of specialized agents that solve complex tasks together. Frameworks like CrewAI (44,000+ GitHub stars) and Microsoft Research’s AutoGen (54,000+ GitHub stars) make it possible to coordinate agents with clearly defined roles — researcher, writer, reviewer — into a coherent workflow.

For home users, this is especially interesting: these systems can run entirely locally, with no dependency on cloud APIs.

2. Local LLMs Have Reached Maturity

Ollama has established itself as the de facto standard for local model management. A single command is enough to start models like Llama 3.2, Mistral 7B, or DeepSeek-R1 — with an OpenAI-compatible API that works with virtually every tool.

Hardware requirements in 2026 are manageable:

Model RAM Required Best For
Llama 3.2 3B 4 GB Simple tasks, fast responses
Mistral 7B 8 GB Good all-round model
Llama 3.1 8B 8–10 GB More complex reasoning
Qwen 2.5 Coder 8 GB Code generation

A Raspberry Pi 5 with 8 GB RAM can run Llama 3.2 3B without issue — not lightning fast, but completely adequate for many home automation tasks.

3. Home Assistant Becomes the AI Control Center

Home Assistant has evolved into the natural integration platform for local AI agents. Since the blog post from September 2025, HA supports full AI Task Entities, tool-calling, and agentic loop support.

The home-llm integration goes even further: a local model gets access to all HA entities and can control devices autonomously, without having to explicitly program every command. The model understands context — “it’s getting cold” can result in the heat being turned up and the blinds closing.

Practical Example: Local Voice Assistant with Whisper + Ollama
```
Microphone → Whisper (Speech-to-Text, local)
           → Ollama Llama 3.2 (Intent + Tool-Calling)
           → Home Assistant REST API
           → Device is controlled
```
Latency: under 2 seconds on a Raspberry Pi 5 with 16 GB RAM. Completely offline — no data leaves the home network.

4. Raspberry Pi AI HAT+ 2: Dedicated AI Hardware for the Pi

Since January 2026, the Raspberry Pi AI HAT+ 2 has been available. The Hailo-10H accelerator brings up to 40 TOPS (INT4) and 8 GB of its own LPDDR4X memory. This significantly offloads the main processor and enables faster inference at noticeably lower power consumption.

For home automation, this means: a Pi 5 with AI HAT+ 2 can continuously evaluate sensor data, detect anomalies, and act proactively — without noticeable performance overhead for other tasks.

5. n8n + Ollama: Visual Agent Workflows Without Coding

n8n has established itself as the ideal platform for agentic workflows that don’t require programming. Combined with a local Ollama server, powerful automations emerge:
- Energy reporting: Sensor data from Home Assistant → Ollama analyzes → WhatsApp summary
- Smart alerts: Anomaly in consumption data → Agent evaluates context → Push notification only when truly relevant
- Shopping assistant: Inventory sensor drops below threshold → Agent checks calendar and prices → Shopping list in Notion
A simple n8n setup for local AI:
```
{
  "nodes": [
    { "type": "n8n-nodes-base.scheduleTrigger" },
    { "type": "@n8n/n8n-nodes-langchain.lmOllama",
      "parameters": { "model": "llama3.2", "baseUrl": "http://localhost:11434" }
    },
    { "type": "@n8n/n8n-nodes-langchain.agent" }
  ]
}
```
Getting Started: Practical Recommendations

Level 1 — Experiment locally (doable right now):
1. Install Ollama: curl -fsSL https://ollama.com/install.sh | sh
2. Pull a model: ollama pull llama3.2
3. Start Open WebUI as a chat interface via Docker
Level 2 — Connect to Home Assistant:
1. Install home-llm Custom Integration via HACS
2. Configure the Ollama endpoint in HA
3. Set Assist to use your local LLM as the conversation agent
Level 3 — Build your own agents:
1. Connect n8n with the Ollama node
2. Build first agentic workflows for energy reporting or notifications
3. Optional: CrewAI or LangGraph for more complex multi-agent scenarios
What’s Coming Next?

Development continues to accelerate. A few trends taking shape in 2026:
- Embedded agents: Smaller, specialized models running directly on microcontrollers — first experiments with ESP32 and Cortex-M are underway
- Persistent memory: Agents that learn across sessions and permanently store personal preferences
- Local computer use: Agents that operate the desktop — currently cloud-only, but first local implementations are on the horizon
Conclusion

2026 is the year AI agents found their way from data centers into the living room. The combination of powerful local hardware (Raspberry Pi 5, AI HAT+), mature frameworks (Ollama, Home Assistant, n8n), and improved models makes it possible to run real agent systems without any cloud dependency.

The barrier to entry has never been lower — and full control over your own data stays entirely with you.

Sources:
March 13, 2026

Model	RAM Required	Quality
`tiny`	~1 GB	for testing
`small`	~2 GB	good balance
`large-v3-turbo`	~3 GB	recommended (almost as good as large-v3)

Phase	Duration
Wake word detection	<100 ms (openwakeword)
STT (large-v3-turbo, short sentence)	1–3 s
LLM response (llama3.2:3b)	5–15 s
TTS (Piper, one sentence)	<1 s
Total	~7–20 s

Model	RAM Required	Best For
Llama 3.2 3B	4 GB	Simple tasks, fast responses
Mistral 7B	8 GB	Good all-round model
Llama 3.1 8B	8–10 GB	More complex reasoning
Qwen 2.5 Coder	8 GB	Code generation

Paperclip AI: The Open-Source Platform for Autonomous AI Companies

AI Technology Abstract — Paperclip AI — Where artificial intelligence meets corporate structure

What if an entire company were run by AI agents — complete with budgets, hierarchies, governance, and audit trails? That’s exactly what Paperclip AI makes possible.

What is Paperclip AI?

Paperclip is an open-source orchestration platform for so-called “Zero-Human Companies” — organizations run entirely by AI agents. It’s neither a chatbot nor a traditional agent framework. Instead, it’s a management layer that sits on top of existing AI agents, coordinating, monitoring, and budgeting them.

The project is available under the MIT license on GitHub with over 21,000 stars and is actively maintained. The current version is v0.3.1 (March 2026).

The Core Concept: Bring Your Own Agent

Unlike platforms such as CrewAI or AutoGen that come with their own agent definitions, Paperclip follows a “Bring Your Own Agent” approach. This means you can plug in any AI runtime — from Claude Code and OpenClaw to Codex, Cursor, or even simple Bash scripts.

Agents communicate through an intelligent heartbeat system: they wake up, receive tasks, complete them, and report back. Through Runtime Skill Injection, agents can learn new workflows on the fly — without any retraining.

Code on screen — Paperclip supports any AI runtime — from Claude Code to custom scripts

Organizational Structure Like a Real Company

What makes Paperclip unique is how consistently it maps corporate structures onto AI agents:

Org charts with hierarchies, roles, titles, and reporting lines
Departments like Engineering, Finance, and Operations
Projects and issues — every task is traceable back to the company mission
Multi-company support — a single deployment can run multiple isolated “companies” simultaneously

In practice, this means you can define a CEO agent that makes strategic decisions and delegates tasks to a CTO agent, who in turn coordinates engineering agents. Each agent has a clearly defined area of responsibility.

Team organization — AI agents organized in corporate hierarchies — from CEO to individual contributors

Budget Control: No Agent Runs Unchecked

A common problem with autonomous AI systems is spiraling costs. Paperclip solves this with built-in budget management:

Monthly budgets per agent with hard enforcement
Automatic pause at 100% utilization, warning at 80%
Cost tracking at task, project, and company level
Transparent billing: every API call is logged

If you’re running 20 Claude Code sessions in parallel, you’ll always know exactly what each agent costs — and can throttle or pause individual agents as needed.

Financial dashboard — Per-agent budget tracking with automatic enforcement and cost transparency

Governance and Control

Paperclip treats human users as “Board Members” with full control over the AI organization:

Approval gates for strategic decisions — critical actions require human sign-off
Approve, pause, terminate, or override agents at any time
Versioned configurations with rollback capability
Immutable audit log (append-only): every tool call, every instruction, every decision is recorded

This makes Paperclip particularly attractive for scenarios where traceability and compliance matter.

Control and oversight — Board-level governance with approval gates and immutable audit trails

Technical Architecture

Paperclip is written in TypeScript and uses the following technologies:

Backend: Node.js 20+ with Express
Frontend: React-based dashboard
Database: PostgreSQL (embedded or external)
API: REST on port 3100
Deployment: Docker, Docker Compose, or natively via npx

The setup is surprisingly simple:

# Quick start with npx (embedded PostgreSQL)
npx paperclipai onboard --yes

# Or via Docker Compose
docker compose -f docker-compose.quickstart.yml up -d

Use Cases

1. Autonomous Software Company

Organize multiple coding agents as a team: a CTO agent plans the architecture, engineering agents implement features, and a test engineer validates the results. All coordinated through Paperclip’s issue system.

2. Multi-Business Management

Anyone running multiple projects or business models in parallel can create a separate “company” in Paperclip for each — with isolated budgets, agents, and governance rules.

3. Smart Home and Infrastructure

Specialized agents monitor and optimize home infrastructure: a Smart Home agent manages Home Assistant automations, while a DevOps agent handles Docker containers and backups.

4. Content and Reporting

Agents that regularly generate reports, create social media posts, or analyze data — with clear budget limits and human approval before publication.

5. 24/7 Operations

Thanks to the heartbeat system, agents work autonomously around the clock. They wake up, check for new tasks, complete them, and go back to standby — without human intervention.

How Does Paperclip Compare to Other Frameworks?

Data comparison — Paperclip operates one layer above traditional agent frameworks

Aspect	Paperclip	CrewAI / AutoGen
Focus	Organization & Governance	Workflow Execution
Agents	Bring Your Own (any runtime)	Built-in definitions
Budget	Built-in, per agent	Not available
Multi-Company	Yes, isolated	No
Audit	Append-only log	Minimal
Metaphor	“Found a company”	“Define a workflow”

Paperclip doesn’t compete directly with agent frameworks — it sits one layer above. You use Paperclip to coordinate and monitor agents running on any runtime of your choice.

Real-World Example: Paperclip on a Raspberry Pi

Raspberry Pi hardware — Running an entire AI company on a Raspberry Pi 5 — Paperclip makes it possible

Paperclip runs perfectly as a Docker container on a Raspberry Pi 5 — including its PostgreSQL database. In my setup, Paperclip manages eight specialized agents:

CEO — strategic coordination and prioritization
CTO — technical architecture decisions
CFO — budget monitoring and cost optimization
Python Expert — scripting and automation
Java Expert — backend development
Test Engineer — quality assurance and testing
Smart Home Expert — Home Assistant and IoT integrations
Financial Expert — analytics and reporting

All agents use Claude Sonnet as their backend and have access to the local network, Docker containers, and SSH connections. The heartbeat system ensures agents only become active when there are actual tasks to complete — saving both resources and costs.

Getting Started

Ready to try Paperclip? Here’s how to get started in under five minutes:

# Option 1: Quick start (includes embedded PostgreSQL)
npx paperclipai onboard --yes

# Option 2: Docker (recommended for servers)
git clone https://github.com/paperclipai/paperclip.git
cd paperclip
docker compose -f docker-compose.quickstart.yml up -d

# Then open http://localhost:3100 in your browser

From there, you can create your first company, define agents, assign roles, and start delegating tasks.

Conclusion

Paperclip AI fills a gap in the AI ecosystem: while other tools focus on executing individual agents, Paperclip provides the organizational infrastructure for entire AI teams. With budget management, governance, audit logs, and the flexible “Bring Your Own Agent” approach, it’s the ideal platform for anyone looking to move beyond single chatbot interactions.

Whether as an experiment on a Raspberry Pi or as a production system in the cloud — Paperclip makes the leap from “I use AI” to “AI works for me” tangible.

Links: GitHub · Website · License: MIT

March 13, 2026

Author: constantinm

Local Voice Assistant: Whisper + Ollama + Piper on the Raspberry Pi

Voice Control Without the Cloud: Is That Really Possible?

The Architecture: Four Building Blocks, One System

Block 1: Whisper (Speech-to-Text)

Block 2: Ollama (Language Model)

Block 3: Piper (Text-to-Speech)

Block 4: wyoming-satellite (optional, but useful)

Setup: Docker Compose in 30 Minutes

Configuring Home Assistant

Realistic Performance Numbers

When Is It Worth It?

Sources and Further Reading

AI Agents in 2026: From Chatbots to Autonomous Systems at Home

From Concept to Reality: AI Agents in Everyday Life

What Makes an AI Agent?

The Most Important Trends in 2026

1. Multi-Agent Systems Become the Standard

2. Local LLMs Have Reached Maturity

3. Home Assistant Becomes the AI Control Center

4. Raspberry Pi AI HAT+ 2: Dedicated AI Hardware for the Pi

5. n8n + Ollama: Visual Agent Workflows Without Coding

Getting Started: Practical Recommendations

What’s Coming Next?

Conclusion

Paperclip AI: The Open-Source Platform for Autonomous AI Companies

What is Paperclip AI?

The Core Concept: Bring Your Own Agent

Organizational Structure Like a Real Company

Budget Control: No Agent Runs Unchecked

Governance and Control

Technical Architecture

Use Cases

1. Autonomous Software Company

2. Multi-Business Management

3. Smart Home and Infrastructure

4. Content and Reporting

5. 24/7 Operations

How Does Paperclip Compare to Other Frameworks?

Real-World Example: Paperclip on a Raspberry Pi

Getting Started

Conclusion