Personal / Family · 2025 · Ongoing

Rosie — a private family AI that runs entirely on local hardware

A long-running family AI assistant on a single Mac Mini — persistent memory, document recall, 142 tools, FHIR-integrated health data — never touching a third-party cloud LLM.

Role Architect & sole engineer
Python 3.12
Ollama
LanceDB
SQLite
asyncio
MCP
launchd

The privacy ask, restated

The pitch for cloud AI assistants is real: they're convenient, they're capable, and they keep getting better. The cost is also real: every interaction is a data exchange with a third party, and for the categories of data that matter most to a family — health, finance, schedules, kids — that's the wrong default.

Problem

The useful version of a personal AI is the one that knows your calendar, your medical history, your family's documents, and your day-to-day routines. That's also exactly the data you don't want sitting in someone else's training corpus, or sitting in their breach notification queue next year.

Rosie is my answer. It's a long-running daemon on a Mac Mini in my closet, and it doesn't talk to OpenAI, Anthropic, or anyone else for inference.

How it works

A short tour:

79K

Lines of code

Python 3.12 across one daemon process, one watchdog, and a small CLI. asyncio for the event loop, structured logging throughout.

142

Tools

Each tool is a typed Python function the agent can call — calendar, mail, health, files, shell, FHIR queries, web, transit, weather, photos. Type signatures are the contract.

3,241

Tests

Pytest suite covering the tool surface, the memory layer, the scheduling system, and end-to-end conversational flows. Every PR runs the lot.

The LLM is whichever Ollama model is loaded at the moment — typically a Qwen or Llama variant tuned for tool use. The memory layer is LanceDB for vector recall and SQLite for the structured side (events, reminders, family graph). The whole thing runs as a launchd service with a separate watchdog process that restarts the daemon if it wedges.

Decisions worth naming

One process, not a service mesh

There's a temptation with agents to split — separate processes for memory, tool dispatch, scheduling, web. I started there and walked it back. The daemon is one Python process with explicit asyncio task boundaries, and the operational story is one launchctl kickstart away from a fresh start.

Decision

Boring infrastructure in service of an ambitious surface. A single supervised process is easier to reason about, easier to debug, easier to recover. The complexity goes into the tool layer and the memory layer where it earns its keep.

Tool contracts before agent prompts

Every tool is a Python function with a typed signature, a docstring describing its semantics, and a Pytest test exercising its happy path and at least two failure modes. The agent prompt is small; the tool surface is the API.

tools/health/get_lab_results.py

@tool(category="health", requires_auth=True)
async def get_lab_results(
  *,
  test_name: str,
  after: date | None = None,
  limit: int = 10,
) -> list[LabResult]:
  """
  Fetch lab results from the FHIR server for the current user.

  test_name: LOINC code or human name ("hemoglobin A1c").
  after: only results dated after this. Defaults to 1 year ago.
  limit: max rows to return (1-50).
  """
  after = after or (date.today() - timedelta(days=365))
  async with fhir_client() as fhir:
      rows = await fhir.observations(
          code=resolve_loinc(test_name),
          date_after=after,
          limit=clamp(limit, 1, 50),
      )
      return [LabResult.from_fhir(r) for r in rows]

The result: model swaps are trivial. When a new Ollama model lands, the regression is "does the new model still pick the right tool 95% of the time?" — measured by a fixed eval suite, not by feel.

FHIR R4 against the real provider, not a mock

The health surface is the load-bearing one. Rosie connects to Epic MyChart via OAuth2 + FHIR R4, fetches observations, conditions, medications, and immunizations against real APIs, and stores them in an encrypted SQLite cache so a typical "what was my last lipid panel" query doesn't hit the network.

Persistent memory as a separate concern

Conversations don't live in the LLM's context window — they live in LanceDB and SQLite. The agent recalls relevant memories per turn via vector search, ranks them by recency × similarity, and the prompt assembly is its own module. New conversations start fast; long-running threads still feel coherent.

Where it is now

Production at home for over a year. Powers the morning brief, schedules family logistics, runs lab-result lookups, drafts emails, and keeps a memory of who's allergic to what. The codebase has been forked into a sister project, Aileen, which applies the same architecture to a business context (review monitoring, financial analysis, the steward role on Chain Seeker).

Result

A 79K-LOC private AI assistant that runs entirely on a single Mac Mini, integrates real-world health data via FHIR, and has been the daily-driver family AI for over a year — with zero data leakage to third-party model providers. The fork to Aileen proved the architecture generalizes; the test suite proves it's maintainable.

What's next

The roadmap is honest: better long-term memory consolidation (today's recall is recency-biased), proactive surfacing of weekly summaries, and a small UI shell so non-CLI users in the family can talk to Rosie without using my terminal.