This document describes a proposed expansion of the Envoy processing model from its current binary gathering/complete model into a richer, phase-aware finite state machine. The goals are:• Allow emails to be sent at any phase, not just at completion• Match the context provided to the LLM to the type of cognitive work it is currently doing (RAG principle)• Support summarisation and short-term memory for managing large context• Make all behaviour advisory and notes-driven — Envoy can update its own instructions
Retrieval-Augmented Generation (RAG) is at its core about matching context to cognitive task. When you are orienting yourself, you want broad shallow context. When you are writing code, you want focused technical context. When you are composing a reply, you want communication preferences and prior correspondence.Loading everything all the time defeats this: it pollutes the context window with irrelevant material and gives the LLM no structural signal about what kind of thinking it should be doing right now.The state machine enables RAG naturally: each state signals the current cognitive task, and the orchestrator (and the LLM itself) can load state-appropriate notes. The LLM still requests specific notes explicitly via add_notes; the state provides the default context layer on top of that.
All non-terminal states are fully connected — the LLM may transition to any other state on each iteration. There is no enforced sequence. Transitions are advisory, determined by the LLM's understanding of what it needs to do next.
triage Initial read. Categorise the email, understand the sender, decide what type of
task this is and what approach it needs. No emails sent from here.
Auto-loads: envoy/states/triage, user context notes, CONTENTS
gathering Load context. Fetch notes, search emails, retrieve reference material.
Emails advisory: may send to external agents or ask user clarifying questions,
but final replies belong in composing or complete.
Auto-loads: envoy/states/gathering, thread bundle (if exists)
summarising Condense loaded context. Read fetched material, extract relevant parts into
short-term scratch notes, drop originals from context.
Emails: No.
Auto-loads: envoy/states/summarising
working General processing — analysis, research, writing, decision-making.
write_notes heavily. Progress-update emails are acceptable.
Auto-loads: envoy/states/working, thread bundle
coding Code-writing. Load existing code, plan the change, implement incrementally.
One routine per iteration where possible. No emails unless a question is needed.
Auto-loads: envoy/states/coding, thread bundle
composing Prepare outbound communication. Draft emails, check tone and content against
user preferences. Emails: yes, this is the primary email-sending state.
Auto-loads: envoy/states/composing, user preferences notes
waiting Voluntarily park and wait for external input. Dispatches send_emails immediately,
sends a continuation email to preserve state, then stops the run.
Used when: question sent to user, request sent to external agent, confirmation
needed before proceeding. Processing resumes when the reply arrives.
Auto-loads: nothing (job of this state is to STOP)
complete All work done. Dispatch all remaining actions (send_emails, move_emails,
delete_emails, write_notes, delete_notes). Stop.
Auto-loads: nothing
escalate Cannot handle this. Flag for human. Stop.
Auto-loads: nothing
Simple question: triage → composing → completeResearch task: triage → gathering → gathering → summarising → working → composing → completeCoding task with question: triage → gathering → coding → coding → coding → waiting (reply arrives) → coding → coding → composing → completeExternal agent call (websearch@): triage → gathering → composing [sends search request] → waiting (agent replies) → gathering [loads result] → working → composing → complete
Each state name maps to a note at envoy/states/{name}. When the LLM sets a status, the orchestrator auto-loads the corresponding state note for the NEXT iteration. These notes contain the specific instructions, mindset, and guidance for that phase.Example envoy/states/triage might say: 'You are a secretary deciding how an incoming request should be handled. Read the email carefully. Ask: who is this person, what are they asking, is this a simple question or a multi-step task, is there an ongoing project this relates to, does it need specialist treatment (coding, research, scheduling)? Set your next status to signal what kind of work is needed.'Example envoy/states/coding might say: 'You are writing code. Load the existing code for this project before changing anything. Implement one routine per iteration, writing it to the appropriate note key. Before writing, state your plan in working_note. After writing, verify the logic is sound. Aim for working, testable code — not stubs or outlines.'State notes are just notes. Envoy can update them. They are not hardcoded anywhere in the orchestrator.
A context bundle is a note that lists other notes to auto-load whenever this thread is active. Each ongoing task or project creates and maintains its own bundle. The bundle is the LLM's way of saying 'these are the notes that are always relevant to this subject'.Bundle note key pattern: bundles/{topic-slug}Example bundles/z80-sudoku might contain: notes: [projects/Z80-sudoku-challenge/index, Z80/instruction-set, projects/Z80-sudoku-challenge/plan]When the LLM identifies that an incoming email relates to an existing bundle (by checking CONTENTS or searching for prior threads), it loads the bundle note and the orchestrator auto-loads all listed notes at the start of each gathering iteration.The LLM creates the bundle note during the first triage iteration for a new topic, and adds to it during gathering as new relevant notes are discovered.
When should a bundle be deleted?Bundles must survive beyond status=complete because the thread may resume — a user may come back with follow-up requests on the same topic. Deleting on complete would destroy useful context.Options under consideration:• Explicit closure: Envoy deletes the bundle only when it detects the thread is truly closed (email moved to Done or Archive, and the user said something like 'thanks, all done').• Bundle flagging: Bundles carry a 'last_active' date. A periodic housekeeping run could list old bundles and prompt for deletion.• Manual: Bundles persist until explicitly deleted by the user or by Envoy on instruction.For now: bundles persist until Envoy explicitly deletes them. The LLM should include bundlecleanup in its completion checklist when a topic is definitively closed.
Some loaded documents (notes, emails) contain relevant information mixed with irrelevant content. Loading the full document wastes context. Two approaches:
Pattern: read a large document → extract relevant parts → write to a scratch note → drop the original from context.The 'summarising' state is designed for this. The LLM:1. Has fetched one or more large documents2. Transitions to status='summarising'3. Writes concise summaries to scratch/{topic}-{aspect} notes4. Uses the 'drop' field to remove the originals from the active document set5. On subsequent iterations, only the compact scratch notes are in context6. At completion (or when the scratch material is no longer needed), deletes the scratch notesScratch notes use the scratch/ key prefix by convention, making them easy to identify and clean up.Note: the 'drop' field already exists in the EnvoyResponse schema. Short-term summarisationis largely implementable with current tools — it is a workflow convention, not a new feature.
A future enhancement to the notes system would allow retrieving specific sections of a note by heading anchor, rather than loading the full document. This is analogous to chunk-based retrieval in traditional RAG pipelines.Example: instead of loading the full Z80/instruction-set note (thousands of lines), retrieve onlythe section under the heading 'Bit Manipulation Instructions'.This would require:• Notes system support for sub-document addressing (e.g. key#heading-slug)• The LLM knowing what headings exist (possibly from a note's table of contents)• Orchestrator support for partial note fetchUntil this is built, the workaround is the summarisation pattern above: load the full note,extract what is needed into a scratch note, drop the original.
Not every loaded document needs summarisation. Some gathering steps simply accumulate note keys in context — these notes remain available throughout the run and are pulled back into focus naturally when the LLM needs them for working or composing.The LLM does not need to re-fetch a note that is already in context. The document set persists across iterations within a run. The key discipline: know what is already in context (listed at the top of each iteration prompt) and do not re-request it.This is already supported: the iteration prompt shows the full current document set. The LLM can read this list and request only new documents.
No behaviour described in this document should be hardcoded in the orchestrator beyond the minimal mechanics (detect state, dispatch actions, send continuation for waiting). All guidance, instructions, context bundles, and conventions live in notes.This means:• Envoy can update its own state instructions (envoy/states/*) if it discovers a better approach• State instructions can differ per mailbox or per deployment• New states can be introduced by creating a new envoy/states/{name} note — no code change required• Context bundles are self-managing: the LLM creates, updates, and eventually deletes themThe orchestrator's job is mechanics. The notes system's job is intelligence.
The schema change required is minimal:• Expand the 'status' field to accept the new state names• Allow send_emails to dispatch on any non-terminal state (remove the complete-only restriction)• For 'waiting': dispatch send_emails immediately, then trigger auto-continuation and stop the run• Auto-load envoy/states/{status} note at the start of each iteration• Auto-load bundle notes when a bundle key is known (carried in continuation attachment or working_note)The existing model tier (next_model) hint mechanism applies naturally: state notes can recommenda model tier. E.g. envoy/states/triage might advise 'mini' (orientation is cheap),envoy/states/coding might advise 'full' (code quality needs the best model).All current behaviour for complete and escalate remains unchanged.
1. Should 'triage' be a mandatory first iteration, or optional — only used when the LLM judges it necessary? Current lean: optional. Simple emails can skip straight to composing → complete.2. Do we need distinct 'coding' and 'working' states, or can coding be handled by working with a specialised state note? Current lean: keep 'coding' as a distinct state since it benefits from notably different auto-loaded context (existing code, test notes, reference docs).3. Should there be a 'planning' state before 'coding' for non-trivial changes? Current lean: document this in envoy/states/coding rather than add a state — keeps FSM lean.4. Bundle cleanup: see Bundle Cleanup section above. Decision deferred.5. Should the orchestrator validate state names against a known list, or accept any string and look for a matching envoy/states/{name} note? Current lean: accept any string — enables new states without code changes.
Related notes: Agent Instructions, TODO, Design Specification, Design Supplement