Concrete Improvements: Code Retrieval Case (2026-02-22)

Derived from Code Delivery and Retrieval Failure. Two categories: code changes to orchestrator/schema, and notes changes to state instruction notes.

Code Changes

1. Attachment Parsing (Input)

Files: orchestrator.py (parse_email_message), envoy_schema.py (EnvoyResponse)

In parse_email_message(): scan all MIME parts. For non-text/plain, non-text/html parts, extract metadata and optionally content. Build an available_attachments list in the parsed email dict:

# In parsed email dict:
"available_attachments": [
    {
        "filename": "jsonhtl_to_html.py",
        "content_type": "text/x-python",
        "size_bytes": 5798,
        "readable": True   # text type, can be loaded on request
    },
    {
        "filename": "diagram.png",
        "content_type": "image/png",
        "size_bytes": 42000,
        "readable": False  # binary, not yet supported
    }
]

Text-readable types: .py .txt .js .ts .json .yaml .yml .md .html .css .sh .rb .go .rs .java .c .h .cpp. Store attachment bytes in a temp dict keyed by (message_id, filename) so they can be retrieved on request.

Add to EnvoyResponse schema:

class AttachmentFetch(BaseModel):
    message_id: str   # Message-ID of the email
    filename: str     # Attachment filename to load

class EnvoyResponse(BaseModel):
    # ... existing fields ...
    read_attachments: List[AttachmentFetch]  # Request attachment content

In the gathering loop: if read_attachments is non-empty, load each readable attachment and add to context as:

=== ATTACHMENT: jsonhtl_to_html.py (from <msg-id>) ===
#!/usr/bin/env python3
... content ...

Non-readable types: log a warning, add "ATTACHMENT filename (binary, not readable)" to context, add to gather_results as UNSUPPORTED.

2. Attachment Sending (Output)

Files: orchestrator.py (send_email / deliver_via_mail), envoy_schema.py (EmailOut)

Add to EmailOut:

class EmailAttachment(BaseModel):
    note_key: str    # Load content from this note
    filename: str    # Attachment filename in the email

class EmailOut(BaseModel):
    # ... existing fields ...
    attachments: List[EmailAttachment]  # Files to attach

In send_email(): if attachments are present, build a MIMEMultipart email, fetch each note's content, and attach as text/plain (or appropriate MIME type based on filename extension). The email body goes in the first text/plain part as usual.

This enables the correct code-delivery pattern: body = "Code is in notes at X. See attached.", attachment = file from note.

3. Outbound Email Length Guard

File: orchestrator.py (dispatch_immediate_actions or send_email)

Before sending any email, check the body length. If body > 4000 chars (configurable constant EMAIL_BODY_WARN_CHARS):

EMAIL_BODY_WARN_CHARS = 4000

if len(email_out.body) > EMAIL_BODY_WARN_CHARS:
    logger.warning(
        f"Email to {email_out.to!r} body is {len(email_out.body)} chars "
        f"(>{EMAIL_BODY_WARN_CHARS}). Consider using an attachment instead."
    )

Log only for now — do not silently truncate. The LLM needs to learn to use attachments instead. In future this could auto-convert the body tail to an attachment.

4. Approach Tracking to Prevent Loops

File: orchestrator.py (main loop)

Track which note keys and search queries have been attempted this run. Surface them in the LLM context alongside the UNAVAILABLE list from failed_fetches:

# In main loop, alongside failed_fetches:
attempted_searches: list[str] = []  # search queries tried this run

# When building LLM context, add:
if attempted_searches:
    context += "\n=== SEARCHES TRIED THIS RUN ===\n"
    for s in attempted_searches:
        context += f"  {s}\n"
    context += "Try a different approach if these have already failed.\n"

The existing failed_fetches dict already tracks note fetch failures (after 2 attempts, marks as UNAVAILABLE). Extend it to also track search query failures, and surface both to the LLM explicitly.

Notes Changes (State Instruction Updates)

5. envoy/states/triage — Add: self-retrieval and work index

Add two rules to the triage instructions:

Self-retrieval rule: If the user is asking about something you previously wrote, stored, or created — a note, code, a document — your first gathering action is to load that note by key. Do not search emails. Check CONTENTS or the email thread for the note key you used.

Work index rule: For any task that will take more than one iteration, create a work index note immediately (e.g. scratch/{topic}-index) and set bundle_key to it. The index should list: the task description, all note keys you will use, current status. This ensures context is available on resumption without re-discovering it.

6. envoy/states/gathering — Add: notes-first, vary approach

Add two rules:

Notes-first: If you are looking for something you created (code, a document, data), check notes before searching emails. The note key is usually visible in the email thread or in CONTENTS. If not, search CONTENTS before searching emails.

Vary approach: Track what you have tried in working_note. If the same fetch or search has failed twice, you must try a different approach: different note key, different search terms, different folder, or escalate to ask the user. Do not repeat a failed strategy a third time.

7. envoy/states/composing — Add: no inline code, use note references and attachments

Add a rule:

Code and documents: Never copy code or a large document verbatim into an email body. Instead:• Reference the note key: "The code is stored in notes at projects/X/code"• Attach the file using the attachments field: set note_key to the note containing the code, filename to the intended filename• Summarise what changed, but do not forward the full contentWhen reporting an update to existing code: say what changed, reference the note key, attach if the user needs the file. Do not resend the entire code block in the email body.

8. envoy/start — Add: notes are the file system

Add a section to the start instructions:

Notes are your file system. Email is the channel.• Code, documents, and data belong in notes — not in email bodies• To send a file: reference its note key in the email body and attach it using the attachments field• If something was stored in notes, you can always retrieve it — search notes before searching emails• For multi-step tasks: create a work index note (scratch/{topic}-index) and set bundle_key to it immediately in triage. List all your working note keys in the index.• The note is the source of truth. The email is a notification.

Implementation Priority

Item	Impact	Effort	Do first?
Notes changes (5-8)	High — prevents the loop from recurring	Low — edit 4 notes	Yes
Email length guard (3)	Medium — visible warning without breaking anything	Low — ~10 lines	Yes
Approach tracking (4)	High — breaks the repetition loop	Low — ~20 lines + context string	Yes
Attachment output (2)	High — correct code delivery pattern	Medium — MIME multipart + schema	Next
Attachment input (1)	High — enables source file workflow	Medium — MIME parsing + schema	Next

Implementation Status (updated 2026-02-22)

All 8 items above are now complete. Additional improvements implemented in the same session:

• Email length guard (EMAIL_BODY_WARN_CHARS = 4000): implemented in dispatch_immediate_actions(). Logs WARNING before sending oversized body.

• Approach tracking (attempted_searches): implemented. Persisted in continuation.json. Surfaced to LLM as '=== SEARCHES TRIED THIS RUN ==='.

• Note-write repair (_parse_note_value()): implemented. Fixes LLM codeblock encoding bug. notes_client.write_doc() silent fallback removed. See note-encoding issue.

• Dead link recovery: documented in README/links with two-approach procedure.

• CONTENTS stale link: root cause of 2026-02-22 JSONHTL run failure. Rule: always update CONTENTS when renaming or deleting any note that appears there.

version	`2`
created	`2026-02-22`