Envoy Testing Guide

How to test the orchestrator manually, and what was found in testing.

Active Test Plans

Attachment Testing Checklist — 5 tests for attachment parsing, continuation round-trip, filter_text, scan smoke, and E2E (branch: thread-chain)

Attachment Email Testing Checklist — 4 tests sending emails with attachments to envoy; establishes pre-refactor baseline

Test Harness: test_orchestrator.py

Located at ~/py/envoy/test_orchestrator.py. Provides inject/resume/list/run commands.Common invocations: # Inject a test email and immediately run the orchestrator python3 test_orchestrator.py --inject 'Subject line' --body 'Body text' --run # Force continuation after 2 iterations, then resume from Sent folder python3 test_orchestrator.py --inject 'Complex task' --body '...' --run --max-iter 2 python3 test_orchestrator.py --resume --run # List INBOX (shows seq IDs and SEEN/UNSEEN) python3 test_orchestrator.py --list # Mark a specific INBOX message as UNSEEN so orchestrator picks it up python3 test_orchestrator.py --mark-unseen 5

Key Testing Notes

IMAP APPEND marks messages as SEEN by default. Always use --mark-unseen ortest_orchestrator.py (which marks UNSEEN automatically after APPEND).External mail (sendmail → zoneedit MX → Outlook → popit3 → noodle IMAP)takes ~5-20 minutes. For testing, always inject via IMAP APPEND.Continuation emails sent by the orchestrator go through the external loop.For testing, use --resume which finds the latest continuation in the SentIMAP folder (now stored there automatically since 2026-02-22) and injects it.

Phase-Aware Processing: Test Results (2026-02-22)

Phase state machine was tested with two scenarios:Scenario 1 — Simple phase test: triage (iter 1) → composing (iter 2) → complete State notes auto-loaded: ✓ Send from composing (dispatch_immediate): ✓ No double-send on complete: ✓ (prior_sent_to fix)Scenario 2 — Multi-step with continuation (--max-iter 3): Run 1: triage → gathering × 3 → continuation sent (iter 3) Notes written: scratch/z80-test-summary, scratch/z80-test-recs Bundle key set: bundles/z80-sudoku-project Run 2 (resumed): triage → gathering × 2 → composing → complete Intermediate email sent in composing: ✓ Final email sent in complete: ✓ Scratch notes deleted (cleanup): ✓ Original email moved to Done: ✓ Total iterations across both runs: 7

Bugs Found and Fixed

1. Double-send on complete (fixed 2026-02-22): When LLM replied in a non-terminal phase, execute_actions would also send a confirmation reply because it only checked response.send_emails (the final response) not emails dispatched in earlier iterations. Fix: dispatch_immediate_actions accumulates sent_to_addresses set; execute_actions checks prior_sent_to before sending confirmation.2. Duplicate notes in context (fixed 2026-02-22): If LLM requested a note already in gathered_notes, it would be fetched and added again, inflating context with duplicate content. Fix: add_notes fetching now checks already_loaded_keys first.3. move_emails INBOX-only search (fixed 2026-02-22): execute_actions only searched INBOX for emails to move. On continuation resume, the email could be elsewhere. Fix: search INBOX then Active folder before giving up.4. Continuation email not in Sent folder (fixed 2026-02-22): Continuation emails sent via smtplib were not stored in IMAP Sent, making them hard to retrieve for testing. Fix: send_continuation_email now appends a copy to IMAP Sent.

Known Limitations Observed

• Bundle note (bundles/z80-sudoku-project) generates a warning each iteration because the LLM sets bundle_key but never creates the actual bundle note. The LLM needs to CREATE the bundle note (write_notes) if it sets bundle_key. Document this in envoy/states/triage — set bundle_key AND write the note.• On continuation resume, phase defaults back to triage even if continuation data has current_phase=gathering. The triage state note fires an extra LLM call. Consider: restore current_phase and skip triage if resuming.• Notes are de-duplicated within a run, but continuation restores gathered_notes and the resumed run may also re-request the same notes, causing more API calls but not context duplication (the dedup fix now prevents that).

version3
updated2026-03-16