For a Python call the flow is simple: one iteration to decide and call, result in context next iteration. HTTP calls involving an uncached or stale spec require more steps.
Three effective LLM calls:
Iteration N — Decide: LLM reads CONTENTS, sees the service listed, reads its stub note. Decides to make a call. Emits an http_calls entry. Since spec is absent or stale, orchestrator fetches the spec first.
Iteration N+1 — Schema generation: LLM receives the raw OpenAPI spec. Produces a compact call/response schema (YAML). Orchestrator writes this to the cache note. LLM also constructs the actual API call from the schema and its intent.
Iteration N+2 — Use result: LLM receives the HTTP response, interprets it using the response schema from the previous step. Proceeds with task.
One effective LLM call (same as Python):
Iteration N — Decide and call: LLM reads the cached schema from the service note. Emits http_calls with full parameters. Orchestrator executes immediately. Result in context next iteration.
Non-2xx responses are passed to the LLM with the error schema from the cached spec. The LLM can: retry with corrected arguments, escalate to the user, or note the failure.
If the LLM determines the cached schema was wrong or stale, it flags this. The orchestrator invalidates the cache and re-runs schema generation.