Problem: Raw LLM responses are unstructured and hard to parseSolution: OpenAI schema parameter enforces strict JSON structureBenefit: document_generator.py can reliably extract CV/cover letter dataTrade-off: Schema must be carefully designed to constrain LLM output
PDFs are complex and extraction varies by PDF encodingPriority order: pdfplumber (most reliable) → pypdf → PyPDF2Fallback: Try each until one succeedsBenefit: Handles PDFs created by different tools without user intervention
By default: LLM sees filename headers and is told to reference documentsUseful for: Document comparison, Q&A tasks where context mattersFor jobs: CV and job description are inputs, not documents to referenceFlag removes headers: Cleaner JSON without instruction to cite sourcesBenefit: Structured output is cleaner, easier to parse
Schema returns plain text fields with embedded bold markersFull Markdown support (headers, lists) conflicts with structured schemaLimited parsing: bold onlyBenefit: Simple formatting in JSON without complex markdown rendering
LLM outputs JSON (structured format)document_generator uses YAML parser (more flexible)YAML is superset of JSON - all valid JSON is valid YAMLBenefit: Flexibility if JSON needs to be hand-edited as YAML later
CV space is premium (fit on 1-2 pages)Narrow margins: 0.5 inches (vs standard 1 inch)Small font: 10.5pt (vs 11pt standard)Minimal spacing: Pt(2) between elementsBenefit: Maximize content without looking cramped
Problem: Generic LLM responses over-exaggerate skillsConstraint: Detailed rules prevent hallucination (invent skills not in CV)Emphasis: NEVER fabricate, NEVER add version numbers, NEVER exaggerateConsequence: Longer prompt = better output quality for job applications
ask is a general-purpose tool (any files, any question)apply.sh specializes ask for job applicationsBenefit: ask remains flexible for other usesapply.sh encapsulates: system prompt, schema path, CV source file
Simple pipeline: Ask → JSON → DOCXEach tool does one thing wellNo magic - explicit schema, explicit formattingHuman readable at each stage (can inspect JSON before DOCX generation)