Database Schema & Storage

PopIt3 uses GNU Database Manager (GDBM) for all persistent storage. Each database is a key-value store where values are JSON documents.

Databases

~/.email3.mail.gdbm

Raw email storage from POP3 syncKey: UIDL (POP3 unique ID per email)Value: {size: bytes, mail: str} where mail is raw email decoded latin-1Purpose: Source of truth for all emails; enables reprocessing

~/.jobserve.gdbm

Analyzed job opportunities and related emailsKey: Message-ID (email header)Value: JSON document with classificationContains: scored alerts, scored suggestions, unclassified emails

~/.jobserve_applications.gdbm

Application confirmation records — INTENTIONALLY SEPARATE from the main jobs database.Key: Message-ID (email header)Value: JSON document with parsed application detailsRationale: Applications are displayed in a separate table at the bottom of the web report and have a different retention period (28 days vs 14 days for scored jobs). Keeping them in a separate DB simplifies iteration and avoids mixing concerns.Used by: newparser_jobserve.py (writes), job_analysis_report.py (reads for report), job_api.py (reads for API output)

GDBM Wrapper: gdata.py

Three-class hierarchy:• gdata_raw — raw bytes access to GDBM• gdata_simple(gdata_raw) — adds UTF-8 encoding/decoding• gdata(gdata_simple) — adds JSON serialization/deserializationTypical usage: gdata(filename, mode) — values are Python dicts, automatically JSON-encoded on write and decoded on read.GDataLockedError: wraps EAGAIN when another process holds the lock. job_api.py catches this and returns 503 Retry-After.Context manager: with gdata(file) as db:

Job Records Structure

All scored jobs in ~/.jobserve.gdbm with JSON structure

Scored Job Alert/Suggestion

{ 'subject': 'Senior Python Engineer (London)', 'date': '2026-01-22T12:30:00+00:00', 'job_type': 'alert', // or 'suggestion' 'parsed_job': { 'job_title': 'Senior Python Engineer', 'company': 'TechCorp Inc', 'location': 'London, UK', 'salary': '£100-130k + benefits', 'work_type': 'permanent', 'description': '<html>...full job details...</html>', 'job_url': 'https://...', 'ref': 'TC-12345', 'posted': '22/01/2026 18:24:45' }, 'scored_job': '{"score": 8, "reason": "Strong technical match..."}' // string, parseable JSON 'score': 8, // integer 0-10 'score_reason': 'Strong technical match with Python, good location...'}

Job Application Record (in applications DB)

{ 'subject': 'JobServe Job Application Confirmation', 'date': '2026-01-22T14:00:00+00:00', 'job_type': 'application', 'parsed_application': { 'application_id': 'APP-98765', 'job_title': 'Senior Engineer', 'company': 'TechCorp' }}

Unclassified Email Record

{ 'subject': 'Random email from JobServe', 'date': '2026-01-20T10:00:00+00:00', 'unclassified': {} // NOTE: unclassified:{} marker distinguishes from other record types}

Indexing Strategy

Primary key: Message-ID (email header)Allows: • Deduplication (same email not processed twice)• Efficient lookup of specific job• Email-to-record tracingNote: No secondary indices; linear scan for filtering by date/score

Cleanup Process

Triggered by job_analysis_report.py::process_job_analysis():Scored jobs >14 days old:• Deleted from ~/.jobserve.gdbm• UIDL returned for email deletion from POP3 mailboxApplications >28 days old:• Deleted from ~/.jobserve_applications.gdbm (the separate applications DB)• UIDL returned for email deletion from POP3 mailboxUnclassified emails:• Never auto-deleted (kept for debugging)

Backup & Recovery

GDBM files are flat files in ~/. Backup by copying:cp ~/.jobserve.gdbm ~/.jobserve.gdbm.backupcp ~/.jobserve_applications.gdbm ~/.jobserve_applications.gdbm.backupRecovery from raw emails:If .jobserve.gdbm corrupted, can reprocess from ~/.email3.mail.gdbm using reprocess_output.txt or analyze_all.json

version 2