EnrichCRM
CSV in, enriched contacts out, at a fraction of the cost.
A contact-enrichment pipeline that turns a CSV of business emails into enriched records, built to do it at a fraction of managed-service cost.
- language
- TypeScript (Node CLI + Next.js 15)
- CLI stack
- Brave Search, Trafilatura, Gemini Flash
- UI stack
- OpenAI agents, Firecrawl, SSE streaming
- storage
- SQLite, per-contact cost tracking
- ownership
- Sole engineer — built end to end
- history
- built Jun 2025 to Apr 2026
- repo
- Private
The problem
Researching each contact company, industry, and tech stack by hand is slow, and managed enrichment services charge by the record. EnrichCRM treats that cost as an engineering problem: a local scraper and a cheaper model in place of the paid stack.
Who this serves
Marketing and research operators enriching outbound lists — downstream outreach runs on researched company, industry, and stack data without per-record managed-service fees.
How it is built
-
Cost treated as an engineering problem
The CLI replaces a paid scraping layer with Trafilatura (local, no per-page network round-trip) and swaps GPT-4o for Gemini Flash. Measured spend lands near $0.003 per contact in the --stats output. The broader "4.6x faster, 77% cheaper" comparison is self-reported, with no benchmark harness in the repo.
-
Resume-safe batch processing
Status is written to SQLite before and after every contact, so an interrupted run restarts at the exact failure point with --resume, and --retry-errors reprocesses only the failures rather than the whole list.
-
Multi-agent field routing on the UI path
The web surface categorizes each requested field into one of six domains and dispatches to a purpose-built agent, with cross-source confidence scoring and structured output enforced through Zod schemas.
By the numbers
self-reported marks figures stated in docs or commit history that the source brief could not reproduce from the repository alone. Everything else is traceable to code.
- 4.6x: Basis: README comparison of developer-observed runs: Firecrawl + OpenAI at 47.6s and $0.013/contact versus Trafilatura + Gemini at 10.4s and about $0.003/contact. The repo tracks CLI cost via --stats, but does not include a benchmark harness for the timing multiple.
Where this honestly stands
Shipped. Two enrichment surfaces (a Brave + Gemini CLI and an OpenAI + Firecrawl web UI) share types but not providers, and the README is candid about the seam. There is no spreadsheet UI and no Salesforce integration.
Want the parts that are not in a public repo? I will walk you through the architecture and the decisions on a call.