The Four-Method Connector Contract — and Knowing When to Stop

The problem

We ran a PR and podcast production company on four disconnected SaaS tools: QuickBooks (billing), Copper (CRM), Basecamp (projects), and PandaDoc (contracts). Reporting meant manual exports. Adding a new data source required keeping authentication, pagination, rate-limit handling, field mapping, upsert logic, and retries consistent across the platform.

After the third integration, the pattern got hard to miss: every connector solved the same five problems, but each solved them differently. Different auth patterns. Different pagination state machines. Different error-handling philosophies. Different retry strategies that were either too aggressive (rate-limit death spirals) or too passive (data gaps that surface in the QBR).

The real cost was subtle divergence: each connector had to remain understandable while the engine carried the generic reliability behavior.

The contract

I reduced the surface area to four methods:

class Connector(Protocol):
    name: str

    def authenticate(self) -> AuthContext: ...
    def get_sync_order(self) -> Sequence[str]: ...
    def fetch_entities(self, entity_type: str, since: datetime | None) -> Iterable[RawEntity]: ...
    def transform_entity(self, entity_type: str, raw: RawEntity) -> NormalizedRecord: ...

That’s it. No generic run(), upsert(), or health_check(). Those live in the engine, not the connector.

Why four methods, not three or five?

Three methods (auth, fetch, transform) would have tempted connector authors to embed destination-specific write logic inside transform. I’ve seen this mistake before: a “generic” ETL where the Salesforce connector returns dict and the PostgreSQL connector returns SQLAlchemy model, and the sync engine has to branch on type. That is not abstraction; it is indirection.

Five methods (adding health_check or validate) would have encouraged connector authors to build mini-services. A health check that only pings the API is useless; a health check that validates schema drift against the live API is valuable but expensive to maintain across every connector. I moved health validation into the engine’s pre-flight runner, which runs a lightweight fetch-and-discard against each connector on startup.

Four methods forces a clean separation: the connector knows the source schema and auth. The engine knows sync semantics, retry policy, checkpointing, and dual-destination routing.

The engine

The sync engine handles the generic loop:

Authenticate — refresh OAuth tokens, validate scopes.
Resolve dependency order — if invoices depends on customers, fetch customers first.
Fetch — incremental mode passes the last checkpoint timestamp; full mode passes None.
Transform — connector-specific normalization to a shared NormalizedRecord schema.
Upsert — engine handles INSERT ... ON CONFLICT for PostgreSQL or batch Airtable API calls.
Write checkpoint — per-connector, per-entity-type metadata in _sync_metadata.

Retry policy

The engine owns retry behavior and keeps it outside the four-method connector contract. Connectors provide source-specific operations; they do not implement retry loops or duplicate reliability policy. This boundary keeps backoff and failure handling consistent across integrations without expanding the connector API.

Dual-destination routing

One requirement I refused to compromise: the ops team wanted Airtable (familiar UI, fast filtering) and the analytics pipeline wanted PostgreSQL (real SQL, BI tooling). Most ETL tools force you to pick one primary destination and treat the other as a slow replica.

I split the write path:

PostgreSQL: Full schema, foreign keys, JSONB for extensibility. Used by the Next.js dashboard and Looker Studio.
Airtable: Flattened schema, human-readable field names, link fields for relational views. Used by ops for daily triage.

The engine writes both destinations in parallel. If Airtable rate-limits, PostgreSQL still commits and the checkpoint advances. Airtable catches up on the next sync interval. Ops may see a 30-second delay, but analytics does not stall because of a UI tool’s rate limit.

What I cut

1. Schema migration in connectors

Connectors do not manage their own schema. The engine owns the NormalizedRecord schema and the migration path. When a connector author adds a new field, they update the transform method and submit a PR. The engine’s migration runner (Alembic) handles the rest. This prevents the “each connector has its own migration tool” nightmare I’ve seen in larger ETL platforms.

2. Real-time streaming

The engine is batch-oriented with configurable intervals (15 min, 1 hr, daily). Real-time streaming would have required persistent connections, webhook validation, and operational complexity this team size could not support. The 15-minute sync interval was a business-acceptable tradeoff. Real-time would have been an engineering vanity project.

3. Bidirectional sync

Write-back to source systems is intentionally unsupported. The contract has no push_entities method. Bidirectional sync introduces conflict resolution, field-level locking, and merge semantics that explode complexity. If ops needs to update a Copper contact, they use Copper’s UI. The sync engine reads the change on the next interval.

Failure modes I designed for

API schema drift

Copper changed their company_id field from integer to string in 2024. The connector’s transform_entity method caught the type mismatch during local testing (the engine validates transform output against the NormalizedRecord schema). The fix stayed inside the connector; the engine and its contract required no changes.

Auth token expiration mid-sync

The engine checkpoints after every entity batch. If a 10,000-record sync fails at record 8,247 because the OAuth token expired, the next run resumes from record 8,248 with a refreshed token. That checkpoint granularity avoids the “re-sync everything” death spiral.

Duplicate detection across partial runs

The upsert logic uses a composite key of (source_id, source_system, entity_type). This means a record from Copper’s contacts table and a record from QuickBooks’ customers table can both exist without collision, even if they represent the same person. Identity resolution (“this Copper contact is that QuickBooks customer”) is a separate, explicit pipeline step — not an implicit assumption baked into the sync engine.

Honest limits

The propagation engine hardcodes platforms. core/propagation_engine.py knows about QuickBooks, Copper, and PandaDoc explicitly. This violates the plugin architecture and is the most important technical-debt item.
Observability is weak. No Prometheus, no structured logging pipeline, no alert on sync failure. Failures are visible in the dashboard but not paged.
Deployment is manual. The deploy process takes ~20 minutes and rollback coverage is limited.

The numbers

Fact	Value	Basis
Connector boundary	Four required methods	Public connector kit
Vendor integrations	Six behind one contract	Production architecture and public kit
Daily API operations	~11,548 documented baseline → ~3,395 documented incremental	Doc-stated 69% reduction; not instrumented

The API-operation comparison is documented, not exported as raw telemetry. The public connector kit and six vendor integrations are directly verifiable against the exact four-method contract above.

Public code extract

I published a sanitized companion repo for the connector contract: throughline-connector-kit. It includes the four-method connector protocol, a small in-memory sync engine, a runnable synthetic CRM example, tests, and no private production data, vendor credentials, client records, or production schemas.

The takeaway

This isn’t a “look how clever the abstraction is” post — the four-method contract is fairly obvious in hindsight. What I'm more proud of is shipping it, maintaining it across six live integrations, and resisting the temptation to add a fifth method every time a new edge case showed up.

The interesting part wasn't the design. It was the restraint.