← Back to blog

CSV Contact Import: Async Processing for Large Lists Without Timeouts

PushMail Team··4 min read

You have a CSV with 80,000 contacts. You need to get them into your email platform. You upload the file, hit submit, and watch a spinner for 45 seconds before the request times out. Half your contacts were imported. The other half weren't. You have no idea which half.

This is what happens when CSV import is implemented as a synchronous HTTP request. Most email platforms do exactly this, and most developers have been burned by it at least once.

PushMail handles it differently. The upload returns immediately. Processing happens in the background. You poll for progress. Here's how.

Why synchronous imports fail

A typical CSV import endpoint reads the file, parses every row, validates every email, and upserts every contact -- all inside a single HTTP request handler. The problems compound with scale:

  • Cloudflare Workers have a 30-second CPU time limit on the paid plan
  • AWS Lambda times out at 15 minutes, but API Gateway cuts you off at 29 seconds
  • Vercel Functions max out at 60 seconds on Pro

Even if your serverless function has a generous timeout, your HTTP client probably doesn't. Browsers close connections. Load balancers terminate idle requests. Proxies give up.

A 50,000-row CSV with email validation and deduplication takes well over 30 seconds to process. At 100,000 rows, you're looking at minutes. No synchronous HTTP request survives that.

Upload, enqueue, return

PushMail splits the import into three stages. The HTTP request only handles the first two.

Stage 1: Upload the CSV to R2. The file is sent as multipart form data and stored in Cloudflare R2 at a unique key (imports/{orgId}/{timestamp}.csv). This takes a few hundred milliseconds regardless of file size -- you're writing bytes to object storage, not parsing them.

Stage 2: Create an import record and enqueue. A row is inserted into the imports table in D1 with status: "pending", and a message containing the import ID is sent to Cloudflare Queues. This takes single-digit milliseconds.

Stage 3: Return immediately. The API responds with the import record, including its ID and status. Total request time: under a second, regardless of whether the CSV has 500 rows or 500,000.

The upload endpoint

The import starts with a multipart form upload to POST /v1/imports/upload:

curl -X POST https://pushmail.dev/api/v1/imports/upload \
  -H "Authorization: Bearer pm_live_abc123..." \
  -F "file=@contacts.csv" \
  -F "siteId=1" \
  -F "listId=5" \
  -F 'columnMapping={"Email Address":"email","First":"first_name","Last":"last_name"}'

The response comes back immediately with a 201:

{
  "data": {
    "import": {
      "id": 42,
      "status": "pending",
      "siteId": 1,
      "source": "csv",
      "totalRows": null,
      "processedRows": null
    }
  }
}

The totalRows is null because the file hasn't been parsed yet. The queue consumer populates it once processing begins.

Column mapping

CSV files from different sources use different headers. Your Mailchimp export has "Email Address". Your CRM export has "email". Your sales team's spreadsheet has "E-mail".

The columnMapping parameter maps your CSV headers to PushMail's contact fields:

{
  "Email Address": "email",
  "First Name": "first_name",
  "Last Name": "last_name"
}

If you don't provide a mapping, PushMail falls back to common column names automatically -- it checks for email, email_address, Email, EMAIL, firstName, first_name, First Name, FNAME, and similar variations. For most standard exports, you can skip the mapping entirely.

Polling for progress

Once the import is enqueued, poll GET /v1/imports/:id to track it:

{
  "data": {
    "id": 42,
    "status": "processing",
    "totalRows": 82450,
    "processedRows": 34200,
    "importedRows": 33890,
    "skippedRows": 310,
    "errorRows": 0,
    "startedAt": "2025-11-21T14:30:01Z",
    "completedAt": null
  }
}

The queue consumer updates progress after every batch of 100 rows. You see exactly how many rows have been processed, imported, skipped (invalid emails), and errored. When processing finishes, status changes to "completed" and completedAt is set.

Deduplication via upsert

Duplicate emails are handled at the database level. The contacts table has a unique constraint on (siteId, email). When a CSV row matches an existing contact, the import updates their firstName and lastName instead of creating a duplicate. This uses Drizzle ORM's onConflictDoUpdate.

You can re-import the same CSV without worrying about duplicates. If you import a file twice, the second run updates existing records and reports them as successful imports. Rows with missing or malformed email addresses are counted as skippedRows.

List assignment and sequence triggers

If you include a listId in the upload, every imported contact is added to that list. The consumer inserts into the contact_lists junction table with onConflictDoNothing, so contacts already on the list aren't duplicated.

This also fires sequence triggers. If you have a drip sequence configured to trigger on "added to list", contacts imported into that list are automatically enrolled. A welcome series that fires when someone joins your newsletter list will trigger for imported contacts, not just contacts added through your signup form. The same applies to tags.

Processing speed on D1

D1 is SQLite on Cloudflare's infrastructure. Single-row writes are fast, but there's no concurrent write scaling. The queue consumer processes rows in batches of 100. For each row: extract the email, validate the format, upsert the contact, look up the contact ID, assign to a list, and check for sequence triggers.

Practical processing times:

  • 10,000 rows: under 30 seconds
  • 50,000 rows: 1.5 to 3 minutes
  • 100,000 rows: 3 to 6 minutes
  • 500,000 rows: 15 to 30 minutes

These numbers include validation, deduplication, list assignment, and trigger checks -- not just raw inserts. The key point is that none of this runs inside your HTTP request. Whether the import takes 30 seconds or 30 minutes, the experience is the same: upload, poll, done.