How Identity Resolution Works

How It Works

TrailSpark creates or matches leads using persistent identifiers: email, productUserId, crmId, or mapId. Signals that arrive without any identifying information are considered anonymous and are stored in S3 cold storage (not in the database) until the visitor can be identified.

A signal counts as "identifiable" if it contains any of: email, userId, productUserId, crmId, marketing automation IDs, phone number, or product organization context. Only signals that lack all of these go to cold storage.

When a signal arrives with both an anonymousId and an identifying field (e.g., email from a Segment identify call), TrailSpark creates an identity_resolution record linking that anonymousId to the identifier. It then searches cold storage for previous anonymous signals with the same anonymousId, rehydrates them into signal_staging, and processes them against mapping rules.

Resolution Flow

Anonymous signal arrives (anonymousId only, no identifying info) → Stored in S3 cold storage (org/{orgId}/anonymous/{date}/) → No database record created Identification signal arrives (anonymousId + email or userId) → Lead record created/updated → identity_resolution record created: anonymousId → email/userId → anonymousId cached in Redis (known identifier optimization) → Cold storage searched for matching anonymousId (across all known anonymousIds for this email) → Matching signals rehydrated into signal_staging → Rehydrated signals processed against mapping rules → Future signals with same anonymousId skip cold storage via Redis cache

Identity Resolution Table

FieldPurpose
anonymousIdPrimary anonymous tracking ID (from Segment, RudderStack, etc.)
emailLead's email address (encrypted at rest)
emailHashSHA256 hash for deterministic lookups
userIdProduct user ID from identify/track calls
deviceIdDevice-based tracking identifier
sourceHow the identity was established: identify, inferred, or manual
traitsAdditional traits from the identify event (stored as JSON)

Lead Identifiers

Leads require at least one persistent identifier to be created. cdpId (anonymousId) alone is not sufficient.

IdentifierPersistentCan Create Lead
emailYesYes
productUserIdYesYes
crmIdYesYes
mapIdYesYes
cdpId / anonymousIdNo (ephemeral)No

Identifying Information for Cold Storage Routing

A signal is routed to the database (not cold storage) if it contains any of:

  • Email address (in any payload location)
  • userId (non-email)
  • productUserId
  • crmId
  • Marketing automation IDs (mktLeadId, mktEventId)
  • Phone number
  • Product organization context (productOrgId / workspace ID)

Lead Merging

When multiple leads match the same set of identifiers (e.g., one lead found by email and another by productUserId), TrailSpark merges them automatically. The oldest lead becomes the primary record. Signals from duplicate leads are reassigned, and duplicates are marked with status merged.

Limitations

  • Cross-device: anonymousId does not persist across different devices or browsers. However, cross-device stitching works when the same email resolves multiple anonymousId values via the identity resolution table.
  • Cleared cookies: A new anonymousId is assigned after cookies are cleared
  • Cold storage lookback: Rehydration searches cold storage for the last 90 days by default (configurable via COLD_STORAGE_LOOKBACK_DAYS)
  • Retention: Anonymous signals in cold storage follow your plan's signal retention period
  • Deduplication: Rehydrated signals are deduplicated by content hash to prevent duplicate processing

Troubleshooting

Anonymous signals not resolving -- Verify the identification event includes both the anonymousId and an identifying field (email, userId, etc.). Confirm both events belong to the same organization.

Lead missing historical activity -- The anonymous signals may have come from a different device/browser (different anonymousId). If the same email was used on both devices, cross-device stitching should link them automatically. Check your CDP configuration for anonymousId persistence.

Next Steps