What happens to anonymous signals before a visitor is identified?

Anonymous signals are stored in S3 cold storage. When the visitor is later identified via an email or user ID, TrailSpark searches cold storage for matching anonymousId values and rehydrates those signals into the processing pipeline.

Which identifiers can create a new lead in TrailSpark?

Email, productUserId, crmId, and mapId are persistent identifiers that can create leads. An anonymousId (cdpId) alone is not sufficient and must be paired with another identifier.

How does cross-device identity stitching work?

When the same email resolves multiple anonymousId values from different devices or browsers, TrailSpark links them via the identity resolution table, connecting activity across devices to the same lead.

How Identity Resolution Works

How It Works

TrailSpark creates or matches leads using persistent identifiers: email, productUserId, crmId, or mapId. Signals that arrive without any identifying information are considered anonymous and are stored in S3 cold storage (not in the database) until the visitor can be identified.

A signal counts as "identifiable" if it contains any of: email, userId, productUserId, crmId, marketing automation IDs, phone number, or product organization context. Only signals that lack all of these go to cold storage.

When a signal arrives with both an anonymousId and an identifying field (e.g., email from a Segment identify call), TrailSpark creates an identity_resolution record linking that anonymousId to the identifier. It then searches cold storage for previous anonymous signals with the same anonymousId, rehydrates them into signal_staging, and processes them against mapping rules.

Resolution Flow

Anonymous signal arrives (anonymousId only, no identifying info)
  → Stored in S3 cold storage (org/{orgId}/anonymous/{date}/)
  → No database record created

Identification signal arrives (anonymousId + email or userId)
  → Lead record created/updated
  → identity_resolution record created: anonymousId → email/userId
  → anonymousId cached in Redis (known identifier optimization)
  → Cold storage searched for matching anonymousId (across all known anonymousIds for this email)
  → Matching signals rehydrated into signal_staging
  → Rehydrated signals processed against mapping rules
  → Future signals with same anonymousId skip cold storage via Redis cache

Identity Resolution Table

Field	Purpose
`anonymousId`	Primary anonymous tracking ID (from Segment, RudderStack, etc.)
`email`	Lead's email address (encrypted at rest)
`emailHash`	SHA256 hash for deterministic lookups
`userId`	Product user ID from identify/track calls
`deviceId`	Device-based tracking identifier
`source`	How the identity was established: `identify`, `inferred`, or `manual`
`traits`	Additional traits from the identify event (stored as JSON)

Lead Identifiers

Leads require at least one persistent identifier to be created. cdpId (anonymousId) alone is not sufficient.

Identifier	Persistent	Can Create Lead
`email`	Yes	Yes
`productUserId`	Yes	Yes
`crmId`	Yes	Yes
`mapId`	Yes	Yes
`cdpId` / `anonymousId`	No (ephemeral)	No

Identifying Information for Cold Storage Routing

A signal is routed to the database (not cold storage) if it contains any of:

Email address (in any payload location)
userId (non-email)
productUserId
crmId
Marketing automation IDs (mktLeadId, mktEventId)
Phone number
Product organization context (productOrgId / workspace ID)

Lead Merging

When multiple leads match the same set of identifiers (e.g., one lead found by email and another by productUserId), TrailSpark merges them automatically. The oldest lead becomes the primary record. Signals from duplicate leads are reassigned, and duplicates are marked with status merged.

Limitations

Cross-device: anonymousId does not persist across different devices or browsers. However, cross-device stitching works when the same email resolves multiple anonymousId values via the identity resolution table.
Cleared cookies: A new anonymousId is assigned after cookies are cleared
Cold storage lookback: Rehydration searches cold storage for the last 90 days by default (configurable via COLD_STORAGE_LOOKBACK_DAYS)
Retention: Anonymous signals in cold storage follow your plan's signal retention period
Deduplication: Rehydrated signals are deduplicated by content hash to prevent duplicate processing

Troubleshooting

Anonymous signals not resolving -- Verify the identification event includes both the anonymousId and an identifying field (email, userId, etc.). Confirm both events belong to the same organization.

Lead missing historical activity -- The anonymous signals may have come from a different device/browser (different anonymousId). If the same email was used on both devices, cross-device stitching should link them automatically. Check your CDP configuration for anonymousId persistence.