Deduplication

Reworkd automatically handles deduplicating data whenever your scrapers re-run.

How It Works

When saving data, Reworkd uses a unique key (or composite key) based on the record’s fields to determine if the data is new or if it is a duplicate of data that has already been saved.

Scenario	Action Taken by Reworkd
New row of data saved	Inserts data and marks as a `CREATE` change.
Duplicate row of data saved	Skips insertion; no duplicate is created.
Updating data that has been seen before (existing key)	Updates existing record without duplication and marks as an `UPDATE` change

Defining your Deduplication Key

When you are creating your schema, you must also select which of the fields you want to use as part of your primary/deduplication key. This deduplication key is critical to ensure you avoid duplicated data. It must:

✅ Be unique for every output row.
✅ Remain stable over time (avoid frequently changing fields).
✅ Be consistent. Regardless of what website you are on, this key must be the same for the same item.

If there is no one obvious key field, use multiple attributes to create a reliable composite key.

Good vs. Poor Key Examples

Good key choices

Unique ID like a SKU or UPC
Combination of unique attributes like Brand + Model + Color

Poor key choices

Price (frequently changes)
Availability status (frequently fluctuating)
Timestamp of last update

Get Started

Features

Developers

How It Works

Defining your Deduplication Key

Good vs. Poor Key Examples

Good key choices

Poor key choices

Get Started

Features

Developers

​How It Works

​Defining your Deduplication Key

​Good vs. Poor Key Examples

​Good key choices

​Poor key choices

How It Works

Defining your Deduplication Key

Good vs. Poor Key Examples

Good key choices

Poor key choices