Stop paying for the same person twice
Design a deduplication ruleset Breeze can run without breaking things
This one is for the RevOps leads, the CRM admins, and anyone who has opened Monday’s pipeline report to find the same prospect listed three times, each with a different owner and half the story.
What: Using Breeze Assistant to design the deduplication model for your portal: the matching rules that define what counts as a duplicate, the survivorship logic that decides which record lives and which values win, and the prevention layer that stops new ones forming, so that when you point HubSpot’s AI duplicate tool at your database, it is executing a considered plan rather than guessing, and a permanent merge never quietly destroys the wrong data.
Prompt of the week:
Duplicates are the most ordinary problem in any CRM and the one that does the most quiet damage. They arrive from everywhere at once: a contact fills in a form with their work email, a rep creates them by hand with their personal one, a trade-show list gets imported, an integration syncs a third version, and each copy splits the truth. The rep cannot see that the lead already downloaded the case study and sat through a webinar, because that history is logged against the other record. Marketing emails the same person twice. The forecast counts them as two. And every duplicate contact is a line on your HubSpot bill, because the platform charges by the contact. A portal running a ten or twenty per cent duplicate rate is paying, every year, for thousands of people who do not exist.
HubSpot has answered the finding half of this well. The duplicate management tool uses AI to surface likely matches with a confidence score, and Breeze can sit over the top of your data quality signals. But the tool has firm edges worth knowing before you lean on it: it only covers contacts and companies, it scans once a day, it caps how many duplicates it will show you, it lets you reject a pair but not define your own matching logic, and (the part that catches people out) every merge is permanent. There is no undo. The primary record’s values win, blanks get filled from the secondary, the lifecycle stage jumps to whichever record sat furthest down the funnel, and a brand-new Record ID is minted for the survivor, which can quietly break any integration keyed to the old one.
Which is exactly why the dangerous half of deduplication is not the finding. It is the deciding. Which record should survive? When two records disagree on a value, which one is right? What must never be auto-merged because the cost of getting it wrong is too high? And how do you stop the database silently refilling with duplicates the day after you finish? Those are judgement calls, and a confidence score does not make them for you. Run a bulk merge without having made them deliberately, and you are not cleaning your data; you are permanently overwriting it at speed.
The honest framing, repeated across every serious piece written on this in 2026, is that Breeze is a CRM amplifier, not a CRM cleanup miracle. It will happily execute a deduplication at scale, but whether that execution heals the database or harms it depends entirely on the rules you hand it. This prompt produces those rules: a matching model, a survivorship model, and a prevention model, written down and agreed before a single record is merged.
Prompt structure
Paste this into Breeze Assistant and make sure CRM data access is enabled in your AI settings so Breeze can reference your contact and company records, properties, fill rates, import history, and connected integrations:
Role: You are a HubSpot data architect who specialises in
deduplication. You know HubSpot's merge behaviour cold: that merges
are permanent, that the primary record's values win with blanks
filled from the secondary, that lifecycle stage takes the furthest-
down-funnel value, and that a new Record ID is created on merge, and
you design dedup rulesets that clean a database without destroying
the data or breaking the systems that depend on it.
Task: Design the deduplication model for our portal. Produce three
things: a MATCHING ruleset (what counts as a duplicate), a
SURVIVORSHIP model (which record lives and which values win), and a
PREVENTION layer (how new duplicates are stopped at the source).
Then give me a safe, batched execution plan and an ongoing review
cadence. The aim is a database HubSpot's AI duplicate tool can clean
against a considered plan, not a guess.
Context:
- Company: [COMPANY NAME]
- Industry: [INDUSTRY]
- HubSpot tier: [Marketing/Sales tier; note Operations or Data Hub
if present, since it changes dedup limits and automation options]
- Approximate record counts: [contacts; companies]
- Suspected duplicate rate, if known: [e.g. "~15%" / UNKNOWN]
- Main sources of new records: [forms / imports / manual entry /
integrations, name the integrations]
- Systems keyed to the HubSpot Record ID: [e.g. "a billing
integration, a data warehouse" / NONE KNOWN]
- Salesforce or other CRM sync in place: [yes, which / no]
- Unique identifier you trust most: [email / domain / a custom
unique property / none agreed]
- Known pain points: [e.g. "company name variants like Acme Ltd /
Acme Limited / ACME", "reps create instead of search"]
Design the following:
1. MATCHING RULESET
Define what counts as a duplicate, precisely enough to act on:
- The primary unique identifier for contacts (normally email) and
for companies (normally domain), and the fallback signals when
it is missing
- Where exact matching is safe and where fuzzy matching is needed
such as company name variants (Acme Ltd / Acme Limited / ACME on the
same domain), nicknames, formatting differences
- Normalisation to apply before matching: phone formats, casing,
trailing spaces, domain stripping (www, http)
- The cross-object case: the same organisation existing as both a
company and, wrongly, a contact
- A confidence tiering: which match patterns are safe to merge in
bulk versus which must always go to a human for review
2. SURVIVORSHIP MODEL
Decide, in advance, how every merge resolves:
- Which record becomes primary: the rule, not a case-by-case
feeling (e.g. most recent engagement, most complete, oldest
create date), and why
- Per-property conflict rules: for the properties that matter
(lifecycle stage, lead source, owner, consent/subscription
status, key dates), state which value should win and flag where
HubSpot's default (primary wins, blanks filled from secondary)
would produce the wrong answer
- The properties where a wrong merge is expensive: owner,
consent, contractual or billing fields, and how to protect
them
- Record ID handling: flag every system keyed to the Record ID,
since a merge mints a new one, and state what must be updated or
re-pointed after merging
3. PREVENTION LAYER
Stop the database refilling. Recommend:
- Form configuration: email as the unique identifier, update-
existing rather than create-new
- Import discipline: the required pre-import checklist (email or
Record ID present, formatting normalised, the "don't create
duplicates" option used)
- A search-before-create habit for reps, and how to make it the
path of least resistance rather than a policy nobody follows
- Integration audits: which connected systems create records, and
the dedup setting each one needs
4. EXECUTION PLAN
A safe sequence for the first cleanup:
- Export the current state first (and use Merge History Export as
the 90-day audit trail and recovery path, since merges cannot be
undone)
- Work high-confidence matches in batches of 20 to 30 to avoid
decision fatigue and merge errors; bulk-reject the low-
confidence noise
- The order of objects (companies or contacts first, and why for
our setup)
- For a Salesforce sync, the correct side to deduplicate on and
the sequence that keeps the sync intact
5. ONGOING REVIEW
- A weekly or monthly review cadence sized to our duplicate-
creation rate
- A data quality view that surfaces new duplicate pairs and any
spike in creation rate
- Named ownership for the cadence
Constraints:
- Never recommend a bulk auto-merge for any match pattern below high
confidence, or for any property class where a wrong merge is
expensive (owner, consent, billing, contractual). Those go to human
review
- Treat every merge as permanent and irreversible. Every destructive
step must be preceded by an export, and the plan must name the
recovery path if a merge is wrong
- Flag every system keyed to the HubSpot Record ID before
recommending any merge, because the merged record gets a new ID
- Recommendations must work with native HubSpot tools (duplicate
management, import dedup, forms, workflows) unless you explicitly
flag where a third-party dedup tool is genuinely needed (e.g. bulk
fuzzy matching at a scale the native tool caps out on)
- Do not invent a duplicate rate or a record count. If a number you
need is not visible from the current context, state:
"SIGNAL MISSING: [what needs checking manually]"
- Design for shared accountability and reversibility, not speed. A
slower, auditable cleanup beats a fast, irreversible one
Output format:
### I. DEDUP HEALTH SUMMARY
{3-sentence overview: the likely scale of the problem, the single
biggest merge risk in our setup, and an overall readiness rating:
DO NOT BULK MERGE YET / READY WITH GUARDRAILS / READY}
### II. MATCHING RULESET
| Object | Identifier | Match Type | Normalisation | Confidence Tier |
### III. SURVIVORSHIP MODEL
| Property | Conflict Rule | Default Risk | Protection |
{plus the primary-record selection rule and Record ID handling}
### IV. PREVENTION LAYER
| Source | Duplicate Risk | Setting / Habit | Owner |
### V. EXECUTION PLAN
{Ordered, batched sequence with the export/recovery step first}
### VI. ONGOING REVIEW
{Cadence, the data quality view to build, named owner}
Why this prompt works, and how to adapt it
Most deduplication advice stops at “open Manage Duplicates and merge the high-confidence pairs”. That is the easy ten per cent. The hard, valuable part (the part that decides whether a cleanup helps or harms) is everything the merge button assumes you have already thought about: which record deserves to survive, which conflicting value is the true one, what breaks downstream when the ID changes, and how you stop the whole problem coming back. This prompt is built to force those decisions onto the page before anything irreversible happens.
A few things to note about how it is constructed:
Finding is solved; deciding is the job. HubSpot’s AI already surfaces likely duplicates with a confidence score, so the prompt spends almost no effort on detection. It spends it all on the three decisions a confidence score cannot make for you: matching, survivorship, prevention. That reframing is the whole point: you are not asking Breeze to find duplicates, you are asking it to help you design the rules by which they are resolved.
Survivorship is where merges go wrong. HubSpot’s default is that the primary record’s values win and blanks are filled from the secondary, with lifecycle stage jumping to the furthest-down-funnel value. That default is fine for most properties and quietly wrong for a few important ones: a stale owner overwriting the real one, an out-of-date consent status surviving over a fresher opt-out. The prompt makes you name the properties where the default produces the wrong answer, which is exactly where an unreviewed bulk merge does its damage.
The Record ID warning is not pedantry. Every merge mints a new Record ID for the survivor and retires the old ones. If a billing system, a data warehouse, or a custom integration is keyed to that ID, the merge silently breaks the link, a real and recurring complaint in the community. Forcing a list of every ID-keyed system before any merge turns a future incident into a pre-merge checklist item.
Permanence sets the whole risk posture. There is no unmerge. That single fact is why the prompt insists on an export before every destructive step, leans on the 90-day Merge History Export as the recovery trail, and routes anything below high confidence to a human. A reversible mistake is a lesson; an irreversible one is a data-loss incident, and the constraints are written to keep every mistake in the first category.
Prevention is half the deliverable, not an afterthought. A cleanup with no prevention layer is a treadmill: you merge a thousand pairs and the forms, imports, and integrations refill the database while you work. By making prevention one of the three required models, the prompt treats “stop new ones forming” as equal in weight to “clear the existing ones”, which is the only way the win actually lasts.
“SIGNAL MISSING” stops invented numbers. Breeze can see your records and your properties, but it cannot always see your true duplicate rate, the integrations running quietly in the background, or which custom property your team treats as the real unique identifier. Where a recommendation would otherwise rest on a guessed figure, the flag forces a manual check, because a dedup plan built on an invented duplicate rate is worse than no plan, it is a confident one pointed in the wrong direction.
Adapting it for your portal:
Running a Salesforce sync? Deduplication with an active sync has its own rules, and getting the order wrong creates duplicates faster than you clear them. Add: “We sync with Salesforce via [integration version]. Tell me which side to deduplicate on, the sequence that keeps the sync intact, and where the native HubSpot tool cannot merge while the sync is active so I know if a third-party tool is required.” The plan will respect the sync rather than fight it, and tell you plainly at which point the native tool runs out of road.
Post-migration or post-acquisition? If a big import or a merged database has just landed, you have a one-off spike rather than steady drift. Add: “We have just completed a [migration / acquisition / large import]. Treat this as a one-time bulk cleanup at scale, tell me where the native duplicate tool’s limits will bite, and design the matching tiers so the safe majority can be cleared quickly while the risky minority is quarantined for review.” The execution plan reshapes around volume.
Heavy on company records? If account-based motion means companies matter more than contacts, add: “Weight the model towards company deduplication. Focus the matching on domain and name variants, and account for companies created automatically from contact email domains.” The output leans into the company-merge rules and the auto-creation setting that drives most company duplicates.
Worried about destructive merges? If the team has been burned before, or you are in a regulated industry, add: “We need maximum caution. Recommend a quarantine-and-review approach over auto-merge wherever defensible, and build the plan around the Merge History Export as a compliance audit trail.” The plan tilts hard towards reversibility and documentation.
Want a quarterly cadence? Save the output and re-run it 90 days later with: “Compare against the output from [DATE] and report on how the duplicate rate has moved, which sources are still creating duplicates, whether the prevention settings held, and which matching rules need tightening.” That turns a one-off cleanup into a maintained discipline.
Beyond the prompt:
The model the prompt produces is the plan. Executing it safely follows an order, and with permanent merges the order is not optional.
Start with prevention, before you merge a single pair. This is counterintuitive (the duplicates are staring at you and the instinct is to start clearing them), but if the forms, imports, and integrations are still minting new ones, you are bailing a boat with a hole in it. Set email as the unique identifier on forms, switch them to update-existing, fix the import checklist, and audit the integrations first. Then the cleanup is a finite job rather than a permanent one.
Then export everything and turn on the habit of exporting before every batch. Because there is no unmerge, the pre-merge export and the Merge History Export are your only recovery path. Treat them as a seatbelt: the one time you skip it is the time you need it.
Then merge the high-confidence matches in small batches, twenty to thirty at a sitting, applying the survivorship rules you agreed rather than re-deciding each pair on the spot. Decision fatigue is how good rules produce bad merges; small batches against a written rule keep the judgement consistent. Bulk-reject the low-confidence noise so it stops cluttering the view, and quarantine the risky-but-plausible pairs for a slower, deliberate human pass.
Then stand up the review cadence and the data quality view, and give them an owner. A duplicate rate is not a number you fix once; it is a number you hold down. Fifteen minutes against a duplicates view each week, or a slightly longer monthly pass, keeps the database from drifting back, and the moment the creation rate spikes, the view tells you which source sprang a leak.
A clean, single view of each customer is the thing every other use of your CRM quietly depends on: the report you trust, the workflow that fires once, the agent that reasons over one history instead of three. Deduplication is not glamorous work, and nobody will thank you for the duplicate that never got created. But it is what separates a CRM your team believes from one they have quietly stopped trusting. And on the invoice, it is the gap between paying for your customers and paying for your copies of them.
