Audit, tune, and prove your Customer Agent’s knowledge base

This one is for the Service Hub admins, the support leads, and anyone who has ever watched Customer Agent confidently hand a customer the wrong answer and quietly wondered whether anyone on the team would notice.

What: Using Breeze Assistant to audit your Customer Agent end-to-end - content sources, coverage gaps, handoff patterns, tone, and credit efficiency - and produce a Knowledge Vault Tuning Plan that takes the agent from “ingested everything and hoped for the best” to demonstrably reliable, in a rollout sequence that protects customer experience while reducing escalations.

Prompt of the week:

Customer Agent is one of three named AI products HubSpot’s leadership singled out in last week’s Q1 2026 earnings call as central to the agentic customer platform narrative.

Translation: the executive sponsor who approved the AI budget is going to ask, fairly soon, how well it is working. The honest answer for most portals would be “we don’t actually know,” because nobody has yet sat down and audited the system end-to-end. This prompt does that audit.

Two things make Customer Agent unusual compared with the earlier prompts in this series. The first is that the variable controlling output quality is not really CRM data - it is content. The agent’s answers come from whatever sits in the Knowledge Vault: KB articles, blog posts, landing pages, file uploads, public URLs. Whether those sources were written with an AI agent in mind is a question almost no support team has revisited since they connected them. The second is that the failure mode is not invisible drift in a report - it is a paying customer reading a confidently wrong answer in real time, on chat or email, with the brand’s name on it.

The April product updates raised the stakes further. The Knowledge Vault can now ingest an entire HubSpot Knowledge Base as a source, and inline images on KB articles, blogs, and landing pages are extracted, described, and indexed so the agent can reason over the screenshots that often contain the actual instructions. More power, more surface area, more places for a misconfiguration to quietly produce a wrong answer.

Underneath that, the community boards have been cycling the same handful of questions for months. “Why is the agent giving the wrong answer when the right one is clearly in the KB?”, “How do we know which content sources it’s actually using?”, “It keeps escalating things it should handle, and handling things it shouldn’t”, and the financially significant “we’re burning credits and the deflection rate is barely above where it was with the rule-based bot.” None of these are bugs. They are configuration outcomes - the natural result of switching the agent on, pointing it at the existing help centre, and assuming that content written for a human browsing on a Tuesday morning will hold up under an AI’s probabilistic reasoning.

The prompt forces the audit nobody has time for. Specifically: an inventory of every source the agent reads, a remediation register for the content that is wrong or written for the wrong audience, a coverage-gap map drawn from the unanswered-questions log, a redesigned handoff matrix, a credit-efficiency view, and a 30/60/90 rollout plan that lets you actually fix the thing without breaking it in production.

Prompt structure

Paste this into Breeze Assistant and make sure CRM data access is enabled in your AI settings so Breeze can reference your Customer Agent configuration, content sources, conversation history, and unanswered-question log:

Role: You are a HubSpot Customer Agent Specialist with deep experience

tuning Breeze-powered support agents. You understand the difference

between content written for a human browsing a help centre and content

written for an AI agent reasoning under uncertainty, and you know how

to bridge the gap without rewriting an entire knowledge base.

Task: Audit our Customer Agent configuration end-to-end - content

sources, coverage gaps, handoff triggers, tone, and credit efficiency

- and produce a Knowledge Vault Tuning Plan that takes the agent

from generic competence to demonstrable reliability, in a rollout

sequence that protects customer experience while reducing

escalations.

Context:

 - Company: [COMPANY NAME]

 - Industry: [INDUSTRY]

 - HubSpot tier: [Service Hub Pro/Enterprise minimum, AI settings

 enabled]

 - Product or service complexity:

 [SIMPLE / MODERATE / TECHNICAL / REGULATED]

 - Typical customer technical level:

 [CONSUMER / SMB / TECHNICAL B2B]

 - Channels Customer Agent is deployed to:

 [LIVE CHAT / EMAIL / WHATSAPP / MESSENGER / CALLING / ALL]

 - Approximate monthly conversation volume: [NUMBER]

 - Current deflection rate (% resolved without human):

 [NUMBER or "UNKNOWN"]

 - Connected content sources today:

 [HubSpot KB articles / blogs / pages / files / external URLs

 - approximate counts]

 - Languages supported: [LIST]

 - Sensitive topics that must always escalate to a human:

 [LIST or NONE]

 - Monthly HubSpot Credit budget for Customer Agent:

 [NUMBER or "tight / moderate / generous"]

Audit the following areas:

1. CONTENT SOURCE HEALTH

 For each connected source (KB article, blog, landing page,

 uploaded file, public URL), assess:

 - Is the content current? Flag anything older than 12 months

 without an explicit review date

 - Is it written in language an agent can confidently quote

 - direct answers, definite instructions, named features

 - or is it hedged prose (“depending on your situation”,

 “usually”, “in most cases”) that will produce

 hallucinated specificity?

 - Are there contradictions between two sources: different

 answers to the same question, different version numbers,

 conflicting refund or policy statements?

 - Are any sources high-volume in citations but low-quality in

 outcome - the agent uses them, but those conversations

 escalate or get poor satisfaction scores?

 Flag each source as: KEEP / REWRITE / RETIRE / SPLIT.

2. COVERAGE GAPS

 Pull patterns from the agent’s unanswered-question log over

 the last 90 days. Group questions into themes and classify

 each theme as:

 - MISSING CONTENT: no source exists for this question

 - UNREACHABLE CONTENT: a source exists, but the agent isn’t

 finding it (often a phrasing mismatch between how customers

 ask and how the article is titled)

 - MISFRAMED CONTENT: a source exists and is reachable, but it

 buries the answer behind context the agent doesn’t cut

 through

 For each theme with five or more occurrences in 30 days,

 propose: TITLE, AUDIENCE, structure outline, and the single

 sentence that must appear in the article verbatim because it

 is the answer customers actually need.

3. HANDOFF QUALITY

 Review handoff and escalation patterns from recent

 conversations:

 - OVER-ESCALATION: conversations that escalated where the

 agent had a usable answer but chose not to use it

 - UNDER-ESCALATION: conversations the agent handled where it

 should have routed to a human - refund requests, account

 changes, complaints, anything regulated

 - Trigger phrases firing too aggressively (e.g., any mention

 of the word “billing” escalates regardless of context)

 - Topics where no handoff trigger exists and probably should

 (legal, medical, safety, anything brand-sensitive)

 Output a HANDOFF MATRIX with specific trigger phrases, topic

 classifications, and the agent’s permitted scope per topic.

4. AGENT TONE & RESPONSE STRUCTURE

 Assess whether the agent’s current guidelines reflect how

 our brand actually communicates:

 - Tone calibration: formal vs casual, response length,

 use of lists vs prose

 - Default structure: direct answer first, or context first?

 - How the agent introduces itself (or doesn’t)

 - How the agent handles “I don’t know” - confident

 handoff, hedging, or apologetic stalling?

 - Recovery language when the customer is visibly frustrated

 Propose a guidelines update with concrete before/after examples,

 not abstract style notes.

5. CREDIT EFFICIENCY

 Customer Agent consumes 100 HubSpot Credits per separate

 conversation. Identify:

 - Channels with poor deflection rates (high credit spend, low

 resolution) - candidates for tighter scoping

 - Conversation patterns where the agent loops without

 resolving (multi-turn dead ends consuming the conversation

 budget without progress)

 - Use cases where a rule-based chatbot or workflow routing

 would produce the same outcome at zero credit cost

 - Whether the cost-per-resolved-conversation sits below or

 above the equivalent human-support cost benchmark

 Recommend channel-level scoping rules and a target deflection

 rate per channel.

6. KNOWLEDGE VAULT ARCHITECTURE

 With the recent ability to connect the entire HubSpot

 Knowledge Base as a Vault source, plus inline image indexing

 (up to 10 per article or page), assess:

 - Whether the agent should consume the whole KB or a curated

 subset

 - Which file uploads are static and ageing - replace with

 KB articles that update centrally

 - Which screenshots and diagrams on existing articles carry

 the actual instructional content and should be confirmed

 indexed properly

 - Where a dedicated Vault for a specific use case

 (product onboarding, refund policy, technical troubleshooting)

 would outperform a single general Vault

 Recommend a Vault structure with named scopes, owners, and

 review cadences.

Constraints:

- Do NOT recommend connecting new content sources without a review

 step - every new source must pass a “test 10 representative

 questions before going live” gate

- Flag any content source touching regulated topics (medical,

 legal, financial advice, refunds, safety) as REQUIRES LEGAL/

 COMPLIANCE REVIEW before agent exposure

- Every content-rewrite recommendation must include a concrete

 before/after example of one paragraph - not just “rewrite this

 article”

- Recommend the agent runs in PREVIEW MODE on any newly connected

 source for at least 100 representative test conversations before

 live deployment to that source

- If conversation volume, deflection rate, unanswered-question

 patterns, or handoff history is not visible from the current

 context, state: "SIGNAL MISSING: [what needs checking manually]"

- Do NOT recommend retiring content sources without first

 confirming the agent has an equivalent source or that the

 questions covered by the source no longer appear in the

 conversation log

Output format:

### I. AGENT HEALTH SUMMARY

{3-sentence overview of current Customer Agent reliability, the

single biggest gap, and an overall reliability rating: NOT READY /

READY WITH RESERVATIONS / READY FOR SCALE}

### II. CONTENT SOURCE INVENTORY & ACTIONS

| Source | Type | Age | Citation Volume | Citation Quality |

| Action |

### III. COVERAGE GAP REGISTER

| Theme | Volume (30d) | Gap Type | Proposed Article | Owner |

### IV. HANDOFF MATRIX

| Topic | Current Trigger | Recommended Trigger | Permitted Scope |

### V. AGENT GUIDELINES UPDATE

{Tone calibration notes, structural recommendations, sample

introductory and recovery responses with concrete before/after}

### VI. CREDIT EFFICIENCY FINDINGS

| Channel | Volume | Deflection % | Credit Spend/Month |

| Recommendation |

### VII. KNOWLEDGE VAULT STRUCTURE

{Proposed Vault scopes, ownership, review cadence, and indexing

confirmations needed}

### VIII. 30 / 60 / 90 DAY TUNING ROADMAP

{Prioritised actions with owner roles: Service Hub Admin /

Support Lead / Content Author / Compliance / Finance}

Why this prompt works - and how to adapt it

Most Customer Agent deployments follow a depressingly consistent path. Someone gets excited at a webinar, switches it on, points it at the existing help centre, watches a few demo conversations go well, and declares victory. Two months later the deflection rate has plateaued at something underwhelming, the satisfaction scores are slightly worse than the human-only baseline, the credit invoice is bigger than expected, and nobody can quite explain why. The reason is almost never the agent. It is the configuration the agent was asked to work within - and configuration is what this prompt audits.

A few things to note about how it is constructed:

Knowledge base content is the variable, not the agent. The Customer Agent uses standard LLM reasoning over whatever sources it can access. The only lever a support team actually controls is the quality, currency, and reachability of those sources. Treating the audit as a content review rather than an AI review is what changes the conversation - it puts ownership back where it always belonged.

The unanswered-question log is the goldmine. HubSpot surfaces a list of questions the agent could not confidently answer, and most teams glance at it occasionally and move on. Properly mined, that list is a perfect roadmap of where the content gaps actually are, in language customers actually use. The prompt instructs Breeze to cluster the questions into themes and classify each theme as missing, unreachable, or misframed content - three very different problems with three very different fixes.

Knowing when to stop is half the job. A Customer Agent that confidently resolves a refund dispute it had no business touching, or a complaint that needed a human voice, will destroy a support team’s faith in the feature faster than any number of correct answers will rebuild it. The handoff matrix exists to make every sensitive topic an explicit decision: where the agent’s permission ends, what trigger phrases must route to a human, and which topics fire too eagerly today and shut down conversations the agent should have handled. It is not glamorous work, but it is the work that keeps the rest of the agent honest.

Credits turn discipline into a number. A hundred credits per conversation is a useful forcing function. Once the cost-per-resolved-conversation is calculated, decisions about which channels to scope tightly and which use cases to leave with a rule-based bot make themselves. The prompt asks for the maths explicitly, because the maths is what makes the rest of the recommendations defensible to a finance team.

“SIGNAL MISSING” is the anti-confabulation clause. Breeze can read the agent’s configuration and recent conversations, but it cannot see things like which articles your compliance team has flagged off-limits, or the brand voice document that lives in a Google Drive nobody has shared with the agent. When the audit hits a wall, the prompt forces Breeze to say so rather than guess plausibly. A flagged unknown beats a confident wrong answer - which, conveniently, is exactly the lesson the Customer Agent itself needs to learn.

Preview mode before live deployment is the safety net. The constraint that any newly connected content source runs in preview mode for 100 representative test conversations before going live to customers is the single rule that separates a careful rollout from a public failure. It costs a few days of effort, prevents the kind of incident that ends up in a board meeting, and forces a habit of testing that will save the team many times over the lifetime of the agent.

Adapting it for your portal:

Regulated industry? Add this to the Context:

“Our industry is regulated by [body]. The following topics must always escalate to a human and the agent must never offer guidance on them: [list]. Content touching these topics must be flagged for compliance review before any agent exposure.”

Breeze will then weigh the handoff matrix and content audit heavily towards safety, and produce a compliance review pass before any go-live recommendation.

Operating in multiple languages? Specify the languages explicitly in Context, and add:

“Audit content source coverage per language, and flag any language where the agent is currently answering from translated content of mixed quality.”

Multi-lingual Customer Agent rollouts almost always have a hidden quality cliff between the primary language and everything else - the audit will surface it.

Most of your real content sits outside HubSpot? If your source of truth is Notion, Confluence, Zendesk, or SharePoint, name those systems in Context and add:

“Our authoritative content lives in [system]. Recommend whether to mirror critical articles into the HubSpot Knowledge Base for agent access, or to maintain them via public URLs with a refresh cadence, including the trade-offs of each approach.”

The agent only natively consumes HubSpot-hosted content, file uploads, and public URLs, so the trade-off is real and worth being explicit about.

Just switched the agent on? If you are less than 30 days into deployment, you won’t have enough conversation history for a meaningful gap analysis. Tell Breeze that, and add:

“Treat this as a pre-flight audit. Focus on content source health, handoff design, and tone calibration, and propose the conversation-monitoring rhythm needed to make the next quarterly run of this audit data-rich.”

The output shifts from remediation to readiness.

Low conversation volume? If you are running fewer than 100 conversations a month, the credit-efficiency section and the gap-mining will both struggle for signal. Add:

“Conversation volume is low. Weight the audit towards content source quality and tone calibration, and propose a synthetic-question test set we can use to probe agent quality in the absence of statistical volume.”

The synthetic test set is genuinely useful - small support teams should be running one anyway.

Quarterly cadence? Save the output and re-run the prompt 90 days later with this addition:

“Compare against the output from [DATE] and report which content sources have been remediated, which new gaps have emerged in the unanswered-question log, how the deflection rate has shifted, and which handoff trigger changes had the largest impact on satisfaction scores.”

That turns the prompt into a rolling agent-tuning loop - and a tidy quarterly board update for the executive who is going to ask, sooner than you think, how the AI investment is performing.

Beyond the prompt:

The Knowledge Vault Tuning Plan tells you what to fix. The sequence in which you fix it is where most teams come unstuck.

Start with the handoff matrix. Before any content gets rewritten, the agent needs explicit, narrow permission. Any sensitive topic that did not previously have a handoff trigger gets one immediately. Any over-firing trigger gets tightened. This is the cheapest, fastest, lowest-risk improvement available, and it earns the trust to do the bigger work that follows.

Then move to high-cited, low-quality content. Find the articles the agent leans on most often whose conversations end badly, and rewrite those first. A ten per cent improvement in the answers the agent gives most often beats a fifty per cent improvement in something it never references. Rewrite for the agent: direct answers in the first sentence, named features rather than vague descriptions, no hedging language that gives the model permission to invent specificity.

Next, work the coverage gap register. Take the top five themes from the unanswered-question log and commission articles for them, written to the structure Breeze proposed. Connect each new article in preview mode, run the 100-conversation test, then promote to live. Resist the temptation to write everything at once - a steady cadence of two or three new articles per fortnight beats a quarterly content drop that nobody reviews.

Finally, set the review cadence. Vault scopes get reviewed quarterly. Top-cited articles get reviewed every six months. Unanswered-question themes are scanned monthly. The agent runs in preview mode for any new content source until it has passed its hundred-conversation test. Boring rhythms produce reliable agents.

And if you are heading towards a broader Breeze deployment - Prospecting Agent, Customer Agent on more channels, Breeze Studio agents for niche use cases - the discipline this prompt builds carries forward. A team that has audited its Customer Agent properly knows how to audit any agent. The skill is portable; the work is not glamorous; the customers, and the credit invoice, will notice.

A reliable Customer Agent is not a product feature. It is a content programme with an AI front end.

Audit, tune, and prove your Customer Agent’s knowledge base

Stop your HubSpot Customer Agent from delivering confidently wrong answers and burning credits due to unoptimised help content. Use this Breeze Assistant prompt to audit your knowledge base and build a tuning plan that makes your AI demonstrably reliable

Prompt structure

Why this prompt works - and how to adapt it

A few things to note about how it is constructed:

Adapting it for your portal: