Summary
Between February 12 and February 19, 2026, some customers experienced delays in the enrichment of conversation insights, including category assignment, sentiment analysis, and conversation summaries. Core platform functionality — conversation handling and agent responses — was not affected at any point. This document describes what happened, how it was resolved, and the steps we are taking to prevent recurrence.
What Happened
Maven AGI processes conversation insights asynchronously in the background, enriching conversations with categories, sentiment scores, quality assessments, and summaries. This enrichment is separate from and independent of real-time conversation handling.
On February 12, a spike in conversation volume created unusually high demand on our background processing infrastructure. Under these conditions, one account's category assignment jobs began consistently failing — that account had accumulated an exceptionally large historical category set, larger than any we had previously encountered in production, and our processing logic was not designed to handle data at that scale efficiently. The resulting failures consumed a disproportionate share of processing capacity, causing delays to spread to other accounts as well.
How It Was Resolved
Resolution proceeded in phases as the root cause was progressively identified.
Immediate mitigation (Day 1): Category assignment was selectively paused for the affected account, which stopped the backlog from growing and allowed insights processing to resume normally for other accounts. We also increased processing capacity to help clear the accumulated backlog.
Stabilization (Days 1–3): We deployed a series of improvements to processing efficiency and infrastructure utilization that meaningfully improved pipeline stability and throughput, and confirmed stable operation heading into the weekend.
Root cause fix (Days 5–6): We identified and resolved the underlying scaling limitation in the category assignment process, ensuring that the scope of work now scales predictably regardless of an account's historical data volume. Category assignment for the affected account was re-enabled and confirmed working correctly, with processing times returning to normal levels.
The incident was formally resolved on February 19, following confirmation of stable operation through a full business-day traffic cycle.
Impact
Conversation insights — including categories, sentiment, and summaries — were delayed for some accounts during the incident period. For one account, category assignment was paused for the majority of the incident window as a mitigation measure; a backfill is in progress to restore complete historical coverage. Core platform functionality, including conversation handling and agent responses, was unaffected throughout.
Preventive Actions
Algorithmic improvements. The category assignment process has been updated to ensure work scales predictably with data volume, regardless of account size. We are auditing other areas of the insights pipeline to identify and address similar patterns proactively.
Workload isolation. We are introducing controls to better isolate different categories of background processing, so that unusually high-volume activity for one account cannot degrade the experience for others.
Improved monitoring and alerting. We are adding automated alerting on processing queue depth and job age so that backlog buildup is detected promptly. We are also expanding infrastructure health monitoring to cover signals this incident identified as gaps.
Proactive data scale management. We are implementing monitoring of per-account data characteristics so that accounts approaching volumes that warrant configuration adjustments are identified before they can affect system performance.
Faster recovery tooling. We are building automated tooling to restore insights data for any account affected by a temporary processing pause, reducing recovery time in future incidents.
Closing Note
This incident surfaced a scaling limitation that only became apparent at a data volume not previously seen in production. Mitigations were in place within hours of detection, and sustained investigation over the following days led to a complete root cause fix. The changes described above address both the specific issue and the broader systemic gaps this incident identified.