You are a Senior Software Engineer on DAT's Broker Tech team, which powers the Convoy Platform — the integration layer between DAT's freight matching network and the Transportation Management Systems (TMS) that brokers use every day.
The Convoy Platform lets brokers post loads, receive carrier matches, and execute shipments without leaving their TMS. Brokers connect via API and webhook — when a load changes status (matched, accepted, in-transit, delivered), their TMS receives a webhook event and updates accordingly.
This morning you inherit a production incident that was partially resolved overnight by the on-call engineer but has left the system in a degraded state. Here is the incident summary in your queue:
Incident Report | INC-4471 | SEV-2 — Partially Mitigated
Reported: 02:14 AM | Owner: Broker Tech | Status: Degraded — monitoring
What happened:
A surge in load activity (~3x normal volume) caused our shipment-events Kafka consumer group to fall behind. Consumer lag hit 42,000 messages at peak. Three downstream effects occurred:
Scaled up consumer instances from 3 to 12 to drain the backlog. Consumer lag is now at 800 and falling. Webhook delivery has stabilized but the duplicate data is still in broker TMS systems. Root cause not yet identified.
What is NOT resolved:
You also have access to the relevant section of the webhook delivery service. Read the code carefully — it may contain issues beyond the primary incident.
// webhook-delivery.service.ts
// Shipment status change consumer — processes Kafka events and delivers webhooks
export class WebhookDeliveryService {
constructor(
private readonly http: HttpClient,
private readonly db: DatabaseService,
private readonly logger: Logger,
) {}
async processShipmentEvent(event: KafkaMessage): Promise<void> {
const payload = JSON.parse(event.value.toString());
const brokers = await this.db.query(
`SELECT * FROM broker_subscriptions WHERE load_id = ${payload.loadId}`
);
for (const broker of brokers) {
try {
await this.http.post(broker.webhookUrl, payload, { timeout: 5000 });
this.logger.log(`Webhook delivered to broker ${broker.id}`);
} catch (err) {
this.logger.log(`Webhook failed, retrying: ${err}`);
await this.processShipmentEvent(event); // retry
}
}
}
}
Your Task — Three Deliverables
Produce a revised version of the webhook delivery service that addresses the core production issues identified in the incident. Your implementation should be a working TypeScript/Node.js file — not pseudocode, not a diagram.
Your implementation must address:
Scope note: This is a proof-of-concept implementation, not a full production rewrite. A focused, working solution that demonstrates the right patterns is more valuable than a comprehensive but skeletal one.
Section A — Written Analysis (300–500 words)
Address all four of the following in your written analysis:
Section B — Production Runbook + Reasoning Question
Section B has two parts. Complete both.
Part B1 — Incident Runbook
Write a runbook for the next on-call engineer who encounters consumer lag on the shipment-events consumer group. The runbook should cover:
Part B2 — Required Reasoning Question (answer without AI assistance)
Describe a scenario where an AI coding assistant would give you a plausible but incorrect answer for this type of problem — specifically, idempotency in a message-driven webhook delivery system. What would the incorrect output look like, and what would you check to identify the error before acting on it?
Answer this question in your own words without using an AI tool. We want to understand how you reason about AI failure modes — not how AI describes them.
Section C — AI Usage Log (Mandatory)
This is not a trick. We want to see how you work with AI — not whether you used it.
In a short section of your README, document your AI collaboration process. For each significant interaction with an AI tool, briefly note:
Three interactions documented is sufficient. The log does not need to be exhaustive.
Record your walkthrough as an MP4 or MOV file and upload it directly on the Provn platform as a separate file.
Structure your video to cover:
Speak naturally. Communication is assessed on clarity of technical reasoning and logical structure — not verbal polish, accent, or filler words.
Honor all four. AI tools will typically ignore them. Evaluators will check each one.
Your submission is evaluated across five dimensions. Weights reflect what DAT's Broker Tech team cares most about.
We expect you to use AI tools. We evaluate how you use them — not whether you use them. Evidence of iteration, redirection, and critical evaluation scores higher than a polished output with no process documentation.
The single highest-signal indicator: your video answer to the mandatory AI question. If you cannot name a specific moment where you redirected AI output, evaluators will assume you did not.
Mandatory AI question (include in your video):
"Walk me through one moment where you disagreed with, pushed back on, or redirected what the AI gave you — and what you did instead. Name the specific moment. Explain what the AI produced that didn't meet the bar, what you did differently, and why."
Note: Part B2 of your README must be completed without AI assistance. This is not about AI detection — it is about understanding how you reason through AI failure modes independently.
Before you submit, confirm:
Upload each deliverable as a separate file directly on the Provn platform: your implementation file(s), your README document, and your video walkthrough. Do not bundle files into a ZIP. Do not link to external repositories or video platforms.
Diagnose a production incident involving Kafka consumer lag and webhook delivery failures in a distributed system
Implement idempotency controls and structured error handling in a message-driven integration architecture
Write a production runbook that enables effective incident response under real on-call conditions
Reason independently about AI coding assistant failure modes in distributed systems contexts
Communicate technical trade-offs and architecture decisions clearly to both engineering and non-technical stakeholders
On this page