DAT Freight & Analytics

DAT Freight & Analytics operates North America's largest on-demand freight marketplace, specializing in digital freight matching, rate analytics, and supply chain management for brokers, carriers, and shippers. It facilitates over 700,000 daily load posts and powers the DAT One platform for booking loads and analyzing trends.

@dat

Open Opportunities

2 opportunities available

Product Manager

DAT Freight & Analytics

Full-time

Hybrid

Seattle, WA

$135,000 - $188,000 + target bonus

Apr 6

Senior Software Engineer – Broker Tech

DAT Freight & Analytics

Full-time

Hybrid

Seattle, WA

$155,000 – $211,000 base + 10% target bonus

Apr 6

Available Challenges

2 challenges available

Senior Software Engineer — DAT Freight & Analytics, Broker Tech

@dat•Software Engineer

The Scenario You are a Senior Software Engineer on DAT's Broker Tech team, which powers the Convoy Platform — the integration layer between DAT's freight matching network and the Transportation Management Systems (TMS) that brokers use every day. The Convoy Platform lets brokers post loads, receive carrier matches, and execute shipments without leaving their TMS. Brokers connect via API and webhook — when a load changes status (matched, accepted, in-transit, delivered), their TMS receives a webhook event and updates accordingly. This morning you inherit a production incident that was partially resolved overnight by the on-call engineer but has left the system in a degraded state. Here is the incident summary in your queue: Incident Report | INC-4471 | SEV-2 — Partially Mitigated Reported: 02:14 AM | Owner: Broker Tech | Status: Degraded — monitoring What happened: A surge in load activity (\~3x normal volume) caused our shipment-events Kafka consumer group to fall behind. Consumer lag hit 42,000 messages at peak. Three downstream effects occurred: Webhook delivery to broker TMS endpoints began timing out and retrying — 847 webhook events were delivered more than once to 34 different brokers. Because our webhook delivery service retried without idempotency checking, some brokers' TMS systems processed the same status-change event multiple times — resulting in duplicate shipment records in at least 12 confirmed broker accounts. Two brokers called in to report that carrier match notifications showed conflicting statuses — a load marked 'matched' in Convoy and 'available' in their TMS simultaneously.What on-call did: Scaled up consumer instances from 3 to 12 to drain the backlog. Consumer lag is now at 800 and falling. Webhook delivery has stabilized but the duplicate data is still in broker TMS systems. Root cause not yet identified. What is NOT resolved: Duplicate shipment records in 12+ broker accounts. We do not know the full scope. Root cause of the consumer lag spike is unconfirmed. No idempotency controls exist on the webhook delivery path — this will happen again. Existing Code You also have access to the relevant section of the webhook delivery service. Read the code carefully — it may contain issues beyond the primary incident. ``typescript // webhook-delivery.service.ts // Shipment status change consumer — processes Kafka events and delivers webhooks export class WebhookDeliveryService { constructor( private readonly http: HttpClient, private readonly db: DatabaseService, private readonly logger: Logger, ) {} async processShipmentEvent(event: KafkaMessage): Promise { const payload = JSON.parse(event.value.toString()); const brokers = await this.db.query( SELECT * FROM brokersubscriptions WHERE loadid = ${payload.loadId} ); for (const broker of brokers) { try { await this.http.post(broker.webhookUrl, payload, { timeout: 5000 }); this.logger.log(Webhook delivered to broker ${broker.id}); } catch (err) { this.logger.log(Webhook failed, retrying: ${err}); await this.processShipmentEvent(event); // retry } } } } `` Your Task — Three Deliverables Deliverable 1 — Revised Implementation Produce a revised version of the webhook delivery service that addresses the core production issues identified in the incident. Your implementation should be a working TypeScript/Node.js file — not pseudocode, not a diagram. Your implementation must address: Idempotency: webhook events should not produce duplicate side effects when delivered more than once Error handling: distinguish between retryable and non-retryable failures; do not retry infinitely Observability: structured logging with enough context that an on-call engineer could diagnose a repeat incident from logs alone At least one additional issue you identify in the existing code beyond the primary incident Scope note: This is a proof-of-concept implementation, not a full production rewrite. A focused, working solution that demonstrates the right patterns is more valuable than a comprehensive but skeletal one. Deliverable 2 — README.md (Sections A, B, and C) Section A — Written Analysis (300–500 words) Address all four of the following in your written analysis: Root cause: what caused the consumer lag spike, and what conditions allowed the duplicate-delivery problem to propagate as far as it did? Design gap: why does the existing webhook delivery architecture not protect against this class of failure? What specific change closes the gap? Consistency tradeoff: the duplicate shipment records are now in broker TMS systems. Describe the trade-off between (a) attempting an automated cleanup and (b) leaving cleanup to brokers — and which you would recommend given DAT's position in the broker/carrier relationship. Scope decision: name one thing you explicitly chose NOT to include in your implementation and explain why — what would you tackle in a follow-up PR? Section B — Production Runbook + Reasoning Question Section B has two parts. Complete both. Part B1 — Incident Runbook Write a runbook for the next on-call engineer who encounters consumer lag on the shipment-events consumer group. The runbook should cover: How to confirm the issue (which metrics or logs to check first) Immediate mitigation steps (in the order they should be executed) How to confirm the incident is resolved — not just mitigated One question the on-call engineer should answer before closing the incident to prevent recurrence Part B2 — Required Reasoning Question (answer without AI assistance) Describe a scenario where an AI coding assistant would give you a plausible but incorrect answer for this type of problem — specifically, idempotency in a message-driven webhook delivery system. What would the incorrect output look like, and what would you check to identify the error before acting on it? Answer this question in your own words without using an AI tool. We want to understand how you reason about AI failure modes — not how AI describes them. Section C — AI Usage Log (Mandatory) This is not a trick. We want to see how you work with AI — not whether you used it. In a short section of your README, document your AI collaboration process. For each significant interaction with an AI tool, briefly note: What you asked the AI to help with What it gave you What you kept, changed, or rejected — and why Three interactions documented is sufficient. The log does not need to be exhaustive. Deliverable 3 — Video Walkthrough (8–10 minutes) Record your walkthrough as an MP4 or MOV file and upload it directly on the Provn platform as a separate file. Structure your video to cover: Summary (60 seconds): the incident, your diagnosis, and your recommended fix Code walkthrough (3–4 minutes): walk through your revised implementation — explain the key decisions, not just what the code does Runbook walkthrough (1–2 minutes): walk through your Part B1 runbook — how would you actually use this at 2am? Mandatory AI question (1–2 minutes): see the AI Usage Guidance section below Reflection (30–60 seconds): what would you tackle next, and what trade-off are you least confident in? Speak naturally. Communication is assessed on clarity of technical reasoning and logical structure — not verbal polish, accent, or filler words. Constraints Honor all four. AI tools will typically ignore them. Evaluators will check each one. Stack constraint: your implementation must use TypeScript and Node.js — the existing DAT Broker Tech stack. Do not introduce a different runtime, language, or framework. Infrastructure constraint: idempotency must be implemented using a persistent store (e.g. a database or Redis-style cache) — not in-memory state that would not survive a pod restart in Kubernetes. Organizational constraint: you cannot change the Kafka topic structure or partition schema. The consumer group configuration is managed by a separate platform team and any changes require a two-week change request. Your fix must work within the current consumer group design. Ownership constraint: your on-call team will inherit whatever you build. Write your runbook and code comments for the engineer who gets paged at 2am — not the one who built it. Evaluation Criteria Your submission is evaluated across five dimensions. Weights reflect what DAT's Broker Tech team cares most about. Systems Design & Technical Judgment (30%): Identifies the root cause correctly, explains the design gap that allowed the incident to propagate, and proposes an architecture fix with explicit trade-off reasoning — not just a patch. Production Code Quality & Engineering Craft (25%): TypeScript implementation is production-intentioned: meaningful types, structured logging with context, explicit error handling that distinguishes retryable from non-retryable failures, and at least one meaningful test. Message-Driven & Integration Architecture (20%): Demonstrates working knowledge of Kafka delivery semantics and idempotency design — not just that they've heard of the concepts. Webhook contract design reflects the reality of broker TMS integration failures. Communication & Technical Leadership (10%): Written analysis and runbook are structured for handoff — a new team member could act on them without asking clarifying questions. Trade-off reasoning is legible to a non-engineer. AI Fluency (15%): Evidence of directing AI with domain-specific constraints, critical evaluation of AI output, and iteration. The AI Usage Log and video answer to the mandatory question are the primary evidence sources. AI Usage Guidance We expect you to use AI tools. We evaluate how you use them — not whether you use them. Evidence of iteration, redirection, and critical evaluation scores higher than a polished output with no process documentation. The single highest-signal indicator: your video answer to the mandatory AI question. If you cannot name a specific moment where you redirected AI output, evaluators will assume you did not. Mandatory AI question (include in your video): "Walk me through one moment where you disagreed with, pushed back on, or redirected what the AI gave you — and what you did instead. Name the specific moment. Explain what the AI produced that didn't meet the bar, what you did differently, and why." Note: Part B2 of your README must be completed without AI assistance. This is not about AI detection — it is about understanding how you reason through AI failure modes independently. Submission Checklist Before you submit, confirm: Implementation file(s) — TypeScript/Node.js, uploaded as a separate file README.md — includes Section A (written analysis), Section B (Part B1 runbook + Part B2 reasoning question), and Section C (AI Usage Log) Video walkthrough — 8–10 minutes, MP4 or MOV, uploaded as a separate file All four constraints honored — check each one before submitting Part B2 answered in your own words, without AI assistance Upload each deliverable as a separate file directly on the Provn platform: your implementation file(s), your README document, and your video walkthrough. Do not bundle files into a ZIP. Do not link to external repositories or video platforms.

Product Management - Outgo Growth Brief

@dat•Product Manager

The Scenario You are three months into the Product Manager role at Outgo, DAT's fintech product unit serving independent carriers across North America. Outgo offers invoicing, factoring, and a banking product. It operates as a startup — your team moves fast, decisions are yours to make — but you have a significant structural advantage: DAT's carrier network, which processed 400 million freight postings last year and holds $150 billion in annual shipment transaction data. Your first 90 days have surfaced three things: Invoicing adoption among newly onboarded carriers drops sharply after the first invoice. Carriers get set up, submit one invoice, and then go quiet. Your ops team calls it 'one-and-done' and it's been a known problem for eight months with no clear owner. The factoring product has a meaningful approval rate gap — roughly 30% of factoring applications from smaller carriers are declined by the underwriting model, and a significant portion of those declines are for carriers who are, by all observable signals from DAT's freight data, creditworthy and active. The risk team believes the model is conservative by design. You're not sure that's the right call. A competitor fintech targeting carriers has started running ads directly against Outgo's factoring product, positioning themselves as the 'carrier-first' option with faster approvals and no hidden fees. Your GTM team flagged this two weeks ago. No one has responded yet. Your VP of Product has asked you to come to next week's quarterly planning meeting with a clear recommendation: which one of these problems should Outgo prioritize first, and what should we actually do about it? She has also asked you to be explicit about what you are not doing and why — the team has a history of trying to run three initiatives at once and landing none of them. You have access to the following data: Invoicing activation data: first-invoice-to-second-invoice conversion rates by carrier segment, cohort, and onboarding channel Factoring application data: approval rates, decline reason codes, and carrier freight activity signals from the DAT network (load volume, on-time delivery rate, lane consistency) Support ticket data: top 10 support categories for invoicing and factoring by volume and resolution time Competitive intelligence: the competitor's public pricing page and three carrier forum threads discussing Outgo vs. the competitor You do not have access to a full data analyst. You have two hours of engineering time available before the meeting. You are the only PM on Outgo right now. YOUR TASK — THREE DELIVERABLES Deliverable 1 — Outgo Growth Brief Produce a short, decision-ready brief (no longer than two pages) that your VP of Product can read in five minutes and use to run the quarterly planning conversation. It must include: Your prioritization decision: which problem you are recommending Outgo address first, and a clear rationale for why this one and not the others Your proposed intervention: what specifically you would ship or change, what metric moves as a result, and how you would know it worked within 60 days Your attribution approach: how you would isolate the impact of your initiative from other variables so the team can evaluate whether it actually worked What you are explicitly not doing: one or two sentences on each deprioritised problem, explaining why it is not the right first move — not that it doesn't matter, but why it is not the right first move now Deliverable 2 — README (Sections A, B, and C) Section A — Discovery & Prioritization Rationale Walk through how you would use the available data sources to make your prioritization decision. Which data matters most? What would you look for in each source? What is the key question each data source answers for you? Identify the most important assumption in your recommendation — the thing that, if wrong, would change your answer. How would you test it? Section B — Stakeholder & Execution Plan Name the internal stakeholders whose buy-in you need before this initiative can move (engineering, risk/underwriting, ops, GTM — be specific about what you need from each). How do you get alignment with two hours of engineering time available? What is the single highest-risk thing that could go wrong in the first 30 days of execution? How would you get ahead of it? Section C — AI Usage Log List each AI tool you used during this challenge. For each tool: what did you ask it to do, what did it produce, and what did you change, reject, or redirect before including it in your submission? If you used no AI tools, state that explicitly. Deliverable 3 — Video Walkthrough (8–10 minutes) Record your walkthrough as an MP4 or MOV file and upload it directly on the Provn platform as a separate file. Cover these four points in your video: Walk us through your prioritization decision. What was the moment you committed to this choice over the others — what data or reasoning closed it for you? Explain your attribution approach out loud. How would you actually measure whether your initiative worked, and how would you handle the 'but maybe it was something else' objection from your VP? Tell us how you would handle the competitive threat in the background — not as a separate initiative, but in the context of whatever you're already doing. What's your posture on it? Mandatory AI question: Walk me through one moment in this challenge where you redirected the AI. What did you ask it to do, what did it give you, what was wrong or incomplete about that, and what did you do next? CONSTRAINTS You have two hours of engineering time available before the meeting. Your brief and plan must work within this constraint — proposals that require significant eng scoping before the meeting are not actionable. Outgo's underwriting model is maintained by the risk team, not engineering. You cannot change the model directly. Any initiative that touches underwriting approval rates must go through risk team alignment first. DAT's carrier network data is available to you as signal, but it is not yet integrated into Outgo's product infrastructure. Using it in your initiative requires a data pipeline that does not exist yet — scoping that pipeline is part of what you're working with. The brief must be decision-ready for a non-technical VP. It cannot contain unresolved ambiguity, options without a recommendation, or metric definitions that require explanation. You are the only PM. There is no one to delegate the research or writing to. Your process choices — including how you use AI — are visible in your submission.

Growth & Monetization

+2 more