You are a Forward Deployed Engineer at an operations intelligence software company. You have just been assigned to Vertex Manufacturing, a mid-market industrial manufacturer with ~600 employees across two production facilities. Vertex licensed your platform three weeks ago. This is your first onsite.
The VP of Operations, Dana Reyes, has one sentence for you when you arrive: "Our production metrics dashboard takes 20-plus minutes to load and nobody trusts the numbers anymore." Dana has a board presentation in 5 days where she will present production KPIs to the CEO and two board members. She needs the dashboard working and trustworthy before then.
During your first hour on-site, you uncover the following:
What you find during site discovery**
- The dashboard queries Vertex's on-prem SQL Server database directly in real-time, with no caching layer. The SQL Server is also running their ERP — every dashboard query locks production tables during peak shift hours (7–9am, 3–5pm).
- Three core metrics are defined differently between the dashboard and the manual spreadsheets the floor supervisors use: OEE (Overall Equipment Effectiveness), Scrap Rate, and Throughput. The dashboard values and the floor values diverge by 8–23% depending on the metric.
- The IT team is two people. The senior IT lead, Marcus, is on a scheduled vacation and won't return until Day 4. The remaining IT generalist (Jamie) has basic SQL skills and can follow documented procedures but cannot debug application code independently.
- Vertex's infrastructure: on-prem SQL Server (cannot be moved or replaced per IT policy), AWS S3 for data archiving (already in use), BI Reporting Tool (already licensed). No Kafka, no streaming infrastructure.
- Dana does not know about the metric definition discrepancy. She believes the dashboard numbers are accurate — just slow.
Your job is to diagnose the real problem, build a working solution for the highest-priority issue, and brief both Dana and Jamie before you leave the site. You have 60–70 minutes.