You are an AI engineer at a mid-sized financial services firm. The Compliance team manually reviews hundreds of regulatory documents each week — policy updates, audit reports, and external regulatory filings — to identify action items, flag risk areas, and route tasks to the right internal owners.
The process is slow, inconsistent, and increasingly unsustainable. A junior analyst currently spends 12–15 hours per week on initial triage alone. The Head of Compliance has asked your team to explore whether an AI agent could automate the first-pass review and routing — while keeping a human in the loop for final decisions.
You have been asked to design and prototype a Document Intelligence Agent that can:
You will not be evaluated on whether your agent is production-complete. You will be evaluated on the quality of your design decisions, your reasoning about trade-offs, your approach to reliability and evaluation, and how you communicate the system to a non-technical stakeholder (the Head of Compliance) who will decide whether to fund the next phase.
Honor these constraints in your design. Strong candidates explicitly acknowledge them. AI tools will typically ignore them — this is intentional.
Human-in-the-loop is non-negotiable: the agent may not autonomously route or act on a document. Every output must be a proposed action for a human reviewer to approve. Design for this from the start — not as an afterthought.
The documents are unstructured: you cannot assume clean headers, consistent terminology, or structured fields. Your design must handle documents where key information is buried in paragraph 14 of a 40-page PDF.
No greenfield infrastructure: the firm runs on standard cloud infrastructure (AWS or Azure) with an existing document storage system (S3 or Blob Storage). You cannot propose a new data warehouse or real-time streaming pipeline as a prerequisite.
Audit trail required: compliance workflows require a complete record of what the agent extracted, what it proposed, and why. Every agent decision must be logged and explainable to a regulator.
Scope for first phase: the first phase must be demonstrably useful within four weeks of build time with a team of two engineers. If your design requires six months to show value, it will not get funded.
We expect you to use AI tools. We evaluate how you use them — not whether you use them. Evidence of iteration, redirection, and critical evaluation scores higher than a polished output with no process documentation.
The single highest-signal indicator: your video answer to the mandatory AI question. If you cannot name a specific moment where you redirected AI output, evaluators will assume you did not.
You used AI to draft tool schemas, then caught a type error or missing field and corrected it — and you can describe what you caught
You prompted AI for an architecture, it gave you a generic LangChain tutorial structure, and you redirected it to handle the unstructured document constraint specifically
You used AI to draft the stakeholder brief, then rewrote the jargon-heavy parts because a compliance executive would not understand them
You independently designed the failure mode analysis because AI output was too generic ('hallucination is a risk') and you knew the specific failure modes from the scenario
What weak AI usage looks like:
A polished submission with no Section C, or a Section C that says 'I used ChatGPT to help me write'
Tool schemas with string types on every field because the AI defaulted to strings and you did not review them
A stakeholder brief that reads like a technical spec written for engineers
On this page