The Scenario
You are a full-stack developer on the Digital Lending team at a large financial services firm. Your team owns the retail mortgage loan origination platform — the system that processes applications from initial intake through to underwriting decision.
The Platform
The platform currently consists of three microservices:
- Application Service — handles loan application intake, applicant data, and status tracking. Exposes a RESTful API consumed by the Angular dashboard.
- Document Service — manages document upload, storage, and metadata. Loan officers upload applicant documents (pay stubs, W-2s, bank statements) which are stored in S3 and indexed in PostgreSQL.
- Underwriting Service — owned by a separate team. Receives a structured underwriting packet and returns a decision. You cannot modify this service's API — you can only consume it.
The frontend is an Angular loan officer dashboard where staff review applications, upload documents, and track pipeline status.
The Initiative
The firm is piloting a GenAI-assisted document review feature. Today, loan officers manually read each uploaded document, extract key data points (employer name, income figures, pay period, YTD totals), and enter them into the application record by hand. This is slow, error-prone, and creates a bottleneck in the pipeline.
Your task is to build a proof-of-concept that uses an LLM API to automatically extract structured data from uploaded pay stubs and flag discrepancies (e.g., income on the pay stub doesn't match what the applicant self-reported). The extracted data and any flagged discrepancies should surface in the Angular dashboard for the loan officer to review and approve before the data flows downstream to underwriting.
What Exists Today
- Backend: Java 17, Spring Boot 3.x, Spring Security with role-based access (loan officers, underwriters, admins)
- Frontend: Angular 15, TypeScript, NgRx for state management
- Database: PostgreSQL (application and document metadata), S3 (raw document files)
- Infrastructure: AWS — ECS for services, RDS for PostgreSQL, S3 for storage. CI/CD via GitHub Actions.
- The Underwriting Service API is documented: it accepts a structured JSON packet with validated financial data and returns an underwriting decision. You cannot modify this contract.
Constraints
Honor these constraints in your solution. They reflect the real operating environment.
- Infrastructure: Work within the existing stack — Java 17, Spring Boot 3.x, Angular 15, PostgreSQL, AWS (ECS, RDS, S3). Do not introduce new infrastructure components (e.g., no new message brokers, no switching to a different database). You may add new services within the existing technology choices.
- PII / Regulatory: No raw PII (Social Security numbers, account numbers, or unredacted income figures tied to an identified individual) may be sent to any external LLM API. The compliance team has mandated this. Your solution must demonstrate how PII is handled before the LLM sees it.
- Scope: The proof-of-concept must focus on one document type: pay stubs. Do not attempt to solve for all document types (W-2s, bank statements) in this iteration. Identify what you would extend in a second sprint, but deliver a working POC for pay stubs only.
- Service boundaries: The Underwriting Service is owned by another team. You cannot modify its API contract. Your GenAI extraction service must produce output compatible with the existing underwriting packet format. Design your service boundaries accordingly.
- Produce three deliverables. This is a proof-of-concept, not production-ready code — prioritize design clarity and architectural reasoning over polish.