Generative AI Guardrails in Banking
6 min read
Introduction
Generative AI is transforming financial services and with that comes the need for clear, enforceable guardrails. Unique AI is aligned with the Handbook on Generative AI Guardrails in Banking. The Handbook was published in January 2024 by the ABS Standing Committee on Data Management, whose members, such as MAS, HSBC, and JP Morgan, drew on experience across more than thirty enterprise Gen AI use cases to identify the sector's most critical risks and the controls needed to manage them.
Read here the Handbook on Generative AI Guardrails in Banking
The Top 10 AI Risks according to the Handbook
In the following, we will address the Top 10 AI Risks that could significantly impact our success. For each risk, we will outline what it is, why it matters, and the mitigation approach. This structure ensures clarity, preparedness, and effective risk management.
Unrepresentative or biased data inputs
AI can produce unfair or inaccurate outcomes when data is skewed, low‑quality, or unjustified. This leads to customer harm, compliance breaches, and reputational damage.
Mitigations:
Human oversight and literacy: We run HITL reviews and role‑based AI training (AI Academy) to validate sensitive decisions and raise bias awareness.
Model selection and safety: We benchmark models against provider model cards and enforce Azure content filters to reduce biased or unsafe outputs.
Ingestion quality: We improved ingestion fidelity with MDI by default and are rolling out agentic ingestion to fix bad chunk data quality (poorly split documents that break sentences, separate tables from headers, or drop references, causing the retriever to pull fragments without the right context).
Governance and tracking: Issues and mitigations are monitored and recorded through our risk management and governance process.
More about Benchmarking can be found here:
Benchmarking
Toxic and offensive outputs
GenAI can generate harmful, abusive, or non‑compliant content, risking customers, brand, and regulatory scrutiny.
Mitigations:
Content filtering: For LLMs hosted on Azure, content filtering activated and enforced, with defined risk categories and thresholds to block unsafe prompts/outputs.
Risk management and governance: Operational incident path through our risk management and governance process to triage and remediate risky generations, with learnings fed back into prompts/guardrails.
Safeguards and benchmarking: Provider safeguards reviewed alongside our benchmarking to ensure models and configurations remain fit‑for‑purpose and aligned with policy.
Lack of AI risk awareness
Low AI literacy and weak culture cause misuse, control gaps, and oversight failures.
Mitigations:
Employees
Training and enablement: Role-based security/IT security training, annual mandatory compliance training with quiz, and the Unique AI Academy for practical AI literacy and responsible-use skills.
Always-on awareness: Yearly AI awareness month plus targeted, subject-specific campaigns throughout the year to reinforce do’s/don’ts, escalation paths, and recourse options.
Culture and accountability: Reinforced responsible AI behavior through Secure Coding workshops and hands-on training like outage simulations, ensuring consistent adoption of standards and oversight.
Customers
Governance touchpoint: Each customer use case is registered via our risk management process (AI Risk Register entry) to establish ownership, scope, tiering, and review hooks.
Training and awareness: Customer-facing AI Academy sessions to explain capabilities, limitations, oversight expectations (HITL), and how to raise enquiries or appeals.
Lack of use case, data and model governance
Missing inventory, tiering, and reviews create blind spots and inconsistent controls.
Mitigations:
AI Registry/Inventory with risk tiering: We maintain a managed AI Registry/Inventory that records purpose/scope, autonomy, jurisdictions, data/model details, ownership, status, and risk tiering. This enables proportionate controls, pre-/post‑deployment reviews, and change tracking across all use cases.
AI Governance Framework: Our AI Governance Framework standardizes policies, roles, guardrails, and review expectations across the lifecycle (design → data → build/review → deployment → monitoring/change). It ensures consistent approvals, documentation, and oversight for use cases, data handling, and model choices.
The full AI Governance Framework can be found here:
AI Governance Framework
Inadequate human oversight
Human oversight must be proportionate and clear; if in/by/over‑the‑loop controls are misaligned, errors slip through and automation bias increases.
Mitigations:
Oversight design and testing: We define the oversight pattern per use case and risk tier (human‑in/by/over‑the‑loop), and validate it through pre‑/post‑deployment testing with benchmarks and HITL, reinforced by our AI Verify pilot for example, to ensure effectiveness in practice.
Governance, reviews, and traceability: We register oversight decisions and risks in our central Risk Register, conduct periodic reviews (proportionate to materiality), escalate via our AIMS risk management, and incorporate findings from the AI Verify pilot to keep oversight fit‑for‑purpose over time.
LatticeFlow evaluation (see link at the end)
More about the AI Verify Project can be found here:
AI Verify Foundation
Inadequate feedback and recourse mechanisms
Missing or weak channels for enquiries, appeals, or corrections erode trust, leave harm unresolved, and create conduct/fairness exposure.
Mitigations:
Intake, triage, ownership, and communication: We operate a structured Customer Support Process with clear intake paths (email/self‑service), daily triage, assigned ownership, and proactive status updates to ensure every enquiry is tracked through to closure.
Response and resolution targets: We apply defined response and resolution time targets, severity handling, and availability reporting according to SLA in the individual contract
Hallucination / fabrication / confabulation
GenAI can produce confident but false outputs, leading to wrong decisions and potential legal exposure.
Mitigations:
Evaluation and monitoring: We run a dedicated Hallucination Evaluation methodology with grounding checks and maintain a hallucination level metric to track trends and trigger corrective actions over time.
User-facing safeguards: We display clear warning messages in the UI when uncertainty is likely and guide users to verify or escalate; this transparency supports informed use and effective human-in-the-loop review.
Continuous improvement: Findings feed into prompt/retrieval adjustments and model selection, keeping outputs grounded in source context and aligned with domain expectations.
LatticeFlow evaluation (see link at the end)
More about the Hallucination Evaluation Process can be found here:
Hallucination Evaluation
Overconfidence
Systems or users may over‑trust model certainty, especially when uncertainty is poorly communicated.
Mitigations:
Transparency and guardrails: We provide explainability through RAG citations and document highlighting, and apply safety guardrails and proportionate transparency design so users can judge when to trust, verify, or escalate.
Testing, calibration, and training: We run continuous benchmarking, hallucination checks, and human‑in‑the‑loop validation to align displayed confidence with realized accuracy and to correct miscalibration over time. Furthermore, we reinforce this with role‑based enablement through the Unique AI Academy so users know when and how to cross‑check or escalate.
Insufficient model accuracy / soundness
Models not fit‑for‑purpose or inadequately validated cause operational loss and compliance breaches.
Mitigations:
Testing and validation: We run Continuous Testing & Validation with benchmarks and factuality/faithfulness evaluations, apply human‑in‑the‑loop checks, and leverage insights from the AI Verify pilot to confirm fitness before and after deployment.
Standards and guardrails: Under our AI Governance Framework, we apply reliability/robustness standards, explainability via citations, and security guardrails so models behave predictably and are auditable.
Governance and reviews: We conduct risk assessment and treatment, register risks centrally in the Risk Register, and run periodic reviews through AIMS risk management to detect issues and mandate remediation.
Fit‑for‑purpose model selection: We use our LLM Availability Overview to select validated models, enforce configuration constraints, align regional data‑zones to jurisdictional needs, and plan migrations via retirement tracking to keep performance stable over time.
LatticeFlow evaluation (see link at the end)
Model degradation from unexpected use
Performance worsens when usage drifts beyond intended scope, creating hidden failures and fairness risks.
Mitigations:
Governance and scope enforcement: We apply principles, access controls, and audit logs under our AI Governance Framework to enforce intended use and maintain traceability. Robustness practices and security guardrails help prevent off‑label usage and surface anomalous behavior.
Monitoring, reviews, and re‑tiering: Through our AIMS risk management, we register risks centrally, run periodic reviews proportionate to materiality, and use clear treatment options to mandate re‑tiering or re‑approval when usage, autonomy, or context changes.
Testing and drift detection: Continuous Testing & Validation uses benchmarks, hallucination/faithfulness checks, and human‑in‑the‑loop validation to catch degradation from new inputs, domains, or user patterns, feeding fixes into retrieval, prompts, and model selection.
Inventory and approved scope: Our AI Registry/Inventory records approved scope, autonomy level, risk tiering, ownership, and status for each use case, creating the authoritative reference for scope changes and the trigger point for change control.
Fit‑for‑purpose model choices: Our LLM Availability Overview guides validated model selection, enforces configuration constraints, aligns regional data‑zones with deployment requirements, and tracks retirements so teams can migrate proactively without unintended behavior shifts.
Together with LatticeFlow AI, we developed the first FINMA-aligned technical blueprint for assessing an agentic AI system already in production. The evaluation focused on:
System reliability and robustness
Explainability of outputs
Human oversight and intervention mechanisms
Ongoing risk monitoring over time
See here the full Report: