AI Security Testing for Financial Services

Threat-Led Testing Is the New Standard for AI in Financial Services

May 29, 2026
Nabeesha Javed

When Commonwealth Bank published its organization-wide AI report last week, most of the coverage focused on governance frameworks and board accountability.

Fair enough…..

But buried in the detail was something more operationally interesting:

The bank has been running adversarial testing against its AI systems before release, simulating both real-world and attack-condition scenarios, and it has pushed validation controls into runtime.

This is not something compliance can help with its a category that falls under QA.

And for CTOs at fintechs and financial institutions still treating AI security testing like traditional software QA, it is a gap that is getting harder to ignore.

The attack surface has changed

Traditional software has a relatively fixed threat surface. You test the inputs, the outputs, the edge cases, and you ship. AI models do not work that way.

Large language models generate responses based on probability, not rules. That means the same model can behave differently on Tuesday than it did on Friday, under the same inputs, if the underlying data distribution has shifted. It also means the model can be manipulated in ways that traditional code cannot. Prompt injection, jailbreaking, data poisoning, adversarial inputs designed to trigger incorrect outputs. These are not theoretical concerns in financial services. They are live attack vectors.

CBA acknowledged this directly in its report: malicious actors are using AI to scale and refine their attacks. The bank is using AI to detect phishing domains, flag unusual transaction patterns, and run scam prevention. Its adversaries are using AI to make those attacks harder to detect.

If your QA process was designed before generative AI was in your stack, it was not designed for this.

What threat-led testing actually means

Threat-led testing is not new in financial services. Regulators in the UK, EU, and increasingly in the US have pushed banks toward TLPT frameworks, where security testing is structured around realistic threat scenarios rather than checkbox audits.

Applying that same logic to AI systems means a few specific things:

Your penetration testing scope has to include the model, not just the infrastructure around it. Most fintech security testing still focuses on APIs, authentication layers, and network configuration. The model itself is often out of scope. That needs to change.

Simulated fraud behavior has to evolve with the actual fraud patterns. CBA runs AI against tens of thousands of transaction anomaly alerts daily. The fraud patterns that system was trained on six months ago are not the fraud patterns it is facing today. Testing scenarios have to reflect that.

Adversarial inputs need to be part of your pre-release validation. Synthetic adversarial examples, prompt injection attempts, and edge-case inputs designed to expose model failure modes should be standard before any customer-facing AI goes live.

Groundedness checks need to run at runtime, not just at build time. CBA implemented what it called groundedness guardrails inside its chatbot environment, blocking or flagging responses that could not be validated against verified data. That is a runtime testing layer, not a pre-release one.

The operational drift problem

Here is the part that does not get enough attention in AI testing discussions: models do not stay static after deployment.

Continuous learning models can drift. Data distributions shift. Fraud patterns evolve. A model that passed validation in January may be behaving differently by Q3, not because anyone changed the code, but because the world changed around it.

CBA built ongoing monitoring into its AI lifecycle explicitly because of this. Performance drift, data pattern changes, and emerging risks are assessed through both quantitative and qualitative reviews on a continuous basis.

For fintech CTOs, this creates a new category of testing responsibility. It is not just “did this pass QA before launch.” It is “do we have a surveillance layer running alongside every model in production, and are we acting on what it tells us.”

Most teams do not have that infrastructure in place yet.

Why this matters now

AI is moving from experimental to operational in financial services fast. Payment fraud detection, customer-facing chatbots, credit decisioning support, and document processing. These are not pilots anymore. They are live systems making decisions that affect customers.

Regulators are watching. Australia’s APRA, the UK’s FCA, and the EU AI Act are all moving toward frameworks that will require documented evidence of AI testing, validation, and monitoring. The institutions that can produce that evidence will be in a better position than those scrambling to retrofit governance after the fact.

CBA’s decision to publish its AI report was deliberate. It was not just transparency for its own sake. It was signaling to regulators, customers, and the market that it has control over its AI systems, not just access to them.

That distinction, control versus access, is where the competitive gap is opening up in financial services AI right now.

The practical question

If you are a CTO at a fintech or financial institution, the honest question to ask is: What would happen if a regulator asked you to produce evidence of adversarial testing against your AI systems today?

Kualitatem works with financial institutions building AI validation programs that go beyond pre-deployment checklists, including adversarial testing, runtime monitoring, and audit-ready documentation. If your current QA framework wasn’t designed for AI, it’s worth a conversation.

Book a Call

Threat-Led Testing Is the New Standard for AI in Financial Services

The attack surface has changed

What threat-led testing actually means

Other News

Let’s Build Your Success Story