Diligence · Overview

Put AI investment through the same diligence as every other capital decision

Parallax Intel scores banking AI use cases on common terms — the value peers have evidenced, set against the cost to build, run, and govern — so a bank can compare unlike use cases, its first or its next, the way an investment committee weighs any other claim on capital.

The question facing most regional banks isn't whether to invest in AI — it's where to start, and where not to. Whether a bank is weighing its first use cases or sequencing a portfolio already underway, the functions on the table — from back-office efficiency tools to customer-facing decision systems — carry different payoffs and different costs to run safely, and there is rarely a consistent basis for setting one against another. Without common criteria, capital tends to follow the most visible use case rather than the most valuable.

The framework supplies that basis. It asks four questions of every use case and answers each on the same scale:

01
Value evidenced — Where has value actually been disclosed in comparable deployments, and how does it compare to the alternatives?
02
Regulatory intensity — Which regimes apply to this use case, and how heavily do they bear on it?
03
Operating intensity — What is the ongoing burden to run and govern it, and where in the organization does that burden fall?
04
Technical complexity — Which AI components does it rely on, and how difficult are they to build and integrate?

The four-dimensional read

Return

Value evidenced what peers have proven

Cost & risk · operational burden

Regulatory intensity

Operating intensity

Technical complexity

→ Produces a single read

Fund first Fund with controls Low-stakes pilot Defer

The four scores place a use case on a single funding spectrum — from fund-first to defer.

One of the four questions measures return; the other three measure cost and risk. Read together, they separate use cases that are well-evidenced and light to run from those whose value is thin, or whose ongoing burden quietly erodes it — the distinction that should drive where AI capital goes first, and where it does not go yet.

Two commitments hold the read together:

Evidence over assertion. Value, operating, and technology characteristics come from what peers have publicly disclosed — not from vendor claims or conjecture. Where the public record is silent, every gap is filled with a labeled inference, never presented as fact, and the strongest source available anchors each conclusion.
Comparative, not absolute. A score places a use case against its peers in the analyzed set. It is a decision-support benchmark for prioritization — not an institutional performance measure, and not legal advice.

Coverage

Banking
functions

Legal
instruments

14 U.S. · 17 Canada, UK & EU

Operating-model
categories

164

AI
technologies

A continuously maintained body of analysis, current to June 2026.

Behind those four questions is a continuously maintained body of analysis. Coverage runs to 28 banking functions across the whole bank, 14 operating-model categories spanning how a system is designed, fed with data, run, and governed, and 164 mapped AI technologies. The regulatory layer tracks 31 legal instruments across four jurisdictions — 14 U.S. instruments anchoring the analysis for U.S. institutions, plus 17 across Canada, the UK, and the EU — and is kept current as the rules move, from emerging state AI statutes (Texas's TRAIGA, Utah's AIPA, Colorado's ADMT Act) to the 2026 supersession of federal model-risk supervision (SR 11-7 → SR 26-2).

The four sections below take each question in turn — what it measures, what evidence feeds it, and what a finished read looks like — using AI in treasury forecasting as the worked example throughout.

01 Value evidenced

AI value is easy to claim and hard to evidence

The first question the framework asks of any use case is whether its value is real — measured not by what a vendor promises, but by what peer banks have actually put on the public record.

Every vendor deck and internal champion has reason to round up: pilots get reported as wins, and “AI-powered” gets attached to outcomes it didn't drive. The anchor that holds under scrutiny is the public record — what a peer bank, or the vendor it named, has stated openly, with a number where one exists. The read starts there, and treats a benefit no one has disclosed as missing evidence, not as proof the value isn't real.

The read has two parts: how widely a use case has evidenced value, and how deep its single best result runs.

Breadth

It asks whether benefit has been disclosed across the kinds of value a bank actually weighs, rather than in one place only — nine benefit dimensions, grouped into four categories:

Commercial impact

Financial impact (P&L, cost or loss avoidance), customer impact, and adoption and reach.

Operational efficiency

Labor and effort saved, and processing speed.

Model & process quality

Accuracy and error rates, and analytical depth — how much more the system can see, through longer horizons, wider scenario coverage, or broader signals.

Risk & compliance

Measurable risk reduction, and improvement in the institution's regulatory standing.

A use case with disclosed benefit across many of these reads as more comprehensively proven than one resting on a single claim.

Evidence breadth · six sample use cases

Scroll horizontally to compare all nine dimensions →

	Commercial impact			Operational eff.		Model & process qual.		Risk & compliance
	Financial	Customer	Adoption	Labor	Speed	Quality	Depth	Risk	Compliance
Treasury Forecasting		—						—	—
AML		—			—		—		—
Credit Underwriting	—							—
Fraud Detection	—	—	—				—		—
Algorithmic Trading	—	—		—	—			—	—
Pricing Optimization		—	—	—	—	—		—	—

Significant Moderate Limited None disclosed

Cell shading is strength of evidence disclosed, not value generated, read across all six sample use cases from the benefit-claims register. Treasury Forecasting — the row carried through this page — is the disclosed / synthesized read.

Depth

It is the single strongest result on the record. Breadth can flatter a use case with many small, well-reported claims; depth surfaces the one most consequential outcome a peer has actually disclosed. The two are read together, because they tell different stories: a broad-but-shallow case — benefit shown widely, but modestly — is a different proposition from a narrow-but-exceptional one, where coverage is thin but a single result stands out.

The standard

Three things govern what counts:

Disclosed, and weighted by how well. A realized benefit weighs more than a planned one; a quantified figure more than a directional phrase; a named primary source more than a passing mention; a bank-wide or full-client-base outcome more than a single client anecdote. The factors that move a claim's weight are visible — the case is only ever as strong as its evidence.
Comparative, not a benchmark. A score places a use case against its peers in the analyzed set. It measures weight of evidence, not value generated: a high score means a value case is well substantiated, not that the use case is the most valuable bet in absolute terms.
Silence is not absence. Where a peer hasn't disclosed a benefit, that is recorded as no evidence — never as evidence of none. Lower disclosure often reflects competitive sensitivity rather than absent value.

The worked read: Treasury Forecasting

Across the six sample use cases, drawing on 111 publicly disclosed benefit claims, treasury forecasting is the most broadly evidenced value case, scoring 85. Its disclosed benefits concentrate in operational efficiency and analytical depth: across four major US-bank deployments — Bank of America's CashPro, JPMorgan's Cash Flow Intelligence, PNC's PINACLE, and Citi's Cashforce / TIS — the consistent public claims are about labor saved and forecasts that reach further and see more. What the 85 says is narrow, and worth stating plainly: of these six functions, treasury forecasting's value is the best documented — not that it is the most valuable bet a given bank could make.

Value evidenced · Treasury Forecasting

85 / 100

Value Evidenced Score

Most broadly evidenced of the six sample use cases.

Higher means more comprehensively evidenced across peers — not more valuable in absolute terms.

Strongest disclosed result

~250,000 client hours saved in 2025

across 3,000+ corporate clients · Bank of America CashPro

A well-evidenced use case can still be costly to run safely. The next three sections read those costs — beginning with the one that most often separates a back-office tool from a customer-facing system: regulatory intensity.

02 Regulatory intensity

What drives regulatory burden isn't AI sophistication — it's whether the system decides about a person

The framework's second question — which regimes apply, and how heavily — turns less on the technology than on what the use case does with its outputs, and on whether its exposure runs through supervisory examination or statutory penalty.

The instinct is to count the laws on the books, or to assume the most advanced AI draws the most scrutiny. Both mislead. A use case's regulatory weight is set by what it does — and the sharpest dividing line is whether the system reaches a decision about a person (their credit, their eligibility, their treatment) or only produces analysis a human acts on. Cross that line and consumer-protection, fair-lending, and the new state AI statutes engage; stay on the analytical side and most of them fall away.

A second distinction matters as much. Not every applicable rule bites the same way. Some carry statutory penalties, attorney-general actions, and private rights of action; others run entirely through supervisory examination — matters requiring attention, consent orders, heightened oversight — with no direct fine attached. A use case can be subject to many instruments and still carry light intensity if its exposure is supervisory and its workflow analytical.

What the read builds

For each use case, the framework constructs a regulatory surface by asking three things of every instrument:

Does it apply to this workflow? Applicability follows what the use case does — not where the bank operates, and not the mere presence of AI. A law the workflow never triggers is not an obligation.
How directly does it bear? Each applicable instrument is rated High, Medium, Low, or Not relevant by how central its obligations are to the use case's typical workflow.
How hard would it bite? Enforcement character — supervisory or statutory, penalty type and severity — and jurisdictional reach, from subnational to multi-country, including whether obligations extend across borders.

This runs across 31 instruments spanning nine regulatory domains and four jurisdictions, anchored on the 14 U.S. instruments that bear on U.S. institutions: AI-specific, consumer protection and fair lending, data privacy and security, financial crime and payments, market conduct and trading, model risk, operational resilience, tech governance, and third-party and outsourcing.

One distinction does quiet but important work: how directly an instrument applies to a workflow is scored separately from how hard it would bite if violated. Keeping the two apart stops a heavy-penalty law that a use case barely touches from masquerading as a central obligation — a common way regulatory heat-maps mislead.

U.S. regulatory relevance · Treasury Forecasting

Regime familyInstrument(s)Relevance

Third-party & outsourcing risk

Interagency TPRM guidance · primary driver

High

Model risk

SR 26-2

Medium

Tech governance

FFIEC IT Examination Handbook · OCC Heightened Standards

Low

Financial crime & payment systems

BSA / AML

Not relevant

AI-specific state laws

Colorado ADMT Act (SB 26-189) · Texas TRAIGA · Utah AIPA

Not relevant

Consumer protection & fair lending

FCRA/Reg V · ECOA/Reg B · UDAAP

Not relevant

Data privacy & security

GLBA · CCPA/CPRA (+ CPPA ADMT) · Illinois BIPA

Not relevant

Relevance to the typical workflow of AI in treasury forecasting. Coverage spans 31 instruments across four jurisdictions; the 14 U.S. instruments are shown. Decision-support — counsel required for definitive applicability and compliance determinations.

Relevance is rated separately from severity — how directly each instrument bears on the workflow, ordered most-relevant first. Oxblood marks the single High row.

The worked read: Treasury Forecasting

At a regulatory intensity of 50, treasury forecasting carries the lightest profile of the six sample use cases — and lighter in kind, not only in degree. Because it produces analysis rather than decisions about people, the consumer-facing regimes that drive intensity elsewhere simply don't engage: fair-lending and consumer-protection law (ECOA/Reg B, FCRA/Reg V, UDAAP) and the new state AI statutes (the Colorado ADMT Act, Texas's TRAIGA, Utah's AIPA) all read as Not relevant to its workflow. What remains concentrates in three supervisory regimes — third-party and outsourcing risk (Interagency TPRM, the primary driver), model risk (SR 26-2), and tech governance (the FFIEC IT Handbook and OCC Heightened Standards). Notably, SR 26-2 — which replaced SR 11-7 in 2026 and expressly excludes generative and agentic AI — sits at Medium for a conventional-ML forecasting tool, not High. And all of this exposure runs through examination, not statute: matters requiring attention and supervisory remediation, not direct fines, attorney-general actions, or private litigation. The result is a profile qualitatively different from a customer-facing system like credit underwriting, where exposure compounds across consumer-protection, fair-lending, privacy, and model-risk regimes at once.

Regulatory intensity · Treasury Forecasting

50 / 100

Regulatory Intensity Score

The lightest profile of the six sample use cases.

Higher means heavier regulatory intensity on the typical workflow — not a measure of a bank's compliance, and not legal advice.

Character of exposure

Supervisory examination, not statutory penalty

Consumer-protection and fair-lending law does not engage

Knowing which rules bite is one cost; running and governing the system that satisfies them is another. The next section reads the second of those costs — operating intensity.

03 Operating model intensity

The build is a one-time cost; running and governing an AI system is permanent — and it's where realized value quietly leaks away

The question most business cases under-price: what does it cost to run and govern this system, day after day — and where in the bank does that cost land?

The temptation is to treat an AI use case as a build — scope it, ship it, book the benefit. But the build is the cheap part. Running and governing the system is permanent, and cross-functional in a way a one-time project is not: it draws continuously on model risk, data operations, and governance. It is also where evidenced value quietly erodes — a use case that looked attractive on its business case can underdeliver because the operating intensity was never priced in. And that burden is unevenly distributed: it concentrates in particular categories, often shaped by what regulation and the technology demand rather than freely chosen. Knowing where it concentrates is what lets a bank budget for it, rather than discover it after the system is live.

What the read builds

The framework characterizes how a system is actually built, run, and governed — across 14 operating-model categories, grouped into four layers:

Algorithmic design — model architecture, and the learning paradigm.
Data and feature lifecycle — feature engineering, training-data scope, and retraining cadence.
Operational behavior — inference mode, human-in-the-loop controls, exception handling, and how outputs feed downstream decisions.
Governance and trust — explainability, fairness controls, model validation, data-quality controls, and data privacy.

Operating-model intensity · six sample use cases

Scroll horizontally to compare all six sample use cases →

	Algorithmic Trading	Treasury Forecasting
Algorithmic design
Model architecture
Paradigm type
Data & feature lifecycle
Feature engineering
Training data
Model refresh
Operational behavior
Inference mode		—
Human-in-the-loop
Exception handling
Output integration		—
Governance & trust
Explainability
Fairness & discrimination	—	—
Model validation
Data controls
Data privacy	—

High Moderate Limited Low / N/A

Intensity of the operating-model choice for each category, read across all six sample use cases from the Functional Operating Model table. Treasury Forecasting — the column carried through this page — is the disclosed / synthesized read.

The distinctive part is the method. Banks rarely publish their full operating model, so each category is characterized the way the rest of the framework works: by reading the public record first, and where the record stops, applying a single labeled inference — never presented as fact. Three inference types do that work: a technical implication (an architecture or control the system must have to work as described), a regulatory expectation (a control a regulated bank would be expected to operate), or a domain-typical practice (what banks running this kind of AI conventionally do). Every characterization carries its tag, so a reader can see exactly where disclosure ends and inference begins.

Reading the operating model · JPM Cash Flow Intelligence

CategoryCharacterization · basisDetermined by

Human-in-the-loop

Pre-decision review

Liquidity decisions remain with human teams

Disclosed

Output integration

Informational / analytical

Dashboards and forecasts, not automated execution

Disclosed

Model validation

Independent formal validation

SR 26-2 expects it for a material model

Regulatory expectation

Retraining

Calendar-scheduled

Enterprise treasury forecasting retrains on cycle

Domain-typical

Data controls

Automated validation

Automated reconciliation implies validation at ingestion

Technical implication

How disclosure ends and inference begins, shown per category. Inference is one of three labeled types — never presented as fact. Five categories shown; chosen characterizations only, not full option sets or scores.

Oxblood marks what the bank has disclosed; the muted tags mark where a single, labeled inference fills what the public record leaves open.

The standard

Four things govern the read:

Disclosure first. A category is characterized from the public record wherever the record speaks; inference fills only what disclosure leaves open.
Inference is always labeled. Every inferred characterization is tagged by type — technical implication, regulatory expectation, or domain-typical — and never presented as established fact.
Weighted by what matters. Categories that drive more governance burden count for more than those that don't — a materiality judgment applied consistently, so the score reflects where the real operating weight sits.
Comparative, decision-support. The score positions a use case against its peers; it measures operating intensity, not a bank's operating maturity.

The worked read: Treasury Forecasting

At an operating intensity of 56, treasury forecasting carries the lightest profile of the six sample use cases — most of its run-and-govern choices sit at the cohort floor: ensemble models on scheduled-batch inference, pre-decision human review, and informational output that doesn't trigger automated action. But "light overall" hides where the weight sits. Its operating intensity concentrates in one category — data sourcing: the multi-source external integration the forecasts depend on, pulling from client ERP systems, external bank accounts, and internal systems, each a system of record that tolerates no latency or reconciliation gaps. And a few governance categories are shaped by regulation rather than choice — model validation, where SR 26-2 expects independent validation for a material model, and data privacy, because personal data flows through the pipeline. The lesson generalizes: even a light use case carries a specific, locatable operating intensity, and the value of the read is knowing exactly where it sits before the system is live.

Operating model intensity · Treasury Forecasting

56 / 100

Operating Model Intensity Score

The lightest profile of the six sample use cases.

Higher means a heavier ongoing burden to run and govern — not a measure of a bank's operating maturity, and not legal advice.

Where the weight sits

Multi-source data integration

Light to run in most categories; the burden concentrates in data sourcing

Running the system is one cost; the difficulty of building it in the first place is another. The final section reads technical complexity.

04 Technical complexity

Build difficulty isn't one thing — frontier AI is often cheaper to build than conventional models wired into hard systems

The framework's final question — which AI components does this rely on, and how hard are they to build and integrate — decomposes build difficulty across the model, the data, and the integration, because the same score can come from very different places.

The temptation is to read “AI complexity” as a single dial: the more advanced the model, the harder the build. It doesn't hold. Build difficulty has at least three separable sources, and for a great deal of banking AI the model is the cheapest of them — a conventional, well-understood algorithm can be far less work to stand up than the data pipeline that feeds it or the systems of record it has to integrate with. The framework decomposes build difficulty into its sources so a bank can see where the cost actually sits, and so a use case isn't waved through on the strength of “frontier AI” that is, in build terms, lighter than a conventional model wired into a treasury management system.

What the read builds

The framework reads three components:

AI sophistication — which technologies the system uses, mapped against a taxonomy of 164 AI technologies organized in three layers: core AI (the algorithms themselves), applied AI (forecasting, anomaly detection, language, vision), and lifecycle and infrastructure (model development, monitoring, and deployment).
Data complexity — how many and which data modalities the system works across, from structured tables and time series through graph relationships, text, and beyond.
Integration depth — how many systems the model must connect to, and how deep those connections run.

Two moves keep the read honest. It records not only what a system uses but what it notably does not — the absence of large language models, graph networks, or unstructured data is part of the profile, and recording it stops a conventional system from being scored as if it were frontier. And where disclosure stops, the same three labeled inference types fill the gap — a technical implication, a regulatory expectation, or a domain-typical practice — each tagged, never presented as fact.

AI components · Treasury Forecasting

AI components

Disclosed

Ensemble methods

Time-series forecasting

Scenario simulation

AutoML

Drift monitoring

Inferred

Neural networks (domain-typical)

Anomaly detection (technical implication)

Notably absent

Large language models

Generative AI

Graph neural networks

Federated learning

Computer vision

NLP

Data modalities

Disclosed

Structured / tabular

Time-series / temporal

Absent

Graph / network

Text

Image

Video

Voice

Code

Geospatial

Integration

Disclosed

Multi-bank ingestion

ERP (SAP)

TMS

Components from public disclosure; inference labeled by type; absences recorded. Treasury Forecasting; chosen components shown, not the per-component complexity weights or score.

Recording what a system notably lacks — no large language models, generative AI, or graph networks — is what stops a conventional build from being scored as if it were frontier.

The standard

Four things govern the read:

Disclosure first, absence recorded. Components come from the public record; what a system notably lacks is recorded alongside what it has.
Inference always labeled. Each inferred component is tagged by type and never presented as established fact.
Weighted by complexity. Each technology, modality, and integration carries a complexity weight, so the score reflects genuine build difficulty rather than a count of components — a principle applied consistently, with no values shown.
Comparative, decision-support. The score positions a use case against its peers; it measures build and integration difficulty, not a bank's engineering capability.

The worked read: Treasury Forecasting

At a technical complexity of 40, treasury forecasting is the lightest build of the six sample use cases — and the decomposition shows why. Its disclosed AI is conventional and mature: ensemble methods, time-series forecasting, scenario simulation, AutoML, and drift monitoring, with neural networks a domain-typical inference and anomaly detection a technical implication. The frontier categories that lift other functions are notably absent — no large language models, no generative AI, no graph neural networks, no federated or edge learning. The data layer is narrow: two modalities, structured transaction records and time series, with none of the graph relationships or text feeds that make compliance functions data-heavy. What complexity exists lives almost entirely in the integration — multi-bank ingestion, ERP (SAP), and treasury management systems: few system classes, but each a system of record that tolerates no latency or error. The lesson is the one the score is built to surface: the build is light precisely because the model is conventional and the data narrow, and the engineering effort that remains is robust integration, not the AI. It is the same picture the operating read showed — the weight sits in the data plumbing, not the model.

Technical complexity · Treasury Forecasting

40 / 100

Tech Complexity Score

The lightest build of the six sample use cases.

Higher means greater build and integration difficulty — not a measure of a bank's engineering capability, and not legal advice.

Where the cost sits

Integration, not the model

Conventional AI on a narrow data layer; few system classes, each a system of record

That completes the read. With value set against the three costs of realizing it — regulatory, operating, and technical — a use case can be placed the way any other claim on capital is: the well-evidenced, light-to-build cases that earn capital first, and the ones whose value is outweighed by what it takes to build, govern, and sustain. Producing that placement, on common terms, is what the framework is for.

Deciding where to start with AI — or where to take it next?

A structured, evidence-based read may be useful to how those decisions get made.

Start a conversation →