Put AI investment through the same diligence as every other capital decision
Parallax Intel scores banking AI use cases on common terms — the value peers have evidenced, set against the cost to build, run, and govern — so a bank can compare unlike use cases, its first or its next, the way an investment committee weighs any other claim on capital.
The question facing most regional banks isn't whether to invest in AI — it's where to start, and where not to. Whether a bank is weighing its first use cases or sequencing a portfolio already underway, the functions on the table — from back-office efficiency tools to customer-facing decision systems — carry different payoffs and different costs to run safely, and there is rarely a consistent basis for setting one against another. Without common criteria, capital tends to follow the most visible use case rather than the most valuable.
The framework supplies that basis. It asks four questions of every use case and answers each on the same scale:
-
01
Value evidenced — Where has value actually been disclosed in comparable deployments, and how does it compare to the alternatives?
-
02
Regulatory intensity — Which regimes apply to this use case, and how heavily do they bear on it?
-
03
Operating intensity — What is the ongoing burden to run and govern it, and where in the organization does that burden fall?
-
04
Technical complexity — Which AI components does it rely on, and how difficult are they to build and integrate?
The four-dimensional read
Return
Cost & risk · operational burden
→ Produces a single read
One of the four questions measures return; the other three measure cost and risk. Read together, they separate use cases that are well-evidenced and light to run from those whose value is thin, or whose ongoing burden quietly erodes it — the distinction that should drive where AI capital goes first, and where it does not go yet.
Two commitments hold the read together:
- Evidence over assertion. Value, operating, and technology characteristics come from what peers have publicly disclosed — not from vendor claims or conjecture. Where the public record is silent, every gap is filled with a labeled inference, never presented as fact, and the strongest source available anchors each conclusion.
- Comparative, not absolute. A score places a use case against its peers in the analyzed set. It is a decision-support benchmark for prioritization — not an institutional performance measure, and not legal advice.
Coverage
functions
instruments
categories
technologies
Behind those four questions is a continuously maintained body of analysis. Coverage runs to 28 banking functions across the whole bank, 14 operating-model categories spanning how a system is designed, fed with data, run, and governed, and 164 mapped AI technologies. The regulatory layer tracks 31 legal instruments across four jurisdictions — 14 U.S. instruments anchoring the analysis for U.S. institutions, plus 17 across Canada, the UK, and the EU — and is kept current as the rules move, from emerging state AI statutes (Texas's TRAIGA, Utah's AIPA, Colorado's ADMT Act) to the 2026 supersession of federal model-risk supervision (SR 11-7 → SR 26-2).
The four sections below take each question in turn — what it measures, what evidence feeds it, and what a finished read looks like — using AI in treasury forecasting as the worked example throughout.
AI value is easy to claim and hard to evidence
The first question the framework asks of any use case is whether its value is real — measured not by what a vendor promises, but by what peer banks have actually put on the public record.
Every vendor deck and internal champion has reason to round up: pilots get reported as wins, and “AI-powered” gets attached to outcomes it didn't drive. The anchor that holds under scrutiny is the public record — what a peer bank, or the vendor it named, has stated openly, with a number where one exists. The read starts there, and treats a benefit no one has disclosed as missing evidence, not as proof the value isn't real.
The read has two parts: how widely a use case has evidenced value, and how deep its single best result runs.
Breadth
It asks whether benefit has been disclosed across the kinds of value a bank actually weighs, rather than in one place only — nine benefit dimensions, grouped into four categories:
Commercial impact
Financial impact (P&L, cost or loss avoidance), customer impact, and adoption and reach.
Operational efficiency
Labor and effort saved, and processing speed.
Model & process quality
Accuracy and error rates, and analytical depth — how much more the system can see, through longer horizons, wider scenario coverage, or broader signals.
Risk & compliance
Measurable risk reduction, and improvement in the institution's regulatory standing.
A use case with disclosed benefit across many of these reads as more comprehensively proven than one resting on a single claim.
Scroll horizontally to compare all nine dimensions →
| Commercial impact | Operational eff. | Model & process qual. | Risk & compliance | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Financial | Customer | Adoption | Labor | Speed | Quality | Depth | Risk | Compliance | |
| Treasury Forecasting | — | — | — | ||||||
| AML | — | — | — | — | |||||
| Credit Underwriting | — | — | |||||||
| Fraud Detection | — | — | — | — | — | ||||
| Algorithmic Trading | — | — | — | — | — | — | |||
| Pricing Optimization | — | — | — | — | — | — | — | ||
Depth
It is the single strongest result on the record. Breadth can flatter a use case with many small, well-reported claims; depth surfaces the one most consequential outcome a peer has actually disclosed. The two are read together, because they tell different stories: a broad-but-shallow case — benefit shown widely, but modestly — is a different proposition from a narrow-but-exceptional one, where coverage is thin but a single result stands out.
The standard
Three things govern what counts:
- Disclosed, and weighted by how well. A realized benefit weighs more than a planned one; a quantified figure more than a directional phrase; a named primary source more than a passing mention; a bank-wide or full-client-base outcome more than a single client anecdote. The factors that move a claim's weight are visible — the case is only ever as strong as its evidence.
- Comparative, not a benchmark. A score places a use case against its peers in the analyzed set. It measures weight of evidence, not value generated: a high score means a value case is well substantiated, not that the use case is the most valuable bet in absolute terms.
- Silence is not absence. Where a peer hasn't disclosed a benefit, that is recorded as no evidence — never as evidence of none. Lower disclosure often reflects competitive sensitivity rather than absent value.
The worked read: Treasury Forecasting
Across the six sample use cases, drawing on 111 publicly disclosed benefit claims, treasury forecasting is the most broadly evidenced value case, scoring 85. Its disclosed benefits concentrate in operational efficiency and analytical depth: across four major US-bank deployments — Bank of America's CashPro, JPMorgan's Cash Flow Intelligence, PNC's PINACLE, and Citi's Cashforce / TIS — the consistent public claims are about labor saved and forecasts that reach further and see more. What the 85 says is narrow, and worth stating plainly: of these six functions, treasury forecasting's value is the best documented — not that it is the most valuable bet a given bank could make.
Value evidenced · Treasury Forecasting
Most broadly evidenced of the six sample use cases.
Higher means more comprehensively evidenced across peers — not more valuable in absolute terms.
Strongest disclosed result
~250,000 client hours saved in 2025
A well-evidenced use case can still be costly to run safely. The next three sections read those costs — beginning with the one that most often separates a back-office tool from a customer-facing system: regulatory intensity.
What drives regulatory burden isn't AI sophistication — it's whether the system decides about a person
The framework's second question — which regimes apply, and how heavily — turns less on the technology than on what the use case does with its outputs, and on whether its exposure runs through supervisory examination or statutory penalty.
The instinct is to count the laws on the books, or to assume the most advanced AI draws the most scrutiny. Both mislead. A use case's regulatory weight is set by what it does — and the sharpest dividing line is whether the system reaches a decision about a person (their credit, their eligibility, their treatment) or only produces analysis a human acts on. Cross that line and consumer-protection, fair-lending, and the new state AI statutes engage; stay on the analytical side and most of them fall away.
A second distinction matters as much. Not every applicable rule bites the same way. Some carry statutory penalties, attorney-general actions, and private rights of action; others run entirely through supervisory examination — matters requiring attention, consent orders, heightened oversight — with no direct fine attached. A use case can be subject to many instruments and still carry light intensity if its exposure is supervisory and its workflow analytical.
What the read builds
For each use case, the framework constructs a regulatory surface by asking three things of every instrument:
- Does it apply to this workflow? Applicability follows what the use case does — not where the bank operates, and not the mere presence of AI. A law the workflow never triggers is not an obligation.
- How directly does it bear? Each applicable instrument is rated High, Medium, Low, or Not relevant by how central its obligations are to the use case's typical workflow.
- How hard would it bite? Enforcement character — supervisory or statutory, penalty type and severity — and jurisdictional reach, from subnational to multi-country, including whether obligations extend across borders.
This runs across 31 instruments spanning nine regulatory domains and four jurisdictions, anchored on the 14 U.S. instruments that bear on U.S. institutions: AI-specific, consumer protection and fair lending, data privacy and security, financial crime and payments, market conduct and trading, model risk, operational resilience, tech governance, and third-party and outsourcing.
One distinction does quiet but important work: how directly an instrument applies to a workflow is scored separately from how hard it would bite if violated. Keeping the two apart stops a heavy-penalty law that a use case barely touches from masquerading as a central obligation — a common way regulatory heat-maps mislead.
U.S. regulatory relevance · Treasury Forecasting
Relevance to the typical workflow of AI in treasury forecasting. Coverage spans 31 instruments across four jurisdictions; the 14 U.S. instruments are shown. Decision-support — counsel required for definitive applicability and compliance determinations.
The worked read: Treasury Forecasting
At a regulatory intensity of 50, treasury forecasting carries the lightest profile of the six sample use cases — and lighter in kind, not only in degree. Because it produces analysis rather than decisions about people, the consumer-facing regimes that drive intensity elsewhere simply don't engage: fair-lending and consumer-protection law (ECOA/Reg B, FCRA/Reg V, UDAAP) and the new state AI statutes (the Colorado ADMT Act, Texas's TRAIGA, Utah's AIPA) all read as Not relevant to its workflow. What remains concentrates in three supervisory regimes — third-party and outsourcing risk (Interagency TPRM, the primary driver), model risk (SR 26-2), and tech governance (the FFIEC IT Handbook and OCC Heightened Standards). Notably, SR 26-2 — which replaced SR 11-7 in 2026 and expressly excludes generative and agentic AI — sits at Medium for a conventional-ML forecasting tool, not High. And all of this exposure runs through examination, not statute: matters requiring attention and supervisory remediation, not direct fines, attorney-general actions, or private litigation. The result is a profile qualitatively different from a customer-facing system like credit underwriting, where exposure compounds across consumer-protection, fair-lending, privacy, and model-risk regimes at once.
Regulatory intensity · Treasury Forecasting
The lightest profile of the six sample use cases.
Higher means heavier regulatory intensity on the typical workflow — not a measure of a bank's compliance, and not legal advice.
Character of exposure
Supervisory examination, not statutory penalty
Knowing which rules bite is one cost; running and governing the system that satisfies them is another. The next section reads the second of those costs — operating intensity.
The build is a one-time cost; running and governing an AI system is permanent — and it's where realized value quietly leaks away
The question most business cases under-price: what does it cost to run and govern this system, day after day — and where in the bank does that cost land?
The temptation is to treat an AI use case as a build — scope it, ship it, book the benefit. But the build is the cheap part. Running and governing the system is permanent, and cross-functional in a way a one-time project is not: it draws continuously on model risk, data operations, and governance. It is also where evidenced value quietly erodes — a use case that looked attractive on its business case can underdeliver because the operating intensity was never priced in. And that burden is unevenly distributed: it concentrates in particular categories, often shaped by what regulation and the technology demand rather than freely chosen. Knowing where it concentrates is what lets a bank budget for it, rather than discover it after the system is live.
What the read builds
The framework characterizes how a system is actually built, run, and governed — across 14 operating-model categories, grouped into four layers:
- Algorithmic design — model architecture, and the learning paradigm.
- Data and feature lifecycle — feature engineering, training-data scope, and retraining cadence.
- Operational behavior — inference mode, human-in-the-loop controls, exception handling, and how outputs feed downstream decisions.
- Governance and trust — explainability, fairness controls, model validation, data-quality controls, and data privacy.
Scroll horizontally to compare all six sample use cases →
| Algorithmic Trading | AML | Credit Underwriting | Fraud Detection | Pricing Optimization | Treasury Forecasting | |
|---|---|---|---|---|---|---|
| Algorithmic design | ||||||
| Model architecture | ||||||
| Paradigm type | ||||||
| Data & feature lifecycle | ||||||
| Feature engineering | ||||||
| Training data | ||||||
| Model refresh | ||||||
| Operational behavior | ||||||
| Inference mode | — | |||||
| Human-in-the-loop | ||||||
| Exception handling | ||||||
| Output integration | — | |||||
| Governance & trust | ||||||
| Explainability | ||||||
| Fairness & discrimination | — | — | ||||
| Model validation | ||||||
| Data controls | ||||||
| Data privacy | — | |||||
The distinctive part is the method. Banks rarely publish their full operating model, so each category is characterized the way the rest of the framework works: by reading the public record first, and where the record stops, applying a single labeled inference — never presented as fact. Three inference types do that work: a technical implication (an architecture or control the system must have to work as described), a regulatory expectation (a control a regulated bank would be expected to operate), or a domain-typical practice (what banks running this kind of AI conventionally do). Every characterization carries its tag, so a reader can see exactly where disclosure ends and inference begins.
Reading the operating model · JPM Cash Flow Intelligence
Liquidity decisions remain with human teams
Dashboards and forecasts, not automated execution
SR 26-2 expects it for a material model
Enterprise treasury forecasting retrains on cycle
Automated reconciliation implies validation at ingestion
How disclosure ends and inference begins, shown per category. Inference is one of three labeled types — never presented as fact. Five categories shown; chosen characterizations only, not full option sets or scores.
The standard
Four things govern the read:
- Disclosure first. A category is characterized from the public record wherever the record speaks; inference fills only what disclosure leaves open.
- Inference is always labeled. Every inferred characterization is tagged by type — technical implication, regulatory expectation, or domain-typical — and never presented as established fact.
- Weighted by what matters. Categories that drive more governance burden count for more than those that don't — a materiality judgment applied consistently, so the score reflects where the real operating weight sits.
- Comparative, decision-support. The score positions a use case against its peers; it measures operating intensity, not a bank's operating maturity.
The worked read: Treasury Forecasting
At an operating intensity of 56, treasury forecasting carries the lightest profile of the six sample use cases — most of its run-and-govern choices sit at the cohort floor: ensemble models on scheduled-batch inference, pre-decision human review, and informational output that doesn't trigger automated action. But "light overall" hides where the weight sits. Its operating intensity concentrates in one category — data sourcing: the multi-source external integration the forecasts depend on, pulling from client ERP systems, external bank accounts, and internal systems, each a system of record that tolerates no latency or reconciliation gaps. And a few governance categories are shaped by regulation rather than choice — model validation, where SR 26-2 expects independent validation for a material model, and data privacy, because personal data flows through the pipeline. The lesson generalizes: even a light use case carries a specific, locatable operating intensity, and the value of the read is knowing exactly where it sits before the system is live.
Operating model intensity · Treasury Forecasting
The lightest profile of the six sample use cases.
Higher means a heavier ongoing burden to run and govern — not a measure of a bank's operating maturity, and not legal advice.
Where the weight sits
Multi-source data integration
Running the system is one cost; the difficulty of building it in the first place is another. The final section reads technical complexity.
Build difficulty isn't one thing — frontier AI is often cheaper to build than conventional models wired into hard systems
The framework's final question — which AI components does this rely on, and how hard are they to build and integrate — decomposes build difficulty across the model, the data, and the integration, because the same score can come from very different places.
The temptation is to read “AI complexity” as a single dial: the more advanced the model, the harder the build. It doesn't hold. Build difficulty has at least three separable sources, and for a great deal of banking AI the model is the cheapest of them — a conventional, well-understood algorithm can be far less work to stand up than the data pipeline that feeds it or the systems of record it has to integrate with. The framework decomposes build difficulty into its sources so a bank can see where the cost actually sits, and so a use case isn't waved through on the strength of “frontier AI” that is, in build terms, lighter than a conventional model wired into a treasury management system.
What the read builds
The framework reads three components:
- AI sophistication — which technologies the system uses, mapped against a taxonomy of 164 AI technologies organized in three layers: core AI (the algorithms themselves), applied AI (forecasting, anomaly detection, language, vision), and lifecycle and infrastructure (model development, monitoring, and deployment).
- Data complexity — how many and which data modalities the system works across, from structured tables and time series through graph relationships, text, and beyond.
- Integration depth — how many systems the model must connect to, and how deep those connections run.
Two moves keep the read honest. It records not only what a system uses but what it notably does not — the absence of large language models, graph networks, or unstructured data is part of the profile, and recording it stops a conventional system from being scored as if it were frontier. And where disclosure stops, the same three labeled inference types fill the gap — a technical implication, a regulatory expectation, or a domain-typical practice — each tagged, never presented as fact.
AI components · Treasury Forecasting
AI components
Disclosed
Ensemble methods
Time-series forecasting
Scenario simulation
AutoML
Drift monitoring
Inferred
Neural networks (domain-typical)
Anomaly detection (technical implication)
Notably absent
Large language models
Generative AI
Graph neural networks
Federated learning
Computer vision
NLP
Data modalities
Disclosed
Structured / tabular
Time-series / temporal
Absent
Graph / network
Text
Image
Video
Voice
Code
Geospatial
Integration
Disclosed
Multi-bank ingestion
ERP (SAP)
TMS
Components from public disclosure; inference labeled by type; absences recorded. Treasury Forecasting; chosen components shown, not the per-component complexity weights or score.
The standard
Four things govern the read:
- Disclosure first, absence recorded. Components come from the public record; what a system notably lacks is recorded alongside what it has.
- Inference always labeled. Each inferred component is tagged by type and never presented as established fact.
- Weighted by complexity. Each technology, modality, and integration carries a complexity weight, so the score reflects genuine build difficulty rather than a count of components — a principle applied consistently, with no values shown.
- Comparative, decision-support. The score positions a use case against its peers; it measures build and integration difficulty, not a bank's engineering capability.
The worked read: Treasury Forecasting
At a technical complexity of 40, treasury forecasting is the lightest build of the six sample use cases — and the decomposition shows why. Its disclosed AI is conventional and mature: ensemble methods, time-series forecasting, scenario simulation, AutoML, and drift monitoring, with neural networks a domain-typical inference and anomaly detection a technical implication. The frontier categories that lift other functions are notably absent — no large language models, no generative AI, no graph neural networks, no federated or edge learning. The data layer is narrow: two modalities, structured transaction records and time series, with none of the graph relationships or text feeds that make compliance functions data-heavy. What complexity exists lives almost entirely in the integration — multi-bank ingestion, ERP (SAP), and treasury management systems: few system classes, but each a system of record that tolerates no latency or error. The lesson is the one the score is built to surface: the build is light precisely because the model is conventional and the data narrow, and the engineering effort that remains is robust integration, not the AI. It is the same picture the operating read showed — the weight sits in the data plumbing, not the model.
Technical complexity · Treasury Forecasting
The lightest build of the six sample use cases.
Higher means greater build and integration difficulty — not a measure of a bank's engineering capability, and not legal advice.
Where the cost sits
Integration, not the model
That completes the read. With value set against the three costs of realizing it — regulatory, operating, and technical — a use case can be placed the way any other claim on capital is: the well-evidenced, light-to-build cases that earn capital first, and the ones whose value is outweighed by what it takes to build, govern, and sustain. Producing that placement, on common terms, is what the framework is for.
Deciding where to start with AI — or where to take it next?
A structured, evidence-based read may be useful to how those decisions get made.