textarcana · September 12, 2025 10:53 · Sep 12, 2025
diff --git a/fintech-evals.md b/fintech-evals.md
@@ -0,0 +1,44 @@
+# Pre-Release and Training-Stage Model Evaluations in Finance
+
+This document summarizes practices and financial evaluation approaches that are applied **before deployment** of AI models in the financial sector. The focus is on validation during training, testing, or pre-release stages.
+
+---
+
+## Key Practices & Metrics
+
+| Validation / Pre-release Activity | What’s Done / Measured | Why It Matters in Finance Context |
+|---|---|---|
+| **Data quality & integrity checks** | Detect missing data, outliers, and feature consistency; stress-test slices. | Financial models are very sensitive to distribution shifts; bad data leads to wrong risk/credit/fraud predictions [milliman]. |
+| **Back-testing / Out-of-sample performance** | Historical simulation; compare predictions vs. actual outcomes. | Ensures models aren’t just overfitting; essential for risk and portfolio models [cfa]. |
+| **Cross-validation / Time-aware splits** | Use purged cross-validation, walk-forward testing to avoid look-ahead bias. | Prevents overly optimistic results in time-series financial data [wiki-pcv]. |
+| **Hyperparameter tuning & model specification reviews** | Explore architectures, parameters, feature sets. | Balances bias/variance, stability, and risk of extreme errors [google-ml]. |
+| **Stress testing / Scenario analysis** | Evaluate under adverse conditions (e.g., downturns, shocks). | Core requirement for credit and market risk models [milliman]. |
+| **Fairness, bias, regulatory compliance checks** | Check group fairness, regulatory adherence. | Prevents legal/regulatory exposure in lending, underwriting, etc. [empowered]. |
+| **Model explainability / interpretability** | Use explainability tools (feature attribution, local explanations). | Required for auditability and trust in regulated financial contexts [fiddler]. |
+| **Offline metrics linked to business KPIs** | Validate that accuracy, AUC, precision, etc. correlate with expected business outcomes. | Avoids models that look “good” technically but fail financially [google-ml]. |
+| **Gate / Release sign-off criteria** | Require thresholds across categories (edge cases, rare events, slices). | Provides a governance checkpoint before production [indium]. |
+| **Synthetic / simulation data evaluation** | Generate artificial or simulated market/fraud data to test rare events. | Helps evaluate resilience under tail risks [jpm-synth]. |
+
+---
+
+## Case Studies & Sector Examples
+
+- **AI Validator roles in banks**: Dedicated teams perform adversarial testing, subgroup fairness checks, and interpretability validation before release [fiddler].  
+- **Finance Agent Benchmark**: Evaluates foundation models on finance-analyst tasks (retrieval, Q&A) as a *pre-use benchmark* [vals].  
+- **CFA Investment Model Validation**: Professional guidance for validation in investment management (back-testing, hold-out analysis, governance) [cfa].  
+- **Empowered Systems**: Outlines validation strategies—documentation, input validation, governance—tailored to financial services [empowered].  
+- **JP Morgan synthetic data**: Uses synthetic equity market data for safe, repeatable pre-deployment model testing [jpm-synth].  
+
+---
+
+## References
+
+- [milliman]: https://www.milliman.com/insight/insurance/Effective-model-validation
+- [cfa]: https://rpc.cfainstitute.org/sites/default/files/-/media/documents/article/rf-brief/investment-model-validation.pdf
+- [wiki-pcv]: https://en.wikipedia.org/wiki/Purged_cross-validation
+- [google-ml]: https://research.google.com/pubs/archive/aad9f93b86b7addfea4c419b9100c6cdd26cacea.pdf
+- [empowered]: https://empoweredsystems.com/blog/bigger-data-bigger-problems-ai-ml-model-validation-for-financial-firms
+- [fiddler]: https://www.fiddler.ai/blog/rise-of-banking-ai-validator
+- [vals]: https://www.vals.ai/benchmarks/finance_agent-04-22-2025
+- [indium]: https://www.indium.tech/blog/ai-enabled-metrics-for-release-decision
+- [jpm-synth]: https://www.jpmorganchase.com/about/technology/research/ai/synthetic-data