Skip to content

Instantly share code, notes, and snippets.

@shouse-lab
Created September 5, 2025 14:13
Show Gist options
  • Select an option

  • Save shouse-lab/3daa466c1acfa1b78491c1f94f4e0d1f to your computer and use it in GitHub Desktop.

Select an option

Save shouse-lab/3daa466c1acfa1b78491c1f94f4e0d1f to your computer and use it in GitHub Desktop.
fraud graph

To store graph insights from banking fraud detection (covering IDT, SID, ATO) for downstream investigation and mitigation, design a table schema that captures algorithm output, network features, entity context, and traceability links. The schema should support storage of graph metrics, community and cluster data, and allow for extensibility (such as JSON fields) to accommodate evolving analytics.

Recommended Table Schema

Column Name Data Type Description
insight_id UUID / PK Unique identifier for each insight
entity_id VARCHAR Primary node/entity reference (user/account/device/etc.)
entity_type VARCHAR Type of entity (user, account, device, etc.)
insight_type VARCHAR Algorithm/category: e.g., 'IDT', 'SID', 'ATO', 'centrality', 'path', 'community', 'anomaly'
algorithm_used VARCHAR Algorithm name: e.g., 'pagerank', 'shortest_path', 'louvain', 'similarity'
score FLOAT Numeric metric: centrality score, path score, similarity, cluster density, etc.
related_entities JSON List of referenced node/entity IDs involved in the detected structure or algorithm result
cluster_id VARCHAR Community or cluster label for the entity (if relevant, else NULL)
path_details JSON Traversal path data: ordered entity IDs, hops, and relationships (optional)
features JSON Computed features: e.g., {in_degree, out_degree, num_neighbors, avg_path_length}
insight_timestamp TIMESTAMP When the algorithm detected/stored this insight
raw_algorithm_output JSON Raw/full output dump of algorithm if complex or needed for audit (optional, for traceability)
investigation_status VARCHAR Enum: 'open', 'reviewed', 'escalated', 'resolved'
notes TEXT Analyst comments or contextual notes on the insight

Important Design Details

  • related_entities and path_details fields allow efficient representation of non-tabular graph structures for path finding, ring detection, or fraud rings as input for further queries or visualization.12
  • features field is extensible for per-entity graph features (centrality, anomaly score, local subgraph stats, etc.) and allows downstream teams to adapt schema for new metrics.34
  • raw_algorithm_output captures full logs or intermediate results if future transparency, explainability, or deep-dive investigation is necessary.4
  • entity_type and insight_type fields let investigators quickly filter and pivot between types of entities (accounts, devices, IPs) and insight categories (fraud ring, outlier, high-risk link, etc.).5
  • cluster_id and score fields facilitate clustering/community analysis, reflecting algorithm output for fraud rings and collusive networks.34
  • Use JSON fields generously for algorithm details, path data, and computed features to ensure schema flexibility and future-proofing, while keeping core indexes and keys as first-class columns for query/index performance and reporting.64

Example Record (in JSON)

{
  "insight_id": "d2f7e86c-259c-4f90-905c-1fae9db1df47",
  "entity_id": "acct_12345678",
  "entity_type": "account",
  "insight_type": "ATO_centrality",
  "algorithm_used": "pagerank",
  "score": 0.864,
  "related_entities": ["user_221", "device_54a", "acct_8765123"],
  "cluster_id": "community_45",
  "path_details": {
      "path": ["acct_12345678", "device_54a", "acct_8765123"],
      "relations": ["accessed_by", "accessed_by"]
  },
  "features": {
      "in_degree": 5,
      "out_degree": 2,
      "local_cluster_density": 0.78
  },
  "insight_timestamp": "2025-09-05T10:05:11Z",
  "raw_algorithm_output": {"scores": {"acct_12345678": 0.864, ...}},
  "investigation_status": "open",
  "notes": "Flagged for further review: unusually high centrality and connection to known fraud ring."
}

This schema, mixing indexed and JSON fields, gives downstream teams structured, detailed, and context-rich graph insights for investigation and mitigation.643 7891011121314151617181920

Footnotes

  1. https://go.neo4j.com/rs/710-RRC-335/images/Neo4j_WP-Fraud-Detection-with-Graph-Databases.pdf

  2. https://www.nebula-graph.io/posts/fraud-detection-with-graph-analytics

  3. https://www.tigergraph.com/blog/uncovering-financial-fraud-how-graph-technology-transforms-detection-in-banking/ 2 3

  4. https://go.neo4j.com/rs/710-RRC-335/images/Neo4j-Financial-Fraud-Detection-GDS-white-paper-EN-A4.pdf 2 3 4 5

  5. https://www.tigergraph.com/glossary/fraud-detection-with-graph/

  6. https://www.puppygraph.com/blog/fraud-graph 2

  7. https://www.tigergraph.com.cn/wp-content/uploads/2021/07/TigerGraph-Fraud-Solution-EN-Version.pdf

  8. https://neo4j.com/blog/developer/exploring-fraud-detection-neo4j-graph-data-science-part-1/

  9. https://aws.amazon.com/blogs/database/use-graph-machine-learning-to-detect-fraud-with-amazon-neptune-analytics-and-graphstorm/

  10. https://www.sciencedirect.com/science/article/pii/S0167923620300580

  11. https://www.dataversity.net/the-power-of-graph-databases-to-detect-fraud/

  12. https://www.formica.ai/blog/detecting-anomalies-with-fraud-graphs

  13. https://www.minfytech.com/blogs/graph-neural-networks-for-financial-fraud-detection

  14. https://blogs.oracle.com/database/post/graph-database-use-cases-for-financial-services-companies

  15. https://cambridge-intelligence.com/use-cases/fraud/

  16. https://resources.nvidia.com/en-us-financial-services-industry/optimizing-fraud-det

  17. https://neo4j.com/use-cases/fraud-detection/

  18. https://www.graphable.ai/blog/graph-database-fraud-detection/

  19. https://cacm.acm.org/blogcacm/leveraging-graph-databases-for-fraud-detection-in-financial-systems/

  20. https://www.yfiles.com/resources/how-to/visualizing-first-party-bank-fraud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment