5 Billion to 1: Data Engineering for Fraud Analysis

5 Billion to 1: Data Engineering for Fraud Analysis

In this case study, the SME Team addresses a bank's fraud analysis need by utilizing Azure Data Factory and Snowflake to engineer a solution that improved their time to insight for potential fraud. 

Fraud analytics is one of the 3 top big data use cases in financial services.

XYZ Bank processes about 60 million transactions a day and they need to continuously integrate the new data with historical data, consisting of about 5 billion rows of transactions every 90 days. In order to better predict credit/debit card fraud, analysis must be done on this massive amount of data. The below contains the challenges, goals, initiatives, and impacts for the fraud analytics case study. 
Fraud analytics in action.


  • The bank is facing extreme difficulty with effectively accessing and leveraging over 5 billion rows of credit and debit card transactions in their transactional database.
  • They need this transactional data to be integrated into the data warehouse while also being archived in the data lake.
  • They want to use Power BI for analysis but they cannot connect Power BI directly to entire data set because...
    • If they try to load the data into the report, the memory limit is reached and the report fails to reload.
    • Using direct query requires the largest warehouse that would increase their costs drastically if used throughout the day.


The bank partnered with SME to architect a solution that would:

  • Leverage 3 months of data to more accurately capture attempted fraudulent transactions.
  • Set the bank up for improved accuracy when they shift to the predictive use case: fraud detection.
  • Improve analytical report performance, speed, and reliability by connecting to curated data marts.


In order to provide a solution that met the bank's goals, SME assessed their current data ecosystem. From the assessment, a plan was constructed and implemented. These were the key steps our technical team performed for XYZ Bank:

  • Use Azure Data Factory to move data from the transactional database, Azure SQL, to Azure Data Lake Storage
  • Use Azure Data Factory to partition the data, store in the data lake, and load into Snowflake
  • Perform complex data transformations within Snowflake to prepare a data mart for analysis
  • Direct query the transformed data mart inside Snowflake with Power BI


Following the deployment of the solution, it was determined that not only were the goals met but exceeded them by providing real business impacts:
  • Actionable insights unearthed after visualizing outliers in the data
  • Drilled down from 5 billion records to one record, identifying which accounts have unusual spend activity
  • Significantly reduced analytical warehousing costs
  • Improved speed to insight



SME's George Barrett was the lead Solutions Engineer on the project. After scrubbing and anonymizing the data, he led a webinar that went into detail on this case study and includes an end-to-end demo of “5 Billion to 1: Data Engineering for Fraud Analysis”. 

Watch Now.



Related Articles

REPORT: Simplify Data Architecture for Faster Insights

December 17, 2021
Organizations face increasing competition and compressed time frames that require intelligent use of all available...

Journey to the Cloud: 7 Tips for Selecting the Right Strategy & Tools

January 9, 2023
Moving your data to the cloud can help manage costs and increase agility. In the cloud, you can scale up or down as...

Inventory Analytics for Data-Driven Inventory Management

August 2, 2021
With vast amounts of data being constantly pulled from multiple data sources, it can be hard to accurately predict and...

Get Started Today