Business Need
Our client, a renowned financial services provider, faced the challenge of managing disparate data silos (customer, credit application, account, marketing, dealer, etc.) spread across on-premises databases.
They sought to consolidate this data into a single platform for reporting and analytics. Furthermore, they required advanced data tracking capabilities throughout the data’s lifecycle.
Business Challenges
The challenges our client faced were multifaceted. First and foremost, their data was fragmented across various on-premises databases, making data access and integration cumbersome. What exacerbated the situation was their outdated data pipelines, originally built using VB script more than two decades ago. These legacy pipelines caused significant delays in generating critical business reports, impacting timely decision-making. Furthermore, the client grappled with data quality issues, necessitating manual data corrections before reports could be published. The absence of interactive loan dashboards and limited capabilities to work with diverse data perspectives added complexity to their data management.
Business Solution:
To address the above challenges comprehensively, NuSummit embarked on a strategic data modernization initiative. The core components of our solution included:
- AWS Cloud Migration: We orchestrated the migration of the client’s data from on-premises servers to the AWS cloud, leveraging the powerful Databricks environment. This transition allowed for data refresh every two hours, ensuring the delivery of timely and accurate business reports.
- Delta Lake Implementation: To streamline data processing and storage, we constructed a delta lake, providing a solid foundation for data management.
- Orchestration Scheduler: A scheduler was designed and implemented to orchestrate jobs, enabling seamless data flow and execution.
- Robust Data Ingestion Pipeline: NuSummit created a robust data ingestion pipeline that facilitated data movement from source systems to presentation/business layers. This pipeline included row-level data validation to maintain data integrity. Data Validation Pipeline: A dedicated data validation pipeline was established to verify data based on business aggregations before report generation.
- CI/CD Pipeline Development: Our team built a CI/CD pipeline from the ground up, streamlining infrastructure deployment and code management.
- GitHub Template Creation: We introduced a GitHub template to consolidate all infrastructure and ETL code into a single repository, simplifying code management and collaboration.
- Business Alerts and Enhanced Analytics: The solution incorporated intelligent business alerts for identifying potentially problematic scenarios in real-time. Furthermore, we implemented associative analytics, providing decision-makers with valuable insights for informed choices.
Project Differentiator:
The Enterprise Data Lake, serving as a unified data platform, successfully transformed siloed data into a valuable resource for analytics and reporting. The project introduced innovative data tracking technologies across the data lifecycle, ensuring data integrity and usability.
Tech Stack:
- AWS (DMS, Kinesis, SNS, SQS, Lambda,
- Dynamo DB, S3, SSM, KMS)
- Databricks
- Delta Lake
- QuerySurge
- Attunity
- Jenkins
- Terraform
- GitHub
- Tableau