BDNav
At CloudZA, we specialize in crafting tailored cloud solutions that drive operational efficiency and innovation. Our collaboration with BDNav, a leading provider of analytics for retail stores, exemplifies our commitment to delivering scalable, cost-effective, and real-time data platforms.
Understanding BDNav's Challenges
BDNav's mission is to empower retail businesses with actionable insights derived from vast amounts of data. However, their existing data infrastructure was predominantly batch-oriented, leading to several challenges:
- Delayed Data Availability:
Batch processing introduced significant lags, delaying critical business insights.
- High Operational Costs:
The reliance on scripting for ETL processes resulted in substantial compute expenses due to processing requirements.
- Data Integrity Concers:
The absence of primary keys in their SQL Server databases complicated upsert operations, risking data duplication and inconsistency.
- Scalability Limitations:
As data volumes grew, the existing architecture struggled to scale efficiently.
Crafting the Solution
To address these challenges, CloudZA designedd and implemented a real-time, cost-efficient data ingestion pipeline leveraging AWS services and open-source technologies:
- Real-Time Data Streaming:
We deployed DMS to capture Change Data Capture (CDC) events from BDNav's SQL Server databases. This setup enabled the streaming of real-time data changes directly into Amazon S3.
- Handling Absence of Primary Keys:
For tables lacking natural primary keys, we implemented a surrogate key strategy. This approach facilitated efficient upsert operations, ensuring data consistency and integrity.
- Scalable and Self-Healing Infrastructure:
By utilising DMS Managed Service, we enabled a fault-tolerant, highly available environment, reducing the need for manual intervention and maintenance.
- Performance Optimisation:
Our team fine-tuned the Debezuim Server Iceberg configuration, achieving and ingestion rate of approximately 1 million rows per batch every 3 minutes.
- Monitoring and Observability:
We integrated Amazon CloudWatch for comprehensive monitoring, setting up metrics and alerts to track ingestion performance, resource utilisation and system
Lifecycle Stages of Implementation
- 1. Ingestion and Streaming:
Real-time data capture from SQL Server using Debezium Server Iceberg, with offset management to maintain data consistency.
- 2. Teansformation and Upsertion:
Application of surrogate keys and direct writing into Iceberg tables on S3, eliminating the need for intermediate ETL layers.
- 3. Analysis and Visualisation:
Utilisation of Amazon Athena to query Iceberg tables, providing BDNav's analysts with timely and actionable insights.
- 4. Monitoring and Performance Tuning:
Continuous monitoring via CloudWatch, with automated scaling policies to maintain optimal performance and resource utilisation.
Tools and Services Utilised
AWS Services:
- Amazon S3: Scalable storage for Iceberg tables.
- Amazon Athena: Serverless querying over Iceberg tables.
- Amazon CloudWatch: Monitoring and logging
Open Source Tools:
- DMS: CDC tool for real-time data streaming.
- Apache Iceberg: High-performance table format for large datasets.
Achieved Outcomes
- Cost Efficiency: Eliminated the need for AWS Glue ETL jobs, significantly reducing compute costs
- Real-Time Insights: Enabled data availability within minutes of changes occuring in the source database, empowering BDNav to provide timely analytics to retail clients.
- Operational Simplicity: Simplified architecture with reduced maintenance overhead, allowing BDNav's team to focus on delivering value rather than managing infrastructure
- Scalability: The solution seamlessly scales with data growth, ensuring consistent performance even as data volumes increase.
Conclusion
Through close collaboration with BDNav, CloudZA delivered a robust, real-time data ingestion and analytics platform tailored to the unique challenges of the retail analytics sector. By harnessing the power of AWS and open-source technologies, we transformed BDNav's data operations, enabling them to offer enhanced insights to their retail clients while optimising costs and operationsl efficiency.