This research article discusses the challenges and solutions related to optimizing a microservice designed to fetch data from a Data Lake. The microservice runs complex SQL queries that interact with numerous tables, some containing billions of records, which currently take 50 to-300 seconds to execute. This execution time is problematic because the API that utilizes this data must adhere to a strict service level agreement (SLA) of only 200 to 1000 milliseconds.
The article highlights that this issue is not unique to the author’s situation but is common among many companies that need to integrate large-scale data access with microservices. The main point is that microservices, typically lean and efficient by design, struggle to handle massive data directly due to their complexity.
The author discusses implementing a well-considered microservice data platform architecture to address this. This solution aims to allow microservices to meet their stringent SLAs without compromising performance due to the complexities associated with accessing large-scale data.
The article intends to share insights and learnings from the approach and implementation of this architecture, providing a potential roadmap for others facing similar challenges.
Refining the Design: A Simplified Approach to Data Handling
When we started addressing this problem, it became clear that the solution lay somewhere in a microservice data platform architecture. This approach is specifically designed to bridge the gap between lean microservices and large-scale data lakes, ensuring efficient data retrieval while meeting strict SLA requirements.
However, given the vast scope of such an architecture, we wanted to implement a solution tailored to our immediate problem.

The Approach: Data Ingestion Service
The ingestion service is responsible for pulling data from the Data Lake on a scheduled basis. It writes the raw data directly to a Redis cache, which serves as a staging area for further processing and ensures fast data retrieval for downstream services.
- Data Ingestion Service
The ingestion service is responsible for pulling data from the Data Lake on a scheduled basis. It writes the raw data directly to a Redis cache, which serves as a staging area for further processing and ensures fast data retrieval for downstream services.
- Aggregation Service (Optional)
The aggregation service reads the raw data from the temporary table and applies the necessary transformations, aggregations, and other business logic. Once the data is processed, the results are pushed to a Redis cache, ensuring the processed data is quickly and efficiently available to the UI systems.
- Iterative Refinement: Learning from Access Patterns
Initially, the data ingestion and aggregation layers were designed to fetch only the most frequently used data based on initial assumptions about customer needs.
As the system went live, we started monitoring access patterns. This gave us valuable insights into which data was being accessed more often and the specific business requirements driving those requests.
Using this information, we continuously refined the ingestion and aggregation layers. For instance, we began pre-loading additional datasets into the ingestion service and applying the same aggregation logic.
Over time, this approach significantly reduced reliance on fallback mechanisms, leading to improved system performance and a better user experience.
Fallback Mechanism (Cache Manager Service)
The fallback mechanism ensured that the system remained functional even if the cache didn’t contain the required data. In such cases, the service reverted to the old logic, querying the staging table or Data Lake directly. However, this fallback came at the cost of slower response times.

In this case, data sync was not an issue, since most changes took place on a weekly basis, otherwise this would have been a complex, microservice data platform use case that would require CDC (change data capture) Microservice as well.
Author is writing a Change Data Capture Microservice based on Observer pattern. Not sure if it is possible, or if he will have to rely on change data logs. Also read “Zero Trust Security core principle is “never trust, always verify” at
https://journals-times.com/2024/01/30/why-trust-only-zero-trust-in-2024/
Read another article on “Integrating Unity Catalog with real-time, event-driven platforms like Kafka to enable seamless data governance and support real-time use cases” written by the author at https://medium.com/@animesh1997/integrating-unity-catalog-with-real-time-event-driven-platforms-like-kafka-to-enable-seamless-data-4694a3e75305
