What Carousell learned about scaling BI in the cloud

As businesses like Carousell continue to shift their reporting processes to cloud data platforms, a bottleneck is emerging within their business intelligence setups. What used to be efficient dashboards at a smaller scale are now slowing down, queries are taking longer to execute, and minor schema errors are causing disruptions in reports. This situation has led teams to navigate between the need for stable executive metrics and the flexibility required for analysts to explore data.

This tension is a common occurrence in cloud analytics environments, where business intelligence tools are expected to cater to both operational reporting and in-depth analysis. This often results in a single environment taking on multiple roles – serving as a presentation layer, a modeling engine, and an ad-hoc compute system simultaneously.

A recent shift in architecture at Southeast Asian marketplace Carousell sheds light on how some analytics teams are addressing this challenge. Insights from the company’s analytics engineers reveal a move towards a split design that separates performance-critical reporting from exploratory workloads. While this example is specific to Carousell, it reflects a larger issue observed in cloud data stacks.

The issue arises when modern BI tools allow teams to define logic within the reporting layer, shifting compute pressure away from optimized databases and into the visualization tier. At Carousell, engineers faced challenges with large datasets in their analytical “Explores,” which sometimes reached hundreds of terabytes in size. The dynamic execution of joins within the BI layer led to slow query times and increased compute load, impacting the overall performance.

To address these challenges, Carousell engineers made a significant change by transferring heavy transformations upstream to BigQuery pipelines, where database engines are better equipped to handle large joins. They also split their BI instances into two environments – one dedicated to pre-aggregated executive dashboards and weekly reporting, and the other for exploratory analysis. This separation of responsibilities helps maintain predictable performance and reduces the risk of performance degradation in critical workflows.

In addition to performance improvements, the new environment at Carousell introduced stronger release controls and governance rules to ensure data accuracy and consistency. Automated checks and validation tools were implemented to catch errors before they impact production models, enhancing stability and reducing the need for constant troubleshooting.

Overall, the redesign at Carousell resulted in significant performance gains, with query times dropping from over 40 seconds to under 10 seconds. By separating presentation, transformation, and experimentation workloads, the analytics team was able to create a more reliable and efficient reporting environment.

For teams scaling their analytics stacks, the key lies in defining architectural boundaries and determining which workloads are best suited for the warehouse versus the BI layer. By following best practices and implementing governance controls, businesses can optimize their cloud analytics environments for better performance and scalability.