Setting Up Cloud Native

A "warehouse" bucket was created for the Lakehouse

Spark was used to consume data from Redpanda and write it as an Iceberg table

A sample dataset was produced for Redpanda

PostgreSQL was also queried using Trino

Sample dashboards were created on Superset using Trino queries

Trino was used to query the Iceberg table and Redpanda topic

Prometheus and Grafana were configured to monitor the Kubernetes cluster

Master the Power of Containers: Kubernetes Cluster Setup and Administration


  • High costs associated with traditional data lake solutions
  • Vendor lock-in and limited flexibility
  • Migrating existing big data workloads to a new platform


We successfully built a Lakehouse using 100% open-source tools, achieving the following:

  • Kubernetes and Ceph Storage Cluster: Deployed and validated a Kubernetes and Ceph Storage Cluster using open-source tools.
  • Hadoop Cluster Replacement: Verified the functionality of the Lakehouse by running existing big data workloads on the Hadoop cluster.
  • Open-Source Lakehouse: Demonstrated the feasibility of creating a Lakehouse with 100% open-source software, eliminating license costs.


No upfront license fees or ongoing subscription costs

Vendor neutrality:

Avoid vendor lock-in and maintain control over your data

Open-source flexibility:

Freedom to choose the best tools for your specific needs


Easily adapt to new technologies and changing requirements


This project demonstrates the viability of building a powerful and cost-effective Lakehouse using 100% open-source tools.

By leveraging open-source technologies, organizations can achieve greater flexibility, control, and cost savings while future-proofing their data infrastructure.


Partner with Us

Contact us today to discuss your specific needs and explore how we can empower your business with insightful, cost-effective data solutions.

Unleash the power of your data with a custom-built, open-source big data platform.