Open Data Lakehouse

Cookie Notice

This site uses cookies for performance, analytics, personalization and advertising purposes.

For more information about how we use cookies please see our Cookie Policy.

Manage Consent Preferences

Essential/Strictly Necessary Cookies

Required

These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.

Analytical/ Performance Cookies

These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages. This helps us to improve the way the website works and allows us to test different ideas on the site.

Functional/ Preference Cookies

These cookies allow our website to properly function and in particular will allow you to use its more personal features.

Targeting/ Advertising Cookies

These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites. You should check the relevant third party website for more information and how to opt out, as described below.

Companies building an open data lakehouse on Starburst

Taking a new approach with a data lakehouse

Data lakes promised a cost-effective, scalable storage solution but lacked critical features around data reliability, governance, and performance. And legacy lakes required data to be landed in their proprietary systems before you could extract value.

Enter the open data lakehouse.

Anatomy of an open data lakehouse

The open data lakehouse is a cost-effective, performant, and future data architecture that is built on an open foundation:

A single point of access and governance for all data in and around the data lake
Modern table formats provide advanced warehouse-like capabilities directly on the lake
Built on commodity storage and compute, which means you can scale up and down in a cost effective way

Comparing a Data Lake vs. Data Lakehouse

The open data lakehouse overcomes the limitations of legacy lakes, because it’s built with the understanding that center of gravity does not mean a single source of truth. It works with your other data sources in an open, scalable manner – creating a single, open system to access and govern the data in and around your lake.

Legacy Data Lake

Open Data Lakehouse

Access

Limited to the data lake

Universal access to data in and around the lake

Table Formats

Limited to a single format (e.g. file formats in Hadoop)

Support for all modern formats Iceberg, Delta Lake, Hudi

Scalability

Medium

High

Performance

Low

High

Cost

$ (can be expensive with proprietary vendors)

Use Cases

Raw data storage, ML

BI, SQL, ML, Real-Time Apps

Reliability

Low quality, data swamp

High-quality, reliable data with ACID transactions

Governance

Poor governance because security needs to be applied to files

Fine-grained security and governance for row/columnar level for tables

Real World Data Lakehouse Success Stories

Hundreds of the most data-driven companies on the planet, including Grubhub, Verizon, and Lucid, chose Starburst to break down data silos and increase  time-to-insight.

Accelerating data discovery

CHALLENGE

With a multitude of databases and data platforms, Genus’ data engineers were burdened by complex ETL pipelines that took weeks to run.

SOLUTION

Time-to-insight was accelerated by 75% after turning to Starburst to query data directly from Genus’ data lakes (in Amazon S3 and ADLS).

Read Full Case Study

“With Starburst, we have accelerated data discovery, simplified data pipelines, and have a unified query layer across all data sources. These three points are critical to what we do.”

Patrice Linel

Senior Manager of Data Science & Data Engineering, Genus

Upgrading to Amazon S3

CHALLENGE

Transitioning from a legacy data warehouse to an AWS cloud data lake proved challenging without a fast and reliable way to query its distributed data.

SOLUTION

Having a powerful data lake analytics engine allows Zalando to accomplish its Customer 360 program, which increases wallet share and improves buyer recommendations.

Read Full Case Study

“The decision to deploy Starburst Enterprise was made simpler because it has proven to be a reliable, fast, and stable query engine for S3 data lakes.”

Alberto Miorin

Engineering Lead, Zalando

Democratizing data lake access

CHALLENGE

Requests for data sets took hours, and sometimes days, to fulfill and required lots of movement between zones in the data lake.

SOLUTION

Time-to-insight was reduced from days to seconds by using Starburst to explore near real-time data on and around Banco Inter's data lake.

Read full case study

"Starburst gives us a single platform to explore more data, maintain data quality and governance, and provide data to our employees using their visualization tools of choice."

André Gortari

Data Engineering Manager, Banco Inter

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

Query your data lake fast with Starburst's best-in-class MPP SQL query engine
Get up and running in less than 5 minutes
Easily deploy clusters in AWS, Azure and Google Cloud

For more deployment options:

Download Starburst Enterprise

Essential/Strictly Necessary Cookies

Analytical/ Performance Cookies

Functional/ Preference Cookies

Targeting/ Advertising Cookies

By Use Cases

By Industry

Documentation

Connect

Education

Sign up for Datanova 2024