Why AWS is investing in a zero-ETL future

Swami Sivasubramanian

VP, AI and Data

Published Jul 6, 2023

Data is at the center of every application, process, and business decision. When data is used to improve customer experiences and drive innovation, it can lead to business growth. According to Forrester, advanced insights-driven businesses are 8.5 times more likely than beginners to report at least 20% revenue growth. However, to realize this growth, managing and preparing the data for analysis has to get easier.

That’s why AWS is investing in a zero-ETL future so that builders can focus more on creating value from data, instead of preparing data for analysis.

Challenges with ETL

What is ETL? Extract, Transform, Load is the process data engineers use to combine data from different sources. ETL can be challenging, time-consuming, and costly.

It requires data engineers to create custom code.
DevOps engineers have to deploy and manage the infrastructure to make sure the pipelines scale with the workload. In case the data sources change, data engineers have to manually make changes in their code and deploy it again.
While this is happening data analysts can’t run interactive analysis or build dashboards, data scientists can’t build machine learning (ML) models or run predictions, and end-users can’t make data-driven decisions.

The time required to build or change pipelines makes the data unfit for near-real-time use cases such as detecting fraudulent transactions, placing online ads, and real-time supply chain analysis. In these scenarios, the opportunity to improve customer experiences, address new business opportunities, or lower business risks can simply be lost.

AWS is bringing its zero-ETL vision to life

Zero-ETL makes data available to data engineers at the point of use through direct integrations between services and direct querying across a variety of data stores. This frees the data engineers to focus on creating value from the data, instead of spending time and resources building pipelines.

We have been making steady progress towards bringing our zero-ETL vision to life so organizations can quickly and easily connect to and act on their data. Here are just two examples:

With Amazon Redshift Streaming Ingestion, organizations can configure Amazon Redshift to directly ingest high-throughput streaming data from Amazon Managed Streaming for Apache Kafka (Amazon MSK) or Amazon Kinesis Data Streams and make it available for near-real-time analytics in just a few seconds. Customers can connect to multiple data streams and pull data directly into Amazon Redshift without staging it in Amazon Simple Storage Service (Amazon S3).
Using federated query in Amazon Redshift and Amazon Athena, organizations can run queries across data stored in their operational databases, data warehouses, and data lakes so that they can create insights from across multiple data sources with no data movement. Data analysts and data engineers can use familiar SQL commands to join data across several data sources for quick analysis, and store the results in Amazon S3 for subsequent use.

And just last week we announced the public preview of Aurora zero-ETL integration with Amazon Redshift, to enable near real-time analytics and machine learning (ML) using Amazon Redshift on petabytes of transactional data from Aurora. With this launch, customers can ingest data in hundreds of thousands of transactions every minute in Aurora and can still analyze them real-time in Redshift without having to build labor intensive and expensive ETL pipelines. To learn more read my full blog.

When organizations can quickly and seamlessly integrate data that is stored and analyzed in different tools and systems, they can make data-driven predictions with more confidence, improve customer experiences, and promote data-driven insights across the business.

Julian Frank

Sr. Solution Architect & GM - Amazon EBU

12mo

"Zero-ETL" definitely sounds good

Chris Ebert

Principal Software Engineer @ Tyler Technologies | AWS Community Builder

What a great vision statement

Adrian Brudaru

Open source pipelines - dlthub.com

None of these are really zero ETL solutions, they are just product features that assist with the ETL or use something that was already ETLd. A zero ETL future doesn't mean 99/100 ETLs still running because you can solve one on your cloud. Zero ETL can only be a marketing misnomer. However, dlt gets as close as possible to simplifying EL in open source, tech agnostic. Works with redshift or s3 too, but also with google cloud, snowflake, parquet files in storage, etc. dlt library is an open source data loading solution like nothing before - with built in schema inference and evolution, and scalable extraction, this makes data loading as simple as can be https://dlthub.com/docs/getting-started/try-in-colab https://dlthub.com/docs/reference/explainers/schema-evolution

5 Reactions

Gourav Sengupta

Head - Data Engineering, Quality, Operations, and Knowledge

Because Swami Sivasubramanian the coders in AWS perhaps understand very little beyond the full form of acronym of ETL? Amazon does not have a single coherent solution for enterprise data architecture. If AWS is interested to know what ETL stands for beyond its abbreviations and implied technicalities let me know. Only a person with very little knowledge would argue that throwing millions of pen and paper will make a country educated, just like pushing more data makes a company suddenly intelligent.

1 Reaction

Steve Wilshaw

AWS Cloud Superfan!

Sounds fabulous but sadly I am not seeing much progress with the functionality and reliability of Databrew. Feels like an abandoned product unfortunately which is a shame as it has huge untapped potential.

See more comments

To view or add a comment, sign in

Chief Data Officer Insights on Generative AI and Data Strategy

Nov 17, 2023

Why AWS is investing in a zero-ETL future

Swami Sivasubramanian

VP, AI and Data

More articles by this author

Insights from the community

Others also viewed

Build and automate a serverless data lake using an AWS Glue trigger for the Data Catalog and ETL jobs

AWS data processing services and planning

AWS Glue In 2023

Data Engineering on AWS

AWS Glue

AWS GLUE

AWS Glue

AWS Glue

An Introduction to Data Lakes and How They Can be Used on AWS

How to Choose the Right Data Ingestion Service: AWS, Azure, GCP

Explore topics

Chief Data Officer Insights on Generative AI and Data Strategy

Nov 17, 2023

Insights from the community

Others also viewed

Build and automate a serverless data lake using an AWS Glue trigger for the Data Catalog and ETL jobs

AWS data processing services and planning

AWS Glue In 2023

Data Engineering on AWS

AWS Glue

AWS GLUE

AWS Glue

AWS Glue

An Introduction to Data Lakes and How They Can be Used on AWS

How to Choose the Right Data Ingestion Service: AWS, Azure, GCP

Explore topics