This week, Amazon Web Services introduced new integrations with its Amazon Aurora PostgreSQL and Amazon DynamoDB database management services that allow them to share data with the Amazon Redshift data warehouse services, without the need to set up ETL (Extract, Transform and Load) workflows between them.
AWS calls this no code service “Zero ETL” and shortcuts a typically onerous process for the admin, to create ETL pipelines from the database that records the transactions and the data warehouse that keeps the data for later analysis by the user’s own applications and queries.
Once data hits either database, it shows up in RedShift seconds later, according to the cloud giant, offering customers potential near real-time data analysis.
The AWS service will copy data from the source database, either by specified tables or for the whole database, to the Redshift warehouse and monitor the health of the pipeline itself.
Redshift can query and create materialized views across multiple tables.
“Using these new zero-ETL integrations, you can run unified analytics on your data from different applications without having to build and manage different data pipelines to write data from multiple relational and non-relational data sources into a single data warehouse,” wrote AWS Senior Solutions Architect Esra Kayabali, in a blog post describing the integrations.

AWS Zero integration with Aurora.
Database Integrations
Amazon Aurora is a managed service of the PostgreSQL database system with high performance and reliability guarantees (Aurora zero integration also supports the MySQL flavor of Aurora).
The customer’s Aurora cluster must be in the same availability zone as the Redshift instance, and the customers must use Aurora 16.4 or above. Two-phase transactions are not supported.
Data filtering defines the scope of replication. You can copy the whole database or only certain tables (you can’t filter by columns or rows). AWS processes the data filters in the order in which they are entered.
All the data goes to the Redshift cloud data warehouse, which offers the ability run SQL queries across structured and unstructured data, which can be used to create reports and forecasts, or ingested or studied by AI-based processes. The Redshift cluster being used must be patched to 180 or above.

Screenshot: AWS Aurora data in Redshift.
Amazon DynamoDB is a low-latency, serverless NoSQL data store service. It is often used with a caching service for an even greater performance boost, as well as with AWS Lamba Triggers, which kick off serverless functions when new data is entered. DynamoDB also has a zero integration package with the Amazon OpenSearch Service.
With this integration, data from DynamoDB is replicated in a Redshift SUPER datatype column and accessed via PartiQL sql.

DynamoDB data in Redshift.
Cost and Availability
AWS does not charge for the workflow processing zero-ETL integrations. Customers, however, are still responsible for the costs of storing replicated data in the data warehouse (or RPUs on Amazon Redshift Serverless) and any cross-zone data transfers. Visit the Amazon Aurora and Amazon DynamoDB pricing pages for the database services.
Each AWS account is limited to 100 integrations and 50 integrations per data warehouse.
Aurora PostgreSQL zero-ETL integration with Amazon Redshift in a subset of AWS regions across the U.S., Asia and Europe. Amazon DynamoDB zero-ETL integration with Amazon Redshift is now available in all commercial, China and GovCloud AWS Regions.
The post AWS Makes ETL Disappear for Aurora PostgreSQL, DynamoDB appeared first on The New Stack.
AWS' "Zero ETL" program for RedShift shortcuts the typically onerous process of creating and maintaining ETL pipelines.