Insights from paper: Amazon Redshift Re-invented

11 min readAug 6, 2024

1. Abstract

Amazon Redshift was launched in 2013 by Amazon Web Services (AWS).

It revolutionized the data warehousing industry by offering a fully managed, petabyte-scale, enterprise-grade, and cost-effective cloud data warehouse.

Today, tens of thousands of customers use Redshift in AWS’s global infrastructure to process exabytes of data daily.

Amazon Redshift was created to analyze large volumes of data efficiently. The use cases have evolved over the last few years.

To cater to the needs, Redshift has evolved. It has improved both storage and compute scalability.

Many innovations have been made, such as tiered storage, multicluster auto-scaling, cross-cluster data sharing, and the AQUA query acceleration layer.

As of paper writing (2020), Redshift can integrate with other AWS services. Now it can:

Query data from the data lake
Query with PartiQL
Support federated queries to Amazon Aurora, RDS, and materialized views
Ingest semi-structured data
Ingest streaming data

2. Introduction

Redshift’s development has focused on meeting the following four primary customer needs.

High-performance execution of increasingly complex analytical queries.
Process more data and scale the number of users
Easier to use
Integrate seamlessly with the AWS ecosystem

3. Performance that matters

3.1 Overview

Let’s start with the high-level architecture given in the diagram below.

A Redshift has one coordinator (leader) node and multiple worker (compute) nodes. The data is stored on Redshift Managed Storage, backed by Amazon S3.

The cached data is in the compute nodes on locally attached SSDs in a compressed column-oriented format.

Tables are either replicated on every compute node or partitioned into multiple buckets distributed among all compute nodes.

There are many features present in Redshift. A few important ones are:

Concurrency Scaling: To dynamically scale-out
Data Sharing: To securely and easily share data across Redshift clusters
AQUA layer: For query acceleration using FPGAs
Compilation-As-A-Service: Optimized code generation for query fragments

The data inside the cluster can be accessed by:

JDBC/ODBC connection
Data API

Let’s understand the flow of the query.

The leader node receives the query.
It is parsed, rewritten, and optimized.
Optional local joins are done if required.
Optimized C++ code query binaries are sent to compute nodes.
Data is scanned from locally attached SSDs or hydrated storage.

Redshift’s execution model is optimized for the underlying EC2 hardware. Out-of-the-box, Amazon Redshift delivers up to 3X better price-performance ratio on the untuned 3TB TPC-DS benchmark. If tuned, it has 1.5X better price performance than the second-best cloud data warehouse offering, as shown in the diagram below.

3.2 Introduction to Redshift Code Generation

We have seen in the query plan that Redshift generates C++ code specific to the query plan. The generated code is then compiled, and the binary is shipped to the compute nodes for execution.

Each compiled file is called a segment. It consists of a pipeline of operators called steps. Each segment (and each step within it) is part of the physical query plan.

Example of generated code:

// Loop over the tuples of R.
while (scan_step->has_next()) {
// Get next value for R.key.
    auto field1 = fetcher1.get_next();
    
    // Get next value for R.val.
    auto field2 = fetcher2.get_next();
    
    // Apply predicate R.val < 50.
    if (field2 < constant1) {
        // Hash R.key and probe the hash table.
        size_t h1 = hash(field1) & (hashtable1_size - 1);
        
        for (auto* p1 = hashtable1[h1]; p1 != nullptr; p1 = p1->next) {
            // Evaluate the join condition R.key = S.key.
            if (field1 == p1->field1) {
                sum1 += field2;
            }
        }
    }
}

The above code is generated for the below query:

SELECT sum(R.val) FROM R, S WHERE R.key = S.key AND R.val < 50

The generated code keeps the working set as close to the CPU as possible to maximize performance.

3.3 Vectorized Scans

Redshift added a SIMD-vectorized scan layer to the generated code that accesses the data blocks and evaluates predicates as function calls.

The vectorized scan functions are precompiled, covering all data types and their supported encoding and compression schemes.

3.4 Reducing Memory Stalls with Prefetching

Redshift’s pipelined execution avoids the materialization of intermediate results for the outer stream of joins and aggregates by keeping the intermediate column values in CPU registers.

Redshift keeps a circular buffer in the fastest (L1) CPU cache, prefetches it, and pushes it into the buffer for each tuple that arrives.

An earlier tuple is popped and pushed downstream to the rest of the steps.

Once the buffer is filled, individual tuples are processed by pushing and popping one at a time from the buffer.

3.5 Inline Expression Functions

In Redshift, most string functions are vectorized with SIMD code tailored to that particular function.

Let’s take the LIKE predicates as an example. They use the pcmpestri instruction in Intel CPUs, which allows sub-string matching of up to 16-byte patterns in a single instruction.

3.6 Compilation Service

The query processing engine compiles optimized object files. When the same or similar queries are executed, the compiled segments are reused from the cluster code compilation cache.

The compilation service uses compute and memory resources beyond the Redshift cluster to accelerate query compilation through a scalable and secure architecture.

The compilation service caches the compiled objects off-cluster in an external code cache to serve multiple compute clusters that may need the same query segment.

3.7 CPU-Friendly Encoding

Redshift supports generic byte-oriented compression algorithms such as LZO and ZSTD.

3.8 Adaptive Execution

Redshift’s execution engine takes runtime decisions to boost performance.

It can change the generated code.

It can change the runtime properties.

All this is done on the fly based on execution statistics.

3.9 AQUA for Amazon Redshift

Advanced Query Accelerator (AQUA) is a multi-tenant service that acts as an off-cluster caching layer for RMS. It also acts as a push-down accelerator for complex scans and aggregations.

AQUA caches hot data for clusters on local SSDs.

Redshift identifies applicable scan and aggregation operations and pushes them to AQUA, which processes them against the cached data and returns the results.

3.10 Query Rewriting Framework

Redshift features a novel DSL-based Query Rewriting Framework (QRF). There are a couple of benefits of it, as given below:

It enables the rapid introduction of novel rewritings and optimizations.
It is used for creating scripts for incremental materialized view maintenance.

4. Scaling Storage

4.1 Redshift Managed Storage(RMS)

The RMS is designed for durability of 99.999999999% and availability of 99.99% over a given year across multiple availability zones (AZs).

RMS manages both user data and transaction metadata.

The diagram below shows the key components of RMS extending from in-memory caches to committed data on Amazon S3.

Snapshots of data on Amazon S3 act as logical restore points for the customer.

Redshift supports the restoration of both a complete cluster and specific tables.

RMS accelerates data accesses from S3 by using a prefetching scheme that pulls data blocks into memory and caches them to local SSDs.

RMS-supported Redshift RA3 instances provide up to 16PBs of capacity today.

A table’s data is partitioned into data slices and stored as logical chains of blocks.

Each data block is described by a superblock ( its in-memory block header), an indexing structure. Queries reach the relevant data blocks by scanning the superblock using zone maps.

Transactions are synchronously committed to Amazon S3 by RMS.

The concurrent clusters spun up on demand rely on snapshot isolation.

4.2 Decoupling Metadata from Data

Since metadata is decoupled from data, Elastic Resize and Cross-Instance Restore are possible.

Elastic Resize allows customers to reconfigure their cluster in minutes by adding nodes to improve performance and provide more storage for demanding workloads or removing nodes to save money.

Cross-Instance Restore allows users to restore snapshots taken from a cluster of one instance type to a cluster of different instance types or different numbers of nodes.

4.3 Expand Beyond Local Capacity

Redshift enhances its scalability by expanding beyond local storage capacities using Amazon S3.

The system utilizes local memory and SSDs as caches, which are dynamically adjusted based on current workload demands. This includes using a tiered storage cache, which tracks data access patterns to maintain a working data set locally.

A dynamic buffer cache also keeps the hottest data blocks in memory, significantly boosting performance.

4.4 Incremental Commits

RMS employs an incremental commit protocol that only captures changes made since the last commit.

It reduces both the data footprint and associated costs. This log-based commit approach separates the in-memory structure from the persisted structure.

4.5 Concurrency Control

Redshift implements Multi-version Concurrency Control (MVCC).

Readers are neither blocked nor are blocked.

Writers may only be blocked by other concurrent writers.

Each transaction sees a consistent snapshot of the database established by all committed transactions before its start.

Redshift enforces serializable isolation. Due to this isolation level, there are no data anomalies such as lost updates and read-write skews.

5. Scaling Compute

5.1 Cluster Size Scaling

Amazon Redshift supports Elastic Resize, a feature allowing customers to add or remove compute nodes dynamically.

This lightweight operation primarily involves reassigning data partitions across the nodes without reshuffling the underlying data distribution. This ensures that compute resources are efficiently utilized, decoupling parallelism from data partitions.

Post-resize, data partitions moved to different nodes are rehydrated from Amazon S3, prioritizing the most frequently accessed or ‘hot’ data to maintain consistent performance.

5.2 Concurrency Scaling

To address the need to handle high concurrency workloads, Redshift has introduced Concurrency Scaling.

This feature automatically scales out by adding additional compute clusters when the workload exceeds the capacity of the primary cluster.

Users interact with a single cluster endpoint, and as demand increases, Redshift seamlessly attaches additional clusters to handle the queued queries. These clusters rehydrate data from Amazon S3, ensuring that performance scales linearly with the number of concurrent clients.

The diagram below shows the concurrency improvements Redshift achieved over one year.

Concurrency improvements

5.3 Compute Isolation

Compute Isolation in Redshift allows different clusters to securely share live data across different Redshift clusters and AWS accounts.

This feature enables separate clusters to operate on a single data source, eliminating the complexity of maintaining multiple copies of the data.

The data sharing is managed through a system of metadata requests, ensuring that shared data is accessible even if the producer cluster is paused. Redshift manages all the necessary metadata and IAM policies to facilitate this secure sharing.

6. Automated Tunning

6.1 Automatic Table Optimizations (ATO)

ATO automatically optimizes table properties such as distribution and sort keys based on the workload patterns. These keys are crucial for improving query performance by minimizing data movement and enhancing retrieval efficiency.

ATO continuously analyzes query execution metadata to recommend and apply optimizations. It builds a weighted join graph from all joins in the workload and selects distribution keys that minimize network costs.

Similarly, it recommends sort keys that reduce the volume of data retrieved from disk.

6.2 AutomaticWorkload Management

AutoWLM dynamically manages query concurrency and memory allocation based on real-time workload characteristics. It uses machine learning models to predict query resource requirements and queuing theory to adjust the number of concurrently executing queries.

This system helps maintain optimal resource utilization, balancing query latency and throughput by throttling concurrency when necessary to prevent over-saturation.

6.3 Query Predictor Framework

This framework uses machine learning models to predict query memory consumption and execution time. It trains models on cluster-specific data to adapt to changing workloads quickly.

The predictions help the system allocate resources more effectively, prioritizing short queries and minimizing resource contention.

6.4 Materialized Views

Applications often need to perform complex queries on large tables.

Materialized views (MVs) are helpful in speeding up predictable and repeated queries. Redshift automates the efficient maintenance and use of MVs in three ways:

It incrementally maintains filter, projection, grouping and join in materialized views to reflect changes on base tables.
It can automate the timing of the maintenance.
Redshift users can directly query an MV or rely on Redshift’s sophisticated MV-based auto rewriting to rewrite queries over base tables.

6.5 SmartWarmpools, Gray Failure Detection, and Auto-Remediation

Redshift employs smart warmpools, pre-configured EC2 instances ready to replace failed nodes quickly, minimizing downtime. It also uses machine learning models to forecast the required number of instances in the warmpool, optimizing cost and latency.

For hardware failures, especially gray failures that are hard to detect, Redshift uses outlier detection algorithms to identify sub-performing components and automatically triggers remediation actions.

6.6 Serverless Compute Experience

The team has introduced Redshift Serverless. It relies on algorithms for automated provisioning, sizing, and scaling of Redshift compute resources.

Serverless offers a near-zero touch interface. Customers pay only for the seconds they have queries running.

7. Using the best tool for the job

7.1 Data in Open File Formats in Amazon S3

Redshift can also access data in open file formats in Amazon S3 via a feature called Spectrum.

Redshift Spectrum facilitates exabyte scale analytics of data lakes and is extremely cost-effective.

Spectrum provides massive scale-out processing, performing scans and aggregations of data in Parquet, Text, ORC, and AVRO formats.

To use Spectrum, users must register their external tables in either the Hive Metastore, AWS Glue, or AWS Lake Formation catalog.

7.2 Redshift ML with Amazon Sagemaker

Redshift ML integrates with Amazon SageMaker, enabling data analysts to train and deploy machine learning models using SQL commands directly within Redshift.

The integration simplifies the machine learning workflow by bringing the model to the data, avoiding the need to move data between services.

7.3 OLTP Sources with Federated Query and Glue Elastic Views

AWS has purpose-built OLTP-oriented databases like Aurora Postgres/MySQL, and DynamoDB.

Users often need to analyze this OLTP data with Redshift. Redshift facilitates both

The in-place querying of data found in the OLTP services using Redshift’s Federated Query
The seamless copying and synchronization of data to Redshift, using Glue Elastic Views.

7.4 Redshift’s SUPER Schemaless Processing

Redshift introduces the SUPER data type to handle semistructured data, allowing for efficient JSON data ingestion, storage, and querying.

This feature is particularly useful for handling dynamic and nested data structures without requiring predefined schemas.

7.5 Redshift with Lambda

Redshift supports user-defined functions (UDFs) that leverage AWS Lambda, enabling integration with external services and custom processing.

This feature is helpful for tasks like data enrichment, masking, or migrating legacy code, providing flexibility to extend Redshift’s capabilities.