0% found this document useful (0 votes)

528 views51 pages

Getting Started With Amazon Redshift

Uploaded by

rohit kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

528 views51 pages

Getting Started With Amazon Redshift

Uploaded by

rohit kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Getting Started with

Amazon Redshift

Maor Kleider, Sr. Product Manager, Amazon Redshift

• Introduction
• Benefits
• Use cases
• Getting started
• Q&A
What is Big Data?

When your data sets become so large and diverse

that you have to start innovating around how to
collect, store, process, analyze and share them
It’s never been easier to generate vast amounts of data

Generate

Individual AWS customers Collect & Store

generate over a PB/day

Analyze

Collaborate & Act

Amazon S3 lets you collect and store all this data

Generate

Store exabytes of
Individual AWS customers Collect & Store
data in S3
generating over PB/day

Analyze

Collaborate & Act

But how do you analyze it?

Generate

Store exabytes of
Individual AWS customers Collect & Store
data in S3
generating over PB/day

Highly
Analyze
Constrained

Collaborate & Act

The Dark Data Problem
Most generated data is unavailable for analysis
Data Volume

Generated Data
Available for Analysis

Year
1990 2000 2010 2020
Sources:
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
AWS Big Data Portfolio
Collect Store Analyze

Amazon Kinesis AWS Direct

Amazon S3 Amazon Glacier Amazon EMR Amazon EC2
Firehose Connect

Amazon Kinesis Amazon Amazon Amazon RDS, Amazon Athena

Amazon
Analytics Snowball Dynamo DB Amazon Aurora Redshift Athena

Amazon Kinesis Amazon Amazon Amazon Amazon Machine

Streams CloudSearch Elasticsearch QuickSight Learning

AWS Database Migration Service AWS

AWSGlue
Glue
Amazon Redshift

shift
Fast, simple, petabyte-scale data warehousing for $1,000/TB/Year

150+ features
a lot faster
a lot simpler
a lot cheaper

Relational data warehouse

Massively parallel; petabyte scale

Amazon Fully managed

Redshift HDD and SSD platforms
$1,000/TB/year; starts at $0.25/hour
Selected Amazon Redshift customers
Use Case: Traditional Data Warehousing

Business Advanced pipelines Secure and Bulk Loads

Reporting and queries Compliant and Updates

Easy Migration – Point & Click using AWS Database Migration Service
Secure & Compliant – End-to-End Encryption. SOC 1/2/3, PCI-DSS, HIPAA and FedRAMP compliant
Large Ecosystem – Variety of cloud and on-premises BI and ETL tools

Japanese Mobile World’s Largest Children’s Powering 100 marketplaces

Phone Provider Book Publisher in 50 countries
Use Case: Log Analysis

Log & Machine Clickstream Time-Series

IOT Data Events Data Data

Cheap – Analyze large volumes of data cost-effectively

Fast – Massively Parallel Processing (MPP) and columnar architecture for fast queries and parallel loads
Near real-time – Micro-batch loading and Amazon Kinesis Firehose for near-real time analytics

Interactive data analysis and Ride analytics for pricing Ad prediction and
recommendation engine and product development on-demand analytics
Use Case: Business Applications

Multi-Tenant BI Back-end Analytics as a

Applications services Service

Fully Managed – Provisioning, backups, upgrades, security, compression all come built-in so you can
focus on your business applications
Ease of Chargeback – Pay as you go, add clusters as needed. A few big common clusters, several
data marts
Service Oriented Architecture – Integrated with other AWS services. Easy to plug into your pipeline

Infosys Information Analytics-as-a- Product and Consumer

Platform (IIP) Service Analytics
Amazon Redshift architecture
Leader node
Simple SQL endpoint BI tools Analytics tools SQL clients

Stores metadata JDBC/ODBC

Optimizes query plan
Coordinates query execution

Compute nodes
Leader node
Local columnar storage
10 GigE
Parallel/distributed execution of all queries, loads, (HPC)
backups, restores, resizes

Start at just $0.25/hour, grow to 2 PB (compressed) Compute node Compute node Compute node
DC1: SSD; scale from 160 GB to 326 TB
Ingestion
DS2: HDD; scale from 2 TB to 2 PB Backup
Restore

Amazon S3 Amazon EMR Amazon Dynamo DB SSH

Benefit #1: Amazon Redshift is fast
analyze compression listing;

Dramatically less I/O Table | Column | Encoding

Direct-attached storage 10 10 | 13 | 14 | 26 |…

324 … | 100 | 245 | 324

Large data block sizes 375 375 | 393 | 417…

623 … 512 | 549 | 623

637 637 | 712 | 809 …

959 … | 834 | 921 | 959

Benefit #1: Amazon Redshift is fast

Parallel and distributed

Query

Load

Export

Backup

Restore

Resize
Benefit #1: Amazon Redshift is fast

Hardware optimized for I/O intensive workloads, 4 GB/sec/node

Enhanced networking, over 1 million packets/sec/node

Choice of storage type, instance size

Regular cadence of auto-patched improvements

Benefit #1: Amazon Redshift is fast

“Did I mention that it’s ridiculously fast? We’re using “After investigating Redshift, Snowflake, and
it to provide our analysts with an alternative to Hadoop” BigQuery, we found that Redshift offers top-of-the-
line performance at best-in-market price points”

“On our previous big data warehouse system, it took

around 45 minutes to run a query against a year of
“…[Redshift] performance has blown away everyone
data, but that number went down to just 25 seconds
here. We generally see 50-100X speedup over Hive”
using Amazon Redshift”

“We regularly process multibillion row datasets “We saw a 2X performance improvement on a wide
and we do that in a matter of hours. We are heading variety of workloads. The more complex the queries,
to up to 10 times more data volumes in the next couple the higher the performance improvement”
of years, easily
And has gotten faster...

5X Query throughput improvement over the past year

 Memory allocation (launched)
 Improved commit and I/O logic (launched)
 Queue hopping (launched)
Fast
 Query monitoring rules (launched)

10X Vacuuming performance improvement

 Ensures data is sorted for efficient and fast I/O
 Reclaims space from deleted rows
 Enhanced vacuum performance leads to better system throughput

Efficient
The life of a query
Client Amazon Redshift Cluster

2 3
BI tools
Compute node

1
Queue 1

Analytics tools
Queue 2
Compute node

Leader node

SQL clients
Compute node
Query monitoring rules
• Allows automatic handling of runaway (poorly written) queries

• Metrics with operators and values (e.g. query_cpu_time > 1000) create a predicate

• Multiple predicates can be AND-ed together to create a rule

• Multiple rules can be defined for a queue in WLM. These rules are OR-ed together

If { rule } then [action]

{ rule : metric operator value } eg: rows_scanned > 100000
• Metric : cpu_time, query_blocks_read, rows scanned, query
execution time, cpu & io skew per slice, join_row_count, etc.
• Operator : <, >, ==
• Value : integer
[action] : hop, log, abort
Query monitoring rules
Monitor and control
cluster resources
consumed by a query

Get notified, abort and

reprioritize long-
running / bad queries

Pre-defined templates
for common use
cases
Query monitoring rules
Common use cases:
• Protect interactive queues
INTERACTIVE = { “query_execution_time > 15 sec” or
“query_cpu_time > 1500 uSec” or
”query_blocks_read > 18000 blocks” } [HOP]

• Monitor ad-hoc queues for heavy queries

AD-HOC = { “query_execution_time > 120” or
“query_cpu_time > 3000” or
”query_blocks_read > 180000” or
“memory_to_disk > 400000000000”} [LOG]

• Limit the number of rows returned to a client

MAXLINES = { “RETURN_ROWS > 50000” } [ABORT]
Benefit #2: Amazon Redshift is inexpensive

Price per hour for Effective annual

DS2 (HDD) DS2.XL single node price per TB compressed

On-demand $ 0.850 $ 3,725

1 year reservation $ 0.500 $ 2,190 Pricing is simple
3 year reservation $ 0.228 $ 999 Number of nodes x price/hour
No charge for leader node
Price per hour for Effective annual
No upfront costs
DC1 (SSD) DC1.L single node price per TB compressed Pay as you go
On-demand $ 0.250 $ 13,690
1 year reservation $ 0.161 $ 8,795
3 year reservation $ 0.100 $ 5,500
Benefit #3: Amazon Redshift is fully managed

Continuous/incremental backups
Multiple copies within cluster Compute node Compute node Compute node

Continuous and incremental backups

to Amazon S3
Region 1

Continuous and incremental backups

across regions Amazon S3

Streaming restore Region 2

Amazon S3
Benefit #3: Amazon Redshift is fully managed

Fault tolerance
Disk failures Compute node Compute node Compute node

Node failures

Network failures Region 1

Availability Zone/region level disasters Amazon S3

Region 2

Amazon S3
Node fault tolerance
Data-path monitoring agents
Node level monitoring
can detect SW/HW
Compute node
issues and take action

Leader node Compute node

Client

Compute node
Node fault tolerance
Data-path monitoring agents Failure is detected at one
of the compute nodes

Compute node

Leader node Compute node

Client

Compute node
Node fault tolerance
Data-path monitoring agents Redshift parks the
connections

Compute node Next, the node is

replaced

Leader node Compute node

Client

Compute node
Node fault tolerance
Data-path monitoring agents Queries are re-submitted

Compute node

Leader node Compute node

Client

Compute node
Node fault tolerance
Data-path monitoring agents Additional monitoring
layer for the leader
Cluster-level monitoring agents node and network
Compute node

Leader node Compute node

Client

Compute node
Benefit #4: Security is built-in Customer VPC

 Load encrypted from S3

BI tools Analytics tools SQL clients
 SSL to secure data in transit
JDBC/ODBC
 ECDHE perfect forward secrecy
Internal VPC
 Amazon VPC for network isolation

 Encryption to secure data at rest

Leader node

 All blocks on disks and in S3 encrypted 10 GigE

(HPC)
 Block key, cluster key, master key (AES-256)

 On-premises HSM & AWS CloudHSM support

Compute node Compute node Compute node

 Audit logging and AWS CloudTrail integration

Ingestion
Backup
 SOC 1/2/3, PCI-DSS, FedRAMP, BAA Restore

Amazon S3 Amazon EMR Amazon Dynamo DB SSH

Benefit #5: Amazon Redshift is powerful
• Approximate functions

• User defined functions

• Machine learning

• Data science
Benefit #6: Amazon Redshift has a large ecosystem

Data integration Business intelligence Systems integrators

Benefit #7: Service oriented architecture

EC2/SSH
DynamoDB

RDS/Aurora

Amazon ML

EMR
Amazon
Redshift CloudSearch

Data Pipeline
Amazon
Mobile
S3 Amazon Kinesis Analytics
Amazon Redshift Spectrum
Amazon Redshift Spectrum
Run SQL queries directly against data in S3 using thousands of nodes

Fast @ exabyte scale Elastic & highly available On-demand, pay-per-query

S3
SQL
High concurrency: Multiple No ETL: Query data in-place Full Amazon Redshift
clusters access same data using open file formats SQL support
Life of a query Query
SELECT COUNT(*)
FROM S3.EXT_TABLE
GROUP BY…

JDBC/ODBC

Amazon
Redshift

Redshift Spectrum ...

Fast @ Exabyte scale
1 2 3 4 N

Amazon S3 Data Catalog

Exabyte-scale object storage Apache Hive Metastore
Amazon Redshift Spectrum – Current support

File formats Compression Encryption

• Parquet • Gzip • SSE with AES256

• CSV • Snappy • SSE KMS with default
• Sequence • Lzo (coming soon) key
• RCFile • Bz2
• ORC (coming soon)
• RegExSerDe (coming soon)

Column types Table type

• Numeric: bigint, int, smallint, float, double • Non-partitioned table

and decimal (s3://mybucket/orders/..)
• Char/varchar/string • Partitioned table
• Timestamp (s3://mybucket/orders/date=YYYY-MM-
• Boolean DD/..)
• DATE type can be used only as a
partitioning key
The Emerging Analytics Architecture

Storage
Amazon S3 AWS Glue Data Catalog
Exabyte-scale Object Storage Hive-compatible Metastore

Serverless
Compute
Amazon Kinesis Firehose AWS Glue Amazon Redshift Spectrum AWS Lambda
Real-Time Data Streaming ETL & Data Catalog Fast @ Exabyte scale Trigger-based Code Execution

Data
Processing
Amazon EMR Amazon Redshift Amazon Athena
Athena
Managed Hadoop Applications Petabyte-scale Data Warehousing Interactive Query
Over 20 customers helped preview Amazon Redshift Spectrum
Use cases
NTT Docomo: Japan’s largest mobile service provider

68 million customers Scaling challenges

Tens of TBs per day of data across a Performance issues
mobile network
6 PB of total data (uncompressed) Need same level of security
Data science for marketing Need for a hybrid environment
operations, logistics, and so on

Greenplum on-premises
NTT Docomo: Japan’s largest mobile service provider

125 node DS2.8XL cluster

S3
4,500 vCPUs, 30 TB RAM
2 PB compressed
Data ET Forwarder
Source State Loader
Management
10x faster analytic queries
AWS
Direct
Connect 50% reduction in time for new
Client Amazon Redshift Sandbox BI application deployment
Significantly less operations
overhead
Nasdaq: powering 100 marketplaces in 50 countries

Orders, quotes, trade executions, Expensive legacy DW

market “tick” data from 7 exchanges ($1.16 M/yr.)
7 billion rows/day Limited capacity (1 yr. of data
Analyze market share, client activity, online)
surveillance, billing, and so on
Needed lower TCO
Must satisfy multiple security
Microsoft SQL Server on-premises and regulatory requirements
Similar performance
Nasdaq: powering 100 marketplaces in 50 countries

23 node DS2.8XL cluster

828 vCPUs, 5 TB RAM
368 TB compressed
2.7 T rows, 900 B derived
8 tables with 100 B rows
7 man-month migration
¼ the cost, 2x storage, room to
grow
Faster performance, very
secure
Amazon.com clickstream analytics

Web log analysis for Amazon.com

• PBs workload, 2TB/day@67% YoY
• Largest table: 400 TB

Understand customer behavior

Previous solution
• Legacy DW (Oracle)—query across 1 week/hr
• Hadoop—query across 1 month/hr
Results with Amazon Redshift

• Query 15 months in 14 min • 100 node DS2.8XL clusters • 20% time of one DBA

• Load 5B rows in 10 min • Easy resizing • Increased productivity

• 21B w/ 10B rows: 3 days to 2 hrs • Managed backups and restore

(Hive  Redshift)
• Failure tolerance and recovery
• Load pipeline: 90 hrs to 8 hrs
(Oracle  Redshift)
Resources

Detail Pages
• http://aws.amazon.com/redshift
• https://aws.amazon.com/marketplace/redshift/
• https://aws.amazon.com/redshift/developer-resources/
• Amazon Redshift Utilities - GitHub

Best Practices
• http://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-
practices.html
• http://docs.aws.amazon.com/redshift/latest/dg/c_designing-tables-best-
practices.html
• http://docs.aws.amazon.com/redshift/latest/dg/c-optimizing-query-
performance.html
Thank you!

Amazon Redhsift
No ratings yet
Amazon Redhsift
25 pages
API Manager Tools for Developers
No ratings yet
API Manager Tools for Developers
28 pages
Redshift ETL with AWS Glue & Step Functions
No ratings yet
Redshift ETL with AWS Glue & Step Functions
31 pages
Amazon Redshift Database Developer Guide
No ratings yet
Amazon Redshift Database Developer Guide
783 pages
AWS Certified ML Engineer Associate Slides
No ratings yet
AWS Certified ML Engineer Associate Slides
861 pages
Amazon Redshift论文
No ratings yet
Amazon Redshift论文
13 pages
Change Data Capture Using Aws Dms Ra
No ratings yet
Change Data Capture Using Aws Dms Ra
3 pages
Amazon Red Shift
No ratings yet
Amazon Red Shift
54 pages
AWS Tools for Data Engineers
No ratings yet
AWS Tools for Data Engineers
24 pages
Partnercast - Amazon Redshift Super Class - Session 1 - Nov - 2022
No ratings yet
Partnercast - Amazon Redshift Super Class - Session 1 - Nov - 2022
74 pages
AWS Glue 101 - All You Need To Know With A Full Walk-Through - by Kevin Bok - Towards Data Science
No ratings yet
AWS Glue 101 - All You Need To Know With A Full Walk-Through - by Kevin Bok - Towards Data Science
23 pages
Introductiontoamazonredshiftwebinar 130322140336 Phpapp01
No ratings yet
Introductiontoamazonredshiftwebinar 130322140336 Phpapp01
32 pages
Oracle EBS Cloud Deployment Guide
No ratings yet
Oracle EBS Cloud Deployment Guide
136 pages
Aws Certified Data Engineer Slides
No ratings yet
Aws Certified Data Engineer Slides
711 pages
Running Oracle EBS in The Cloud
No ratings yet
Running Oracle EBS in The Cloud
60 pages
AWS Solutions Architect Practice
25% (4)
AWS Solutions Architect Practice
2 pages
Aws CJ Saa en Kickoff 2023 Nov
No ratings yet
Aws CJ Saa en Kickoff 2023 Nov
43 pages
Greenplum Architecture, Administration, and
No ratings yet
Greenplum Architecture, Administration, and
573 pages
AWS Certified Solutions Architect Professional-Exam Guide en 1.2
No ratings yet
AWS Certified Solutions Architect Professional-Exam Guide en 1.2
3 pages
Session 4 - Day 2 Amazon Redshift Overview and Architecture-1-20
No ratings yet
Session 4 - Day 2 Amazon Redshift Overview and Architecture-1-20
20 pages
Amazon AWS Redshift Overview
No ratings yet
Amazon AWS Redshift Overview
3 pages
AWS S3 Interview Questions
No ratings yet
AWS S3 Interview Questions
23 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
138 pages
1 AWS Analytics and Data Lakes
No ratings yet
1 AWS Analytics and Data Lakes
15 pages
Bhadri
No ratings yet
Bhadri
9 pages
Dags: The Definitive Guide: Everything You Need To Know About Airflow Dags
100% (1)
Dags: The Definitive Guide: Everything You Need To Know About Airflow Dags
72 pages
Awscloudpractitionerday1slides1572711835593 PDF
No ratings yet
Awscloudpractitionerday1slides1572711835593 PDF
97 pages
FSDF Solution for Financial Execs
No ratings yet
FSDF Solution for Financial Execs
47 pages
The Analyst's Guide To Amazon Redshift: Periscope Data Presents
No ratings yet
The Analyst's Guide To Amazon Redshift: Periscope Data Presents
17 pages
StruxureWare Data Center Operation
No ratings yet
StruxureWare Data Center Operation
20 pages
ETL Testing Approach
No ratings yet
ETL Testing Approach
96 pages
Message Queues (ActiveMQs and Kafka)
No ratings yet
Message Queues (ActiveMQs and Kafka)
7 pages
AWS ML Specialty Cheat Sheet
100% (1)
AWS ML Specialty Cheat Sheet
67 pages
LeafLevel Classifications 20141110
No ratings yet
LeafLevel Classifications 20141110
87 pages
Databricks Cloud Workshop: SF, 2015-05-20! Download Slides
100% (1)
Databricks Cloud Workshop: SF, 2015-05-20! Download Slides
168 pages
CSA Database Monitoring Guide
No ratings yet
CSA Database Monitoring Guide
49 pages
Unit 2 Data Preprocessing and Association Rule Mining
No ratings yet
Unit 2 Data Preprocessing and Association Rule Mining
31 pages
LO2a) - Introduction To Data Engineering
No ratings yet
LO2a) - Introduction To Data Engineering
32 pages
Oracle EBS To Cloud - IaaS
No ratings yet
Oracle EBS To Cloud - IaaS
35 pages
Data Engineering & Analysis Expert
No ratings yet
Data Engineering & Analysis Expert
5 pages
Data Engg
No ratings yet
Data Engg
19 pages
REPEAT 1 Lessons From Migrating Oracle Databases To Amazon Aurora DAT342-R1 PDF
No ratings yet
REPEAT 1 Lessons From Migrating Oracle Databases To Amazon Aurora DAT342-R1 PDF
41 pages
AWS Storage Solutions and EC2 Management
No ratings yet
AWS Storage Solutions and EC2 Management
167 pages
Amazon Aurora: Relational Database Reimagined For The Cloud
No ratings yet
Amazon Aurora: Relational Database Reimagined For The Cloud
31 pages
Learn To Create MSBI (Microsoft Business Intelligence) Project in 7 Days - CodeProject
No ratings yet
Learn To Create MSBI (Microsoft Business Intelligence) Project in 7 Days - CodeProject
20 pages
Sandeep Julakanti - Resume
No ratings yet
Sandeep Julakanti - Resume
9 pages
Unite Real-Time and Batch Analytics With AWS Glue
No ratings yet
Unite Real-Time and Batch Analytics With AWS Glue
28 pages
Devops With Awscourse Content Latest
No ratings yet
Devops With Awscourse Content Latest
10 pages
AWS Data Lakes Course Overview
No ratings yet
AWS Data Lakes Course Overview
187 pages
Amul
No ratings yet
Amul
16 pages
AWS Data Lake Lab: Athena & QuickSight
No ratings yet
AWS Data Lake Lab: Athena & QuickSight
22 pages
AWS SQL Server Modernization Guide
No ratings yet
AWS SQL Server Modernization Guide
54 pages
S K Resume
No ratings yet
S K Resume
1 page
Building A Data Empowered Company Domo Ebook PDF
No ratings yet
Building A Data Empowered Company Domo Ebook PDF
12 pages
Data Warehousing & BI Guide
No ratings yet
Data Warehousing & BI Guide
88 pages
Migrate Oracle DB To AWS RDS Using Oracle Dump and DMS
No ratings yet
Migrate Oracle DB To AWS RDS Using Oracle Dump and DMS
41 pages
Importing SequentialFiles in DataStage
No ratings yet
Importing SequentialFiles in DataStage
13 pages
DMWQ1D4S3T2 - Amazon Aurora Performance Optimization Techniques
No ratings yet
DMWQ1D4S3T2 - Amazon Aurora Performance Optimization Techniques
57 pages
Complete Course Outline of Power BI
No ratings yet
Complete Course Outline of Power BI
4 pages
Data Warehousing 95-797: Meeting Days, Times, Location: Semester:, Year
No ratings yet
Data Warehousing 95-797: Meeting Days, Times, Location: Semester:, Year
5 pages
Amazon S3: Prepared by
No ratings yet
Amazon S3: Prepared by
42 pages
Concept Check Chapter 4
No ratings yet
Concept Check Chapter 4
4 pages
DSX InfoSphere DataStage Is Big Data Integration 2013-05-13
0% (1)
DSX InfoSphere DataStage Is Big Data Integration 2013-05-13
30 pages
C175 Course Guide
No ratings yet
C175 Course Guide
6 pages
1 AWS EC2 Interview Questions - MindMajix
No ratings yet
1 AWS EC2 Interview Questions - MindMajix
25 pages
Danish Shamim: Professional Profile
No ratings yet
Danish Shamim: Professional Profile
3 pages
Automating MongoDB Exports on AWS
No ratings yet
Automating MongoDB Exports on AWS
6 pages
Harish CH
No ratings yet
Harish CH
5 pages
Varun Kumar Bajaj New
No ratings yet
Varun Kumar Bajaj New
3 pages
AWS Glue
100% (1)
AWS Glue
225 pages
Practicedump: Free Practice Dumps - Unlimited Free Access of Practice Exam
No ratings yet
Practicedump: Free Practice Dumps - Unlimited Free Access of Practice Exam
4 pages
AWS IAM Notes
No ratings yet
AWS IAM Notes
12 pages
MongoDB Indexes Guide
No ratings yet
MongoDB Indexes Guide
68 pages
AWS Cert Exam Dumps for Architects
No ratings yet
AWS Cert Exam Dumps for Architects
13 pages
AWS Cloud Services Overview
No ratings yet
AWS Cloud Services Overview
16 pages
Abhishek - Resume
No ratings yet
Abhishek - Resume
1 page
Bigquery: Introducing Powerful New Enterprise Data Warehousing Features
No ratings yet
Bigquery: Introducing Powerful New Enterprise Data Warehousing Features
6 pages
Rutuja Dhanawade Resume
No ratings yet
Rutuja Dhanawade Resume
1 page
Aws Saa Mcqs
No ratings yet
Aws Saa Mcqs
24 pages
Back-End Developer - Data Engineer
No ratings yet
Back-End Developer - Data Engineer
2 pages
CHECKLIST: Analytics Project Framework
No ratings yet
CHECKLIST: Analytics Project Framework
2 pages
Spark Use Cases
No ratings yet
Spark Use Cases
2 pages
Hands On Lab Guide For Data Lake PDF
No ratings yet
Hands On Lab Guide For Data Lake PDF
19 pages
DB2 Administrators Unix Commands Surviva PDF
No ratings yet
DB2 Administrators Unix Commands Surviva PDF
7 pages
DB2 Administrators Unix Commands Surviva PDF
No ratings yet
DB2 Administrators Unix Commands Surviva PDF
7 pages
AWS Boto - 1
No ratings yet
AWS Boto - 1
55 pages
Top Cloud Service Providers - A Quick Comparison - Avenga
No ratings yet
Top Cloud Service Providers - A Quick Comparison - Avenga
25 pages
The Changing Role of The DBA in The Expanding Cloud World - Database Trends and Applications
No ratings yet
The Changing Role of The DBA in The Expanding Cloud World - Database Trends and Applications
5 pages
AWS Cloud Computing Guide
No ratings yet
AWS Cloud Computing Guide
2 pages
Practice Test 6
No ratings yet
Practice Test 6
50 pages
01 010 010 Lab Notes v2 07 PDF
No ratings yet
01 010 010 Lab Notes v2 07 PDF
77 pages
Data Lakes For Maximum Flexibility
No ratings yet
Data Lakes For Maximum Flexibility
29 pages
Title: Presented By:-Gaurav Sharma Roll No. - 19EMCCS037 Batch - A, Year - 4th Branch - CSE
No ratings yet
Title: Presented By:-Gaurav Sharma Roll No. - 19EMCCS037 Batch - A, Year - 4th Branch - CSE
13 pages
SQL PLSQL Material Ramanjanayalu
100% (1)
SQL PLSQL Material Ramanjanayalu
323 pages
Skyess Spark Syllabus
No ratings yet
Skyess Spark Syllabus
12 pages
Amazon Web Services Hyderabad
No ratings yet
Amazon Web Services Hyderabad
6 pages
AWS Course Content
No ratings yet
AWS Course Content
9 pages
AWS Cloud Practitioner Guide
No ratings yet
AWS Cloud Practitioner Guide
8 pages

Getting Started With Amazon Redshift

Uploaded by

Getting Started With Amazon Redshift

Uploaded by

Getting Started with

Maor Kleider, Sr. Product Manager, Amazon Redshift

When your data sets become so large and diverse

Individual AWS customers Collect & Store

Collaborate & Act

Collaborate & Act

Collaborate & Act

Amazon Kinesis AWS Direct

Amazon Kinesis Amazon Amazon Amazon RDS, Amazon Athena

Amazon Kinesis Amazon Amazon Amazon Amazon Machine

AWS Database Migration Service AWS

Relational data warehouse

Amazon Fully managed

Business Advanced pipelines Secure and Bulk Loads

Japanese Mobile World’s Largest Children’s Powering 100 marketplaces

Log & Machine Clickstream Time-Series

Cheap – Analyze large volumes of data cost-effectively

Multi-Tenant BI Back-end Analytics as a

Infosys Information Analytics-as-a- Product and Consumer

Stores metadata JDBC/ODBC

Amazon S3 Amazon EMR Amazon Dynamo DB SSH

Dramatically less I/O Table | Column | Encoding

324 … | 100 | 245 | 324

623 … 512 | 549 | 623

959 … | 834 | 921 | 959

Parallel and distributed

Hardware optimized for I/O intensive workloads, 4 GB/sec/node

Enhanced networking, over 1 million packets/sec/node

Choice of storage type, instance size

Regular cadence of auto-patched improvements

“On our previous big data warehouse system, it took

5X Query throughput improvement over the past year

10X Vacuuming performance improvement

• Multiple predicates can be AND-ed together to create a rule

If { rule } then [action]

Get notified, abort and

• Monitor ad-hoc queues for heavy queries

• Limit the number of rows returned to a client

Price per hour for Effective annual

On-demand $ 0.850 $ 3,725

Continuous and incremental backups

Continuous and incremental backups

Streaming restore Region 2

Network failures Region 1

Availability Zone/region level disasters Amazon S3

Leader node Compute node

Leader node Compute node

Compute node Next, the node is

Leader node Compute node

Leader node Compute node

Leader node Compute node

 Load encrypted from S3

 Encryption to secure data at rest

 All blocks on disks and in S3 encrypted 10 GigE

 On-premises HSM & AWS CloudHSM support

 Audit logging and AWS CloudTrail integration

Amazon S3 Amazon EMR Amazon Dynamo DB SSH

• User defined functions

Data integration Business intelligence Systems integrators

Fast @ exabyte scale Elastic & highly available On-demand, pay-per-query

Redshift Spectrum ...

Amazon S3 Data Catalog

File formats Compression Encryption

• Parquet • Gzip • SSE with AES256

Column types Table type

• Numeric: bigint, int, smallint, float, double • Non-partitioned table

68 million customers Scaling challenges

125 node DS2.8XL cluster

Orders, quotes, trade executions, Expensive legacy DW

23 node DS2.8XL cluster

Web log analysis for Amazon.com

Understand customer behavior

• Load 5B rows in 10 min • Easy resizing • Increased productivity

• 21B w/ 10B rows: 3 days to 2 hrs • Managed backups and restore

You might also like