0% found this document useful (0 votes)

62 views34 pages

Amazon Redshift: Getting Started Guide

Uploaded by

Anil Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views34 pages

Amazon Redshift: Getting Started Guide

Uploaded by

Anil Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Amazon Redshift

Getting Started Guide

Amazon Redshift Getting Started Guide

Amazon Redshift: Getting Started Guide

Amazon's trademarks and trade dress may not be used in connection with any product or service that is not
Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or
discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may
or may not be aﬃliated with, connected to, or sponsored by Amazon.
Amazon Redshift Getting Started Guide

Table of Contents
Getting started with Amazon Redshift .................................................................................................. 1
Prerequisites .............................................................................................................................. 1
Sign up for AWS ................................................................................................................ 2
Determine firewall rules ...................................................................................................... 2
Amazon Redshift concepts and data processing flow ...................................................................... 2
Amazon Redshift concepts .................................................................................................. 2
Typical data processing flow for Amazon Redshift .................................................................. 3
Getting started with Amazon Redshift basics ......................................................................................... 5
Getting started with the Amazon Redshift console ......................................................................... 5
Connecting to Amazon Redshift ................................................................................................... 6
Getting started with clusters and data loading .............................................................................. 7
Using a sample dataset ....................................................................................................... 7
Bringing your own data to Amazon Redshift ........................................................................ 10
Getting started with common database tasks .............................................................................. 19
Task 1: Create a database .................................................................................................. 19
Task 2: Create a user ........................................................................................................ 19
Task 3: Create a schema .................................................................................................... 20
Task 4: Create a table ....................................................................................................... 21
Task 5: Load sample data .................................................................................................. 22
Task 6: Query the system tables ........................................................................................ 22
Task 7: Cancel a query ...................................................................................................... 25
Task 8: Clean up your resources ......................................................................................... 27
Getting started with querying outside data sources .............................................................................. 28
Getting started querying data lakes ............................................................................................ 28
Getting started querying remote data sources ............................................................................. 28
Getting started accessing data in other clusters ........................................................................... 29
Getting started training ML models with Redshift data ................................................................. 29
Additional resources ......................................................................................................................... 30
Document history ............................................................................................................................. 31

iii
Amazon Redshift Getting Started Guide
Prerequisites

Getting started with Amazon

Redshift
Welcome to the Amazon Redshift Getting Started Guide. Amazon Redshift is a fully managed, petabyte-
scale data warehouse service in the AWS Cloud. An Amazon Redshift data warehouse is a collection of
computing resources called nodes, which are organized into a group called a cluster. Each cluster runs an
Amazon Redshift engine and contains one or more databases.

If you are a ﬁrst-time user of Amazon Redshift, we recommend that you begin by reading the following
sections:

• Amazon Redshift management overview – In this topic, you can find an overview of Amazon Redshift.
• Service highlights and pricing – On this product detail page, you can find details about Amazon
Redshift service highlights and pricing.
• Amazon Redshift Getting Started Guide (this guide) – In this guide, you can find a tutorial of using
Amazon Redshift to create a sample cluster and work with sample data.

In this guide, you can ﬁnd tutorials that walk you through the following:

• Getting started with the Amazon Redshift console (p. 5)

• Connecting to Amazon Redshift (p. 6)
• Getting started with Amazon Redshift clusters and data loading (p. 7)
• Getting started with common database tasks (p. 19)
• Getting started querying your data lake (p. 28)
• Getting started querying data on remote data sources (p. 28)
• Getting started accessing data in other Amazon Redshift clusters (p. 29)
• Getting started training machine learning models with Amazon Redshift data (p. 29)

If your organization is eligible, you might be able to create a cluster under the Amazon Redshift free trial
program. To do this, choose Free trial to create a conﬁguration with the dc2.large node type. For more
information about choosing a free trial, see Amazon Redshift free trial.

Topics
• Prerequisites (p. 1)
• Amazon Redshift concepts and data processing ﬂow (p. 2)

Prerequisites
Before you begin setting up an Amazon Redshift cluster, make sure that you complete the following
prerequisites:

• Sign up for AWS (p. 2)

• Determine ﬁrewall rules (p. 2)

1
Amazon Redshift Getting Started Guide
Sign up for AWS

Sign up for AWS

If you don't already have an AWS account, sign up for one. If you already have an account, you can skip
this prerequisite and use your existing account.

1. Open https://portal.aws.amazon.com/billing/signup.
2. Follow the online instructions.

Part of the sign-up procedure involves receiving a phone call and entering a veriﬁcation code on the
phone keypad.

Determine ﬁrewall rules

As part of this tutorial, you specify a port when you launch your Amazon Redshift cluster. You also create
an inbound ingress rule in a security group to allow access through the port to your cluster.

If your client computer is behind a firewall, make sure that you know an open port that you can use.
Using this open port, you can connect to the cluster from a SQL client tool and run queries. If you don't
know an open port, work with someone who understands your network firewall rules to determine an
open port in your firewall.

Though Amazon Redshift uses port 5439 by default, the connection doesn't work if that port isn't open
in your ﬁrewall. You can't change the port number for your Amazon Redshift cluster after it's created.
Thus, make sure that you specify an open port that works in your environment during the launch
process.

This prerequisite applies only when you bring your own data to Amazon Redshift. For more information,
see Bringing your own data to Amazon Redshift (p. 10).

Amazon Redshift concepts and data processing

flow
In the following sections, you can find key concepts for Amazon Redshift and a description and diagram
of the typical Amazon Redshift data processing flow:

• Amazon Redshift concepts (p. 2)

• Typical data processing ﬂow for Amazon Redshift (p. 3)

Amazon Redshift concepts

Following are some key Amazon Redshift concepts:

• Cluster – The core infrastructure component of an Amazon Redshift data warehouse is a cluster.

A cluster is composed of one or more compute nodes. The compute nodes run the compiled code.

If a cluster is provisioned with two or more compute nodes, an additional leader node coordinates the
compute nodes. The leader node handles external communication with applications, such as business
intelligence tools and query editors. Your client application interacts directly only with the leader
node. The compute nodes are transparent to external applications.

2
Amazon Redshift Getting Started Guide
Typical data processing ﬂow for Amazon Redshift

• Database – A cluster contains one or more databases.

User data is stored in one or more databases on the compute nodes. Your SQL client communicates
with the leader node, which in turn coordinates running queries with the compute nodes. For details
about compute nodes and leader nodes, see Data warehouse system architecture. Within a database,
user data is organized into one or more schemas.

Amazon Redshift is a relational database management system (RDBMS) and is compatible with
other RDBMS applications. It provides the same functionality as a typical RDBMS, including online
transaction processing (OLTP) functions such as inserting and deleting data. Amazon Redshift also is
optimized for high-performance batch analysis and reporting of datasets.

Following, you can find a description of typical data processing flow in Amazon Redshift, along with
descriptions of different parts of the flow. For further information about Amazon Redshift system
architecture, see Data warehouse system architecture.

Typical data processing ﬂow for Amazon Redshift

The following diagram illustrates a typical data processing ﬂow in Amazon Redshift.

An Amazon Redshift data warehouse is an enterprise-class relational database query and management
system. Amazon Redshift supports client connections with many types of applications, including business
intelligence (BI), reporting, data, and analytics tools. When you run analytic queries, you are retrieving,
comparing, and evaluating large amounts of data in multiple-stage operations to produce a ﬁnal result.

At the data ingestion layer, diﬀerent types of data sources continuously upload structured,
semistructured, or unstructured data to the data storage layer. This data storage area serves as a staging
area that stores data in diﬀerent states of consumption readiness. An example of storage might be an
Amazon Simple Storage Service (Amazon S3) bucket.

At the optional data processing layer, the source data goes through preprocessing, validation, and
transformation using extract, transform, load (ETL) or extract, load, transform (ELT) pipelines. These raw
datasets are then reﬁned by using ETL operations. An example of an ETL engine is AWS Glue.

At the data consumption layer, data is loaded into your Amazon Redshift cluster, where you can run
analytical workloads.

Data can also be consumed for analytical workloads as follows:

3
Amazon Redshift Getting Started Guide
Typical data processing ﬂow for Amazon Redshift

• Use datashares to share live data across Amazon Redshift clusters for read purposes with relative
security and ease. You can share data at diﬀerent levels, such as databases, schemas, tables, views
(including regular, late-binding, and materialized views), and SQL user-deﬁned functions (UDFs).

For more information about data sharing, see Getting started accessing data in other Amazon Redshift
clusters (p. 29).
• Use Amazon Redshift Spectrum to query data in Amazon S3 ﬁles without having to load the data into
Amazon Redshift tables. Amazon Redshift provides SQL capability designed for fast online analytical
processing (OLAP) of very large datasets that are stored in both Amazon Redshift clusters and Amazon
S3 data lakes.

For more information about Redshift Spectrum, see Getting started querying your data lake (p. 28).
• Join data from relational databases, such as Amazon Relational Database Service (Amazon RDS)
and Amazon Aurora, or Amazon S3, with data in your Amazon Redshift database using a federated
query. You can use Amazon Redshift to query operational data directly (without moving it), apply
transformations, and insert data into your Amazon Redshift tables.

For more information about federated queries, see Getting started querying data on remote data
sources (p. 28).
• Amazon Redshift machine learning (ML) creates models, using data you provided and metadata
associated with data inputs. These models capture patterns in the input data. You can use these
models to generate predictions for new input data. Amazon Redshift works with Amazon SageMaker
Autopilot to automatically get the best model and make the prediction function available in Amazon
Redshift.

For more information about Amazon Redshift ML, see Getting started training machine learning
models with Amazon Redshift data (p. 29).

4
Amazon Redshift Getting Started Guide
Getting started with the Amazon Redshift console

Getting started with Amazon

Redshift basics
If you are a ﬁrst-time user of Amazon Redshift, we recommend that you read the following sections to
help you get started using Amazon Redshift.

Topics
• Getting started with the Amazon Redshift console (p. 5)
• Connecting to Amazon Redshift (p. 6)
• Getting started with Amazon Redshift clusters and data loading (p. 7)
• Getting started with common database tasks (p. 19)

Getting started with the Amazon Redshift console

After you have signed in to the Amazon Redshift console, you can create and manage all Amazon
Redshift objects, including clusters, databases, and nodes. You can also view queries, run queries, and
perform other data deﬁnition language (DDL) and data manipulation language (DML) operations.

If you are a ﬁrst-time user of Amazon Redshift, we recommend that you begin by going to the
Dashboard, Clusters, and query editor v2 pages to get started using the console.

To get started with the Amazon Redshift console, watch the following video: Getting stated with Amazon
Redshift.

Following, you can ﬁnd a screenshot of the Amazon Redshift console and descriptions of its sections.

5
Amazon Redshift Getting Started Guide
Connecting to Amazon Redshift

Following, you can ﬁnd descriptions of the navigation pane items of the Amazon Redshift console:

• Dashboard – Check Cluster metrics and Query overview for insights to metrics data (such as CPU
utilization) and query information. Using these can help you determine if your performance data is
abnormal over a specified time range.
• Clusters – View a list of clusters in your AWS account, choose a cluster to start querying, or perform
cluster-related actions. You can also create a new cluster from this page.
• Queries – Get information for reference or troubleshooting, such as a list of recent queries and the
SQL text for each query.
• Editor – Run queries on databases hosted on your Amazon Redshift cluster, save queries for reuse, or
schedule them to run at a future time (in the query editor only).
• Datashares – As a producer account administrator, either authorize consumer accounts to access
datashares or choose not to authorize access. To use an authorized datashare, a consumer account
administrator can associate the datashare with either an entire AWS account or specific cluster
namespaces in an account. An administrator can also decline a datashare.
• Config – Connect to Amazon Redshift clusters from SQL client tools over Java Database Connectivity
(JDBC) and Open Database Connectivity (ODBC) connections. You can also set up an Amazon Redshift–
managed virtual private cloud (VPC) endpoint. Doing this provides a private connection between a VPC
based on the Amazon VPC service that contains a cluster and another VPC that is running a client tool.
• Marketplace – Get information on other tools or AWS services that work with Amazon Redshift.
• Advisor – Get specific recommendations about changes you can make to your Amazon Redshift cluster
to prioritize your optimizations.
• Alarms – Create alarms on cluster metrics to view performance data and track metrics over a time
period that you specify.
• Events – Track events and get reports on information such as the date the event occurred, a
description, or the event source.
• What's new – View new Amazon Redshift features and product updates.

Connecting to Amazon Redshift

To connect to Amazon Redshift clusters, from the Clusters page, expand Connect to Amazon Redshift
clusters and do one of the following:

• Use the query editor v2 to run queries on databases hosted by your Amazon Redshift cluster. After
creating your cluster, you can immediately run queries by using the query editor v2.

For more information, see Querying a database using the Amazon Redshift query editor v2.
• Connect to Amazon Redshift from your client tools using JDBC or ODBC drivers by copying the JDBC or
ODBC driver URL.

To work with data in your cluster, you need JDBC or ODBC drivers for connectivity from your client
computer or instance. Code your applications to use JDBC or ODBC data access API operations, or use
SQL client tools that support either JDBC or ODBC.

For more information on how to ﬁnd your cluster connection string, see Finding your cluster
connection string.
• If your SQL client tool requires a driver, you can download an operating system-speciﬁc driver to
connect to Amazon Redshift from your client tools.

For more information on how to install the appropriate driver for your SQL client, see Conﬁguring a
JDBC driver version 2.0 connection.

For more information on how to conﬁgure an ODBC connection, see Conﬁguring an ODBC connection.

6
Amazon Redshift Getting Started Guide
Getting started with clusters and data loading

Getting started with Amazon Redshift clusters and

data loading
In this section, you can ﬁnd two tutorials that walk you through the process of creating a sample Amazon
Redshift cluster. In one, you use a sample dataset, and in the other you bring your own dataset.

Make sure that you have the prerequisites before getting started. For more information, see
Prerequisites (p. 1).

Topics
• Using a sample dataset (p. 7)
• Bringing your own data to Amazon Redshift (p. 10)

Using a sample dataset

In this tutorial, you walk through the process to create an Amazon Redshift cluster by using a sample
dataset. Amazon Redshift automatically loads the sample dataset when you are creating a new cluster.
You can immediately query the data after the cluster is created.

Before you begin setting up an Amazon Redshift cluster, make sure that you complete the Sign up for
AWS (p. 2) and Determine ﬁrewall rules (p. 2).

In this tutorial, you perform the steps shown following:

Topics
• Step 1: Create a sample Amazon Redshift cluster (p. 7)
• Step 2: Try example queries using the query editors (p. 9)

Important
The sample cluster that you create runs in a live environment. The on-demand rate is $0.25 per
hour for using the sample cluster that is designed in this tutorial until you delete it. For more
pricing information, see Amazon Redshift pricing. If you have questions or get stuck, you can
contact the Amazon Redshift team by posting on our Discussion forum.

This tutorial isn't meant for production environments and doesn't discuss options in depth. After you
complete the steps in this tutorial, you can use Additional resources (p. 30) to ﬁnd more in-depth
information. This information can help you plan, deploy, and maintain your clusters, and work with the
data in your data warehouse.

Step 1: Create a sample Amazon Redshift cluster

When you have the prerequisites completed, you can start creating your Amazon Redshift cluster, based
on a sample dataset.

To create an Amazon Redshift cluster based on a sample dataset:

1. Sign in to the AWS Management Console and open the Amazon Redshift console at https://
console.aws.amazon.com/redshift/.
2. To create a cluster, do one of the following:

• On the Amazon Redshift service page, choose Create cluster. The Create cluster page appears.

7
Amazon Redshift Getting Started Guide
Using a sample dataset

• On the https://console.aws.amazon.com/redshift/, choose DASHBOARD, then choose Create

cluster.
• On the https://console.aws.amazon.com/redshift/, choose CLUSTERS, then choose Create
cluster.
3. In the Cluster configuration section, specify a Cluster identifier. This identifier must be unique.
The identifier must be from 1–63 characters using as valid characters a–z (lowercase only) and -
(hyphen).

Enter examplecluster for this tutorial.

4. If your organization is eligible, you might be able to create a cluster under the Amazon Redshift free
trial program. To do this, choose Free trial to create a conﬁguration with the dc2.large node type.
For more information about choosing a free trial, see Amazon Redshift free trial.

The console displays your selection, as shown in the screenshot following.

If you later choose another node type, your organization is no longer eligible for the free trial.

After you choose your node type, do one of the following:

• In Sample data, choose Load sample data to load the sample dataset into your Amazon Redshift
cluster. Amazon Redshift loads the sample dataset Tickit into the default dev database and
public schema. You can start using the query editor v2 to query data.
• To bring your own data to your Amazon Redshift cluster, choose Production. Then, in Sample
data, choose Load sample data. For information about bringing your own data, see Bringing your
own data to Amazon Redshift (p. 10).

Amazon Redshift automatically loads the sample dataset into your sample Amazon Redshift cluster.
5. In the Database conﬁguration section, specify values for Admin user name and Admin user
password. Or choose Generate password to use a password generated by Amazon Redshift.

For this tutorial, use these values:

• Admin user name: Enter awsuser.

• Admin user password: Enter a value for the password.
6. Choose Create cluster.

This tutorial uses the Amazon Redshift query editor v2. You can use this editor to query data
immediately, after Amazon Redshift ﬁnishes creating the cluster.

8
Amazon Redshift Getting Started Guide
Using a sample dataset

You can also choose other SQL client tools that support JDBC or ODBC drivers to work with data in your
cluster. For more information, see Connecting to an Amazon Redshift cluster using SQL client tools in the
Amazon Redshift Cluster Management Guide.

Step 2: Try example queries using the query editors

When Amazon Redshift is creating your Amazon Redshift cluster, it automatically uploads the sample
dataset Tickit. Cluster creation might take a few minutes to complete. After creation completes, the
cluster status becomes ACTIVE. You can view the sample Tickit tables from the sample dataset.

Using the query editor

You can view the sample Tickit tables in the query editor v2 by choosing the cluster, the dev database,
and public schema.

After the Amazon Redshift cluster is created, in Connect to Amazon Redshift clusters, choose Query
data.

From the query editor v2, connect to a database, and choose the cluster name in the tree-view panel. If
prompted, enter the connection parameters.

When you connect to a cluster and its databases, you provide a Database name and User name. You also
provide parameters required for one of the following authentication methods:

Database user name and password

With this method, also provide a Password for the database that you are connecting to.
Temporary credentials

With this method, query editor v2, generates a temporary password to connect to the database.

When you select a cluster with query editor v2, depending on the context, you can create, edit, and
delete connections using the context (right-click) menu.

By default, Amazon Redshift creates a default database named dev and a default schema named
public. To view the individual data ﬁles of the sample dataset, choose a cluster, go to the query editor
v2, and choose the dev database, public schema, then Tables.

Alternatively, in the navigation pane, choose CLUSTERS and the cluster you want query data on. Then
under Query data, choose either Query in query editor or Query in query editor v2 to query data in
your speciﬁed query editor.

Trying example queries

Try some example queries in one of the query editors, as shown following. For more information on
working with the SELECT command, see SELECT in the Amazon Redshift Database Developer Guide.

-- Find total sales on a given calendar date.

9
Amazon Redshift Getting Started Guide
Bringing your own data to Amazon Redshift

SELECT sum(qtysold)
FROM sales, date
WHERE sales.dateid = date.dateid
AND caldate = '2008-01-05';

-- Find top 10 buyers by quantity.

SELECT firstname, lastname, total_quantity
FROM (SELECT buyerid, sum(qtysold) total_quantity
FROM sales
GROUP BY buyerid
ORDER BY total_quantity desc limit 10) Q, users
WHERE Q.buyerid = userid
ORDER BY Q.total_quantity desc;

-- Find events in the 99.9 percentile in terms of all time gross sales.
SELECT eventname, total_price
FROM (SELECT eventid, total_price, ntile(1000) over(order by total_price desc) as
percentile
FROM (SELECT eventid, sum(pricepaid) total_price
FROM sales
GROUP BY eventid)) Q, event E
WHERE Q.eventid = E.eventid
AND percentile = 1
ORDER BY total_price desc;

After you complete the steps in this tutorial, you can use Additional resources (p. 30) to ﬁnd more
in-depth information. This information can help you plan, deploy, and maintain your clusters, and work
with the data in your data warehouse.

You can also try the Bringing your own data to Amazon Redshift (p. 10) tutorial to create a cluster
with your own dataset.

Bringing your own data to Amazon Redshift

In this tutorial, you walk through the process to create an Amazon Redshift cluster by bringing your own
dataset to Amazon Redshift. You can use this sample cluster to evaluate the Amazon Redshift service.

Before you begin setting up an Amazon Redshift cluster, make sure that you complete the Sign up for
AWS (p. 2) and Determine ﬁrewall rules (p. 2).

In this tutorial, you perform the steps shown following.

Important
The sample cluster that you create runs in a live environment. The on-demand rate is $0.25 per
hour for using the sample cluster that is designed in this tutorial until you delete it. For more
pricing information, go to the Amazon Redshift pricing page. If you have questions or get stuck,
you can contact the Amazon Redshift team by posting on our Discussion forum.

Topics

10
Amazon Redshift Getting Started Guide
Bringing your own data to Amazon Redshift

• Step 1: Create a sample Amazon Redshift cluster (p. 11)

• Step 2: Conﬁgure inbound rules for SQL clients (p. 13)
• Step 3: Grant access to one of the query editors and run queries (p. 13)
• Step 4: Load data from Amazon S3 to Amazon Redshift (p. 14)
• Step 5: Try example queries using the query editor (p. 18)
• Step 6: Reset your environment (p. 18)

Step 1: Create a sample Amazon Redshift cluster

For any operation that accesses data from another AWS resource, your cluster needs permission to access
the resource and the data on the resource on your behalf. An example is using a COPY command to load
data from Amazon Simple Storage Service (Amazon S3). You provide those permissions by using AWS
Identity and Access Management (IAM). You can do this through an IAM role that is attached to your
cluster. Or you can provide the AWS access key for an IAM user that has the necessary permissions. For
more information about credentials and access permissions, see Credentials and access permissions.

To best protect your sensitive data and safeguard your AWS access credentials, we recommend creating
an IAM role and attaching it to your cluster. For more information about providing access permissions,
see Permissions to access other AWS resources.

The cluster that you are about to create is live, not running in a sandbox. You incur the standard Amazon
Redshift usage fees for the cluster until you delete it. If you complete the tutorial described here in one
sitting and delete the cluster when you are ﬁnished, the total charges are minimal.

To create an Amazon Redshift cluster

1. Sign in to the AWS Management Console and open the Amazon Redshift console at https://
console.aws.amazon.com/redshift/.
Important
If you use IAM user credentials, ensure that you have the necessary permissions to perform
the cluster operations. For more information, see Controlling access to IAM users in the
Amazon Redshift Cluster Management Guide.
2. At upper right, choose the AWS Region where you want to create the cluster.
3. On the navigation menu, choose CLUSTERS, then choose Create cluster. The Create cluster page
appears.
4. In the Cluster conﬁguration section, specify values for Cluster identiﬁer, Node type, and Nodes:

• Cluster identifier: Enter examplecluster for this tutorial. This identifier must be unique. The
identifier must be from 1–63 characters using as valid characters a–z (lowercase only) and -
(hyphen).
• Choose one of the following methods to size your cluster:
Note
The following step assumes an AWS Region that supports RA3 node types. For a list
of AWS Regions that support RA3 node types, see Overview of RA3 node types in the
Amazon Redshift Cluster Management Guide.
• If your AWS Region supports RA3 node types, choose either Production or Free trial to answer
the question What are you planning to use this cluster for?

If your organization is eligible, you might be able to create a cluster under the Amazon Redshift
free trial program. For information about creating clusters using the free trial program, see
Using a sample dataset (p. 7). To do this, choose Free trial to create a conﬁguration with the
dc2.large node type. For more information about choosing a free trial, see Amazon Redshift free
trial.

11
Amazon Redshift Getting Started Guide
Bringing your own data to Amazon Redshift

• If you don't know how large to size your cluster, choose Help me choose. Doing this starts a
sizing calculator that asks you questions about the size and query characteristics of the data
that you plan to store in your data warehouse.

If you know the required size of your cluster (that is, the node type and number of nodes),
choose I'll choose. Then choose the Node type and number of Nodes to size your cluster for the
proof of concept.
• Choose dc2.large for Node type and 2 for Nodes for this tutorial.
• If you have chosen Production for your cluster, then do one of the following:
• To use the sample dataset Amazon Redshift provides, in Sample data, choose Load sample
data. Amazon Redshift loads the sample dataset Tickit to the default dev database and public
schema.
• To bring your own data to Amazon Redshift, continue with the rest of the tutorial.
5. In the Database conﬁguration section, specify values for Database name (optional), Database port
(optional), Admin user name, and Admin user password. Or choose Generate password to use a
password generated by Amazon Redshift.

For this tutorial, use these values:

• Database name (optional): Enter dev.

• Database port (optional): Enter 5439.
• Admin user name: Enter awsuser.
• Admin user password: Enter a value for the password.
6. For this tutorial, create an IAM role and set it as the default for your cluster, as described following.
There can only be one default IAM role set for a cluster.

a. Under Cluster permissions, for Manage IAM roles, choose Create IAM role.
b. Specify an Amazon S3 bucket for the IAM role to access by one of the following methods:

• Choose No additional Amazon S3 bucket to allow the created IAM role to access only the
Amazon S3 buckets that are named as redshift.
• Choose Any Amazon S3 bucket to allow the created IAM role to access all Amazon S3
buckets.
• Choose Speciﬁc Amazon S3 buckets to specify one or more Amazon S3 buckets for the
created IAM role to access. Then choose one or more Amazon S3 buckets from the table.
c. Choose Create IAM role as default. Amazon Redshift automatically creates and sets the IAM
role as the default for your cluster.

Because you created your IAM role from the console, it has the
AmazonRedshiftAllCommandsFullAccess policy attached. This allows Amazon Redshift to
copy, load, query, and analyze data from Amazon resources in your IAM account.

For information on how to manage the default IAM role for a cluster, see Creating an IAM role as
default for Amazon Redshift.
7. (Optional) In the Additional configurations section, turn off Use defaults to modify Network and
security, Database configuration, Maintenance, Monitoring, and Backup settings.

In some cases, you might create your cluster with the Load sample data option and want to turn on
enhanced Amazon VPC routing. If so, the cluster in your virtual private cloud (VPC) requires access to
the Amazon S3 endpoint for data to be loaded.

To make the cluster publicly accessible, you can do one of two things. You can conﬁgure a network
address translation (NAT) address in your VPC for the cluster to access the internet. Or you
can conﬁgure an Amazon S3 VPC endpoint in your VPC. For more information about enhanced

12
Amazon Redshift Getting Started Guide
Bringing your own data to Amazon Redshift

Amazon VPC routing, see Enabling enhanced Amazon VPC routing in the Amazon Redshift Cluster
Management Guide.
8. Choose Create cluster.

Step 2: Conﬁgure inbound rules for SQL clients

Later in this tutorial, you access your cluster from within a virtual private cloud (VPC) based on the
Amazon VPC service. However, if you use an SQL client from outside your ﬁrewall to access the cluster,
make sure that you grant inbound access.

You can skip this step if you plan to access the cluster with the Amazon Redshift query editor from within
your VPC.

To check your ﬁrewall and grant inbound access to your cluster

1. Check your firewall rules if your cluster needs to be accessed from outside a firewall. For example,
your client might be an Amazon Elastic Compute Cloud (Amazon EC2) instance or an external
computer.
2. To access from an Amazon EC2 external client, add an ingress rule to the security group attached
to your cluster that allows inbound traffic. You add Amazon EC2 security group rules in the Amazon
EC2 console. For example, a CIDR/IP of 192.0.2.0/24 allows clients in that IP address range to
connect to your cluster. Find out the correct CIDR/IP for your environment.

Step 3: Grant access to one of the query editors and run queries
To query databases hosted by your Amazon Redshift cluster, you have two options:

1. Connect to your cluster and run queries on the AWS Management Console with one of the query
editors.

If you use one of the query editors, you don't have to download and set up an SQL client application.
2. Connect to your cluster through an SQL client tool, such as SQL Workbench/J. For more information
about using SQL Workbench/J, see Connect to your cluster by using SQL Workbench/J in the Amazon
Redshift Cluster Management Guide.

Using one of the Amazon Redshift query editors is the easiest way to run queries on databases hosted
by your Amazon Redshift cluster. After creating your cluster, you can immediately run queries using
the Amazon Redshift console. For details about considerations when using the Amazon Redshift query
editor, see Querying a database using the query editor in the Amazon Redshift Cluster Management
Guide.

Granting access to the query editor v2

The first time an administrator configures query editor v2 for your AWS account, they choose the AWS
KMS key that is used to encrypt query editor v2 resources. By default, an AWS owned key is used to
encrypt resources. Or an administrator can use a customer managed key by choosing the Amazon
Resource Name (ARN) for the key in the configuration page. After you configure an account, AWS KMS
encryption settings can't be changed. For more information, see Configuring your AWS account.

To access the query editor v2, you need permission. An administrator can attach one of the following
AWS-managed policies to the IAM user or role to grant permissions. These AWS-managed policies are
written with diﬀerent options that control how tagging resources allows sharing of queries. You can use
the IAM console (https://console.aws.amazon.com/iam/) to attach IAM policies.

You can also create your own policy based on the permissions allowed and denied in the provided
managed policies. If you use the IAM console policy editor to create your own policy, choose SQL

13
Amazon Redshift Getting Started Guide
Bringing your own data to Amazon Redshift

Workbench as the service for which you create the policy in the visual editor. The query editor v2 uses
the service name AWS SQL Workbench in the visual editor and IAM Policy Simulator.

For more information, see Accessing the query editor v2.

Using the query editor v2

To use the query editor v2 to query a database, see Working with query editor v2.

Granting access to the query editor

To use the Amazon Redshift query editor, you need permission. To set access, attach the
AmazonRedshiftQueryEditor and AmazonRedshiftReadOnlyAccess IAM policies to the IAM user
that you use to access your cluster.

If you have already created an IAM user to access Amazon Redshift, you can attach the
AmazonRedshiftQueryEditor and AmazonRedshiftReadOnlyAccess policies to that user. If you
haven't created an IAM user yet, create one and attach the policies to the IAM user.

To attach the required IAM policies for the query editor

1. Sign in to the AWS Management Console and open the IAM console at https://
console.aws.amazon.com/iam/.
2. Choose Users.
3. Choose the user that needs access to the query editor.
4. Choose Add permissions.
5. Choose Attach existing policies directly.
6. For Policy names, choose AmazonRedshiftQueryEditor and AmazonRedshiftReadOnlyAccess.
7. Choose Next: Review.
8. Choose Add permissions.

Using the query editor

To use the query editor to query a database, see Querying a database using the query editor.

Step 4: Load data from Amazon S3 to Amazon Redshift

Using one of the Amazon Redshift query editors is the easiest way to load data to tables. After creating
your cluster, you can load data from Amazon S3 to your cluster using the Amazon Redshift console.

Use the query editor v2 that simpliﬁes the loading of data using the Load data wizard. For more
information, see Loading your own data from Amazon S3 to Amazon Redshift using the query editor
v2 (p. 14). Or use the query editor to create tables and load your data. For more information, see
Loading sample data from Amazon S3 using the query editor (p. 15).

Loading your own data from Amazon S3 to Amazon Redshift using the query editor v2
To load your own data from Amazon S3 to Amazon Redshift, Amazon Redshift requires an IAM role that
has the required privileges to load data from the speciﬁed Amazon S3 bucket.

First, connect to a database. Next, create some tables in the database. Then load your own data from
Amazon S3 to Amazon Redshift. For more information on how to work with the query editor v2, see
Working with query editor v2 in the Amazon Redshift Cluster Management Guide.

The COPY command generated and used in the query editor v2 Load data wizard supports all the
parameters available to the COPY command syntax to load data from Amazon S3. For information about
the COPY command and its options used to copy load from Amazon S3, see COPY from Amazon Simple
Storage Service in the Amazon Redshift Database Developer Guide.

14
Amazon Redshift Getting Started Guide
Bringing your own data to Amazon Redshift

Loading sample data from Amazon S3 using the query editor

At this point, you have a database called dev and you are connected to it. Next, you create some tables
in the database, upload data to the tables, and try a query. For your convenience, the sample data that
you load is available in an Amazon S3 bucket.
Note
If you're using a SQL client tool, ensure that your SQL client is connected to the cluster.

After you complete this step, you can do the following:

• Try example queries at Step 5: Try example queries using the query editor (p. 18).
• Reset your environment at Step 6: Reset your environment (p. 18).
• Find more information about Amazon Redshift at Additional resources (p. 30).

Note
To try querying data in the query editor without loading your own data, choose Load sample
data in Sample data. If you do, Amazon Redshift loads its sample dataset to your Amazon
Redshift cluster automatically during cluster creation.

To load sample data from Amazon S3

1. Create tables.

If you are using the Amazon Redshift query editor, individually copy and run the following create
table statements to create tables in the dev database. For more information about the syntax, see
CREATE TABLE in the Amazon Redshift Database Developer Guide.

create table users(

userid integer not null distkey sortkey,
username char(8),
firstname varchar(30),
lastname varchar(30),
city varchar(30),
state char(2),
email varchar(100),
phone char(14),
likesports boolean,
liketheatre boolean,
likeconcerts boolean,
likejazz boolean,
likeclassical boolean,
likeopera boolean,
likerock boolean,
likevegas boolean,
likebroadway boolean,
likemusicals boolean);

create table venue(

venueid smallint not null distkey sortkey,
venuename varchar(100),
venuecity varchar(30),
venuestate char(2),
venueseats integer);

create table category(

catid smallint not null distkey sortkey,
catgroup varchar(10),

15
Amazon Redshift Getting Started Guide
Bringing your own data to Amazon Redshift

catname varchar(10),
catdesc varchar(50));

create table date(

dateid smallint not null distkey sortkey,
caldate date not null,
day character(3) not null,
week smallint not null,
month character(5) not null,
qtr character(5) not null,
year smallint not null,
holiday boolean default('N'));

create table event(

eventid integer not null distkey,
venueid smallint not null,
catid smallint not null,
dateid smallint not null sortkey,
eventname varchar(200),
starttime timestamp);

create table listing(

listid integer not null distkey,
sellerid integer not null,
eventid integer not null,
dateid smallint not null sortkey,
numtickets smallint not null,
priceperticket decimal(8,2),
totalprice decimal(8,2),
listtime timestamp);

create table sales(

salesid integer not null,
listid integer not null distkey,
sellerid integer not null,
buyerid integer not null,
eventid integer not null,
dateid smallint not null sortkey,
qtysold smallint not null,
pricepaid decimal(8,2),
commission decimal(8,2),
saletime timestamp);

2. Load sample data from Amazon S3 by using the COPY command.

Note
We recommend using the COPY command to load large datasets into Amazon Redshift
from Amazon S3 or Amazon DynamoDB. For more information about COPY syntax, see
COPY in the Amazon Redshift Database Developer Guide.

a. Download the file tickitdb.zip, which contains individual sample data files.
b. Unzip and load the individual files to a tickit folder in your Amazon S3 bucket in your AWS
Region.
c. Edit the COPY commands in this tutorial to point to the files in your Amazon S3 bucket. For
information about how to manage files with Amazon S3, see Creating and configuring an S3
Bucket in the Amazon Simple Storage Service User Guide.

16
Amazon Redshift Getting Started Guide
Bringing your own data to Amazon Redshift

d. Provide authentication for your cluster to access Amazon S3 on your behalf to load the sample
data. You provide authentication by referencing the IAM role that you created and set as the
default for your cluster in previous steps.

The COPY commands include a placeholder for the Amazon Resource Name (ARN) for the IAM
role, your bucket name, and an AWS Region, as shown in the following example.