Databases
Storing data in systematic way
Traditional System
Earlier, many of the works were done on a traditional paper-based approach.
For example:
● Sending Messages (postal cards)
● Tickets Booking
● Government Application Forms
knowledge portal
Newer Approaches
Instead of a paper-based approach, the data is now stored in computer systems.
A database is an organized collection of data, generally stored and accessed electronically from a
computer system.
Name Mobile Number
Mr A 9XXXXX
Mr B 022XXXX
knowledge portal
Database Types
There are various types of databases; let’s explore few of them.
1. Flat File
A flat-file database is a database that stores data in a plain text file.
Example: Excel
knowledge portal
Relational Databases
2. Relational Database
Data is organized into tables of columns and rows representing a specific entity type.
Generally uses SQL (Structured Query Language) to manage databases.
Example: MySQL
knowledge portal
NoSQL Databases
3. NoSQL Databases
NoSQL databases stores and manages data that allows very high speeds and great operational
flexibility.
Example: MongoDB, DynamoDB
knowledge portal
Installing & Managing Databases
Depending on the organizations, they can either decide to manage a database on their own
servers or use a managed database offerings like RDS.
1. Manage Database Own Servers:
● Provisioning Database.
● Host Security (Patching, Hardening and others)
● Configure Replicas, High-Availability, Upgrading, Monitoring and others
Install Database
knowledge portal
Installing & Managing Databases
2. Managed Database Offering
● Provisioning Database via simple GUI steps.
● Host Security (Patching, Hardening and others) taken care by the provider.
● Configure Replicas, High-Availability within a single click.
knowledge portal
Introduction to RDS
Managed Database Service
Challenges with EC2 DB
Maintaining database in EC2 instance leads to multiple challenges, this includes:
1. Provisioning Database.
2. Host Security (Patching, Hardening and others)
3. Configure Replicas, High-Availability, Upgrading and others
Most organizations simply offload these tasks to 3rd party vendors or hire database administrators.
knowledge portal
Intro to RDS
Amazon RDS is a managed service that makes it easy to set up, operate, and scale a relational
database in the cloud.
AWS manages the underlying hardware, OS, security and software patching, automated failure
detection and recovery for you.
In click of few buttons we can :
● Provision / Resize hardware on-demand.
● Multi-AZ deployments.
● Create read replicas.
knowledge portal
RDS Read Replicas
12
Use Case : Bank
In bank, for different kind of work purpose, there are different people you might
have to approach. For example :
- Cash Collector
- Cheque Counter
- Enquiry Counter
Database Way
Using a single database for all kind of activity will increase the database load
and slow down the operations.
connect ()
Database
Improved Architecture - Read Replica
Read Replica allows customers to offload read requests or analytics traffic from
the primary instance
All WRITE Operations
Primary DB
Replication
All READ Operations
Read Replica
RDS Read Replica
RDS Read Replica feature allows customers to implement “Database Read
Replica” functionality for RDS databases.
Replication
Primary Database Read Replica
Pointers to Note - 1
You can create one or more replicas of a given source DB Instance and serve
high-volume application read traffic.
Read Replica 1
Replication
Primary Database
Read Replica 2
Pointers to Note - 2
With Amazon RDS, you can create a read replica in a different AWS Region
from the source DB instance.
Replication
ap-south-1
us-east-1
Amazon RDS Multi AZ Deployments
19
Understanding the Challenge
If database is running in a specific availability zone and if the AZ is down or
unreachable then your entire application can be impacted.
connect ()
Clients
Availability Zone 1
Multi-AZ Architecture
In this approach, Amazon creates a standby DB instance and synchronously
replicates data from the primary DB instance in a different availability zone.
Primary
Availability Zone 1 Replication
Clients
Standby
Availability Zone 2
Failover Condition
If a planned or unplanned outage of your DB instance results from an
infrastructure defect, Amazon RDS automatically switches to a standby replica
in another Availability Zone if you have turned on Multi-AZ.
● Loss of availability in primary Availability Zone
● Loss of network connectivity to primary
● Compute unit failure on primary
● Storage failure on primary
Failover times are typically 60–120 seconds.
Amazon Aurora
Closed Source Database
Overview of Database Offerings
Databases are generally divided into two types:
● Open Source Databases
● Commercial Databases
Commercial Offering does come with various aspects that are not found in the open
source databases.
Open Source Databases Commercial Databases
knowledge portal
Introducing Aurora
Amazon Aurora is a MySQL and PostgreSQL-compatible relational database built for
the cloud, that combines the performance and availability of traditional enterprise
databases with the simplicity and cost-effectiveness of open source databases.
Amazon Aurora is up to five times faster than standard MySQL databases and three
times faster than standard PostgreSQL databases.
It provides the security, availability, and reliability of commercial databases at 1/10th the
cost
Performance / Simplicity and Cost
Availability of Effectiveness of Open
Enterprise Databases Source Databases
Amazon Aurora
RDS - Multi-AZ & Read Replica Architecture
In a typical setup, primary, standby and read replicas are three different instances in
multiple availability zones. The underlying storage is EBS volume.
Synchronous
Replication Asynchronous
Replication
Standby Instance Primary Instance Read Replica
AZ 2 AZ 1 AZ 3
knowledge portal
RDS with EBS Storage
RDS with MySQL
EBS Volume
knowledge portal
Aurora Architecture
Two Primary Components: DB Instances + Storage Cluster Volume
Since Aurora and Storage Layer are independent, we can scale the storage easily.
Availability Zone
Availability Zone
Availability Zone
Write Write
knowledge portal
Overview of Storage Volume
Availability Zone 1 Availability Zone 2 Availability Zone 3
Cluster Storage Volume
knowledge portal
Scalability Aspect in Aurora
With this architecture, you can add a DB instance quickly because Aurora doesn't make
a new copy of the table data. Instead, the DB instance connects to the shared volume
that already contains all your data.
Availability Zone
Availability Zone
Availability Zone
Write Write
knowledge portal
Scale at a Faster Pace
Availability Zone 1 Availability Zone 2 Availability Zone 3
Cluster Storage Volume
knowledge portal
Aurora Architecture
Two Primary Components: DB Instances + Storage Cluster Volume
Since Aurora and Storage Layer are independent, we can scale the storage easily.
Write Write Write
Read Read Read
Aurora Read Replica Primary Instance Aurora Read Replica
knowledge portal
Aurora Architecture
knowledge portal
Aurora Endpoints
You can connect to Aurora Cluster through endpoints.
Endpoints is Aurora Specific URL consisting of host and port.
There are three primary types of endpoints available:
● Cluster Level Endpoints
● Reader Level Endpoints
● Instance Level Endpoints
knowledge portal
Application
Cluster Endpoint Reader Endpoint
Aurora Endpoints
Endpoint Types Description
Cluster Level Endpoints Connects to current primary DB instance in the cluster.
Used for performing write operations.
Reader Level Endpoints Built-In endpoints for Read Replicas.
For Multiple Read Replicas, this endpoint will balance load
among all read replicas.
Instance Endpoints Allows connection directly to the instance.
Custom Endpoints Ability to create custom endpoints for our own requirements.
knowledge portal
Aurora Features
Aurora provides wide variety of interesting features. Some of these includes:
Global Databases Serverless
Cross Region Auto-Scaling
Replication
BackTrack IAM DB Authentication
Sharing DB Clusters RDS Proxy
knowledge portal
Aurora Architecture
Two Primary Components: DB Instances + Cluster Volume
Since Aurora and Storage Layer are independent, we can scale the storage easily.
Primary Instance Aurora Read Replica Aurora Read Replica
knowledge portal
Amazon Neptune
Monitor Everything
Different Types of Database Technologies
There are multiple different types of database technologies available and each has its own set of
benefits.
Some of the popular ones include:
1. Relational Database [MySQL]
2. NoSQL Databases [key/value] [DynamoDB]
3. Graph Databases [Neptune]
knowledge portal
Revising Relational Databases
Data is organized into tables of columns and rows representing a specific entity type.
Generally uses SQL (Structured Query Language) to manage databases.
Example: MySQL
knowledge portal
Graph Database
A graph database stores nodes and relationships instead of tables.
Whenever connections or relationships between entities are at the core of the data that you're
trying to model, a graph database is your natural choice.
You can easily find out who the "friends of friends" of a particular person are—for example, the
friends of Howard's friends.
Amazon Neptune
Amazon Neptune is a fast, reliable, fully managed graph database service that makes it easy to
build and run applications that work with highly connected datasets.
knowledge portal
Graph Visualization
In many cases the Neptune workbench can create a visual diagram of your query results.
knowledge portal
NoSQL Databases
Different Type of Database
Importance of Schema Free Structure
In a traditional schema database like MySQL, before you start to add data in it, you must first
define the structure of those records.
Example: User Data Information
userID Name Age Interests AWS A/C Number
knowledge portal
Schema Free Database
On Schema Free databases like MongoDB (NoSQL), you can simple add records without any
previous structure.
We can easily group records that do not have a same structure
knowledge portal
Basics of NoSQL Database
NoSQL databases ("not only SQL") are non-tabular databases and store data differently than
relational tables.
NoSQL have gained huge popularity because they are simpler to use, flexible and can achieve
performance that are very difficult with traditional relational databases.
userID Name Age Interests
01 Harsh 26 Astronomy
02 zeal 26 Teaching
knowledge portal
Advantages of NoSQL Database
There are lot of advantages of NoSQL database over standard relational databases :
● Schema Free
● Horizontal Scaling
● Easy Replication
● Can manage huge amount of data.
knowledge portal
Basics of DynamoDB
Storing data NoSQL Way
Introduction
DynamoDB is a fully managed NoSQL database service by AWS.
Being managed service, it simplifies lots of operations like hardware provisioning, setup and
configurations, patching, replication, clustering etc for the users.
.
DynamoDB
knowledge portal
ElastiCache
Let’s Cache
Simple Analogy
● There is a new vegetable shop in the locality which has become very popular.
● Every day 300-500 people visit and buy vegetables.
● Each visitor asks the price of at-least two-three veggies before making a purchase.
Imaging the condition of the employee inside that shop after a few days.
knowledge portal
Simple Analogy - Smart Approach
Vegetable Shop Owner decided to create a dashboard that has a list of all the common vegetable
prices which are requested by the buyers.
Price List
Veggie A - $1
Veggie B - $2
Veggic C - $C
knowledge portal
Simple Analogy - Learning
1. Since the price list of common items was listed, users no longer need to ask the employees
about it. This reduces the overall load on the employee.
2. Visitors can quickly get to go through the price list - Better Efficiency.
knowledge portal
Challenges with Database Workloads
There can be certain common queries within the database that hundreds of users might
request.
This would increase the load on the database and can lead to performance degradation.
SELECT * from user_data;
knowledge portal
Caching Solutions
With caching solutions, you can cache the response associated with frequent queries.
This allows better response time and decreases the load on the database servers.
knowledge portal
Popular Caching Solutions
Two of the most popular caching solutions used for databases are:
1. Memcached
2. Redis
To use them, you will have to install, configure, optimize and secure the EC2 instances where
these engines would be running.
knowledge portal
Introducing AWS ElastiCache
ElastiCache is a fully managed AWS service that makes it easier to deploy, operate and scale an
in-memory data-store or cache in the cloud.
It is like a managed service and within a few clicks, we can have an in-memory layer in our infra.
ElastiCache can also detect and replace failed nodes thus reducing the overhead.
knowledge portal
Local Zones
Let’s Extend Regions
Understanding the Challenge
Each AWS Region has multiple, isolated locations known as Availability Zones.
It takes time to build AWS region.
Users can see certain amount of latency depending on how far they are from the region.
Introducing Local Zones
A Local Zone is an extension of an AWS Region in geographic proximity to your users.
You should use AWS Local Zones to deploy workloads closer to your end-users for low-latency
requirements.
knowledge portal
Important Pointers
Only Core AWS services are supported. Not ALL AWS services are available in Local Zone.
Limited Instance Type Support.
To get started, you first need to enable the AWS Local Zones for your AWS account before you
can deploy resources to them.
knowledge portal