0% found this document useful (0 votes)

62 views378 pages

GCP Storage Compute

The document provides an overview of cloud computing and distributed frameworks like Hadoop and MapReduce, highlighting their importance in handling large data sets. It discusses the differences between roles such as Data Engineer and Cloud Architect, their respective tests, and the major topics covered in each. Additionally, it outlines Google Cloud Platform's resources, compute options, and the advantages of using cloud services over traditional data centers.

Uploaded by

flaviano teodoro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views378 pages

GCP Storage Compute

Uploaded by

flaviano teodoro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 378

Module - Overview

As data sets grow larger, in-memory processing

on a single machine does not work well

Over view
Distributed frameworks such as Hadoop/
MapReduce help

Configuring a cluster of distributed machines is

complicated and expensive

Cloud services such as GCP, AWS and Azure help

The Test
Two Tests

Data Engineer Cloud Architect

Both - 2 hours, 50 questions, multiple-choice

Tests and Life
Data Engineer Not easy to “game”

Big Data, ML, Hadoop..tough test

Cloud Architect Definitely easier, more theoretical

Compute, Networking, Security,…

- Big Data
Data Engineer - BigQuery, DataFlow, Pub/Sub
Major Topics
- Storage Technologies
(Also minor topics Cloud storage, Cloud SQL, BigTable, Datastore
for Cloud Architect)
- Machine Learning
Concepts, TensorFlow, Cloud ML
- Compute choices
AppEngine, Compute Engine, Containers
Data Engineer -
Minor Topics - Logging and monitoring
Stackdriver
(Also major topics
for Cloud Architect)
- Security, networking
API keys, load balancing…
- Hadoop, Spark, MapReduce…

Required Context
- Hive, HBase, Pig

- RDBMS, indexing, hashing

- Syntax is tested too
Drills and Labs

- Implementation knowledge essential

Don’t try to “prep for the test” (famous last words)
Why Cloud Computing
Why Cloud Computing

Data is getting bigger World is getting smaller

Nothing fits in-memory on a single machine

Big Data, Small World

Super-computer Cluster of generic

computers
Big Data, Small World

Monolithic Distributed
Lots of cheap hardware

Replication & fault tolerance

Distributed Distributed computing

Lots of cheap hardware
HDFS
Replication & fault tolerance
YARN
Distributed Distributed computing
MapReduce
“Clusters”

“Nodes”

Distributed “Server Farms”

“Server Farms”

All of these servers need to be

co-ordinated by a single piece
of software
Single Co-ordinating Soft ware

• Partition data
• Co-ordinate computing tasks
• Handle fault tolerance and recovery
• Allocate capacity to processes
Single Co-ordinating Soft ware

Google developed proprietary

software to run on these
distributed systems
Single Co-ordinating Soft ware

First: store millions of records on multiple machines

Single Co-ordinating Soft ware

Second: run processes on all these machines to crunch data

Single Co-ordinating Soft ware

To solve distributed
Google File System
storage

To solve distributed
MapReduce
computing
Single Co-ordinating Soft ware

Google File System

Apache developed open source
versions of these technologies
MapReduce
Single Co-ordinating Soft ware

Google File System HDFS

MapReduce MapReduce
Hadoop

HDFS MapReduce

A file system to manage A framework to process

the storage of data data across multiple servers
Hadoop

HDFS MapReduce

In 2013, Apache released Hadoop 2.0

Hadoop

HDFS MapReduce

MapReduce was broken into two separate

parts
Hadoop

HDFS MapReduce YARN

A framework to define A framework to run

a data processing task the data processing task
Hadoop

HDFS MapReduce YARN

Each of these components have corresponding

configuration files
Co-ordination Bet ween Hadoop Blocks

MapReduce
User defines map and
reduce tasks using the
YARN MapReduce API

HDFS
Co-ordination Bet ween Hadoop Blocks

MapReduce

YARN
A job is triggered on
the cluster
HDFS
Co-ordination Bet ween Hadoop Blocks

MapReduce

YARN
YARN figures out where
and how to run the job, and

HDFS stores the result in HDFS

Hadoop Ecosystem

Hadoop

An ecosystem of tools have sprung up around this

core piece of software
Hadoop Ecosystem

Hive HBase Pig

Hadoop

Kafka Spark Oozie

Hadoop Ecosystem

Hive HBase Pig

Kafka Spark Oozie

Provides an SQL interface to Hadoop
Hive
The bridge to Hadoop for folks who don’t have
exposure to OOP in Java
A database management system on top of
Hadoop
HBase
Integrates with your application just like a
traditional database
A data manipulation language

Transforms unstructured data into a

Pig structured format

Query this structured data using interfaces

like Hive
A distributed computing engine used along
with Hadoop

Spark Interactive shell to quickly process datasets

Has a bunch of built in libraries for machine

learning, stream processing, graph processing
etc.
A tool to schedule workflows on all the
Oozie
Hadoop ecosystem technologies
Kafka Stream processing for unbounded datasets
Hadoop Ecosystem

Hive HBase Pig

Hadoop

Kafka Spark Oozie

Hadoop Ecosystem

Hive HBase Pig

Setting up and running distributed software is

Hadoop
an expensive, complicated exercise

Kafka Spark Oozie

Who owns the machines?

Distributed
Computing Who deploys the software?
Infrastructure

How does scaling happen?

Distributed Computing Infrastructure

On-premise Colocation Services Cloud Services

Who owns the machines?

You do.

Who deploys the software?

You do.
On-premise
How does scaling happen?

You buy machines and scale them in.

Who owns the machines?

You (usually and mostly) do

Who deploys the software?

You do.
Colocation
Services How does scaling happen?

You buy or lease machines.

Utilisation planning is super-important

On-premise

Colocation
Services Else, end up with a white elephant data centre
Who owns the machines?

Google, Amazon or Microsoft does

Who deploys the software?

GCP, AWS or Azure does it for you

Cloud
Services How does scaling happen?

You dynamically add machines

OpEx rather than CapEx

Cloud Services: Conserve cash

Financial
Implications
Watch for nickel-and-diming
Utilisation planning very simple indeed

Cloud Services:
Operational Little chance of a white elephant data center
Implications
GCP Overview
Google Cloud Platform

“Use Resources” “Pay for resources”

Hardware (VMs, disks) and
Billed for usage per-project
Software (BigQuery, BigTable…)
Google Cloud Platform

Resources Billing
Google Cloud Platform

Resources Billing
Resources
Location Hierarchy

“Regions” “Zones”
e.g. Central US, Western Europe, Basically, data centres within
and East Asia regions
Within region, locations usually have
“Regions” network latencies of less than 5ms
e.g. Central US, Western
Europe, and East Asia
“Single failure domain” within a region

“Zones” Identified as region-name + letter

Basically, data centres within
regions
asia-east1-a
Zones
Hardware computers, hard disks

Resources Software Virtual machines (VMs)

Global aka multi-regional
Zones Cloud Storage, DataStore, BigQuery

Regional
AppEngine instances
Resources
Zonal
VM (Compute Engine) instances, disks
Resources

Hardware Software
computers, hard disks Virtual machines (VMs),
Software services
Resources

Hardware Software
Compute choices, Big data,
Storage technologies… machine learning …
Recall

- Big Data

Major Topics BigQuery, DataFlow, Pub/Sub

- Storage Technologies
Cloud storage, Cloud SQL, BigTable, Datastore

- Machine Learning
Concepts, TensorFlow, Cloud ML
Recall

- Compute choices
AppEngine, Compute Engine, Containers
Minor Topics - Logging and monitoring
Stackdriver

- Security, networking
API keys, load balancing…
Resources

Hardware Software
Compute choices, Big data,
Storage technologies… machine learning …
Consumption Mechanisms

Command-line
GCP Console Client Libraries
Interface

gcloud utility (needs

web front-end Python, Java, Go…
SDK), or gcloud shell
Google Cloud Platform

Resources Billing
Google Cloud Platform

Resources Billing
Billing

All resources consumed associated with a project

Projects are associated with accounts

Billing happens per-project

Projects
Resources + Settings + Metadata

Resources within a project can easily interact

Project ~ Namespace
Name, ID, number

Project ID is unique, forever

Project ~ Namespace
Getting Feet Wet
Cloud services have important advantages over
on-premise or colocated data centers

Summary
GCP, Google’s Cloud Platform, offers a suite of
storage and compute solutions

TensorFlow and Cloud ML give GCP an edge in

machine learning applications

The Google Cloud Data Engineer test is a fairly

rigorous one
Module - Compute
GCP offers three Compute Options for running
cloud apps

Over view
Google AppEngine is the PaaS option -
serverless and ops-free

Google ComputeEngine is the IaaS option - fully

controllable down to OS

Google Container Engine lies in between - clusters of

machines running Kubernetes and hosting containers
Hosting Web Content:
A Case Study of Compute Options
Cloud Use-Cases

Running a Hadoop Serving a

Hosting a website
cluster TensorFlow model

Relatively basic “Big data” “Machine

Learning”
Cloud Use-Cases

Running a Hadoop Serving a

Hosting a website
cluster TensorFlow model

Relatively basic “Big data” “Machine

Learning”
Hosting a Website
Static, No SSL Load balancing, scaling Heroku, Engine Yard
Plain HTML files Currently on VMs or servers Lots of code, languages

Just buy storage Get VMs, manage yourself Just focus on code, forget the rest

SSL, CDN, Bells & whistles Lots of dependencies

Still quite static, but rich content Deployment is becoming painful

Need HTTPS serving, release management etc Create containers, manage clusters
Hosting a Website
Static, No SSL
Plain HTML files

Just buy storage

Buy disk space

“Google Cloud Storage”
Automatic scaling
No HTTPS, no CDN, no deployment help, nothing
To host a static site, create a Cloud Storage Bucket and
Hosting on upload content
Cloud Storage
Can use “storage.googleapis.com” URL, or own domain
Could write own HTML CSS

Hosting on Or use static generators

Cloud Storage - Jekyll
- Ghost
- Hugo
Can copy over content to bucket directly
- Web Console
- Cloud Shell

Or
Hosting on - store on GitHub
Cloud Storage
- then use WebHook to run update script

Or
- use CI/CD tool like Jenkins
- Cloud Storage plug-in for post-build step
Hosting a Website
Static, No SSL
Plain HTML files

Just buy storage

Buy disk space

“Google Cloud Storage”
Automatic scaling
No HTTPS, no CDN, no deployment help, nothing
Hosting a Website
Load balancing, scaling Heroku, Engine Yard
“Google Cloud Currently on VMs or servers Lots of code, languages

Storage” Get VMs, manage yourself Just focus on code, forget the rest

SSL, CDN, Bells & whistles Lots of dependencies

Still quite static, but rich content Deployment is becoming painful

Need HTTPS serving, release management etc Create containers, manage clusters
Hosting a Website
Load balancing, scaling Heroku, Engine Yard
“Google Cloud Currently on VMs or servers Lots of code, languages

Storage” Get VMs, manage yourself Just focus on code, forget the rest

SSL, CDN, Bells & whistles Lots of dependencies

Still quite static, but rich content Deployment is becoming painful

Need HTTPS serving, release management etc Create containers, manage clusters
Hosting a Website
SSL so that HTTPS serving is possible

CDN edges world-over

Atomic deployment, one-click rollback

SSL, CDN, Bells & whistles “Firebase Hosting +

Google Cloud Storage”
Still quite static, but rich content

Need HTTPS serving, release management etc

Hosting a Website
Load balancing, scaling Heroku, Engine Yard
“Google Cloud Currently on VMs or servers Lots of code, languages

Storage” Get VMs, manage yourself Just focus on code, forget the rest

Lots of dependencies
“Firebase Hosting + Deployment is becoming painful

Google Cloud Storage” Create containers, manage clusters

Hosting a Website
Load balancing, scaling Heroku, Engine Yard
“Google Cloud Currently on VMs or servers Lots of code, languages

Storage” Get VMs, manage yourself Just focus on code, forget the rest

Lots of dependencies
“Firebase Hosting + Deployment is becoming painful

Google Cloud Storage” Create containers, manage clusters

Hosting a Website
Load balancing, scaling
Currently on VMs or servers

Get VMs, manage yourself

You’d like to control load balancing, scaling etc yourself

IAAS (“Infra-as-a-service”) “Google Compute Engine”

Configuration, administration, management - all on you

No need to buy machines or install OS, dev stack, languages etc

Google Cloud Launcher

LAMP stack or WordPress in a few minutes

Hosting with
Compute Engine
Cost estimates before deployment
You choose machine types, disk sizes before
deployment

Can customise configuration, rename

Hosting with instances etc
Compute Engine
After deployment, have full control of machine
instances
Loads of storage options

- Cloud Storage Buckets

- Standard persistent disks
Hosting with - SSD (solid state disks)
Compute Engine - Local SSD

After deployment, have full control of machine

instances
Loads of storage technologies if you prefer

- Cloud SQL (MySQL, PostgreSQL

- NoSQL
Hosting with - GCP NoSQL tools - BigTable, Datastore
Compute Engine
Load Balancing options
- Network load balancing: forwarding rules based on
address, port, protocol

- HTTP load balancing: look into content, examine cookies,

Hosting with
Compute Engine certain clients to one server…

- Internal: On private network not on internet

- Auto-scaled
DevOps - laundry list

Compute Engine Management with Puppet, Chef, Salt, and

Ansible

Hosting with Automated Image Builds with Jenkins, Packer, and Kubernetes

Compute Engine
Distributed Load Testing with Kubernetes

Continuous Delivery with Travis CI

Managing Deployments with Spinnaker

Hosting with StackDriver for logging and monitoring
Compute Engine
Hosting a Website
Load balancing, scaling
Currently on VMs or servers

Get VMs, manage yourself

You’d like to control load balancing, scaling etc yourself

IaaS (“Infra-as-a-service”) “Google Compute Engine”

Configuration, administration, management - all on you

No need to buy machines or install OS, dev stack, languages etc

Hosting a Website
“Google Cloud “Google Compute Heroku, Engine Yard
Lots of code, languages

Storage” Engine”
Just focus on code, forget the rest

Lots of dependencies
“Firebase Hosting + Deployment is becoming painful

Google Cloud Storage” Create containers, manage clusters

Hosting a Website
“Google Cloud “Google Compute Heroku, Engine Yard
Lots of code, languages

Storage” Engine”
Just focus on code, forget the rest

Lots of dependencies
“Firebase Hosting + Deployment is becoming painful

Google Cloud Storage” Create containers, manage clusters

Hosting a Website
You run separate web server, database

Separate containers to isolate from each other “Google Container Engine”

Service-oriented architecture

Microservices

Lots of dependencies
Deployment is becoming painful

Create containers, manage clusters

Container
A container image is a lightweight, stand-alone, executable package of a
piece of software that includes everything needed to run it: code, runtime,
system tools, system libraries, settings

(www.docker.com)
Container

A container image is a lightweight, stand-alone, executable package of a

piece of software that includes everything needed to run it: code, runtime,
system tools, system libraries, settings
(www.docker.com)
Containers and VMs

Containers Virtual Machines

(www.docker.com)
Container Cluster

Kubelet Kubelet Kubelet

Kubernetes
Container Cluster

Pod Pod Pod

Kubelet Kubelet Kubelet

Node Instance #1 Node Instance #2 Node Instance #3

Kubernetes Master
Container Cluster
Pod Pod Pod

Kubelet Kubelet Kubelet

Node Instance #1 Node Instance #2 Node Instance #3

Kubernetes Master
Containers and VMs

Containers Virtual Machines

Virtualise the Operating System Virtualise hardware

More portable Less portable

Quick to boot Slow to boot

Size - tens of MBs Size - tens of GBs

(www.docker.com)
DevOps - need largely mitigated
Hosting with
Container Engine Can use Jenkins for CI/CD
Hosting with StackDriver for logging and monitoring
Container Engine
Hosting a Website
You run separate web server, database

Separate containers to isolate from each other “Google Container Engine”

Service-oriented architecture

Microservices

Lots of dependencies
Deployment is becoming painful

Create containers, manage clusters

Hosting a Website
“Google Cloud “Google Compute Heroku, Engine Yard
Lots of code, languages

Storage” Engine”
Just focus on code, forget the rest

“Firebase Hosting + “Google Container

Google Cloud Storage” Engine”
Hosting a Website
Heroku, Engine Yard
Lots of code, languages

Just focus on code, forget the rest

Just write the code - leave the rest to platform

PaaS (“Platform-as-a-Service”) “Google AppEngine”

Hosting on AppEngine
Hosting a Website
“Google Cloud “Google Compute “Google
Storage” Engine” AppEngine”

“Firebase Hosting + “Google Container

Google Cloud Storage” Engine”
Compute Choices
Container Compute
App Engine Engine Engine

A flexible, zero ops

Logical infrastructure powered by Virtual machines running in
(serverless!) platform for
Kubernetes, the open source Google's global data center
building highly available apps
container orchestration system. network
Container Compute
App Engine Engine Engine

You want to focus on writing You need complete control over

code, and never want to touch You want to increase velocity and your infrastructure and direct
a server, cluster, or improve operability dramatically by access to high-performance
infrastructure. separating the app from the OS. hardware such as GPUs and
local SSDs.
Container Compute
App Engine Engine Engine

You need to make OS-level

You neither know nor care
changes, such as providing your
about the OS running your You don’t have dependencies on a
own network or graphic drivers,
code specific operating system.
to squeeze out the last drop of
performance.
Container Compute
App Engine Engine Engine

Support for Java, Python,

PHP, Go, Ruby (beta) and Direct access to GPUs that you
Run the same application on your
Node.js (beta) ... or bring your can use to accelerate specific
laptop, on premise and in the cloud.
own app runtime. workloads.
Container Compute
App Engine Engine Engine
Any workload requiring a specific
Web sites; Mobile app and Containerized workloads OS or OS configuration
gaming backends
Cloud-native distributed systems. Currently deployed, on-premises
RESTful APIs
software that you want to run in
Hybrid applications. the cloud.
Internet of things (IoT) apps.

Can’t be containerised easily; or

need existing VM images
Mix-and-Match
- Use App Engine for the front end serving layer, while running Redis in Compute Engine.

- Use Container Engine for a rendering microservice that uses Compute Engine VMs running
Windows to do the actual frame rendering.

- Use App Engine for your web front end, Cloud SQL as your database, and Container
Engine for your big data processing.
App Engine
Environments

Standard Flexible
Pre-configured with: Java 7, More choices: Java 8, Python
Python 2.7, Go, PHP 3.x, .NET

- Serverless!
- Instance classes determine price, billing
- Laundry list of services - pay for what you use
Environments

Standard Flexible
Pre-configured with: Java 7, More choices: Java 8, Python
Python 2.7, Go, PHP 3.x, .NET

- Serverless!
- Instance classes determine price, billing
- Laundry list of services - pay for what you use
- Based on container instances running on
Google's infrastructure

- Preconfigured with one of several available

AppEngine runtimes (Java 7, Python 2.7, Go and PHP)
Standard
Environment - Each runtime also includes libraries that support
App Engine Standard APIs

- Maybe all you need

- Applications run in a secure, sandboxed
environment

- App Engine standard environment distributes

requests across multiple servers, and scaling
AppEngine
servers to meet traffic demands
Standard
Environment
- Your application runs within its own secure,
reliable environment that is independent of the
hardware, operating system, or physical location
of the server.
Environments

Standard Flexible
Pre-configured with: Java 7, More choices: Java 8, Python
Python 2.7, Go, PHP 3.x, .NET

- Serverless!
- Instance classes determine price, billing
- Laundry list of services - pay for what you use
Environments

Standard Flexible
Pre-configured with: Java 7, More choices: Java 8, Python
Python 2.7, Go, PHP 3.x, .NET

- Serverless!
- Instance classes determine price, billing
- Laundry list of services - pay for what you use
- Allows you to customize your runtime and even
the operating system of your virtual machine
AppEngine using Dockerfiles
Flexible
Environment - Under the hood, merely instances of Google
Compute Engine VMs
Environments

Standard Flexible
Pre-configured with: Java 7, More choices: Java 8, Python
Python 2.7, Go, PHP 3.x, .NET

- Serverless!
- Instance classes determine price, billing
- Laundry list of services - pay for what you use
Cloud Functions
- Serverless execution environment for building and connecting cloud services

- Write simple, single-purpose functions

- Attached to events emitted from your cloud infrastructure and services

- Cloud Function is triggered when an event being watched is fired

Cloud Functions

- Your code executes in a fully managed environment

- No need to provision any infrastructure or worry about managing any servers

- Cloud Functions are written in Javascript

- Run it in any standard Node.js runtime

Compute Engine
Recall

Hosting a Website
Load balancing, scaling
Currently on VMs or servers

Get VMs, manage yourself

You’d like to control load balancing, scaling etc yourself

IAAS (“Infra-as-a-service”) “Google Compute Engine”

Configuration, administration, management - all on you

No need to buy machines or install OS, dev stack, languages etc

- Public images for Linux and Windows Server that
Google provides

Image - Private images that you create or import to Compute

Types Engine

- Images of other OSes OK too

- Creator has full root privileges, SSH capability
- Can share with other users

Creation - While creating instance specify

- zone
- OS
- machine type
- Each instance belongs to a project

- Projects can have any number of instances

Projects and - Projects can have upto 5 VPC (Virtual Private

Networks)
Instances
- Each instance belongs in one VPC
- instances within VPC communicate on LAN
- instances across VPC communicate on internet
- Standard

- High-memory

Machine - High-CPU

Types - Shared-core (small, non-resource intensive)

- Can attach GPU dies to most machine types

- Much much cheaper than regular Compute Engine
instances

Preemptible - But, might be terminated (preempted) at any time if

Compute Engine needs the resources
Instances
- Use for fault-tolerant applications
- Will definitely be terminated after running for 24
hours

Preemptible - Probability of termination varies by day/zone etc

Instances - Cannot live migrate (stay alive during updates) or

auto-restart on maintenance
- Step 1 in Preemption: Compute Engine sends a Soft
Off signal

- Step 2: Hopefully, you have a shutdown script to

clean up and give up control within 30 seconds
Preemptible
Instances - Step 3: If not, Compute Engine sends a Mechanical
Off signal

- Compute Engine transitions to Terminated state

- Each instance comes with a small root persistent
disk containing the OS

- Add additional storage options

- Persistent disks
Storage - Standard
- SSD
Options - Local SSDs
- Cloud Storage
Storage Options
- Durable network storage devices that instances can access like
physical disks in a desktop or a server

- Compute Engine manages physical disks and data distribution to

ensure redundancy and optimize performance
Persistent
Disks - Encrypted (custom encryption possible)

- Built-in redundancy

- Restricted to the zone where instance is located

- Two types - Standard and SSD

Persistent - Standard Persistent - regular hard disks - cheap - OK for

sequential access
Disks
- SSD Persistent - expensive - fast for random access
- Physically attached to the server that hosts your virtual machine
instance

- Local SSDs have higher throughput and lower latency

Local SSD - The data that you store on a local SSD persists only until you stop
or delete the instance

- Small - each local SSD is 375 GB in size, but you can attach up
to eight local SSD devices for 3 TB of total local SSD storage
space per instance.
- Very high IOPS and low latency

- Unlike persistent disks, you must manage the striping on local

Local SSD SSDs yourself

- Encrypted, custom encryption not possible

- use when latency and throughput are not a priority

Cloud Storage - and

Buckets - when you must share data easily between multiple instances or
zones.
- Flexible, scalable, durable

- ~Infinite size possible

- Performance depends on storage class

Cloud Storage - Multi-regional
Buckets - Regional
- Nearline
- Coldline
Containers
Recall

Hosting a Website
“Google Cloud “Google Compute Heroku, Engine Yard
Lots of code, languages

Storage” Engine”
Just focus on code, forget the rest

Lots of dependencies
“Firebase Hosting + Deployment is becoming painful

Google Cloud Storage” Create containers, manage clusters

Recall

Hosting a Website
You run separate web server, database

Separate containers to isolate from each other “Google Container Engine”

Service-oriented architecture

Microservices

Lots of dependencies
Deployment is becoming painful

Create containers, manage clusters

Recall

Container
A container image is a lightweight, stand-alone, executable package of a
piece of software that includes everything needed to run it: code, runtime,
system tools, system libraries, settings

(www.docker.com)
Container

A container image is a lightweight, stand-alone, executable package of a

piece of software that includes everything needed to run it: code, runtime,
system tools, system libraries, settings
(www.docker.com)
Containers and VMs

Containers Virtual Machines

(www.docker.com)
Containers and VMs

Containers Virtual Machines

Virtualise the Operating System Virtualise hardware

More portable Less portable

Quick to boot Slow to boot

Size - tens of MBs Size - tens of GBs

(www.docker.com)
Hosting a Website
You run separate web server, database

Separate containers to isolate from each other “Google Container Engine”

Service-oriented architecture

Microservices

Lots of dependencies
Deployment is becoming painful

Create containers, manage clusters

Hosting a Website
“Google Cloud “Google Compute Heroku, Engine Yard
Lots of code, languages

Storage” Engine”
Just focus on code, forget the rest

“Firebase Hosting + “Google Container

Google Cloud Storage” Engine”
Componentization - microservices

Advantages Portability

Rapid deployment
Orchestration - Kubernetes clusters

Advantages Image registration - Pull images from container registry

Flexibility - mix-and-match with other cloud providers, on-premise

Storage options as with Compute Engine

Storage However, remember that container disks are ephemeral

options
Need to use gcePersistentDisk abstraction for persistent disk
Network load balancing works out-of-box with Container Engine
Load
For HTTP load balancing, need to integrate with Compute Engine
Balancing load balancing
Container Cluster

Kubelet Kubelet Kubelet

Kubernetes
Container Cluster

Pod Pod Pod

Kubelet Kubelet Kubelet

Node Instance #1 Node Instance #2 Node Instance #3

Kubernetes Master
Container Cluster
Pod Pod Pod

Kubelet Kubelet Kubelet

Node Instance #1 Node Instance #2 Node Instance #3

Kubernetes Master
- Group of Compute Engine instances running Kubernetes.

Container - It consists of
Cluster - one or more node instances, and
- a managed Kubernetes master endpoint.
- Managed from the master

Node - Run the services necessary to support Docker containers

Instances - Each node runs the Docker runtime and hosts a Kubelet agent,
which manages the Docker containers scheduled on the host
- Managed master also runs the Kubernetes API server, which
Master - services REST requests
- schedules pod creation and deletion on worker nodes
Endpoint - synchronizes pod information (such as open ports and location)
- Subset of machines within a cluster that all have the same
configuration.

- Useful for customizing instance profiles in your cluster

Node Pool
- You can also run multiple Kubernetes node versions on each node
pool in your cluster, update each node pool independently, and target
different node pools for specific deployments.
- Tool that executes your container image builds on Google Cloud
Platform's infrastructure

- Working:
Container - import source code from a variety of repositories or cloud
storage spaces
Builder - execute a build to your specifications
- produce artifacts such as Docker containers or Java archives.
- Private registry for Docker images

- Can access Container Registry through secure HTTPS endpoints,

which lets you push, pull, and manage images from any system,
whether it's a Compute Engine instance or your own hardware
Container
Registry - Can use the Docker credential helper command-line tool to
configure Docker to authenticate directly with Container Registry

- Can use third-party cluster management, continuous integration,

or other solutions outside of Google Cloud Platform
Autoscaling
Pod Pod Pod

Kubelet Kubelet Kubelet

Node Instance #1 Node Instance #2 Node Instance #3

Kubernetes Master
Autoscaling
Pod Pod

Kubelet Kubelet

Node Instance #1 Node Instance #2

Kubernetes Master
Autoscaling
Pod Pod Pod

Kubelet Kubelet Kubelet

Node Instance #1 Node Instance #2 Node Instance #N

Kubernetes Master
Autoscaling
Pod Pod Pod

Kubelet Kubelet Kubelet

Node Instance #1 Node Instance #2 Node Instance #3

Kubernetes Master
- Automatic resizing of clusters with Cluster Autoscaler

- Periodically checks whether there are any pods waiting, resizes

cluster if needed
Autoscaling
- Also monitors usage of nodes and deletes nodes if all its pods can
be scheduled elsewhere
AppEngine, Compute Engine and Container
Engine are GCP’s 3 compute options

Summary
Google AppEngine is the PaaS option -
serverless and ops-free

Google ComputeEngine is the IaaS option - fully

controllable down to OS

Google Container Engine lies in between - clusters of

machines running Kubernetes and hosting containers
Module - Storage
Block storage for compute VMs - persistent disks or SSDs

Over view Immutable blobs like video/images - Cloud Storage

OLTP - Cloud SQL or Cloud Spanner

NoSQL Documents like HTML/XML - Datastore

NoSQL Key-values - BigTable (~HBase)

Getting data into Cloud Storage - Transfer service

Storage Options
Use-Cases

When you need Use

Storage for Compute, Block Storage Persistent (hard disks), SSD

Storing media, Blob Storage File system - maybe HDFS

SQL Interface atop file data Hive (SQL-like, but MapReduce on HDFS)

Document database, NoSQL CouchDB, MongoDB (key-value/indexed database)

Fast scanning, NoSQL HBase (columnar database)

Transaction Processing (OLTP) RDBMS

Analytics/Data Warehouse (OLAP) Hive (SQL-like, but MapReduce on HDFS)

Use-Cases

When you need Use

Storage for Compute, Block Storage Persistent (hard disks), SSD

Storing media, Blob Storage Cloud Storage

SQL Interface atop file data BigQuery

Document database, NoSQL DataStore

Fast scanning, NoSQL BigTable

Transaction Processing (OLTP) Cloud SQL, Cloud Spanner

Analytics/Data Warehouse (OLAP) BigQuery

Mobile-Specific Use-Cases

When you need Use

Storage for Compute, Block Storage along with mobile SDKs Cloud Storage for Firebase

Fast random access with mobile SDKs Firebase Realtime DB

Use-Cases

When you need Use

Storage for Compute, Block Storage Persistent (hard disks), SSD

Storing media, Blob Storage Cloud Storage

SQL Interface atop file data BigQuery

Document database, NoSQL DataStore

Fast scanning, NoSQL BigTable

Transaction Processing (OLTP) Cloud SQL, Cloud Spanner

Analytics/Data Warehouse (OLAP) BigQuery

Use-Cases

When you need Use

Storage for Compute, Block Storage Persistent (hard disks), SSD

Storing media, Blob Storage Cloud Storage

SQL Interface atop file data BigQuery

Document database, NoSQL DataStore

Fast scanning, NoSQL BigTable

Transaction Processing (OLTP) Cloud SQL, Cloud Spanner

Analytics/Data Warehouse (OLAP) BigQuery

Data is not structured

Block Lowest level of storage - no abstraction at all

Meant for use from VMs

Storage
Location tied to VM location
Data stored in volumes (called blocks)

Remember the options available on Compute Engine VMs

- Persistent disks
Block - Standard
Storage - SSD
- Local SSDs
- (Also Cloud Storage - more in a bit)
Use-Cases

When you need Use

Storage for Compute, Block Storage Persistent (hard disks), SSD

Storing media, Blob Storage Cloud Storage

SQL Interface atop file data BigQuery

Document database, NoSQL DataStore

Fast scanning, NoSQL BigTable

Transaction Processing (OLTP) Cloud SQL, Cloud Spanner

Analytics/Data Warehouse (OLAP) BigQuery

Use-Cases

When you need Use

Storage for Compute, Block Storage Persistent (hard disks), SSD

Storing media, Blob Storage Cloud Storage

SQL Interface atop file data BigQuery

Document database, NoSQL DataStore

Fast scanning, NoSQL BigTable

Transaction Processing (OLTP) Cloud SQL, Cloud Spanner

Analytics/Data Warehouse (OLAP) BigQuery

- Create buckets to store data

- Buckets are globally unique

Cloud - Name (globally unique)

Storage - Location

- Storage Class
- Multi-regional - frequent access from anywhere in the world

- Regional - frequent access from specific region

Bucket Storage
Classes - Nearline - accessed once a month at max

- Coldline - accessed once a year at max

Use-Cases

When you need Use

Storage for Compute, Block Storage Persistent (hard disks), SSD

Storing media, Blob Storage Cloud Storage

SQL Interface atop file data BigQuery

Document database, NoSQL DataStore

Fast scanning, NoSQL BigTable

Transaction Processing (OLTP) Cloud SQL, Cloud Spanner

Analytics/Data Warehouse (OLAP) BigQuery

Use-Cases

When you need Use

Storage for Compute, Block Storage Persistent (hard disks), SSD

Storing media, Blob Storage Cloud Storage

SQL Interface atop file data BigQuery

Document database, NoSQL DataStore

Fast scanning, NoSQL BigTable

Transaction Processing (OLTP) Cloud SQL, Cloud Spanner

Analytics/Data Warehouse (OLAP) BigQuery

- Latency bit higher than BigTable, DataStore - prefer those for low latency

- No ACID properties - can’t use for transaction processing (OLTP)

BigQuery - Great for analytics/business intelligence/data warehouse (OLAP)

- Recall that OLTP needs strict write consistency, OLAP does not
- Superficially similar in use-case to Hive

BigQuery -

-
SQL-like abstraction for non-relational data

Underlying implementation actually quite different from Hive though

Use-Cases

When you need Use

Storage for Compute, Block Storage Persistent (hard disks), SSD

Storing media, Blob Storage Cloud Storage

SQL Interface atop file data BigQuery

Document database, NoSQL DataStore

Fast scanning, NoSQL BigTable

Transaction Processing (OLTP) Cloud SQL, Cloud Spanner

Analytics/Data Warehouse (OLAP) BigQuery

Use-Cases

When you need Use

Storage for Compute, Block Storage Persistent (hard disks), SSD

Storing media, Blob Storage Cloud Storage

SQL Interface atop file data BigQuery

Document database, NoSQL DataStore

Fast scanning, NoSQL BigTable

Transaction Processing (OLTP) Cloud SQL, Cloud Spanner

Analytics/Data Warehouse (OLAP) BigQuery

- Relational databases - super-structured data, constraints etc

- ACID properties - use for transaction processing (OLTP)

Cloud SQL,
Cloud Spanner - Too slow and too many checks for analytics/BI/warehousing (OLAP)

- Recall that OLTP needs strict write consistency, OLAP does not
- Cloud Spanner is Google proprietary, more advanced than Cloud SQL

Cloud SQL, - Cloud Spanner offers “horizontal scaling” - i.e. bigger data, more instances,
replication etc
Cloud Spanner
- Under the hood, Cloud Spanner has a surprising design - more later
Use-Cases

When you need Use

Storage for Compute, Block Storage Persistent (hard disks), SSD

Storing media, Blob Storage Cloud Storage

SQL Interface atop file data BigQuery

Document database, NoSQL DataStore

Fast scanning, NoSQL BigTable

Transaction Processing (OLTP) Cloud SQL, Cloud Spanner

Analytics/Data Warehouse (OLAP) BigQuery

Use-Cases

When you need Use

Storage for Compute, Block Storage Persistent (hard disks), SSD

Storing media, Blob Storage Cloud Storage

SQL Interface atop file data BigQuery

Document database, NoSQL DataStore

Fast scanning, NoSQL BigTable

Transaction Processing (OLTP) Cloud SQL, Cloud Spanner

Analytics/Data Warehouse (OLAP) BigQuery

- Document data - eg XML or HTML - has a characteristic pattern

- Key-value structure, i.e. structured data

DataStore - Typically not used either for OLTP or OLAP

- Fast lookup on keys is the most common use-case

- Speciality of DataStore is that query execution time depends on size of
returned result (not size of data set)

- So, a returning 10 rows will take the same length of time whether dataset is
DataStore 10 rows, or 10 billion rows

- Ideal for “needle-in-a-haystack” type applications, i.e. lookups of non-

sequential keys
- Indices are always fast to read, slow to write

DataStore - So, don’t use for write-intensive data

Use-Cases

When you need Use

Storage for Compute, Block Storage Persistent (hard disks), SSD

Storing media, Blob Storage Cloud Storage

SQL Interface atop file data BigQuery

Document database, NoSQL DataStore

Fast scanning, NoSQL BigTable

Transaction Processing (OLTP) Cloud SQL, Cloud Spanner

Analytics/Data Warehouse (OLAP) BigQuery

Use-Cases

When you need Use

Storage for Compute, Block Storage Persistent (hard disks), SSD

Storing media, Blob Storage Cloud Storage

SQL Interface atop file data BigQuery

Document database, NoSQL DataStore

Fast scanning, NoSQL BigTable

Transaction Processing (OLTP) Cloud SQL, Cloud Spanner

Analytics/Data Warehouse (OLAP) BigQuery

- Fast scanning of sequential key values - use BigTable

- Columnar database, good for sparse data

BigTable - Sensitive to hot spotting - need to design key structure carefully

- Similar to HBase
Cloud Storage
Use-Cases

When you need Use

Storage for Compute, Block Storage Persistent (hard disks), SSD

Storing media, Blob Storage Cloud Storage

SQL Interface atop file data BigQuery

Document database, NoSQL DataStore

Fast scanning, NoSQL BigTable

Transaction Processing (OLTP) Cloud SQL, Cloud Spanner

Analytics/Data Warehouse (OLAP) BigQuery

Use-Cases

When you need Use

Storage for Compute, Block Storage Persistent (hard disks), SSD

Storing media, Blob Storage Cloud Storage

SQL Interface atop file data BigQuery

Document database, NoSQL DataStore

Fast scanning, NoSQL BigTable

Transaction Processing (OLTP) Cloud SQL, Cloud Spanner

Analytics/Data Warehouse (OLAP) BigQuery

- Create buckets to store data

- Buckets are globally unique

Cloud - Name (globally unique)

Storage - Location

- Storage Class
- Multi-regional - frequent access from anywhere in the world

- Regional - frequent access from specific region

Bucket Storage
Classes - Nearline - accessed once a month at max

- Coldline - accessed once a year at max

Bucket Storage Classes

- Frequently accessed ("hot" objects), such as serving website content, interactive workloads,
or mobile and gaming applications.

- Highest availability of the storage classes

- Geo-redundant - Cloud Storage stores your data redundantly in at least two regions
separated by at least 100 miles within the multi-regional location of the bucket.
Bucket Storage Classes

- Appropriate for storing data that is used by Compute Engine instances.

- Better performance for data-intensive computations, as opposed to storing your data in a multi-regional
location
Bucket Storage Classes

- Slightly lower availability

- 30-day minimum storage duration
- Data you plan to read or modify on average once a month or less
- Data backup, disaster recovery, and archival storage.
Bucket Storage Classes

- Unlike other "cold" storage services, same throughput and latency (i.e. not slower to access)

- 90-day minimum storage duration, costs for data access, and higher per-operation costs

- Infrequently accessed data, such as data stored for legal or regulatory reasons
- XML and JSON APIs

- Command line (gsutil)

Working with
Cloud Storage - GCP Console (web)

- Client SDK
- Cloud Storage considers bucket names that contain dots to be domain names

- Must be syntactically valid DNS names

- E.g bucket…example.com is not valid because it contains three dots in a row

Domain-Named
Buckets - End with a currently-recognized top-level domain, such as .com

- Pass domain ownership verification

- E.g. Team member creating bucket must be domain owner or manager

- Number of ways to demonstrate ownership of a site or domain, including:

- Adding a special Meta tag to the site's homepage.

Domain - Uploading a special HTML file to the site.

Verification - Verifying ownership directly from Search Console.

- Adding a DNS TXT or CNAME record to the domain's DNS configuration.

Mobile-Specific Use-Cases

When you need Use

Storage for Compute, Block Storage along with mobile SDKs Cloud Storage for Firebase

Fast random access with mobile SDKs Firebase Realtime DB

Cloud SQL
Use-Cases

When you need Use

Storage for Compute, Block Storage Persistent (hard disks), SSD

Storing media, Blob Storage Cloud Storage

SQL Interface atop file data BigQuery

Document database, NoSQL DataStore

Fast scanning, NoSQL BigTable

Transaction Processing (OLTP) Cloud SQL, Cloud Spanner

Analytics/Data Warehouse (OLAP) BigQuery

Use-Cases

When you need Use

Storage for Compute, Block Storage Persistent (hard disks), SSD

Storing media, Blob Storage Cloud Storage

SQL Interface atop file data BigQuery

Document database, NoSQL DataStore

Fast scanning, NoSQL BigTable

Transaction Processing (OLTP) Cloud SQL, Cloud Spanner

Analytics/Data Warehouse (OLAP) BigQuery

- Relational databases - super-structured data, constraints etc

- ACID properties - use for transaction processing (OLTP)

Cloud SQL,
Cloud Spanner - Too slow and too many checks for analytics/BI/warehousing (OLAP)

- Recall that OLTP needs strict write consistency, OLAP does not
Relational Data
Student
StudentID Student Name StudentID CourseID Term Grade
Name
1 Jane Doe

2 John Walsh 1 Jane Doe CS101 Fall 2015 A-

3 Raymond Wu

Spring
2 John Walsh CS294 B+
2016
CourseID Course Name

CS101 Introduction to Computer Science Raymond Winter

3 ME101 C+
Wu 2015
EE275 Logic Circuits

CS183 Computer Architecture Summer

1 Jane Doe CS183 A+
2012
- Cloud Spanner is Google proprietary, more advanced than Cloud SQL

Cloud SQL - PostgreSQL - complex queries

- Instances need to be created explicitly

- Not serverless

- Specify region while creating instance

- First vs. second generation instances

Instances - Second generation instances allow proxy support - no need to whitelist IP
addresses or configure SSL

- Higher availability configuration

- Maintenance won’t take down the server

- A Second Generation instance is in an high availability configuration when it
has a failover replica

High Availability - The failover replica must be in a different zone than the original instance, also
called the master
Configuration
- All changes made to the data on the master, including to user tables, are
replicated to the failover replica using semisynchronous replication.
- Provides secure access to your Cloud SQL Second Generation instances
without having to whitelist IP addresses or configure SSL.

- Secure connections: The proxy automatically encrypts traffic to and from the
Cloud Proxy database; SSL certificates are used to verify client and server identities.

- Easier connection management: The proxy handles authentication with Google

Cloud SQL, removing the need to provide static IP addresses.
Cloud Proxy
Cloud Proxy

The Cloud SQL Proxy works by having a local client, called the proxy, running in
the local environment
Cloud Proxy

Your application communicates with the proxy with the standard database protocol
used by your database
Cloud Proxy

The proxy uses a secure tunnel to communicate with its companion process
running on the server.
Cloud Proxy

When you start the proxy, need to tell it

- What Cloud SQL instances it should establish connections to

- Where it will listen for data coming from your application to be sent to Cloud SQL
- Where it will find the credentials it will use to authenticate your application to Cloud SQL
Cloud Proxy

You can install the proxy anywhere in your local environment. The location of the
proxy binaries does not impact where it listens for data from your application.
Cloud Spanner
Use-Cases

When you need Use

Storage for Compute, Block Storage Persistent (hard disks), SSD

Storing media, Blob Storage Cloud Storage

SQL Interface atop file data BigQuery

Document database, NoSQL DataStore

Fast scanning, NoSQL BigTable

Transaction Processing (OLTP) Cloud SQL, Cloud Spanner

Analytics/Data Warehouse (OLAP) BigQuery

Use-Cases

When you need Use

Storage for Compute, Block Storage Persistent (hard disks), SSD

Storing media, Blob Storage Cloud Storage

SQL Interface atop file data BigQuery

Document database, NoSQL DataStore

Fast scanning, NoSQL BigTable

Transaction Processing (OLTP) Cloud SQL, Cloud Spanner

Analytics/Data Warehouse (OLAP) BigQuery

- Relational databases - super-structured data, constraints etc

- ACID properties - use for transaction processing (OLTP)

Cloud SQL,
Cloud Spanner - Too slow and too many checks for analytics/BI/warehousing (OLAP)

- Recall that OLTP needs strict write consistency, OLAP does not
- Cloud Spanner is Google proprietary, more advanced than Cloud SQL

- Need high availability

- strong consistency

Cloud - transactional reads and writes (especially writes!)

Don’t use if
Spanner -

- Data is not relational, or not even structured

- Want an open source RDBMS

- Strong consistency and availability is overkill

- Databases contain tables (as usual)

Data Model - Tables ‘look’ relational - rows, columns, strongly typed schemas

- But…
Relational Data
Student
StudentID Student Name StudentID CourseID Term Grade
Name
1 Jane Doe

2 John Walsh 1 Jane Doe CS101 Fall 2015 A-

3 Raymond Wu

Spring
2 John Walsh CS294 B+
2016
CourseID Course Name

CS101 Introduction to Computer Science Raymond Winter

3 ME101 C+
Wu 2015
EE275 Logic Circuits

CS183 Computer Architecture Summer

1 Jane Doe CS183 A+
2012
Relational Data
Student
StudentID Student Name StudentID CourseID Term Grade
Name
1 Jane Doe

2 John Walsh 1 Jane Doe CS101 Fall 2015 A-

3 Raymond Wu

Spring
2 John Walsh CS294 B+
- Usually query student and course grades together 2016

- Most common query = get transcript Raymond Winter

3 ME101 C+
Wu 2015
- Specify a parent-child relationship for efficient storage
Summer
1 Jane Doe CS183 A+
2012
Interleaved Representation
StudentID Student Name

1 Jane Doe

StudentID Student Name CourseID Term Grade

1 Jane Doe CS101 Fall 2015 A-
1 Jane Doe CS183 Summer 2012 A+

2 John Walsh

StudentID Student Name CourseID Term Grade

2 John Walsh CS294 Spring 2016 B+

3 Raymond Wu

StudentID Student Name CourseID Term Grade

3 Raymond Wu ME101 Winter 2015 C+

- Parent-child relationships between tables

Parent- - These cause physical location for fast access

If you query Students and Grades together, make Grades child of Students
Child -

- Data locality will be enforced between 2 independent tables!

- Every table must have primary keys

Parent- - To declare table is child of another…

Prefix parent’s primary key onto primary key of child

Child -

- (This storage model resembles HBase btw)

- Rows are stored in sorted order of primary key values

- Child rows are inserted between parent rows with that key prefix

Interleaving - “Interleaving”

- Fast sequential access - like HBase

- As in HBase - need to choose Primary key carefully

- Do not use monotonically increasing values, else writes will be on same

locations - hot spotting

Hotspotting - Use hash of key value if you naturally monotonically ordered keys

- Under the hood, Cloud Scanner divides data among servers across key
ranges
Hotspotting
Student
StudentID Student Name StudentID CourseID Term Grade
Name
1 Jane Doe

2 John Walsh 1 Jane Doe CS101 Fall 2015 A-

3 Raymond Wu

Spring
2 John Walsh CS294 B+
- Usually query student and course grades together 2016

- Most common query = get transcript Raymond Winter

3 ME101 C+
Wu 2015
- Specify a parent-child relationship for efficient storage
Summer
1 Jane Doe CS183 A+
2012
Interleaved Representation
StudentID Student Name

1 Jane Doe

StudentID Student Name CourseID Term Grade

1 Jane Doe CS101 Fall 2015 A-
1 Jane Doe CS183 Summer 2012 A+

2 John Walsh

StudentID Student Name CourseID Term Grade

2 John Walsh CS294 Spring 2016 B+

3 Raymond Wu

StudentID Student Name CourseID Term Grade

3 Raymond Wu ME101 Winter 2015 C+

Hotspotting
Student
StudentID Student Name StudentID CourseID Term Grade
Name
X23 Jane Doe

F23 John Walsh 1 Jane Doe CS101 Fall 2015 A-

B99 Raymond Wu

Spring
2 John Walsh CS294 B+
2016
- Change key so that it is not monotonically
increasing Raymond Winter
3 ME101 C+
Wu 2015
- Hash StudentID values

Summer
1 Jane Doe CS183 A+
2012
Interleaved Representation
StudentID Student Name

X23 Jane Doe

StudentID Student Name CourseID Term Grade

X23 Jane Doe CS101 Fall 2015 A-
X23 Jane Doe CS183 Summer 2012 A+

F23 John Walsh

StudentID Student Name CourseID Term Grade

F23 John Walsh CS294 Spring 2016 B+

B99 Raymond Wu

StudentID Student Name CourseID Term Grade

B99 Raymond Wu ME101 Winter 2015 C+

- Parent-child relationships can get complicated - up to 7 layers deep!

- CloudScanner is distributed - uses “splits”

Splits -

-
A split is a range of rows that can be moved around independent of others

Splits are added to distribute high read-write data (to break up hotspots)

- Splits are, of course, influenced by parent-child relationships

Splits
StudentID Student Name

X23 Jane Doe

StudentID Student Name CourseID Term Grade

X23 Jane Doe CS101 Fall 2015 A-
X23 Jane Doe CS183 Summer 2012 A+

F23 John Walsh

StudentID Student Name CourseID Term Grade

F23 John Walsh CS294 Spring 2016 B+

B99 Raymond Wu

StudentID Student Name CourseID Term Grade

B99 Raymond Wu ME101 Winter 2015 C+

- Like in HBase, key-based storage ensures fast sequential scan of keys

- Remember that tables must have primary keys

- Unlike in HBase, can also add secondary indices

Secondary - Might cause same data to be stored twice

Indices - Fine-grained control on use of indices

- Force query to use a specific index (index directives)

- Force column to be copied into a secondary index (STORING clause)

Relational Data
Student
StudentID Student Name StudentID CourseID Term Grade
Name
1 Jane Doe

2 John Walsh 1 Jane Doe CS101 Fall 2015 A-

3 Raymond Wu

Spring
2 John Walsh CS294 B+
2016
CourseID Course Name

CS101 Introduction to Computer Science Raymond Winter

3 ME101 C+
Wu 2015
EE275 Logic Circuits

CS183 Computer Architecture Summer

1 Jane Doe CS183 A+
2012
Primary Index - Interleaved Representation
StudentID Student Name

X23 Jane Doe

StudentID Student Name CourseID Term Grade

X23 Jane Doe CS101 Fall 2015 A-
X23 Jane Doe CS183 Summer 2012 A+

F23 John Walsh

StudentID Student Name CourseID Term Grade

F23 John Walsh CS294 Spring 2016 B+

B99 Raymond Wu

StudentID Student Name CourseID Term Grade

B99 Raymond Wu ME101 Winter 2015 C+

Relational Data
Student
CourseID Course Name StudentID CourseID Term Grade
Name
CS101 Introduction to Computer Science

EE275 Logic Circuits

1 Jane Doe CS101 Fall 2015 A-
CS183 Computer Architecture

Spring
2 John Walsh CS294 B+
- Usually query student and course grades together 2016

- But also query courses and grades Raymond Winter

3 ME101 C+
Wu 2015
- So, create secondary index
Summer
1 Jane Doe CS183 A+
2012
Secondary Index - Interleaved Representation
CourseID CourseName

CS101 Introduction to Computer Science

CourseID Student Name StudentID Term Grade

CS101 Jane Doe X23 Fall 2015 A-
CS101 John Walsh F23 Fall 2002 B+

EE275 Logic Circuits

CourseID Student Name CourseID Term Grade

EE275 John Walsh EE275 Spring 2016 A-

CS183 Computer Architecture

CourseID Student Name CourseID Term Grade

CS183 Raymond Wu CS183 Winter 2015 C+

- Remember that tables are strongly-typed (schemas must have types)

- Non-normalized types such as ARRAY and STRUCT available too

Data Types - STRUCTs are not OK in tables, but can be returned by queries (eg if query
returns ARRAY of ARRAYs)

- ARRAYs are OK in tables, but ARRAYs of ARRAYs are not

- Supports serialisability

- Cloud Spanner transaction support is super-strong, even stronger than

traditional ACID

Transactions - Transactions commit in an order that is reflected in their commit

timestamps

- These commit timestamps are "real time" so you can compare them to
your watch
- Two transaction modes

- Locking read-write (slow)

Transactions - Read-only (fast)

- If making a one-off read, use something known as a “Single Read Call”

- Fastest, no transaction checks needed!

- Can set timestamp bounds

- Strong - “read latest data”

- Bounded Staleness - “read version no later than …”

Staleness - Exact Staleness - “read at exactly …”

- (could be in past or future)

- Cloud Scanner has a version-gc that reclaims versions older than 1 hour old
Bigtable
Use-Cases

When you need Use

Storage for Compute, Block Storage Persistent (hard disks), SSD

Storing media, Blob Storage Cloud Storage

SQL Interface atop file data BigQuery

Document database, NoSQL DataStore

Fast scanning, NoSQL BigTable

Transaction Processing (OLTP) Cloud SQL, Cloud Spanner

Analytics/Data Warehouse (OLAP) BigQuery

Use-Cases

When you need Use

Storage for Compute, Block Storage Persistent (hard disks), SSD

Storing media, Blob Storage Cloud Storage

SQL Interface atop file data BigQuery

Document database, NoSQL DataStore

Fast scanning, NoSQL BigTable

Transaction Processing (OLTP) Cloud SQL, Cloud Spanner

Analytics/Data Warehouse (OLAP) BigQuery

- Fast scanning of sequential key values - use BigTable

- Columnar database, good for sparse data

BigTable - Sensitive to hot spotting - need to design key structure carefully

- Similar to HBase
- BigTable is basically GCP’s managed HBase

- This is a much stronger link than between say Hive and BigQuery!

- Usual advantages of GCP -

BigTable and - scalability

HBase - low ops/admin burden

- cluster resizing without downtime

- many more column families before performance drops (~100 OK)

HBase vs. Relational Databases
Properties of HBase

Columnar store Denormalized storage

Only CRUD operations ACID at the row level

A Notification Ser vice

Id To Type Content

1 mike offer Offer on mobiles

2 john sale Redmi sale

3 jill order Order delivered

Columnar store 4 megan sale Clothes sale

Layout of a traditional relational database

A Notification Ser vice

Id To Type Content

1 mike offer Offer on mobiles

2 john
l sale Redmi sale

3 jill order Order delivered

Columnar store 4 megan sale Clothes sale

Layout of a traditional relational database

A Notification Ser vice

Id To Type Content

1 mike offer Offer on mobiles

2 john sale Redmi sale

3 jill order Order delivered

Columnar store 4 megan sale Clothes sale

Row = 3
Column = To
Columnar Store
Id Column Value

1 To mike

1 Type offer
Id To Type Content
1 Content Offer on mobiles
1 mike offer Offer on mobiles 2 To john

2 john sale Redmi sale 2 Type sale

2 Content Redmi sale

3 jill order Order delivered
3 To jill
4 megan sale Clothes sale
3 Type order

3 Content Order delivered

4 To megan

4 Type sale

4 Content Clothes sale

Columnar Store
Id Column Value

1 To mike

1 Type offer
Id To Type Content
1 Content Offer on mobiles
1 mike offer Offer on mobiles 2 To john

2 john sale Redmi sale 2 Type sale

2 Content Redmi sale

3 jill order Order delivered
3 To jill
4 megan sale Clothes sale
3 Type order

3 Content Order delivered

4 To megan

4 Type sale

4 Content Clothes sale

Columnar Store
Id Column Value

1 To mike

1 Type offer
Id To Type Content
1 Content Offer on mobiles
1 mike offer Offer on mobiles 2 To john

2 john sale Redmi sale 2 Type sale

2 Content Redmi sale

3 jill order Order delivered
3 To jill
4 megan sale Clothes sale
3 Type order

3 Content Order delivered

4 To megan

4 Type sale

4 Content Clothes sale

Columnar Store
Id Column Value

1 To mike

1 Type offer
Id To Type Content
1 Content Offer on mobiles
1 mike offer Offer on mobiles 2 To john

2 john sale Redmi sale 2 Type sale

2 Content Redmi sale

3 jill order Order delivered
3 To jill
4 megan sale Clothes sale
3 Type order

3 Content Order delivered

4 To megan

4 Type sale

4 Content Clothes sale

Advantages of a Columnar Store

Sparse tables: No wastage of space when storing sparse

data

Columnar store Dynamic attributes: Update attributes dynamically without

changing storage structure
Sparse Tables

Id To Type Content Expiry

1 mike offer Offer on mobiles 2345689070

2 john sale Redmi sale

3 jill order Order delivered

4 megan sale Clothes sale 2456123989

Sale and offer notifications may have an

expiry time
Sparse Tables

Id To Type Content Expiry Order Status

1 mike offer Offer on mobiles 2345689070

2 john sale Redmi sale

3 jill order Order delivered Delivered

4 megan sale Clothes sale 2456123989

Order related notifications may have

an order status
Sparse Tables

Id To Type Content Expiry Order Status

1 mike offer Offer on mobiles 2345689070

2 john sale Redmi sale

3 jill order Order delivered Delivered

4 megan sale Clothes sale 2456123989

In a traditional database this results in a

change in database structure
Sparse Tables

Id To Type Content Expiry Order Status

1 mike offer Offer on mobiles 2345689070

2 john sale Redmi sale

3 jill order Order delivered Delivered

4 megan sale Clothes sale 2456123989

And empty cells when data is not applicable to certain

rows
Sparse Tables

Id To Type Content Expiry Order Status

1 mike offer Offer on mobiles 2345689070

2 john sale Redmi sale

3 jill order Order delivered Delivered

4 megan sale Clothes sale 2456123989

These cells still occupy space!

Id Column Value
1 To mike
1 Type offer
1 Content Offer on mobiles
1 Expiry 2345689070
2 To john
2 Type sale
2 Content Redmi sale
3 To jill
3 Type order
3 Content Order delivered
4 To megan
4 Type sale
4 Content Clothes sale
Columnar store
4 Expiry 2456123989

Dynamically add new attributes as rows

in this table
Id Column Value
1 To mike
1 Type offer
1 Content Offer on mobiles
1 Expiry 2345689070
2 To john
2 Type sale
2 Content Redmi sale
3 To jill
3 Type order
3 Content Order delivered
4 To megan
4 Type sale
4 Content Clothes sale
Columnar store
4 Expiry 2456123989

No wastage of space with empty cells!

Note that this is not the exact layout
of how data is stored in HBase

Columnar store
It is a general structure of
how columnar stores are
constructed
Properties of HBase

Columnar store Denormalized storage

Only CRUD operations ACID at the row level

Traditional databases use
normalized forms of database
Denormalized storage design to minimize redundancy
Minimize Redundancy

Employee Details

Employee Subordinates

Denormalized storage Employee Address

Employee Details

Id Name Function Grade

1 Emily Finance 6

Employee Subordinates

Id Subordinate Id
1 2
Denormalized storage 1 3

Employee Address

Id City Zip Code

1 Palo Alto 94305
2 Seattle 98101
Employee Details

Id Name Function Grade

1 Emily Finance 6
2 John Finance 3
3 Ben Finance 4

Denormalized storage

All employee details in one table

Employee Subordinates

Id Subordinate Id
1 2
1 3

Denormalized storage

Employees referenced only by ids

everywhere else
Employee Address

Id City Zip Code

1 Palo Alto 94305
2 Seattle 98101

Denormalized storage

Data is made more granular by

splitting it across multiple tables
Id Name Function Grade
1 Emily Finance 6

Id Subordinate Id
1 2
1 3

Id City Zip Code

Denormalized storage
1 Palo Alto 94305
2 Seattle 98101

Normalization
Normalization

Optimizes storage

But storage is cheap in a

Denormalized storage

distributed system!
But storage is cheap in a
distributed system!

Denormalized storage
Optimize number of
disk seeks
Denormalized Storage
Id Name Function Grade Id Subordinate Id
1 Emily Finance 6 1 2
2 John Finance 3 1 3
3 Ben Finance 4

Id Name Function Grade Subordinates

1 Emily Finance 6 <ARRAY>
2 John Finance 3
3 Ben Finance 4
Denormalized Storage
Id Name Function Grade Id City Zip Code
1 Emily Finance 6 1 Palo Alto 94305
2 John Finance 3 2 Seattle 98101
3 Ben Finance 4

Id Name Function Grade Subordinates Address

1 Emily Finance 6 <ARRAY> <STRUCT>
2 John Finance 3
3 Ben Finance 4
Denormalized Storage

Id Name Function Grade Subordinates Address

1 Emily Finance 6 <ARRAY> <STRUCT>
2 John Finance 3
3 Ben Finance 4

Store everything related to an employee in the

same table
Denormalized Storage

Id Name Function Grade Subordinates Address

1 Emily Finance 6 <ARRAY> <STRUCT>
2 John Finance 3
3 Ben Finance 4

Read a single record to get all details

about an employee in one read operation
Properties of HBase

Columnar store Denormalized storage

Only CRUD operations ACID at the row level

Traditional Databases and SQL

Joins: Combining information across tables using keys

Group By: Grouping and aggregating data for the

Only CRUD operations groups

Order By: Sorting rows by a certain column

HBase does not
support SQL
Only CRUD operations

NoSQL
Only a limited set of operations are
allowed in HBase
Create

Read

Update
Only CRUD operations

Delete

CRUD
No operations involving multiple tables

No indexes on tables
Only CRUD operations

No constraints
Id Name Function Grade Subordinates Address

Only CRUD operations This is why all details need to

be self contained in one row
Properties of HBase

Columnar store Denormalized storage

Only CRUD operations ACID at the row level

Updates to a single row are atomic

ACID at the row level

All columns in a row
are updated or none are
Updates to multiple rows are not
atomic

ACID at the row level Even if the update is on the

same column in multiple rows
Traditional RDBMS vs. HBase

Traditional RDBMS HBase

Data arranged in rows and columns Data arranged in a column-wise manner

Supports SQL NoSQL database

Complex queries such as grouping, aggregates, joins etc Only basic operations such as create, read, update and
delete
Normalized storage to minimize redundancy and optimize
space Denormalized storage to minimize disk seeks

ACID compliant ACID compliant at the row level

How Is Data Laid out in HBase?
Notification Data in a Traditional Database

Id To Type Content

1 mike offer Offer on mobiles

2 john sale Redmi sale

3 jill order Order delivered

4 megan sale Clothes sale

This is a 2-dimensional data model

Notification Data in a Traditional Database

Id To Type Content

1 mike offer Offer on mobiles

2 john sale Redmi sale

3 jill order Order delivered

4 megan sale Clothes sale

unique row id column name

4-dimensional Data Model

Row Key Column Family

Column Timestamp
4-dimensional Data Model

Column Family Column Family

Row Key Column Column Column Column Column

Timestamp Value

Timestamp Value
Row
Timestamp Value
A Table for Employee Data

Work Personal

EmpID Dept Grade Title Name SSN

A Table for Employee Data

Work Personal

EmpId Dept Grade Title Name SSN

A record for a 23456987 AVP

single employee
24490982 VP
Notification Data
Id Column Value

1 To mike

1 Type offer
Id To Type Content
1 Content Offer on mobiles
1 mike offer Offer on mobiles 2 To john

2 john sale Redmi sale 2 Type sale

2 Content Redmi sale

3 jill order Order delivered
3 To jill
4 megan sale Clothes sale
3 Type order

3 Content Order delivered

4 To megan

4 Type sale

4 Content Clothes sale

Notification Data
Id Column Value

1 To mike

1 Type offer
Id To Type Content
1 Content Offer on mobiles
1 mike offer Offer on mobiles 2 To john

2 john sale Redmi sale 2 Type sale

2 Content Redmi sale

3 jill order Order delivered
3 To jill
4 megan sale Clothes sale
3 Type order

3 Content Order delivered

4 To megan

Row Key
4 Type sale

4 Content Clothes sale

Notification Data
Id Column Value

1 To mike

1 Type offer
Id To Type Content
1 Content Offer on mobiles
1 mike offer Offer on mobiles 2 To john

2 john sale Redmi sale 2 Type sale

2 Content Redmi sale

3 jill order Order delivered
3 To jill
4 megan sale Clothes sale
3 Type order

3 Content Order delivered

4 To megan

Column Family
4 Type sale

4 Content Clothes sale

Notification Data
Id Column Value

1 To mike

1 Type offer
Id To Type Content
1 Content Offer on mobiles
1 mike offer Offer on mobiles 2 To john

2 john sale Redmi sale 2 Type sale

2 Content Redmi sale

3 jill order Order delivered
3 To jill
4 megan sale Clothes sale
3 Type order

3 Content Order delivered

4 To megan

Columns
4 Type sale

4 Content Clothes sale

Notification Data
Id Column Value

1 To mike

1 Type offer
Id To Type Content
1 Content Offer on mobiles
1 mike offer Offer on mobiles 2 To john

2 john sale Redmi sale 2 Type sale

2 Content Redmi sale

3 jill order Order delivered
3 To jill
4 megan sale Clothes sale
3 Type order

3 Content Order delivered

4 To megan

Value + Timestamp
4 Type sale

4 Content Clothes sale

Data Layout in HBase

Column Family Column Family

Row Key Column Column Column Column Column

Timestamp Value

Timestamp Value
Uniquely identifies a row

Can be primitives, structures, arrays

Row Key
Represented internally as a byte array

Sorted in ascending order

All rows have the same set of column families

Each column family is stored in a separate data

Column Family file

Set up at schema definition time

Can have different columns for each row

Columns are units within a column family

Column New columns can be added on the fly

ColumnFamily:ColumnName = Work:Department
Used as the version number for the values
Timestamp stored in a column

The value for any version can be accessed

Census Data Layout in HBase

Personal Professional

marital_statu
Some ID name gender employed field
s
Filtering Rows Based on Conditions
SQL vs. HBase Shell Commands

SQL HBase Shell

select * from census scan ‘census’

SQL vs. HBase Shell Commands

SQL HBase Shell

select name from census scan 'census',

{COLUMNS =>
['personal:name']}
SQL vs. HBase Shell Commands

SQL HBase Shell

select * from census limit 1 scan 'census', {LIMIT => 1}

SQL vs. HBase Shell Commands

SQL HBase Shell

select * from census where get ‘census’, 1

rowkey = 1
Filters allow you to
control what data is
returned from a scan
operation
Built-in Filters

Conditions on row keys

Conditions on columns

Multiple conditions on columns

Timestamp range
BigTable Performance
- Don’t use if you need transaction support (OLTP) - use Cloud SQL or Cloud
Spanner

- Don’t use for data less than 1 TB (can’t parallelize)

- Don’t use if analytics/business intelligence/data warehousing - use BigQuery

Avoid BigTable instead
When - Don’t use for documents or highly structured hierarchies - use DataStore
instead

- Don’t use for immutable blobs like movies each > 10 MB - use Cloud Storage
instead
- Use for very fast scanning and high throughput

- Use for non-structured key/value data

Use BigTable - Where each data item < 10 MB and total data > 1 TB

When - Use where writes infrequent/unimportant (no ACID) but fast scans crucial

- Use for Time Series data

- BigTable is a natural fit for Timestamp data (range queries)

- Say IOT sensor network emitting data at intervals

Use BigTable - Use Device ID # Time as row key if common query = “All data for a

for Time Series device over period of time”

- Use Time # Device ID as row key if common query = “All data for a
period for all devices”
- Like Cloud Spanner, data stored in sorted lex order of keys

- Data is distributed based on key values

- So, performance will be really poor if

Hotspotting and
Schema Design - Reads/writes are concentrated in some ranges

- For instance if key values are sequential

- Use hashing of key values, or non-sequential keys

- Field Promotion: Use in reverse URL order like Java package
names

Avoiding - This way keys have similar prefixes, differing endings

Hotspotting - Salting

- Hash the key value

- BigTable will improve performance over time

- Will observe read and write patterns and redistribute data so that shards are
“Warming evenly hit

the Cache” - Will try to store roughly same amount of data in different nodes

- This is why testing over hours is important to get true sense of performance
- Use SSD unless skimping on cost

- SSD can be 20x faster on individual row reads

SSD or HDD - More predictable throughput too (no disk seek variance)

Disks - Don’t even think about HDD unless storing > 10 TB and all batch queries

- The more random access, the stronger the case for SSD
SSD or HDD Disks
- Poor schema design (eg sequential keys)

- Inappropriate workload

- too small (<300 GB)

- used in short bursts (needs hours to tune performance internally)

Reasons for Poor
Performance - Cluster too small

- Cluster just fired up or scaled up

- HDD used instead of SSD

- Development v Production instance

- Each table has just one index - the row key. Choose it well

Schema - Rows are sorted lexicographically by row key

All operations are atomic at row level

Design -

- Related entities in adjacent rows

- Row keys: 4KB per key

- Column Families: ~100 per table

Size Limits - Column Values: ~ 10 MB each

- Total Row Size: ~100 MB

- Reverse domain names

Types of - String identifiers

Row Keys - Timestamps as suffix in key

- Domain names

- Sequential numeric values

Row Keys to - Timestamps alone

Avoid - Timestamps as prefix of row-key

- Mutable or repeatedly updated values

Datastore
Use-Cases

When you need Use

Storage for Compute, Block Storage Persistent (hard disks), SSD

Storing media, Blob Storage Cloud Storage

SQL Interface atop file data BigQuery

Document database, NoSQL DataStore

Fast scanning, NoSQL BigTable

Transaction Processing (OLTP) Cloud SQL, Cloud Spanner

Analytics/Data Warehouse (OLAP) BigQuery

Use-Cases

When you need Use

Storage for Compute, Block Storage Persistent (hard disks), SSD

Storing media, Blob Storage Cloud Storage

SQL Interface atop file data BigQuery

Document database, NoSQL DataStore

Fast scanning, NoSQL BigTable

Transaction Processing (OLTP) Cloud SQL, Cloud Spanner

Analytics/Data Warehouse (OLAP) BigQuery

- Document data - eg XML or HTML - has a characteristic pattern

- Key-value structure, i.e. structured data

DataStore - Typically not used either for OLTP or OLAP

- Fast lookup on keys is the most common use-case

- Speciality of DataStore is that query execution time depends on size of
returned result (not size of data set)

- So, a returning 10 rows will take the same length of time whether dataset is
DataStore 10 rows, or 10 billion rows

- Ideal for “needle-in-a-haystack” type applications, i.e. lookups of non-

sequential keys
Traditional RDBMS vs. DataStore

Traditional RDBMS DataStore

Atomic transactions Atomic transactions

Indices for fast lookup Indices for fast lookup

Some queries use indices - not all All queries use indices!

Query time depend on both size of data set and size of Query time independent of data set, depends on result
result set set alone
Traditional RDBMS vs. DataStore

Traditional RDBMS DataStore

Structured relational data Structured hierarchical data (XML, HTML)

Rows stored in Tables Entities of different in Kinds (think HTML tags)

Rows consist of fields Entities consist of Properties

Primary Keys for unique ID Keys for unique ID

Traditional RDBMS vs. DataStore

Traditional RDBMS DataStore

Rows of table have same properties (Schema is strongly Entities of a kind can have different properties (think
enforced) optional tags in HTML)

Types of all values in a column are the same Types of different properties with same name in an entity
can be different
Traditional RDBMS vs. DataStore

Traditional RDBMS DataStore

Lots of joins No joins

Filtering on subqueries No filtering on subqueries

Multiple inequality conditions Only one inequality filter OK per query

- Don’t use if you need very strong transaction support (OLTP) - OK for
basic ACID support though

- Don’t use for non-hierarchical or unstructured data - BigTable is better

Avoid DataStore - Don’t use if analytics/business intelligence/data warehousing - use BigQuery

instead
When
- Don’t use for immutable blobs like movies each > 10 MB - use Cloud Storage
instead

- Don’t use if application has lots of writes and updates on key columns
- Use for crazy scaling of read performance - to virtually any size
Use DataStore
Use for hierarchical documents with key/value data
When -
- “Built-in” Indices on each property (~field) of each entity kind (~table row)

- “Composite” Indices on multiple property values

Full Indexing - If you are certain a property will never be queried, can explicitly exclude it
from indexing

- Each query is evaluated using its “perfect index”

- Given a query, which is the index that most optimally returns query results?

- Depends on following (in order)

Perfect Index - equality filter

- inequality filter (only 1 allowed)

- sort conditions if any specified

- Updates are really slow

No joins possible
Implications of -

Full Indexing - Can’t filter results based on subquery results

- Can’t include more than one inequality filter (one is OK)

- Separate data partitions for each client organization
Multi- - Can use the same schema for all clients, but vary the values
tenancy - Specified via a namespace (inside which kinds and entities can exist)
- Can optionally use transactions - not required
Transaction - Not as strong as Cloud Spanner (which is ACID++), but stronger than
Support BigQuery or BigTable
- Two consistency levels possible for query results

Consistency - Strongly consistent: return up-to-date result, however long it takes

- Eventually consistent: faster, but might return stale

Transfer Ser vice
- The transfer service helps get data into Cloud Storage

- From where?

Importing - From AWS, i.e. an S3 bucket

Data - From HTTP/HTTPS location

- From local files

- From another Cloud Storage Bucket

- Recall that gsutil can be used to get data into cloud storage buckets
gsutil or Transfer - Prefer the transfer service when transferring from AWS etc
Service?
- If copying files over from on-premise, use gsutil
- One-time vs recurring transfers

- Delete from destination if they don’t exist in source

Transfer Service
Bells & Whistles - Delete from source after copying over

- Periodic synchronisation of source and destination based on file filters

Block storage for compute VMs - persistent disks or SSDs

Summary Immutable blobs like video/images - Cloud Storage

OLTP - Cloud SQL or Cloud Spanner

NoSQL Documents like HTML/XML - Datastore

NoSQL Key-values - BigTable (~HBase)

Getting data into Cloud Storage - Transfer service

Kubernetes Concepts
No ratings yet
Kubernetes Concepts
623 pages
Introduction To NodeJS and Expressupdated
No ratings yet
Introduction To NodeJS and Expressupdated
106 pages
AI Class PDF
No ratings yet
AI Class PDF
542 pages
Foundation of ICT
No ratings yet
Foundation of ICT
185 pages
1-Kubernetes in Action, Second Edition MEAP V15 - Marko Lukša-2nd - 1-412
No ratings yet
1-Kubernetes in Action, Second Edition MEAP V15 - Marko Lukša-2nd - 1-412
412 pages
Full Notes
No ratings yet
Full Notes
367 pages
Mcsa 1 - 8
No ratings yet
Mcsa 1 - 8
392 pages
ME 157 Full Course
No ratings yet
ME 157 Full Course
203 pages
OS Unit 3
No ratings yet
OS Unit 3
77 pages
Review of ICT 1 2 1
No ratings yet
Review of ICT 1 2 1
107 pages
Network+ Certification Guide
No ratings yet
Network+ Certification Guide
262 pages
Tera Form
No ratings yet
Tera Form
983 pages
Lab20 - Understanding Table Storage - Azure
No ratings yet
Lab20 - Understanding Table Storage - Azure
22 pages
Azure Zone-Redundant Storage Guide
No ratings yet
Azure Zone-Redundant Storage Guide
17 pages
MongoDB Guide for Students
No ratings yet
MongoDB Guide for Students
104 pages
M3 - Kubernetes Architecture - ILT v1.7
No ratings yet
M3 - Kubernetes Architecture - ILT v1.7
91 pages
2024 - FCJ - Week 1 - Addons
No ratings yet
2024 - FCJ - Week 1 - Addons
146 pages
AIML-Module1 Vtu 21 Scheme
No ratings yet
AIML-Module1 Vtu 21 Scheme
174 pages
Lab9 - Understanding Managed Disks - Azure
No ratings yet
Lab9 - Understanding Managed Disks - Azure
33 pages
Gitlab Cicd
No ratings yet
Gitlab Cicd
191 pages
18CSC205J-UNIT-1 New
No ratings yet
18CSC205J-UNIT-1 New
106 pages
DSA - Unit 1
No ratings yet
DSA - Unit 1
149 pages
Mct702 All Units
No ratings yet
Mct702 All Units
747 pages
Finc 402 Monetary Theory Ob Marvin
No ratings yet
Finc 402 Monetary Theory Ob Marvin
342 pages
Cse - 2014 Se Module 2 V1
No ratings yet
Cse - 2014 Se Module 2 V1
154 pages
Lab23 - Understanding Availability Set and Load Balancer - Azure
No ratings yet
Lab23 - Understanding Availability Set and Load Balancer - Azure
163 pages
IPv6Fundamentals Slides
No ratings yet
IPv6Fundamentals Slides
203 pages
CAO - M01 - Introduction To Computer Architecture and Organization
No ratings yet
CAO - M01 - Introduction To Computer Architecture and Organization
100 pages
Unit2 WT
No ratings yet
Unit2 WT
204 pages
COPA
No ratings yet
COPA
469 pages
Mongo DB
No ratings yet
Mongo DB
297 pages
Procedure Oriented Programming Language
No ratings yet
Procedure Oriented Programming Language
273 pages
Distributed System PDF
No ratings yet
Distributed System PDF
148 pages
Ai and ML (All Modules)
No ratings yet
Ai and ML (All Modules)
735 pages
Azure Kubernetes Service Course Resource 3
No ratings yet
Azure Kubernetes Service Course Resource 3
245 pages
AI Introduction
No ratings yet
AI Introduction
376 pages
18csc310j Unit 5
No ratings yet
18csc310j Unit 5
300 pages
Storage and RAID Systems Guide
No ratings yet
Storage and RAID Systems Guide
104 pages
Computer Hardware
No ratings yet
Computer Hardware
227 pages
Object Oriented Programming: Dr. Bharat Singh PHD, Cse
No ratings yet
Object Oriented Programming: Dr. Bharat Singh PHD, Cse
184 pages
Lesson 3 - Machine Learning Workflow
No ratings yet
Lesson 3 - Machine Learning Workflow
53 pages
CSE210 MODULE 1 + Process Part of Module 2
No ratings yet
CSE210 MODULE 1 + Process Part of Module 2
119 pages
Oodp Unit 1
No ratings yet
Oodp Unit 1
217 pages
18CSC205J Operating Systems Unit 5 - New
No ratings yet
18CSC205J Operating Systems Unit 5 - New
140 pages
PHP Chapter 2
No ratings yet
PHP Chapter 2
259 pages
Vue Js
No ratings yet
Vue Js
70 pages
Lec 9 - Theoretical Foundation of Dsitributed System and Consensus
No ratings yet
Lec 9 - Theoretical Foundation of Dsitributed System and Consensus
103 pages
Module 1 - Lecture - 7CSE1
No ratings yet
Module 1 - Lecture - 7CSE1
124 pages
Laravel Framework Basics and Installation
No ratings yet
Laravel Framework Basics and Installation
105 pages
Artificial Intelligence CSE3002: by Dr. Manomita Chakraborty Assistant Professor VIT-AP, Amaravati, AP, India
No ratings yet
Artificial Intelligence CSE3002: by Dr. Manomita Chakraborty Assistant Professor VIT-AP, Amaravati, AP, India
313 pages
CS390 Python
No ratings yet
CS390 Python
194 pages
Information Systems Project Management
No ratings yet
Information Systems Project Management
212 pages
Module4 PPT V2 - Part1
No ratings yet
Module4 PPT V2 - Part1
110 pages
Running and Buliding Container
No ratings yet
Running and Buliding Container
113 pages
Bench Commands Cheatsheet
No ratings yet
Bench Commands Cheatsheet
4 pages
Unit - 1 Linux
No ratings yet
Unit - 1 Linux
179 pages
Lab Manual 4340704 IWD
No ratings yet
Lab Manual 4340704 IWD
151 pages
PHP Basics SF - 15 04 24
No ratings yet
PHP Basics SF - 15 04 24
118 pages
Microsoft Certified Azure Database Administrator Associate Skills Measured
No ratings yet
Microsoft Certified Azure Database Administrator Associate Skills Measured
6 pages
Hadoop: What Is Data Engineering? Hadoop Overview Hadoop Ecosystem
No ratings yet
Hadoop: What Is Data Engineering? Hadoop Overview Hadoop Ecosystem
9 pages
Oop Lab 8
No ratings yet
Oop Lab 8
6 pages
Image Compression & Segmentation
No ratings yet
Image Compression & Segmentation
6 pages
Android System Configuration
No ratings yet
Android System Configuration
9 pages
AE-I Notes
No ratings yet
AE-I Notes
40 pages
Solving The Permutation Flow Shop Problem With Makespan Criterion Using Grids
No ratings yet
Solving The Permutation Flow Shop Problem With Makespan Criterion Using Grids
12 pages
Group IT - Software Engineer (India Based)
No ratings yet
Group IT - Software Engineer (India Based)
3 pages
VAPT Note
No ratings yet
VAPT Note
3 pages
Java Practical
No ratings yet
Java Practical
7 pages
Short Paper 6-2 DR
No ratings yet
Short Paper 6-2 DR
6 pages
Cloud Computing Presentation
No ratings yet
Cloud Computing Presentation
19 pages
Stegnography
No ratings yet
Stegnography
28 pages
SB 5-0-4-101
No ratings yet
SB 5-0-4-101
3 pages
Secure Message Transmission During Handoff in Wireless Mesh Networks
No ratings yet
Secure Message Transmission During Handoff in Wireless Mesh Networks
54 pages
AELog
No ratings yet
AELog
1 page
CompTIA SY0-501 Exam Dumps PDF & VCE
No ratings yet
CompTIA SY0-501 Exam Dumps PDF & VCE
9 pages
SWIFT Error Codes
No ratings yet
SWIFT Error Codes
199 pages
U4S9
No ratings yet
U4S9
18 pages
Intelligent Service Provider With Location Access: Alochana Chakra Journal ISSN NO:2231-3990
No ratings yet
Intelligent Service Provider With Location Access: Alochana Chakra Journal ISSN NO:2231-3990
7 pages
Restful Api
No ratings yet
Restful Api
27 pages
Jaltest Copy Ecu Cummins x15
100% (1)
Jaltest Copy Ecu Cummins x15
10 pages
Unit 3-Controllers
No ratings yet
Unit 3-Controllers
24 pages
Java - UNIT II
No ratings yet
Java - UNIT II
17 pages
CS772 Project Proposal
No ratings yet
CS772 Project Proposal
2 pages
Evolution of DOS: From QDOS to MS-DOS
No ratings yet
Evolution of DOS: From QDOS to MS-DOS
6 pages
S038 - Rushil Shah - BDA Assignment 1
No ratings yet
S038 - Rushil Shah - BDA Assignment 1
2 pages
Efficient Embedded Systems Design - Arm®
No ratings yet
Efficient Embedded Systems Design - Arm®
9 pages
Blockchain Based e Voting System Report
No ratings yet
Blockchain Based e Voting System Report
61 pages
B1 Digital Techniques Guide
100% (1)
B1 Digital Techniques Guide
58 pages
Implementation of Iot Based Patient Health Monitoring System Using Esp32 Web Server
100% (1)
Implementation of Iot Based Patient Health Monitoring System Using Esp32 Web Server
11 pages
1.satu Intro Overview
No ratings yet
1.satu Intro Overview
16 pages