KEMBAR78
Software Architecture in the age of Cloud Computing | PPTX
SOFTWARE
ARCHITECTURE
IN THE AGE OF
CLOUD
COMPUTING
JAROSLAV GERGIC
Industrial Keynote
16th European Conference on Software
Architecture (ECSA), Prague,
19 – 23 September 2022
AG E N DA
INTRODUCTION
CLOUD SCALE COMPUTING
ARCHITECTING CLOUS SCALE SAAS
CLOSING THOUGHTS
SUMMARY
JA RO S L AV
G E RG I C
Always busy building the next big thing,
now living in the confluence of
cybersecurity, machine learning,
and cloud computing.
2022  1995:
Cisco, GoodData, Ariba, IBM Research, Reuters, Mobil
Server, LCS International
Mentoring: StartupYard, JIC, MSIC
I
N
T
R
O
D
U
C
T
I
O
N
3
C LO U D
C O M P U T I N G
LET’S DEFINE THE TERM
C
L
O
U
D
C
O
M
P
U
T
I
N
G
4
PUBLIC CLOUDS IN
CLOUD
COMPUTING
P
U
B
L
I
C
C
L
O
U
D
S
I
N
C
L
O
U
D
C
O
M
P
U
T
I
N
G
5
E N T E R P R I S E
S C A L E
is no longer the
summit of software
architecture
E
N
T
E
R
P
R
I
S
E
S
C
A
L
E
6
LET’S TALK CLOUD SCALE
Cloud Computing =/= Public Cloud
“
”
C
L
O
U
D
S
C
A
L
E
7
LET’S TALK CLOUD SCALE
Software as a Service (SaaS)
“
”
C
L
O
U
D
S
C
A
L
E
–
S
O
F
T
W
A
R
E
A
S
A
S
E
R
V
I
C
E
8
B2C
Serve millions or billions of users
• Facebook
• YouTube
• TikTok
• Seznam.cz
B2B
(tens of) thousands businesses
• Salesforce
• Dropbox*)
• WorkDay
• GoodData*)
H OW B I G I S C LO U D S C A L E ?
C
L
O
U
S
S
C
A
L
E
S
A
A
S
–
B
2
B
9
B 2 B S A A S : Cloud scale is at lest
three orders of
magnitude bigger than
enterprise scale.
Because you need to serve
thousands of enterprises.
C
L
O
U
D
S
C
A
L
E
B
2
B
S
A
A
S
10
Reverse Migration
• Both Dropbox and GoodData started
originally on AWS
• As they grew, they sought to reduce costs
• GoodData migrated to Rackspace
managed hosting in 2014
• Dropbox migrated to their own datacenters
in 2016
B2B
(tens of) thousands businesses
• Salesforce
• Dropbox*)
• WorkDay
• GoodData*)
P U B L I C C LO U D V S P R I VAT E D C
P
U
B
L
I
C
C
L
O
U
D
V
S
.
P
R
I
V
A
T
E
D
C
11
Public Cloud
• developer productivity
• time to market
• smaller scale
• high-margin product
Private Datacenter
• operational costs
• steady state product
• extreme scale
• margins under pressure
P U B L I C C LO U D V S P R I VAT E D C
P
U
B
L
I
C
C
L
O
U
D
V
S
.
P
R
I
V
A
T
E
D
C
12
C LO U D S C A L E
S A A S
A RC H I T E C T U R E
WHAT DOES IT TAKE TO ARCHITECT CLOUD
SCALE SOFTWARE AS A SERVICE?
C
L
O
U
D
S
C
A
L
E
S
A
A
S
A
R
C
H
I
T
E
C
T
U
R
E
13
Scalability Costs SLAs
Security
Compliance
Productivity
A RC H I T E C T I N G C LO U D S A A S
C
L
O
U
D
S
C
A
L
E
S
A
A
S
–
A
S
P
E
C
T
S
T
O
C
O
N
S
I
D
E
R
14
S C A L A B I L I T Y
• Horizontal scaling
• Distributed computing
• Redundancy and Fault Tolerance
• Elastic workloads
S
C
A
L
A
B
I
L
I
T
Y
15
SINGLE CAUSE OF FAILURE
(vs. Single Point of Failure)
S
I
N
G
L
E
C
A
U
S
E
O
F
F
A
I
L
U
R
E
16
S I N G L E C AU S E O F FA I LU R E
• DNS issue
• credentials rotation
• kernel update
• networking issue
• Infrastructure-level configuration change
S
I
N
G
L
E
C
A
U
S
E
O
F
F
A
I
L
U
R
E
17
Beware of ubiquitous things, which seemingly always work fine!
AVO I D I N G T H E P I T FA L L S
• avoid singletons at any cost*)
• always think of blast radius when any
component, service or piece of underlying
infrastructure fails
• pro tip: checkout out service mesh such as
ISTIO (https://istio.io/)
• allows us to operate multiple
interconnected K8S clusters
A
V
O
I
D
I
N
G
T
H
E
P
I
T
F
A
L
L
S
18
*) there can be only one!
C A PAC I T Y
P L A N N I N G
Why would I need to
do capacity planning in
a public cloud?
Is not it elastic by
design?
C
A
P
A
C
I
T
Y
P
L
A
N
N
I
N
G
19
C O S T S
• Gross Margin in SaaS
• Gross Margin = (Revenue – COGS)/Revenue
• COGS – Cost of Goods Sold
• What is COGS in SaaS?
• All costs needed to operate your SaaS offering.
• HW, SW, operations, support
• What is the benchmark Gross Margin in SaaS?
C
O
S
T
S
20
80%
C O S T S
• Gross Margin in SaaS
• Gross Margin = (Revenue – COGS)/Revenue
• COGS – Cost of Goods Sold
• What is COGS in SaaS?
• All costs needed to operate your SaaS offering.
• HW, SW, operations, support
• What is the benchmark Gross Margin in SaaS?
C
O
S
T
S
21
80%
COGS
$10
$2
C O S T S AV I N G S
STORIES FROM THE TRENCHES
C
O
S
T
S
A
V
I
N
G
S
22
L I N U X K E R N E L T U N I N G
Low-level Linux kernel settings
like huge pages and NUMA
options settings led to
35% - 40% performance boost
for the prevailing workloads
L
I
N
U
X
K
E
R
N
E
L
T
U
N
I
N
G
23
R E G U L A R E X P R E S S I O N S 1 0 1
Parsing input data at cloud
scale…
On multiple occasions we hit
performance issues with 3rd
party regex libraries in different
programming languages.
The improvement was > 10x.
R
E
G
U
L
A
R
E
X
P
R
E
S
S
I
O
N
S
1
0
1
24
E L A S T I C S C A L I N G W I T H S P OT
I N S TA N C E S
Use case:
• A stateful compute and memory
intensive workload driven by
incoming telemetry flow.
Solution:
• Fleet of inexpensive spot instances
coupled with ML-based capacity
predictor.
E
L
A
S
T
I
C
S
C
A
L
I
N
G
W
I
T
H
S
P
O
T
I
N
S
T
A
N
C
E
S
25
S L A S
• SLAs – Service Level Agreements
• Uptime, Latency, Throughput
• Recovery Time/Point Objectives (RTO/RPO)
• Requires supporting infrastructure
• Monitoring – metrics, dashboards
• Logging – instrumentation,
troubleshooting, auditing
• Alerting – 24/7 reliable notification with
duty rotation and escalation paths
S
L
A
S
26
S E C U R I T Y & C O M P L I A N C E
Security Compliance
Threat Modeling SOC 2, ISO 27001, HIPAA, GDPR, Accessibility
Vulnerability Management SOC – Security and Organization Controls
Access Controls SOC 2 - Security, Availability, Processing
Integrity, Confidentiality, or Privacy
Supply chain attack prevention Objectives -> Controls -> Assessments
Security Monitoring PII protection
S
E
C
U
R
I
T
Y
&
C
O
M
P
L
I
A
N
C
E
27
~30% of R&D effort
( D E V E LO P E R ) P RO D U C T I V I T Y
• Continuous Integration / Continuous
Delivery pipelines (CI/CD)
• Development, Testing and Release
Processes
• Quality Assurance, Cycle Time
• Making sure the above scale to many
R&D teams – avoiding bottlenecks.
(
D
E
V
E
L
O
P
E
R
)
P
R
O
D
U
C
T
I
V
I
T
Y
28
Scalability
•Horizontal
scaling
Distributed
computing
Redundancy
and Fault
Tolerance
Elastic
workloads
Costs
Gross Margin
Profiling
Performance
Tuning
Cost
Optimization
Capacity
Management
SLAs
Uptime
Latency
Throughput
RTO / RPO
Monitoring
Logging
Alerting
Security
Compliance
Threat
Modeling
Vulnerability
Management
Supply Chain
Management
Security
Monitoring
SOC2, ISO
27001
GDPR, PII
Productivity
•CI / CD
Development
Testing
Release
DevOps
Scalability
A RC H I T E C T I N G C LO U D S A A S
C
L
O
U
D
S
C
A
L
E
S
A
A
S
–
A
S
P
E
C
T
S
T
O
C
O
N
S
I
D
E
R
29
W H AT N E X T ?
NOW, WHEN I AM DONE ARCHITECTING AND
BUILDING MY CLOUD SCALE SAAS OFFERING?
P
R
E
S
E
N
T
A
T
I
O
N
T
I
T
L
E
30
E VO LV I N G : P U B L I C C LO U D
• Periodically review and benchmark new
instance types.
• Review, evaluate and benchmark new
services provided by the vendor.
• Issue recommendations and develop
blueprints for R&D teams.
• Plan migration.
• Rinse & Repeat.
E
V
O
L
V
I
N
G
:
P
U
B
L
I
C
C
L
O
U
D
31
E VO LV I N G : P R I VAT E D C
• Periodically perform capacity planning and
maintain HW order book based on up-to-date
predictions.
• Periodically review and benchmark new HW
generations. Negotiate prices with the
vendor(s).
• Issue recommendations and develop
blueprints for R&D teams.
• Plan migration.
• Rinse & Repeat.
E
V
O
L
V
I
N
G
:
P
U
B
L
I
C
C
L
O
U
D
32
S TA R T M E U P !
PRODUCTION WORKLOADS IN PUBLIC CLOUD
NEWCOMER GUIDE 2022 EDITION
S
T
A
R
T
M
E
U
P
!
33
W H E R E TO S TA R T ?
W
H
E
R
E
T
O
S
T
A
R
T
?
34
https://googlecloudcheatsheet.withgoogle.com/
P U B L I C C LO U D S I N 2 0 2 2
• All three leading public cloud providers
(AWS, Azure, GCP) exhibit increasing
complexity.
• It is relatively easy to spin up proof of
concepts or play with technologies.
• But launching production workloads is a
whole different story. It is not just about
getting started, but also about doing
things right.
P
U
B
L
I
C
C
L
O
U
D
S
I
N
2
0
2
2
35
S O F T W A R E A R C H I T E C T U R E
I N T H E A G E O F C L O U D
C O M P U T I N G
The cloud-era software architect needs to accommodate not only
functional requirements and customer-defined throughput and
performance requirements at cloud scale, but also a large set of non-
functional requirements related to cyber security, compliance, developer
productivity, and most notably also the financial/cost characteristics,
which at cloud scale can make or break a software-as-a-service
company. The role of a software architect thus became interdisciplinary
by nature and its mental model needs to accommodate all the above
non-functional aspects all the while maintain full picture across all
layers of the software stack from user-facing features all the way down
to operating system and underlying hardware platform levels.
Interdisciplinary “decathlon”
S
U
M
M
A
R
Y
36
O U T LO O K
S IM P LIFICATION
There are vast opportunities for simplification
and codification of best practices as the cloud
computing industry matures.
HYBRID CLOUD
Hybrid cloud deployments will become more
prevalent to protect margins.
O
U
T
L
O
O
K
37
QUESTIONS?
Q
U
E
S
T
I
O
N
S
38
T H A N K YO U
Jaroslav Gergic
@jgergic jaroslavgergic
https://cognitive.cisco.com/
https://conf.researchr.org/home/ecsa-2022
T
H
A
N
K
Y
O
U
&
C
O
N
T
A
C
T
S
39

Software Architecture in the age of Cloud Computing

  • 1.
    SOFTWARE ARCHITECTURE IN THE AGEOF CLOUD COMPUTING JAROSLAV GERGIC Industrial Keynote 16th European Conference on Software Architecture (ECSA), Prague, 19 – 23 September 2022
  • 2.
    AG E NDA INTRODUCTION CLOUD SCALE COMPUTING ARCHITECTING CLOUS SCALE SAAS CLOSING THOUGHTS SUMMARY
  • 3.
    JA RO SL AV G E RG I C Always busy building the next big thing, now living in the confluence of cybersecurity, machine learning, and cloud computing. 2022  1995: Cisco, GoodData, Ariba, IBM Research, Reuters, Mobil Server, LCS International Mentoring: StartupYard, JIC, MSIC I N T R O D U C T I O N 3
  • 4.
    C LO UD C O M P U T I N G LET’S DEFINE THE TERM C L O U D C O M P U T I N G 4
  • 5.
  • 6.
    E N TE R P R I S E S C A L E is no longer the summit of software architecture E N T E R P R I S E S C A L E 6
  • 7.
    LET’S TALK CLOUDSCALE Cloud Computing =/= Public Cloud “ ” C L O U D S C A L E 7
  • 8.
    LET’S TALK CLOUDSCALE Software as a Service (SaaS) “ ” C L O U D S C A L E – S O F T W A R E A S A S E R V I C E 8
  • 9.
    B2C Serve millions orbillions of users • Facebook • YouTube • TikTok • Seznam.cz B2B (tens of) thousands businesses • Salesforce • Dropbox*) • WorkDay • GoodData*) H OW B I G I S C LO U D S C A L E ? C L O U S S C A L E S A A S – B 2 B 9
  • 10.
    B 2 BS A A S : Cloud scale is at lest three orders of magnitude bigger than enterprise scale. Because you need to serve thousands of enterprises. C L O U D S C A L E B 2 B S A A S 10
  • 11.
    Reverse Migration • BothDropbox and GoodData started originally on AWS • As they grew, they sought to reduce costs • GoodData migrated to Rackspace managed hosting in 2014 • Dropbox migrated to their own datacenters in 2016 B2B (tens of) thousands businesses • Salesforce • Dropbox*) • WorkDay • GoodData*) P U B L I C C LO U D V S P R I VAT E D C P U B L I C C L O U D V S . P R I V A T E D C 11
  • 12.
    Public Cloud • developerproductivity • time to market • smaller scale • high-margin product Private Datacenter • operational costs • steady state product • extreme scale • margins under pressure P U B L I C C LO U D V S P R I VAT E D C P U B L I C C L O U D V S . P R I V A T E D C 12
  • 13.
    C LO UD S C A L E S A A S A RC H I T E C T U R E WHAT DOES IT TAKE TO ARCHITECT CLOUD SCALE SOFTWARE AS A SERVICE? C L O U D S C A L E S A A S A R C H I T E C T U R E 13
  • 14.
    Scalability Costs SLAs Security Compliance Productivity ARC H I T E C T I N G C LO U D S A A S C L O U D S C A L E S A A S – A S P E C T S T O C O N S I D E R 14
  • 15.
    S C AL A B I L I T Y • Horizontal scaling • Distributed computing • Redundancy and Fault Tolerance • Elastic workloads S C A L A B I L I T Y 15
  • 16.
    SINGLE CAUSE OFFAILURE (vs. Single Point of Failure) S I N G L E C A U S E O F F A I L U R E 16
  • 17.
    S I NG L E C AU S E O F FA I LU R E • DNS issue • credentials rotation • kernel update • networking issue • Infrastructure-level configuration change S I N G L E C A U S E O F F A I L U R E 17 Beware of ubiquitous things, which seemingly always work fine!
  • 18.
    AVO I DI N G T H E P I T FA L L S • avoid singletons at any cost*) • always think of blast radius when any component, service or piece of underlying infrastructure fails • pro tip: checkout out service mesh such as ISTIO (https://istio.io/) • allows us to operate multiple interconnected K8S clusters A V O I D I N G T H E P I T F A L L S 18 *) there can be only one!
  • 19.
    C A PACI T Y P L A N N I N G Why would I need to do capacity planning in a public cloud? Is not it elastic by design? C A P A C I T Y P L A N N I N G 19
  • 20.
    C O ST S • Gross Margin in SaaS • Gross Margin = (Revenue – COGS)/Revenue • COGS – Cost of Goods Sold • What is COGS in SaaS? • All costs needed to operate your SaaS offering. • HW, SW, operations, support • What is the benchmark Gross Margin in SaaS? C O S T S 20 80%
  • 21.
    C O ST S • Gross Margin in SaaS • Gross Margin = (Revenue – COGS)/Revenue • COGS – Cost of Goods Sold • What is COGS in SaaS? • All costs needed to operate your SaaS offering. • HW, SW, operations, support • What is the benchmark Gross Margin in SaaS? C O S T S 21 80% COGS $10 $2
  • 22.
    C O ST S AV I N G S STORIES FROM THE TRENCHES C O S T S A V I N G S 22
  • 23.
    L I NU X K E R N E L T U N I N G Low-level Linux kernel settings like huge pages and NUMA options settings led to 35% - 40% performance boost for the prevailing workloads L I N U X K E R N E L T U N I N G 23
  • 24.
    R E GU L A R E X P R E S S I O N S 1 0 1 Parsing input data at cloud scale… On multiple occasions we hit performance issues with 3rd party regex libraries in different programming languages. The improvement was > 10x. R E G U L A R E X P R E S S I O N S 1 0 1 24
  • 25.
    E L AS T I C S C A L I N G W I T H S P OT I N S TA N C E S Use case: • A stateful compute and memory intensive workload driven by incoming telemetry flow. Solution: • Fleet of inexpensive spot instances coupled with ML-based capacity predictor. E L A S T I C S C A L I N G W I T H S P O T I N S T A N C E S 25
  • 26.
    S L AS • SLAs – Service Level Agreements • Uptime, Latency, Throughput • Recovery Time/Point Objectives (RTO/RPO) • Requires supporting infrastructure • Monitoring – metrics, dashboards • Logging – instrumentation, troubleshooting, auditing • Alerting – 24/7 reliable notification with duty rotation and escalation paths S L A S 26
  • 27.
    S E CU R I T Y & C O M P L I A N C E Security Compliance Threat Modeling SOC 2, ISO 27001, HIPAA, GDPR, Accessibility Vulnerability Management SOC – Security and Organization Controls Access Controls SOC 2 - Security, Availability, Processing Integrity, Confidentiality, or Privacy Supply chain attack prevention Objectives -> Controls -> Assessments Security Monitoring PII protection S E C U R I T Y & C O M P L I A N C E 27 ~30% of R&D effort
  • 28.
    ( D EV E LO P E R ) P RO D U C T I V I T Y • Continuous Integration / Continuous Delivery pipelines (CI/CD) • Development, Testing and Release Processes • Quality Assurance, Cycle Time • Making sure the above scale to many R&D teams – avoiding bottlenecks. ( D E V E L O P E R ) P R O D U C T I V I T Y 28
  • 29.
    Scalability •Horizontal scaling Distributed computing Redundancy and Fault Tolerance Elastic workloads Costs Gross Margin Profiling Performance Tuning Cost Optimization Capacity Management SLAs Uptime Latency Throughput RTO/ RPO Monitoring Logging Alerting Security Compliance Threat Modeling Vulnerability Management Supply Chain Management Security Monitoring SOC2, ISO 27001 GDPR, PII Productivity •CI / CD Development Testing Release DevOps Scalability A RC H I T E C T I N G C LO U D S A A S C L O U D S C A L E S A A S – A S P E C T S T O C O N S I D E R 29
  • 30.
    W H ATN E X T ? NOW, WHEN I AM DONE ARCHITECTING AND BUILDING MY CLOUD SCALE SAAS OFFERING? P R E S E N T A T I O N T I T L E 30
  • 31.
    E VO LVI N G : P U B L I C C LO U D • Periodically review and benchmark new instance types. • Review, evaluate and benchmark new services provided by the vendor. • Issue recommendations and develop blueprints for R&D teams. • Plan migration. • Rinse & Repeat. E V O L V I N G : P U B L I C C L O U D 31
  • 32.
    E VO LVI N G : P R I VAT E D C • Periodically perform capacity planning and maintain HW order book based on up-to-date predictions. • Periodically review and benchmark new HW generations. Negotiate prices with the vendor(s). • Issue recommendations and develop blueprints for R&D teams. • Plan migration. • Rinse & Repeat. E V O L V I N G : P U B L I C C L O U D 32
  • 33.
    S TA RT M E U P ! PRODUCTION WORKLOADS IN PUBLIC CLOUD NEWCOMER GUIDE 2022 EDITION S T A R T M E U P ! 33
  • 34.
    W H ER E TO S TA R T ? W H E R E T O S T A R T ? 34 https://googlecloudcheatsheet.withgoogle.com/
  • 35.
    P U BL I C C LO U D S I N 2 0 2 2 • All three leading public cloud providers (AWS, Azure, GCP) exhibit increasing complexity. • It is relatively easy to spin up proof of concepts or play with technologies. • But launching production workloads is a whole different story. It is not just about getting started, but also about doing things right. P U B L I C C L O U D S I N 2 0 2 2 35
  • 36.
    S O FT W A R E A R C H I T E C T U R E I N T H E A G E O F C L O U D C O M P U T I N G The cloud-era software architect needs to accommodate not only functional requirements and customer-defined throughput and performance requirements at cloud scale, but also a large set of non- functional requirements related to cyber security, compliance, developer productivity, and most notably also the financial/cost characteristics, which at cloud scale can make or break a software-as-a-service company. The role of a software architect thus became interdisciplinary by nature and its mental model needs to accommodate all the above non-functional aspects all the while maintain full picture across all layers of the software stack from user-facing features all the way down to operating system and underlying hardware platform levels. Interdisciplinary “decathlon” S U M M A R Y 36
  • 37.
    O U TLO O K S IM P LIFICATION There are vast opportunities for simplification and codification of best practices as the cloud computing industry matures. HYBRID CLOUD Hybrid cloud deployments will become more prevalent to protect margins. O U T L O O K 37
  • 38.
  • 39.
    T H AN K YO U Jaroslav Gergic @jgergic jaroslavgergic https://cognitive.cisco.com/ https://conf.researchr.org/home/ecsa-2022 T H A N K Y O U & C O N T A C T S 39