Unit 1
Unit 1
Unit -1
Cloud Computing: An Overview
Cloud computing is the delivery of computing services over the internet (the "cloud"). These
services include servers, storage, databases, networking, software, analytics, and intelligence,
designed to offer faster innovation, flexible resources, and economies of scale. It allows users
to access and store data or applications remotely without needing physical infrastructure.
1. Public Cloud:
o Operated by third-party cloud service providers (e.g., Amazon Web Services,
Microsoft Azure, Google Cloud).
o All hardware, software, and infrastructure are managed by the provider.
o Ideal for small to medium enterprises that need scalability and cost-effective
solutions.
2. Private Cloud:
o Exclusively used by a single organization.
o Can be hosted on-premise or by a third-party provider.
o Offers more control and security, making it ideal for businesses with strict
regulatory requirements.
3. Hybrid Cloud:
o A combination of public and private clouds that work together.
o Allows data and applications to be shared between them.
o Ideal for businesses that want to keep some sensitive workloads in a private
cloud while leveraging the public cloud for scalability.
4. Community Cloud:
o Shared infrastructure for a specific community (e.g., government
organizations or industries with similar security requirements).
o Managed either internally or by a third party.
Cloud Service Models (Delivery Models)
1. Cost Savings: No need for capital expenditure on infrastructure; pay only for the
resources you use.
2. Scalability and Flexibility: Scale resources up or down based on business needs.
3. Accessibility: Access data and applications from anywhere with an internet
connection.
4. Disaster Recovery and Business Continuity: Built-in backup and recovery features
ensure quick recovery of lost data.
5. Automatic Updates: Cloud service providers regularly update services with the latest
technology and security patches.
6. Collaboration: Cloud services make it easy to collaborate with team members, as
data can be accessed from any device.
1. Security and Privacy: Storing data in the cloud may expose businesses to risks if
security measures are insufficient.
2. Downtime: Service outages can occur, though top cloud providers have redundancy
and recovery systems.
3. Limited Control: With public cloud models, businesses may have less control over
the underlying infrastructure.
4. Compliance: Depending on the industry, companies need to comply with various
regulations (GDPR, HIPAA), and using the cloud may require additional governance.
1. Data Storage and Backup: Safeguard data without the need for physical storage
devices.
2. Disaster Recovery: Cloud-based disaster recovery (DRaaS) allows for quick
recovery from system failures.
3. Application Development and Testing: Cloud services provide a flexible, scalable
environment for developers to build and test applications.
4. Big Data Analytics: Cloud computing allows organizations to process vast amounts
of data and extract insights without needing to build their own infrastructure.
5. Artificial Intelligence and Machine Learning: Cloud-based AI services allow
businesses to implement AI models and predictive analytics without needing in-house
expertise.
1. on-Demand Self-Service
Users can access cloud services whenever they need them, without human intervention from
service providers. This allows businesses and individuals to scale up or down resources as
required, paying only for what they use.
Cloud services are available over the internet and accessible through a variety of devices—
smartphones, laptops, tablets—allowing users to work from virtually anywhere with a
connection.
3. Resource Pooling
Cloud providers pool their computing resources to serve multiple users through multi-
tenancy, dynamically allocating and reallocating resources based on demand. This leads to
more efficient resource use and cost-effectiveness.
4. Rapid Elasticity
One of the hallmarks of cloud computing is the ability to scale resources rapidly and
elastically. This means resources can be quickly scaled up during peak demand periods and
scaled back down when not needed, promoting flexibility.
5. Measured Service
Cloud systems automatically control and optimize resource usage by leveraging metering
capabilities. Users can monitor and track their usage, which enables transparent billing and
management.
6. Cost Efficiency
Instead of investing in expensive infrastructure and maintenance, businesses can use cloud
services on a pay-as-you-go model. This significantly reduces capital expenditure and
operational costs.
7) Global Accessibility and Collaboration
Cloud computing enables real-time collaboration across the globe, allowing teams to access
the same tools and data, collaborate on projects, and work together from different locations
seamlessly.
8. Innovation Enablement
Cloud providers invest heavily in the security of their platforms, often more than what
individual organizations might afford. Redundancies, backup systems, and distributed
architectures enhance the reliability and security of data in the cloud.
With cloud services, organizations can focus on their core business processes and innovation
rather than managing complex IT infrastructure. The outsourcing of IT functions to cloud
providers allows businesses to remain agile and competitive.
Defining a cloud
A cloud in the context of computing refers to a network of remote servers hosted on the
internet that store, manage, and process data, rather than using local servers or personal
computers. The cloud offers various services like computing power, storage, databases, and
networking on a pay-as-you-go or subscription model.
Public Cloud:
o Cloud services are offered over the public internet and shared by multiple
customers (multi-tenancy).
o Usually provided by third-party vendors like AWS, Google Cloud, and
Microsoft Azure.
o Cost-effective for many enterprises as infrastructure is managed by the
provider.
Private Cloud:
o Cloud infrastructure is used exclusively by a single organization.
o May be hosted on-premises or by a third-party provider but is isolated from
other users.
o Provides more control over data, security, and compliance but at a higher cost.
Hybrid Cloud:
o A combination of public and private cloud infrastructures.
o Allows data and applications to be shared between both environments.
o Provides flexibility in balancing workloads and managing sensitive data.
Community Cloud:
o A cloud infrastructure shared by a specific community of users with common
interests, such as regulatory requirements or security concerns.
o Can be managed by the organizations themselves or by third parties.
3. Essential Characteristics
On-demand self-service:
o Users can provision computing resources like storage or computing power
automatically as needed without requiring human interaction.
Broad network access:
o Resources are available over the network and can be accessed by diverse client
platforms such as mobile devices, laptops, and workstations.
Resource pooling:
o Cloud resources are pooled to serve multiple consumers using a multi-tenant
model. Physical and virtual resources are dynamically assigned according to
demand.
Rapid elasticity:
o Resources can be elastically provisioned and scaled to meet the demand
dynamically. To consumers, it may appear as if there is unlimited capacity.
Measured service:
o Cloud systems automatically control and optimize resource usage by
leveraging a metering capability. This is typically a pay-per-use model.
This section typically focuses on how services are consumed and delivered:
Service catalog: A listing of available services for users to choose from, with details
on pricing, SLAs, and capabilities.
Service-level agreements (SLAs): Agreements that define the performance,
availability, and other expectations of the services being delivered.
5. Security and Management Layers
Security and management are cross-cutting concerns that affect all layers of the cloud stack.
These include:
CMPs are systems that provide centralized management of cloud environments and
resources. They provide tools for:
Characteristics:
o Provides virtualized computing resources over the internet (such as virtual
machines, storage, and networks).
o Allows users to rent computing infrastructure without managing the physical
hardware.
o Offers flexibility and scalability for computing resources as demand
fluctuates.
o Resources are typically provided on a pay-as-you-go basis.
Benefits:
o Reduces the need for large capital expenditures on hardware.
o Offers scalability to meet changing business needs (upscaling or downscaling
resources).
o Provides full control over computing infrastructure, such as OS, storage, and
security settings.
o Supports disaster recovery and backup solutions.
Characteristics:
o Provides a platform allowing developers to build, test, and deploy applications
without managing underlying infrastructure.
o Includes tools like databases, development frameworks, and operating
systems.
o Developers focus on writing code and application logic, while the cloud
provider manages infrastructure.
Benefits:
o Simplifies application development by abstracting underlying infrastructure
management.
o Enhances development speed due to pre-built components and services.
o Enables collaboration by offering a consistent environment across the
development lifecycle (coding, testing, deployment).
o Supports multi-language development environments.
Characteristics:
o Delivers fully functional software applications over the internet.
o Users access the software via a web browser without needing to install or
maintain it.
o The cloud provider manages the entire infrastructure, platform, and software.
o Applications can range from email services, CRM tools, to office suites.
Benefits:
o Reduces the cost and complexity of software installation, maintenance, and
management.
o Offers anywhere, anytime access to applications with just an internet
connection.
o Facilitates rapid deployment of applications without the need for local
installation.
o Ensures automatic updates and maintenance, reducing downtime.
Distributed Systems
Distributed systems have evolved significantly over time, with key historical developments
shaping the way we design and implement these systems today. Here's a brief overview of the
major milestones:
World Wide Web (1991): Tim Berners-Lee's development of the World Wide Web
made the internet more accessible and user-friendly, greatly expanding the
possibilities for distributed systems.
Peer-to-Peer Networks (1990s): Peer-to-peer (P2P) systems like Napster (1999)
emerged, decentralizing the network further by allowing computers to share resources
without a central server.
Cluster Computing: Systems like Beowulf clusters demonstrated the power of using
multiple machines (nodes) to work together on a single task. This approach improved
performance and fault tolerance.
Grid Computing: Projects like SETI@Home (1999) and BOINC (2002) popularized
grid computing, where dispersed computers collaborated to tackle massive
computational tasks.
Hadoop (2006) and Apache Spark (2014): These frameworks became critical for
processing large datasets in distributed environments. Hadoop’s HDFS (Hadoop
Distributed File System) and MapReduce allowed for scalable, fault-tolerant data
storage and computation.
NoSQL Databases (e.g., Cassandra, MongoDB): Designed for distributed
environments, NoSQL databases provide high availability and partition tolerance
across distributed systems, supporting large-scale applications.
Edge Computing: As IoT devices proliferate, edge computing has become important.
It involves processing data closer to where it is generated (e.g., sensors), reducing
latency and bandwidth usage.
Fog Computing: Complementing cloud and edge computing, fog computing extends
cloud-like services to the edge of the network, creating a distributed system that spans
cloud to local devices.
Virtualization
Virtualization, a foundational technology in modern computing, has evolved significantly
over decades. Below is an overview of its historical development:
IBM VM/370 (1972): IBM introduced the VM/370, a full-featured virtual machine
operating system for its System/370 mainframes. This allowed users to run multiple
virtual instances of an OS, revolutionizing the way computing resources were used.
VMM (Virtual Machine Monitor): The VM/370 included the Virtual Machine
Monitor (VMM), which could create multiple virtual systems on a single physical
machine, leading to higher efficiency and resource utilization.
During the 1980s, with the advent of minicomputers and later personal computers
(PCs), the emphasis shifted away from virtualization as the hardware costs were
decreasing, and dedicated machines were more affordable.
Limited Use in PCs: Virtualization was less prevalent in the desktop computing era,
as the resource constraints of early personal computers made it impractical.
Xen and VMware: The late 1990s saw a revival of virtualization technologies,
primarily driven by the need for efficient server utilization and the rise of data
centers.
o VMware (1999): VMware's release of their virtualization platform for x86
architecture enabled multiple operating systems to run on a single x86
machine. This marked the beginning of mainstream virtualization on
commodity hardware.
o Xen (1990s): Xen, an open-source project, introduced para-virtualization,
where guest OSes are modified to run more efficiently on virtual machines.
This further advanced server-side virtualization.
Full Virtualization vs. Para-Virtualization: The 2000s saw the rise of two main
approaches to virtualization—full virtualization (used by VMware) and para-
virtualization (used by Xen). Both approaches allowed better resource utilization,
disaster recovery, and isolated environments for testing and development.
Intel and AMD Support: In the mid-2000s, Intel (VT-x) and AMD (AMD-V)
introduced hardware-level support for virtualization, making it more efficient and
practical for everyday use.
Server Consolidation: Virtualization became a key technology for server
consolidation, allowing multiple servers to run on a single physical machine,
reducing hardware costs, power consumption, and data center footprint.
Cloud Computing: The rise of cloud platforms like Amazon Web Services
(AWS), Google Cloud, and Microsoft Azure relied heavily on virtualization
technologies to provide scalable, on-demand infrastructure. Virtualization is a core
building block for cloud infrastructure.
Virtual Machine Sprawl: By the 2010s, virtual machine sprawl became an issue. As
VMs proliferated, managing large numbers of virtual machines became complex and
inefficient.
Containerization (Docker): A lighter form of virtualization, containers, gained
popularity. Containers like Docker and Kubernetes provided isolated environments
without the overhead of full virtual machines, making it easier to deploy, scale, and
manage applications across environments.
Web 2.0 represents a significant evolution in the development of the World Wide Web,
marking a shift from a static, content-focused web (often referred to as Web 1.0) to a more
dynamic, interactive, and user-centric environment. Here’s a broad overview of the historical
developments leading up to and characterizing Web 2.0:
1. Web 3.0: Often discussed as the next phase of the web, Web 3.0 envisions a
decentralized web based on blockchain technology, aiming for greater user control
and privacy. This transition builds on the user-centric principles of Web 2.0 while
addressing its limitations.
2. Integration of AI and Machine Learning: Advanced algorithms and AI are being
used to further personalize user experiences, automate processes, and analyze large
datasets.
Web 2.0 represents a pivotal shift in the web’s evolution, making the internet a more
interactive, participatory, and community-oriented space. Its impact continues to influence
how we interact with technology and each other online.
Business Goals: Understand what you want to achieve with cloud computing (e.g.,
cost reduction, scalability, flexibility).
Workload Assessment: Identify the types of applications and data that will be hosted
(e.g., databases, web applications, big data analytics).
Compliance and Security: Determine regulatory requirements and security needs.
Compute Resources: Define the virtual machines or container services you’ll need
(e.g., EC2 instances, Kubernetes clusters).
Storage: Choose appropriate storage solutions (e.g., block storage, object storage, file
storage).
Networking: Design virtual networks, including VPCs (Virtual Private Clouds),
subnets, and load balancers.
Database Services: Select database solutions (e.g., managed SQL databases, NoSQL
databases).
Access Control: Use Identity and Access Management (IAM) to control user access
and permissions.
Encryption: Implement data encryption at rest and in transit.
Monitoring and Logging: Set up tools for monitoring, logging, and alerting (e.g.,
AWS CloudWatch, Google Cloud Monitoring).
Backup and Disaster Recovery: Plan for data backups and disaster recovery
strategies.
Cost Management: Monitor and optimize cloud spending using cost management
tools (e.g., AWS Cost Explorer, Azure Cost Management).
Performance Tuning: Continuously monitor performance and optimize resource
usage.
Updates and Patches: Regularly update and patch software and systems to maintain
security and functionality.
Training: Provide training for your team on cloud technologies and best practices.
Support: Establish support mechanisms, whether through cloud provider support or
internal IT teams.
Cloud Providers: AWS, Microsoft Azure, Google Cloud Platform (GCP), IBM
Cloud, Oracle Cloud.
Management Tools: AWS Management Console, Azure Portal, Google Cloud
Console, Cloud Management Platforms (CMPs).
CI/CD: Tools like Jenkins, GitLab CI/CD, and AWS CodePipeline for continuous
integration and deployment.
Computing platforms and technologies encompass a broad range of systems and innovations
that support the development, deployment, and management of software applications. Here’s
a high-level overview of some key areas within this field:
1. Hardware Platforms
Desktop and Laptop Computers: Personal computing devices for general use.
Servers: High-performance machines used to provide services or resources to other
computers over a network.
Mobile Devices: Smartphones and tablets, which often require different
considerations for application development and performance.
Embedded Systems: Specialized computing systems within devices like automotive
controls, appliances, and industrial machines.
3. Cloud Computing
Infrastructure as a Service (IaaS): Provides virtualized computing resources over
the internet (e.g., AWS EC2, Google Compute Engine).
Platform as a Service (PaaS): Offers a platform allowing customers to develop, run,
and manage applications (e.g., Google App Engine, Heroku).
Software as a Service (SaaS): Delivers software applications over the internet (e.g.,
Salesforce, Microsoft 365).
4. Virtualization
5. Networking Technologies
6. Development Platforms
7. Databases
Relational Databases: Use structured query language (SQL) for data management
(e.g., MySQL, PostgreSQL).
NoSQL Databases: Designed for unstructured data and flexible schemas (e.g.,
MongoDB, Cassandra).
Data Warehousing: Systems used for reporting and data analysis (e.g., Amazon
Redshift, Google BigQuery).
Data Lakes: Centralized repositories for storing large volumes of raw data (e.g.,
Hadoop, AWS S3).
Sensors and Actuators: Devices that collect data and perform actions.
IoT Platforms: Frameworks to connect, manage, and analyze IoT devices (e.g., AWS
IoT, Google Cloud IoT).
Amazon Web Services (AWS) is a comprehensive and widely adopted cloud computing
platform offered by Amazon. It provides a range of cloud services, including computing
power, storage, and databases, as well as machine learning, analytics, and more. AWS
enables businesses to scale and grow by using cloud resources instead of maintaining
physical servers and infrastructure.
1. Compute: Services like Amazon EC2 (Elastic Compute Cloud) allow you to run
virtual servers in the cloud. AWS Lambda lets you execute code in response to events
without provisioning or managing servers.
2. Storage: Amazon S3 (Simple Storage Service) provides scalable object storage for a
variety of data types. Amazon EBS (Elastic Block Store) offers persistent block
storage for use with EC2 instances.
3. Databases: AWS offers managed database services such as Amazon RDS (Relational
Database Service) for SQL databases, Amazon DynamoDB for NoSQL databases,
and Amazon Aurora for high-performance relational databases.
4. Networking: Services like Amazon VPC (Virtual Private Cloud) allow you to create
isolated networks within the AWS cloud. AWS Direct Connect provides a dedicated
network connection from your premises to AWS.
5. Security: AWS provides tools like AWS Identity and Access Management (IAM) for
controlling access to your resources and AWS Shield for protection against DDoS
attacks.
6. Analytics: Services such as Amazon Redshift for data warehousing, Amazon Athena
for querying data in S3, and AWS Glue for data integration help you analyze and
process data.
7. Machine Learning: AWS offers machine learning services like Amazon SageMaker
for building, training, and deploying machine learning models, and AWS Lex for
creating conversational interfaces.
8. Developer Tools: AWS provides a range of tools for developers, including AWS
CodeCommit for source control, AWS CodeBuild for building code, and AWS
CodeDeploy for deployment.
9. Management and Monitoring: Services such as Amazon CloudWatch for
monitoring resources and AWS CloudTrail for tracking API calls help you manage
and oversee your AWS environment.
Google App Engine is a fully managed platform-as-a-service (PaaS) offering from Google
Cloud that allows developers to build, deploy, and scale web applications and services
without having to manage the underlying infrastructure. It abstracts away the infrastructure
management tasks, letting you focus on writing code and developing features.
Here are some key features and components of Google App Engine:
1. Compute Services:
o Azure Virtual Machines (VMs): Provides scalable virtualized computing
resources on-demand.
o Azure App Services: A fully managed platform for building, deploying, and
scaling web apps and APIs.
o Azure Functions: Serverless compute service that lets you run code in
response to events without managing infrastructure.
o Azure Kubernetes Service (AKS): Managed Kubernetes container
orchestration service.
2. Storage:
o Azure Blob Storage: Object storage for unstructured data such as documents,
images, and videos.
o Azure Files: Managed file shares that use the standard SMB protocol.
o Azure Disk Storage: High-performance, durable block storage for VMs.
3. Databases:
o Azure SQL Database: Fully managed relational database with built-in
intelligence and scaling.
o Azure Cosmos DB: Globally distributed, multi-model database service
designed for high performance and scalability.
o Azure Database for MySQL/PostgreSQL: Managed database services for
MySQL and PostgreSQL.
4. Networking:
o Azure Virtual Network: Provides a secure and isolated network within
Azure.
o Azure Load Balancer: Distributes incoming network traffic across multiple
VMs.
o Azure Application Gateway: A web traffic load balancer with built-in
application firewall capabilities.
5. Security and Identity:
o Azure Active Directory (AD): Cloud-based identity and access management
service.
o Azure Key Vault: Securely manages keys, secrets, and certificates.
o Azure Security Center: Provides unified security management and advanced
threat protection.
6. Analytics:
o Azure Synapse Analytics: Integrates big data and data warehousing for
advanced analytics.
o Azure Data Lake Storage: Scalable data lake for big data analytics.
o Azure Stream Analytics: Real-time data stream processing.
7. Machine Learning and AI:
oAzure Machine Learning: Provides tools and services for building, training,
and deploying machine learning models.
o Cognitive Services: Pre-built AI services for vision, speech, language, and
decision-making.
8. Developer Tools:
o Azure DevOps: A suite of development tools including source control, build
and release pipelines, and project management.
o Visual Studio Code Spaces: Cloud-based development environments.
9. Management and Monitoring:
o Azure Monitor: Provides comprehensive monitoring and diagnostics for
applications and resources.
o Azure Resource Manager: Manages resources through templates and
automation.
10. IoT:
o Azure IoT Hub: Connects, monitors, and manages IoT devices.
o Azure Digital Twins: Creates digital models of physical environments.
Hadoop:
Apache Hadoop is an open-source framework designed for distributed storage and processing
of large data sets across clusters of computers. It provides a scalable, fault-tolerant system for
handling vast amounts of data, making it a popular choice for big data applications. Here’s a
detailed overview of Hadoop and its components:
Hadoop's ecosystem includes a range of tools and frameworks that extend its capabilities:
1. Apache HBase:
o Purpose: A distributed, scalable, NoSQL database that runs on top of HDFS.
It provides real-time read/write access to large datasets.
o Features: It is designed for random, real-time read/write access to large
amounts of data.
2. Apache Hive:
o Purpose: A data warehousing and SQL-like query language system that
enables users to query and analyze large datasets stored in HDFS.
o Features: Provides a high-level query language called HiveQL that is similar
to SQL.
3. Apache Pig:
o Purpose: A scripting platform that provides a high-level language (Pig Latin)
for processing and analyzing large datasets.
o Features: It simplifies the process of writing complex MapReduce programs
with its high-level data flow language.
4. Apache Spark:
o Purpose: A unified analytics engine for large-scale data processing that
provides in-memory processing capabilities, which can be more efficient than
MapReduce for certain tasks.
o Features: It supports batch processing, interactive queries, real-time
streaming, and machine learning.
5. Apache Flume:
o Purpose: A distributed service for collecting, aggregating, and transporting
large amounts of log data.
o Features: It is often used to ingest log data into HDFS.
6. Apache Sqoop:
o Purpose: A tool for efficiently transferring bulk data between Hadoop and
relational databases.
o Features: It supports importing data from SQL databases into HDFS and
exporting data from HDFS to SQL databases.
7. Apache ZooKeeper:
o Purpose: A service for coordinating and managing distributed applications.
o Features: It provides a centralized service for maintaining configuration
information, naming, and providing distributed synchronization.