Module-4-Introduction to
Python Programming
Dr. Ch. Praveen Kumar
Assistant Professor
Dept. of EECE
Raspberry Pi
The Raspberry Pi is a low cost, credit-card sized
computer that plugs into a computer monitor or TV,
and uses a standard keyboard and mouse.
Raspberry Pi is a small-sized computer used Linux
operating system. It is mini size computer used
mostly to run larger and smart programs to achieve
output quickly.
It is a capable little device that enables people of all
ages to explore computing, and to learn how to
program in languages like Scratch and Python.
Contd…
The vision of Eben Upton, trustee and
cofounder of the Raspberry Pi Foundation,
was to build a computer that was small and
inexpensive and designed to be programmed
and experimented with, like the ones he had
used as a child, rather than to passively
consume games on.
The Foundation gathered a group of
teachers, programmers, and hardware
experts to thrash out these ideas from 2006.
Raspberry pi4
Contd…
7
8
There are
actually two
labels for all of
the pins,
Broadcom (BCM)
and board
(BOARD). The
board option will
let you refer to
the pin's actual
number on the
board, and the
Broadcom
number is the
actual pin
number that the
Broadcom chip
considers it to
be.
Contd…
Contd…
Raspberry Pi UART Pins
There are multiple kinds of serial communication and UART is one of
them. It is quite popular because of its simple communication system
and dependence on most of the software. There is multiple UART
communication pin in the Raspberry pi 4 and all of them are given
below:
TXD1 – GPIO14 – Pin8
RXD1 – GPIO15 – Pin10
TXD2 – GPIO0 – Pin27
RXD2 – GPIO1 – Pin28
TXD3 – GPIO5 – Pin29
RXD3 – GPIO4 – Pin7
TXD4 – GPIO8 – Pin24
RXD4 – GPIO9 – Pin21
TXD5 – GPIO12 – Pin32
RXD5 – GPIO13 – Pin33
Contd…
SPI Communication Pins
some devices use SPI protocol and it could help the controlling device to control multiple devices using single
data transmissions wire. In Raspberry pi 4 there are multiple SPI pins that can be used for SPI communication.
The SPI pin of Raspberry Pi 4 is given below:
SPI3 CEO N – GPIO0 – Pin27
SPI3 MISO – GPIO1 – Pin28
SPI3 MOSI – GPIO2 – Pin3
SPI3 SCLK – GPIO3 – Pin5
SPI4 CEO N – GPIO4 – Pin7
SPI4 MISO – GPIO5 – Pin29
SPI4 MOSI – GPIO6 – Pin31
SPI4 SCLK – GPIO7 – Pin26
SPI0 CE1 N – GPIO8 – Pin24
SPI0 CE0 N – GPIO9 – Pin21
SPI0 MISO – GPIO10 – Pin19
SPI0 MOSI – GPIO11 – Pin23
SPI5 CEO N/ SPI0 SCLK – GPIO12 – Pin32
SPI5 MISO – GPIO13 – Pin33
SPI5 MOSI – GPIO14 – Pin8
Contd…
I2C Communication Pins
Raspberry Pi 4 also supports the I2C protocol. It is a type of serial communication used by
some sensors and motors to communicate. In Pi GPIO pins also gives I 2C support. All these
pins are given below:
SDA0/SDA6 – GPIO0 – Pin27
SCL0/SCL6 – GPIO1 – Pin28
SDA1/SDA3 – GPIO2 – Pin3
SCL1/SCL3 – GPIO3 – Pin5
SDA3 – GPIO4 – Pin7
SCL3 – GPIO5 – Pin29
SDA4 – GPIO6 – Pin31
SCL4 – GPIO7 – Pin26
SDA4 – GPIO8 – Pin24
SCL4 – GPIO9 – Pin21
SDA5 – GPIO10 – Pin19
SCL5 – GPIO11 – Pin23
SDA5 – GPIO12 – Pin32
SCL5 – GPIO13 – Pin33
SDA6 – GPIO22 – Pin15
SCL6 – GPIO23 – Pin16
Contd…
RPi PWM GPIO Pins
To generate the desired pulse output signal
Raspberry Pi 4 has some PWM pins. Those pins
can be used directly with any low voltage external
device to get that signal. To generate signal first
the pins should get the instructions first. All PWM
pins are given below:
PWM0 – GPIO12 – Pin32
PWM1 – GPIO13 – Pin33
PWM0 – GPIO18 – Pin12
PWM1 – GPIO19 – Pin35
Contd…
Raspberry Pi SDIO Pins
In Raspberry Pi 4 there is a slot for SD card but
GPIO Pins also support the SD card compatibility.
SDIO pins on the device can be used for SD card
in case of requirement:
SD0CLK/SD1 CLK – GPIO22 – Pin15
SD0 CMD/SD1 CMD – GPIO23 – Pin16
SD0 DATA0/SD1 DAT0 – GPIO24 – Pin18
SD0 DAT1/SD1 DAT1 – GPIO25 – Pin22
SD1 DAT2/SD1 DAT2 – GPIO26 – Pin37
SD0 DAT3/SD1 DAT3 – GPIO27 – Pin13
Raspberry Pi 4 Features and Specifications
Contd..
Differences b/w Arduino and Raspberry pi:
Contd…
Contd…
You should choose Arduino if:
You are from electronics background or if you
are a beginner and really want to learn about
electronics and its components.
Your project is simple, especially networking
is not involved.
Your project is more like a electronics project
where software applications are not involved,
like Burglar alarm, voice controlled light.
You are not a computer geek who is not much
interested in softwares and Linux.
You should choose Raspberry Pi If:
Your project is complex and networking is
involved.
Your project is more like a software
application, like a VPN server or Webserver
Have good knowledge about Linux and
softwares.
DEVELOPING ON THE RASPBERRY PI
If you want to seriously explore the
Raspberry Pi, you would be well advised to
pick up a copy of the Raspberry Pi User
Guide, by Eben Upton and Gareth Halfacree
(Wiley, 2012).
Getting started with your Raspberry Pi
Connecting the microSD card:
Contd…
Connecting a keyboard and mouse:
Contd…
Connecting a display:
Contd…
Connecting a network cable (optional):
Contd…
Connecting a power supply:
Contd…
Raspberry Pi together!:
Contd..
Operating System:
Although many operating systems can run on the
Pi, we recommend using a popular Linux
distribution, such as
◾ Raspbian: Released by the Raspbian Pi
Foundation, Raspbian is a distro based on Debian.
This is the default “official” distribution and is
certainly a good choice for general work with a Pi.
◾ Occidentalis: This is Adafruit’s customised
Raspbian. Unlike Raspbian, the distribution
assumes that you will use it “headless”—not
connected to keyboard and monitor—so you can
connect to it remotely by default. (Raspbian
requires a brief configuration stage first.)
Contd…
Installing the Raspbian operating system:
Contd…
The Raspbian desktop:
Contd…
The Welcome Wizard
Navigating the desktop
The Chromium web browser
Loading the Raspberry Pi website in
Chromium
35
import RPi.GPIO as GPIO
GPIO.setmode(GPIO.BOARD)
GPIO.setup(3, GPIO.IN)
GPIO.setup(5, GPIO.OUT)
while True:
val=GPIO.input(3)
print (val)
if val==1:
GPIO.output(5,GPIO.LOW)
else:
GPIO.output(5,GPIO.HIGH)
PIR sensor
37
LDR Sensor
38
39
Introduction to Python Programming
Python is a high-level, versatile programming language known
for its simplicity and readability, making it an excellent choice
for beginners and experienced developers alike.
Created by Guido van Rossum and first released in 1991,
Python emphasizes code readability and allows developers to
express concepts with fewer lines of code than many other
programming languages.
40
Why Learn Python?
Easy to Learn: Python has simple syntax (rules of
writing code), which makes it less complex than other
programming languages.
Versatile: You can use Python for almost anything –
from building websites to analyzing data, creating video
games, or even controlling robots.
Large Community Support: Since Python is so
popular, there’s a huge community of programmers
ready to help with resources, tutorials, and forums.
41
Key Features of Python:
1. Simple and Readable Syntax: Python’s syntax is clean and
easy to understand, which makes it ideal for beginners. The
use of indentation instead of braces makes code look neat
and improves readability.
2. Interpreted Language: Python is an interpreted language,
meaning that it executes code line by line, which allows for
easier debugging and faster development cycles.
3. Dynamically Typed: Variables in Python do not need
explicit declaration, and the type of a variable can change
during runtime, making coding flexible.
42
4. Extensive Standard Library: Python comes with a rich
set of libraries and frameworks that support a wide range of
programming tasks, from web development to scientific
computing, machine learning, and automation.
5. Cross-platform Compatibility: Python can be run on
various operating systems such as Windows, macOS, and
Linux, making it a portable language.
6. Open Source and Community Support: Python is free
and open-source, supported by a large and active community
that contributes to its vast repository of packages and tools.
Key benefits of Python
•Easy to learn, read, and maintain – Simple syntax with
fewer keywords.
•Object and procedure-oriented – Supports code reuse
through functions and procedures.
•Extendable – Allows integration with low-level
languages like C/C++.
•Scalable – Provides a structured approach for large
programs.
•Portable – Python programs can run on different systems
without modification.
•Broad library support – Includes libraries for machine
learning, image processing, cryptography, networking, etc.
Python Syntax and Basics
Let’s start with the basics of Python. Here are some important concepts:
1. Variables: In Python, you don’t need to declare a variable's type. Python
automatically understands whether it’s a number, string, or something else.
age = 18 # Integer
name = "Alice" # String
2. Printing: To output something in Python, we use the print() function.
print("Hello, World!")
3. Data Types:
o Integers: Whole numbers like 5, 10, -3.
o Floats: Decimal numbers like 3.14, 0.5.
o Strings: Text inside quotes, like "Python", "Coding".
o Booleans: True or False values.
45
4. Arithmetic: Python can handle basic math operations:
x = 5 + 3 # Addition
y = 10 - 2 # Subtraction
z = 4 * 2 # Multiplication
a = 10 / 2 # Division
5. Conditional Statements: Python allows you to make
decisions in your program using if, else, and elif
(else if) statements.
age = 18
if age >= 18:
print("You are an adult.")
else:
print("You are a minor.")
46
6. Loops: Loops let you repeat code multiple times. The two main types are for
loops and while loops.
o For Loop: Useful for iterating over a range of numbers or a list of items.
for i in range(5):
print(i) # This will print numbers from 0 to
4
o While Loop: Repeats as long as a condition is true.
count = 0
while count < 3:
print("Counting:", count)
count += 1 # Increment count
7. Functions: Functions allow you to write code once and reuse it whenever you
need. It helps to organize your code better.
def greet(name):
print(f"Hello, {name}!")
greet("Alice")
47
Basic Python Concepts:
Variables: Used to store data values. Python does not require declaring a
variable’s type.
x = 5
name = "John"
Data Types: Python supports several data types like integers, floats, strings,
lists, tuples, sets, and dictionaries.
number = 10 # Integer
pi = 3.14 # Float
fruits = ['apple', 'banana', 'cherry’] # List
Conditional Statements: Python supports if-else statements to perform
different actions based on conditions.
if x > 0:
print("Positive")
else:
print("Non-positive")
48
Contd..
“blinking lights” example:
import RPi.GPIO as GPIO #importing GPIO library
import time #importing time library for delay
GPIO.setmode(GPIO.BOARD)# enable BOARD pin
numbering
GPIO.setup(8, GPIO.OUT) # set the GPIO pin 8 to output
mode
while True:
GPIO.output(8, True)
time.sleep(1)
GPIO.output(8, False)
time.sleep(1)
Applications of Python:
Web Development: Using frameworks like Django and
Flask.
Data Science and Machine Learning: Libraries like
NumPy, pandas, TensorFlow, and scikit-learn are widely
used for data analysis and AI tasks.
Automation and Scripting: Python is frequently used for
automating repetitive tasks, making it popular in the
DevOps world.
Game Development: Libraries like Pygame make it
possible to develop simple games.
50
Software Defined
Networking
Software Defined Networking (SDN)
(SDN) is a modern way to
manage and control computer networks. Instead of controlling
each networking device (like switches and routers) individually,
SDN centralizes control, making networks easier to manage
and more flexible.
SDN is an approach to networking that separates the control
plane (decision-making about where traffic goes) from the data
plane (actually moving the data). Instead of using traditional
hardware-based network management, SDN uses software
controllers to manage network behavior dynamically.
In traditional networks, the hardware (like routers and
switches) decides how data moves through the network, but
SDN changes this by moving the decision-making to a central
software system.
51
What is a Data Plane?
All the activities involving and resulting from data
packets sent by the end-user belong to this plane.
Data Plane includes:
•Forwarding of packets.
•Segmentation and reassembly of data.
•Replication of packets for multicasting.
What is a Control Plane?
All activities necessary to perform data plane
activities but do not involve end-user data packets
belong to this plane. In other words, this is the brain
of the network. The activities of the control plane
include:
53
What are the Components of Software Defining
Networking (SDN)?
The three main components that make the SDN are:
•SDN Applications: SDN Applications relay requests or
networks through SDN Controller using API.
•SDN Controller: SDN Controller collects network
information from hardware and sends this information
to applications.
•SDN Networking Devices: SDN Network devices help
in forwarding and data processing tasks.
54
55
56
57
58
59
SDN Architecture
In a traditional network, each switch has its own data
plane as well as the control plane.
The control plane of various switches exchange
topology information and hence construct a
forwarding table that decides where an incoming
data packet must be forwarded via the data plane.
Software-defined networking (SDN) is an approach
via which we take the control plane away from the
switch and assign it to a centralized unit called the
SDN controller.
Hence, a network administrator can shape traffic via
a centralized console without having to touch the
individual switches. 61
The data plane still resides in the switch and when a
packet enters a switch, its forwarding activity is
decided based on the entries of flow tables, which are
pre-assigned by the controller.
A flow table consists of match fields (like input port
number and packet header) and instructions.
The packet is first matched against the match fields of
the flow table entries. Then the instructions of the
corresponding flow entry are executed.
The instructions can be forwarding the packet via one
or multiple ports, dropping the packet, or adding
headers to the packet.
62
SDN Architecture
A typical SDN architecture consists of three layers.
•Application Layer: It contains the typical network
applications like intrusion detection, firewall, and
load balancing.
•Control Layer: It consists of the SDN controller which
acts as the brain of the network. It also allows hardware
abstraction to the applications written on top of it.
•Infrastructure Layer: This consists of physical
switches which form the data plane and carries out the
actual movement of data packets.
63
64
Advantages of SDN
Advantage Explanation
Centralized Control One controller manages the entire
network, simplifying administration
and decision-making.
Flexibility & Network behavior can be changed
Programmability easily through software without
touching hardware.
Scalability Easily adds or removes devices
(especially helpful for growing IoT
environments).
Enhanced Security Centralized policies make it easier to
enforce security rules and
detect/respond to threats.
Advantages of SDN
Advantage Explanation
Efficient Resource Utilization Traffic can be optimized,
balancing the load and
improving bandwidth use.
Faster Innovation New services or applications
can be deployed quickly
without changing the physical
infrastructure.
Cost Savings Reduces the need for
expensive, proprietary
hardware. Standard hardware
can be used instead.
Disadvantages of SDN
Disadvantage Explanation
Single Point of If the SDN controller fails, the whole
Failure network can be affected unless
redundancy is built in.
Security Risks The centralized controller is a critical
target for cyber-attacks; if
compromised, it can control the whole
network.
Complex Transitioning from traditional networks
Deployment to SDN can be complex, especially in
large enterprises.
Interoperability Integrating SDN with existing (legacy)
Issues hardware and protocols can be
challenging.
Disadvantages of SDN
Disadvantage Explanation
Requires Skilled Personnel Need for staff trained in SDN
technologies.
Latency Issues Delays if controller is
overloaded.
High Initial Investment Upfront costs can be
significant.
SDN TRADITIONAL NETWORK
Software Defined Network is virtual Traditional network is the old
networking approach. conventional networking approach.
Software Defined Network is Traditional Network is distributed
centralized control. control.
This network is programmable. This network is non programmable.
Software Defined Network is open Traditional network is closed
interface. interface.
In Software Defined Network data In traditional network data plane
plane and control plane are and control plane are mounted on
decoupled by software. same plane.
It supports automatic configuration It supports static/manual
so it takes less time. configuration so it takes more time.
It can prioritize and block specific It leads all packets in the same way
network packets. no prioritization support.
SDN TRADITIONAL NETWORK
It is difficult to program again and to
It is easy to program as per need.
replace existing program as per use.
Cost of Software Defined Network is
Cost of Traditional Network is high.
low.
Structural complexity is low in Structural complexity is high in
Software Defined Network. Traditional Network.
Extensibility is high in Software Extensibility is low in Traditional
Defined Network. Network.
In SDN it is easy to troubleshooting In Traditional network it is difficult to
and reporting as it is centralized troubleshoot and report as it is
controlled. distributed controlled.
Its maintenance cost is lower than Traditional network maintenance cost
traditional network. is higher than SDN.
Software Defined
Networking in IoT
The Internet of Things (IoT) involves a massive
number of devices (sensors, actuators, etc.) that
generate huge amounts of data. Managing such
large, complex networks with traditional methods is
hard.
SDN makes it easier by centralizing control and
making the network programmable.
71
Significance of Software Defined
Networking
in IoT
Software−Defined Networking (SDN) on the Internet of
Things (IoT) signifies a considerable improvement over
traditional networking, delivering a range of essential
benefits:
1. Centralized Control
•A controller (software) manages all network devices
(switches, routers).
•It makes intelligent decisions about routing, security,
and resource allocation.
72
2. Customizable Network Infrastructure:
With SDN, network admins can easily
change how the network works from one
place. They can quickly adjust things to
make sure important apps run smoothly and
data moves faster where it's needed most.
73
3. Robust Security:
SDN helps you see the whole network clearly,
making it easier to spot security problems.
As more smart devices connect to the
Internet, SDN offers better protection than
old-style networks.
It lets operators separate devices into
different safety zones and quickly block any
device that gets infected, stopping it from
spreading problems to others.
74
4. Programmability
•Network behavior can be changed quickly
through software instead of reconfiguring
hardware.
•This flexibility is super useful for IoT devices
that may frequently join or leave the network.
5. Automation & Efficiency
•SDN allows automatic configuration, traffic
engineering, and load balancing.
•It can prioritize important IoT data (like
emergency alerts) over regular data.
Benefits of SDN in IoT
Feature Why It Helps IoT
Supports the growing number of
Scalability
IoT devices.
Easy to adapt to changing IoT
Flexibility
needs.
Centralized Management Simplifies managing diverse IoT
devices.
Enhanced Security Quick detection and isolation of
threats.
Efficient use of bandwidth and
Optimized Performance
energy.
Risks of Software Defined Networking in
IoT
The biggest concern is the central controller.
If it gets hacked or stops working, the whole
network could be affected because
everything depends on it.
77
Data Handling and Analytics
What is Data Handling?
Data handling is about:
•Storing, saving, or safely getting rid of research data during and
after a project.
•Making rules and steps to manage data, whether it’s electronic or
paper-based.
Recent Data Concerns:
•Big Data
• Caused by lots of traffic from IoT (Internet of Things) devices.
• Large amounts of data are produced by sensors used in these
devices.
What is Big Data?
•Big Data refers to very large amounts of data that come in fast and
from many different sources.
•Special technologies and systems are used to collect, find, and
analyze this data quickly and efficiently.
1.Big Data helps get valuable insights from huge amounts of different
types of data.
2.Traditional methods can't handle Big Data well because:
1. The data is too big.
2. It comes in too fast.
3. It’s complicated to organize.
3.New tools and technologies are needed to process and analyze it
effectively.
Types of Data
Structured Data
Structured data is highly organized and easy to search, manage, and analyze. It
follows a clear format or model.
• Storage:
Typically stored in Relational Databases (RDBMS), like MySQL, Oracle, or
SQL Server.
• Access & Management:
Managed using Structured Query Language (SQL), which allows users to
query and manipulate data efficiently.
• Examples:
• Customer information (Name, Age, Address)
• Banking transactions
• Inventory data
• Though easy to manage, structured data only makes up 20% of all available
data today.
Unstructured Data
Unstructured data doesn’t follow a specific format or predefined
model. It's often text-heavy but can also include images, videos, and
audio.
• Storage & Processing Challenges:
Traditional relational databases (RDBMS) struggle to process
unstructured data because it doesn't fit neatly into tables and rows.
• Examples:
• Emails
• Social media posts
• Images, videos, and audio files
• Sensor data from IoT devices
Unstructured data accounts for 80% of the total data available
today.
Advanced technologies like Big Data tools (Hadoop, NoSQL
databases) and AI/ML are often used to analyze this data and
Characteristics of Big Data (7 Vs)
•Volume
•Refers to the huge amount of data generated every second.
•Data comes from various sources like social media, IoT devices,
transactions, sensors, etc.
•Example: Facebook generates terabytes of data daily.
•Velocity
•The speed at which data is generated, collected, and processed.
•Modern applications need to handle real-time or near-real-time
data streams.
•Example: Stock market data, online transactions.
• Variety
• Refers to the different types of data formats.
• Can be structured, semi-structured, or unstructured.
• Example: Text, images, videos, sensor data, logs.
•Variability
•Deals with the inconsistency and unpredictability of
data.
•Sometimes the meaning or context of data changes,
making it complex to manage and analyze.
•Example: Social media sentiment changes frequently and
varies by region and language.
•Veracity
•Refers to the quality and trustworthiness of data.
•Big Data often contains inconsistencies, inaccuracies, or is
incomplete.
•Example: Incorrect sensor readings, fake news, or
misleading social media posts.
Value
• The usefulness or benefit derived from analyzing data.
• Extracting valuable insights leads to better decisions and
business outcomes.
• Example: Analyzing customer purchase history to offer
personalized discounts.
Visualization
• The process of representing data insights visually (charts,
graphs, dashboards) to make complex data understandable
and actionable.
Summary
Big Data isn't just about large amounts of data. It's about
managing how fast it comes, where it comes from, what
form it's in, how trustworthy it is, how to visualize it,
and most importantly, how to extract value from it.
Data Handling Technologies
Cloud Computing:
Cloud computing is one of the primary technologies used
for managing, storing, and processing large volumes of
data efficiently.
It offers scalable resources and services over the internet,
enabling organizations to handle massive datasets (big
data) without investing in physical infrastructure.
Essential Characteristics of Cloud Computing
1. On-Demand Self-Service
1. You can get computer resources (like storage or servers) whenever you
need them, without asking anyone.
2. Broad Network Access
2. You can use cloud services from anywhere, on any device (laptop,
phone, tablet).
3. Resource Pooling
3. Resources (storage, memory, processing) are pooled together and
shared among multiple users.
4. These resources are dynamically assigned and reassigned according to
demand.
4. Rapid Elasticity
5. Resources can be easily add more resources or reduce them, depending
on how much you need.
5. Measured Service
6. Resource usage is monitored, controlled, and reported for transparency,
Basic Service Models in Cloud Computing
1. Infrastructure-as-a-Service (IaaS)
1. Provides virtualized hardware resources like servers, storage, and
networking over the internet.
2. Example: Amazon Web Services (AWS), Microsoft Azure (IaaS layer).
3. Users manage the operating systems, storage, and deployed applications, but
not the physical infrastructure.
2. Platform-as-a-Service (PaaS)
1. Provides a platform allowing users to develop, run, and manage applications
without worrying about infrastructure.
2. Example: Google App Engine, Heroku.
3. PaaS offers tools and libraries to support the complete lifecycle of building
and deploying apps.
3. Software-as-a-Service (SaaS)
1. Provides ready-to-use software applications over the internet.
2. Example: Gmail, Google Workspace (Docs, Sheets), Microsoft 365, Dropbox.
3. Users access the software via a browser, while the service provider manages
the infrastructure and platforms behind the scenes.
Cloud Deployment models
Cloud deployment models define how cloud
services are made available to users. They
specify the environment, control, and
ownership of the infrastructure.
1. Public Cloud
•Ownership: Managed by third-party cloud providers
(like AWS, Microsoft Azure, Google Cloud).
•Access: Services are available to anyone who wants to
purchase or use them.
•Infrastructure: Shared with other users (multi-tenant
environment).
•Examples: Hosting websites, SaaS applications,
storage services.
•Pros:
• Cost-effective (pay-as-you-go)
• No need for physical hardware
• Scalable and reliable
•Cons:
• Less control over data security and compliance
• Shared resources might pose security risks for
sensitive data
2. Private Cloud
• Ownership: Exclusive to a single organization,
either managed internally or by a third party.
• Access: Only accessible to a specific organization.
• Infrastructure: Can be on-premises or hosted
externally, but dedicated.
• Examples: Government agencies, banking systems
with sensitive data.
• Pros:
• Enhanced security and control
• Customizable to meet specific compliance
requirements
• Cons:
• Higher cost (infrastructure and maintenance)
• Requires skilled IT staff for management
3. Hybrid Cloud
• Combination of public and private cloud models.
• Allows data and applications to be shared between
them.
• Example: Running sensitive data on a private cloud
while leveraging the public cloud for less critical
workloads.
• Pros:
• Flexibility and scalability
• Balances security (private) and cost efficiency
(public)
• Cons:
• Complex to manage and integrate
• Requires strong network and security management
4. Community Cloud
• Shared by several organizations with similar
requirements (like regulatory, compliance, or
industry needs).
• Can be managed internally or by a third party.
• Examples: Healthcare organizations sharing
infrastructure to manage patient data.
• Pros:
• Cost-effective for communities with common
concerns
• Encourages collaboration and shared responsibility
• Cons:
• Less customizable compared to a private cloud
• Potential for data privacy concerns among members
Data Handling Technologies(contd..)
Internet of Things (IoT)
The Internet of Things (IoT) refers to a system of
interrelated physical devices that can collect and exchange
data over the internet. These devices are equipped with
sensors, software, and other technologies to connect and
exchange data with other devices and systems.
Definition (According to Techopedia):
• IoT "describes a future where everyday physical objects
will be connected to the internet and will be able to
identify themselves to other devices."
• In simpler terms, devices (like appliances, vehicles, or
machinery) can communicate with each other and share
information without human intervention.
1. Sensors Embedded in Devices and Machines:
1. IoT devices have sensors that can detect and measure things like
temperature, motion, light, humidity, etc.
2. These sensors are placed in machines and gadgets used in various
fields such as healthcare, agriculture, manufacturing, and more.
2. Data Transmission to Remote Servers via the Internet:
1. Once data is sensed, IoT devices transmit this information to
remote servers (often cloud servers) using the internet.
2. This enables real-time data collection and remote monitoring.
3. Continuous Data Acquisition:
1. IoT ensures continuous data collection from:
1. Mobile equipment (e.g., vehicles with GPS and diagnostics)
2. Transportation facilities (e.g., traffic lights, smart toll systems)
3. Public facilities (e.g., street lights, public utilities)
4. Home appliances (e.g., smart refrigerators, thermostats)
2. This constant data flow helps in monitoring, predictive
maintenance, and automation.
Data Handling Technologies(contd..)
Data Handling at Data Centers:
Data centers play a critical role in managing and handling
the vast amounts of data generated by businesses and users.
They act as centralized repositories where data is stored,
processed, and analyzed. Efficient data handling at data
centers ensures smooth business operations and better
decision
Storing, Managing, and Organizing Data:
• Data centers securely store huge volumes of data.
• They provide structured storage systems that ensure data is
easily accessible and well-organized.
• Storage solutions include relational databases, NoSQL
databases, and data warehouses.
•Estimates and Provides Necessary Processing Capacity:
•They figure out how much computer power is needed and adjust
resources (CPU, memory, storage) to handle the job smoothly.
•Provides Sufficient Network Infrastructure:
•They make sure the internet and network speed are fast and
reliable, so data moves quickly and without delays.
•Effectively Manages Energy Consumption:
•Since data centers use a lot of electricity, they work to save
energy—keeping things cool and using power efficiently to help
the environment.
•Replicates Data to Keep Backup:
•They make extra copies of data to protect it. If something goes wrong,
the data is safe and easy to get back.
•Develop Business-Oriented Strategic Solutions from Big Data:
•They look at lots of data to find useful information. This helps
companies plan better and make smarter choices.
•Helps Business Personnel to Analyze Existing Data:
•They give tools like dashboards and reports, so people can see and
understand data without needing to be experts.
•Discovers Problems in Business Operations:
•By checking how things are working, they can spot problems early.
This helps businesses fix issues and work better.
Flow of Data:
Data Sources
Data is generated and collected from various sources.
These sources vary depending on the domain, type of
industry, and the nature of data.
•Enterprise Data:
Data produced by business organizations as part of their day-to-day operations.
•Online trading & analysis
•Production & inventory data
•Sales & financial records
•IoT Data
•Data generated by interconnected devices equipped with sensors that
communicate over the internet.
•Industry, agriculture, traffic, transportation
•Medical-care data
•Data from public departments & families
•Bio-medical Data
•Data related to healthcare and biological sciences, typically vast and complex.
•Gene sequencing data
•Data from medical clinics & R&D
•Other Fields
•Specialized domains that produce huge volumes of scientific and technical data.
•Computational biology
•Astronomy
•Nuclear research
Data Acquisition
Data acquisition refers to the process of gathering and
transporting data from various sources to a centralized
system for storage, processing, and analysis. It involves
two main stages:
Data Collection
This is the first step where data is collected from different sources, including sensors,
devices, and log files.
• Log Files / Record Files:
These files are automatically generated by data sources. They record various activities
and events which can be analyzed later for insights and decision-making.
• Sensory Data:
Data collected from different types of sensors such as:
• Sound wave sensors
• Voice recognition sensors
• Vibration sensors
• Automobile sensors
• Chemical sensors
• Weather sensors (temperature, pressure, etc.)
• Complex and Varied Data Collection from Mobile Devices:
Mobile devices collect diverse types of data, such as:
• Geographical location (GPS data)
• 2D barcodes (e.g., QR codes)
• Pictures and videos
These data are often unstructured and come from multiple sources, making the collection
process complex.
Data Transmission
Once the data is collected, it needs to be transferred to
storage or processing systems for further analysis.
• Transfer to Storage System:
After collecting the data, it is sent to data centers or
storage infrastructures where it can be processed and
analyzed.
• Transmission Types:
• Inter-DCN Transmission: Transmission of data between
different data center networks (DCNs).
• Intra-DCN Transmission: Transmission of data within a
single data center network.
Data Pre-processing
Data pre-processing is a critical step in the data acquisition process.
The raw data collected from different sources is often messy,
incomplete, redundant, or noisy.
Pre-processing ensures that the data is clean, reliable, and ready for
further analysis.
Why Pre-processing is Necessary
• Collected datasets may suffer from noise (irrelevant or meaningless
data), redundancy (duplicate records), and inconsistencies
(conflicting information).
• Without pre-processing, the analysis could lead to incorrect
insights.
• Pre-processing ensures higher data quality for accurate decision-
Data Storage
Data storage in the context of Big Data involves techniques and
technologies that enable the management of massive amounts of data
with high availability, consistency, and fault tolerance.
1. File System
Distributed file systems are essential for handling Big Data because
they:
• Store massive data across multiple machines.
• Ensure consistency, availability, and fault tolerance, meaning they
protect against data loss and ensure that data is always accessible.
Examples:
GFS (Google File System):
• A distributed file system designed by Google.
• Supports large-scale file systems efficiently.
• Limitation: Its performance isn't as effective when dealing with small
files.
2. Databases
Big Data demands new types of databases that can handle its volume, variety, and velocity.
Traditional relational databases (RDBMS) often fall short, which has led to the
emergence of NoSQL databases.
NoSQL Databases:
• Non-relational databases designed to handle unstructured, semi-structured, and
structured data.
• Scalable and flexible to manage Big Data's unpredictable data types and formats.
Three Main Types of NoSQL Databases:
1. Key-Value Databases
1. Store data as key-value pairs.
2. Simple and efficient for quick retrieval.
3. Examples: Redis, DynamoDB.
2. Column-Oriented Databases
1. Store data in columns instead of rows.
2. Optimized for queries on large datasets.
3. Examples: Apache Cassandra, HBase.
3. Document-Oriented Databases
1. Store data as documents (usually JSON, XML, BSON).
2. Flexible and suitable for handling complex data.
Data handling using Hadoop
Hadoop is a software framework for distributed
processing of large datasets across large clusters of
computers.
Hadoop is an open-source implementation for Google’s
GFS (Google File System) and MapReduce.
Apache Hadoop’s two key components are:
MapReduce (for processing large data sets)
HDFS (Hadoop Distributed File System) (for distributed
storage)
These were originally derived from Google’s MapReduce
and Google File System (GFS).
Building Blocks of Hadoop
Hadoop Common
A module that contains the utilities and libraries supporting
the other Hadoop components.
Hadoop Distributed File System (HDFS)
Provides reliable data storage and access across nodes in
the Hadoop cluster.
MapReduce
A framework that processes large amounts of data in
parallel, across different nodes in the cluster.
Yet Another Resource Negotiator (YARN)
Called the next-generation MapReduce.
Manages resources by assigning CPU, memory, and storage to
applications running on a Hadoop cluster.