Module 3:Data Analytics and
Supporting Services
Prepared by
Mr.Manish Bhelande
IT Department
SAKEC
What is Data Analytics
• Data analytics is defined as a set of processes, tools, and
technologies that help manage qualitative and quantitative data
to enable discovery, simplify organization, support governance,
and generate insights for a business.
Steps in Data Analysis
The process involved in data analysis involves several steps:
Determine the data requirements or how the data is grouped. Data may
be separated by age, demographic, income, or gender. Data values may be
numerical or divided by category.
Collect the data. This can be done through a variety of sources such as
computers, online sources, cameras, environmental sources, or through
personnel.
Organize the data after it's collected so it can be analyzed. This may take
place on a spreadsheet or other form of software that can take statistical data.
Clean up the data before it is analyzed. This is done by scrubbing it and
ensuring there's no duplication or error and that it is not incomplete. This step
helps correct any errors before the data goes on to a data analyst to be
analyzed.
Types of Data Analytics
There are 6 types of data analytics:
Types of Data Analytics
There are 6 types of data analytics:
Descriptive analytics: This describes what has happened over a given
period of time. Have the number of views gone up? Are sales stronger this
month than last?
Diagnostic analytics: This focuses more on why something happened. It
involves more diverse data inputs and a bit of hypothesizing. Did the
weather affect beer sales? Did that latest marketing campaign impact sales?
Predictive analytics: This moves to what is likely going to happen in the
near term. What happened to sales the last time we had a hot summer? How
many weather models predict a hot summer this year?
Prescriptive analytics: This suggests a course of action. For example, we
should add an evening shift to the brewery and rent an additional tank to
increase output if the likelihood of a hot summer is measured as an average
of these five weather models and the average is above 58%
Types of Data Analytics
There are 6 types of data analytics:
Real-time data analytics
Real-time data analytics involves using data immediately when entered into
the database.
Unlike other types of data analytics that use data from past events
(historical data), this type analyses new data from customers or external
sources on the go.
Augmented data analytics
Augmented analytics uses machine language (ML) and natural language
processing (NLP) to analyze data.
Incorporating machine learning into analytics helps automate the tedious
task of code-based data exploration and make it available to business users.
Data Analytics Techniques
Data analysts can use several analytical methods and techniques to process
data and extract information. Some of the most popular methods include
Regression Analysis:
This entails analyzing the relationship between one or more independent
variables and a dependent variable.
The independent variables are used to explain the dependent variable,
showing how changes in the independent variables influence the dependent
variable.
Factor Analysis:
This entails taking a complex dataset with many variables and reducing
the variables to a small number.
The goal of this maneuver is to attempt to discover hidden trends that
would otherwise have been more difficult to see.
Data Analytics Techniques
Cohort Analysis:
This is the process of breaking a data set into groups of similar data, often
into a customer demographic.
This allows data analysts and other users of data analytics to further dive
into the numbers relating to a specific subset of data.
Monte Carlo Simulations:
Models the probability of different outcomes happening. They're often used
for risk mitigation and loss prevention.
These simulations incorporate multiple values and variables and often have
greater forecasting capabilities than other data analytics approaches.
Time Series Analysis: Tracks data over time and solidifies the relationship
between the value of a data point and the occurrence of the data point.
This data analysis technique is usually used to spot cyclical trends or to
project financial forecasts.
Importance of Data Analytics
Structured Versus Unstructured Data
What Is Structured Data?
Structured data embodies organization and orderliness, residing within
databases in defined formats.
Think of it as data arranged neatly in rows and columns, facilitating easy
processing and analysis.
This structured format allows for swift retrieval and utilization,
contributing significantly to the efficiency of data-driven operations.
Main Characteristics of Structured Data
Organization and Format
Structured data follows a defined and organized format, often residing in fixed
fields within a record or database.
It is organized in a way that makes it easily searchable, categorized, and stored
within specific data structures.
Relational Integrity
Structured data often maintains relationships between different data points or
entities.
This relational integrity ensures consistency and coherence within the dataset,
preventing inconsistencies or conflicting information.
Ease of Querying and Analysis
Due to its organized nature, structured data is easily accessible for querying and
analysis. This characteristic allows for quick retrieval of specific information,
facilitating efficient data-driven decision-making processes.
Main Characteristics of Structured Data
Ease of Querying and Analysis
Due to its organized nature, structured data is easily accessible for querying and
analysis.
This characteristic allows for quick retrieval of specific information, facilitating
efficient data-driven decision-making processes.
Simplified Processing
Processing structured data is comparatively straightforward because of its
predefined format and organization.
Examples of Structured Data
Structured data is used in multiple consumer-oriented business databases or
ERPs, such as :
E-commerce: Review data, pricing data, and SKU number of commodities
Healthcare: hospital administration, pharmacy, and patient data and
medical history of patients.
Banking: Financial transaction details like name of beneficiary, account
details, sender or receiver information and bank details
Customer relationship management (CRM) software: lead acquisition
data, source, activity and so on of leads in the CRM database.
Travel industry: Passenger data, flight information, and travel transactions.
Challenges of Structured Data
Most structured data issues highlight its inflexibility and rigidity in scaling larger
database schemas.
Structured data is "schema on write" or "heavily dependent on schema" for operations.
As structured data is schema dependent, it is a little difficult to scale it for large
databases.
The time needed to load structured data is sometimes underestimated. Identifying
hidden problems in the source system and updating, retrieving, and restoring it can eat
into your cloud storage.
Doesn't cope well with the changing business scenario. It is hard to determine which
query would result in a specific business outcome.
The nature of queries and transactions change as a business shifts its consumer focus.
Structured data is manually entered into the database management system.
The user has to type in a DDL (data definition language) command like Create, Insert,
and Select to sort, manage and retrieve data from the system.
Tools of Structured Data
Unstructured Data
Unstructured data is often categorized as qualitative and cannot be
processed and analyzed using conventional data tools and methods.
It is also known as "schema independent" or "schema on read" data.
Examples of unstructured data include text, video files, audio files, mobile
activity, social media posts, satellite imagery, surveillance imagery, Etc..
Unstructured data is difficult to deconstruct because it has no predefined
data model, meaning it cannot be organized in relational databases.
Instead, non-relational or NoSQL databases are the best fit for
managing unstructured data.
Another way to manage unstructured data is to have it flow into a data lake
or pool, allowing it to be in its raw, unstructured format.
Example of Unstructured Data
For example, data mining techniques applied to unstructured data from a
retail website can help companies learn customer buying habits and timing,
purchase patterns, sentiment toward a specific product, and much more.
Unstructured data is also key for predictive analytics software.
For example, sensor data attached to industrial machinery can alert
manufacturers of strange activity ahead of time.
Rich media: Social media, entertainment, surveillance, satellite information,
geospatial data, weather forecasting, podcasts
Documents: Invoices, records, web history, emails, productivity applications
Media and entertainment data, surveillance data, geospatial data, audio, weather
data
Internet of things: sensor data, ticker data
Analytics: Machine learning, artificial intelligence (AI)
Challenges of unstructured data
Unstructured data is not the easiest to understand.
Users require a proficient background in data science and machine learning
to prepare, analyze and integrate it with machine learning algorithms.
Unstructured data rests on less authentic and encrypted shared servers,
which are more prone to ransomware and cyber attacks.
Currently, there aren't many tools that can manipulate unstructured data
apart from cloud commodity servers and open-source NoSQL DBMS.
Tools of unstructured data
Comparing Structured Vs. Unstructured
Data
Comparing Structured Vs. Unstructured
Data
Three states of data
•When it comes to protecting confidential information, we find that clients require
different approaches or pose different protection needs.
•Some clients need to protect the information on their mobile computers or laptops in
case they are lost.
•Others want to keep their documentation protected on file servers so that it can even
be protected from improper access by IT staff
Protecting the three states of data
The three states of data
We can consider three states for information or data:
Data at rest: By this term we mean data that is not being accessed and is stored on
a physical or logical medium.
Examples may be files stored on file servers, records in databases, documents on
flash drives, hard disks etc.
Protecting the three states of data
Data in transit: Data that travels through an email, web, collaborative work
applications such as Slack or Microsoft Teams, instant messaging, or any type of
private or public communication channel.
It’s information that is traveling from one point to another.
Data in use: When it is opened by one or more applications for its treatment or and
consumed or accessed by users.
Protection of Data at Rest
Documentation is considered secure at rest when it is encrypted (so that it
requires an unworkable amount of time in a brute-force attack to be decrypted), the
encryption key is not present on the same storage medium, and the key is of sufficient
length and level of randomness to make it immune to a dictionary attack.
In this area we find different data protection technologies.
For example:
Full disk encryption or device:
File-level encryption:
Database Encryption:
Protection through Digital Rights Management (IRM)
MDM (Mobile Device Management)
DLPs (Data Leak Prevention)
CASB (Cloud Access Security Brokers)
https://www.sealpath.com/blog/protecting-the-three-states-of-data/
Protection of Data at Rest
Challenges of Data at Rest Protection
Today’s IT departments are faced with numerous challenges when it comes to protecting idle
documentation:
The data can be stored in different media and equipment
Scattered on mobile devices:
Inability to control cloud storage
Need to comply with different data protection regulations
To overcome these challenges, IT Departments must analyze the main risks they face
regarding the management of their data at rest and select the technology or technologies
prioritizing those that will eliminate or mitigate those most likely and/or of greatest impact to
their organization.
Protection of Data in Transit
We are in the age of digital collaboration and there are now plenty of ways to share
our data with others.
One of the most widely used has traditionally been email. With over 3.9 billion
users using email today (Statista, 2020), these numbers are expected to grow to 4.3
billion users by 2023.
However, we move data through other platforms such as collaborative work like
Slack or Microsoft Teams, through cloud storage applications such as Box,
OneDrive, Dropbox, etc.
Protection of Data in Transit
Among the different technologies to protect data in transit are the following:
Email encryption:
Managed File Transfer (MFT):
DLP (Data Leak Prevention)
CASB (Cloud Access Security Brokers)
In-transit protection with digital rights
Protection of Data in Transit
Challenges of Data Protection in Transit
There are an infinite number of means and channels of communication:
Infinity of Cloud applications to protect
Impossibility to maintain control at the receiving end
Difficulty determining what should be protected and what should not
Protection of Data in Use
when it is accessed by an application for treatment. Normally, behind the
application there is a user who wants to access the data to view it, change it, etc.
In this state, the data is more vulnerable, in the sense that in order to see it, the
user must have been able to access the content decrypted (in the case that it was
encrypted).
To protect the data in use, controls should normally be put in place “before”
accessing the content.
For example:
Identity management tools
Conditional Access or Role Based Access Control (RBAC) tools
Through digital rights protection or IRM
Protection of Data in Use
Challenges of Protecting Data in Use
Most of the tools that control access to data do so before allowing access, but
once validated, as we said above, it is more complex to control what can be done
with the data.
Even if we are limiting permissions on the documentation, if it is being shown to
the user in the application, in a viewer, he can always take a picture,
for example, although we can mitigate this action through dynamic watermarks on
the open document.
Protection of Data in Use
Challenges of Protecting Data in Use
Most of the tools that control access to data do so before allowing access, but
once validated, as we said above, it is more complex to control what can be done
with the data.
Even if we are limiting permissions on the documentation, if it is being shown to
the user in the application, in a viewer, he can always take a picture,
for example, although we can mitigate this action through dynamic watermarks on
the open document.
https://www.sealpath.com/blog/protecting-the-three-states-of-data/
Challenges to IoT Analytics
IoT is bringing more and more devices (things) into the digital fold every day,
which will likely make IoT a multi-trillion dollars’ industry in the near future
the rapid evolution of the IoT market has caused an explosion in the number and
variety of IoT solutions,there are many hurdles and challenges facing a real
reliable IoT model.
There are many challenges facing IoT Analytics including; Data Structures,
Combing Multi Data Formats, The Need to Balance Scale and Speed, Analytics at
the Edge, and IoT Analytics and AI.
Challenges to IoT Analytics
Data structures
Most sensors send out data with a time stamp.
Static alerts based on thresholds are a good starting point for analyzing this
data, they cannot help us advance to diagnostic or predictive or prescriptive phases.
There may be relationships between data pieces collected at specific intervals
of times. In other words, classic time series challenges.
Combining Multiple Data Formats
While time series data have established techniques and processes for handling,
the insights that would really matter cannot come from sensor data alone.
There are usually strong correlations between sensor data and other unstructured
data.
For example, a series of control unit fault codes may result in a specific service
action that is recorded by a mechanic
Challenges to IoT Analytics
The Need to Balance Scale and Speed
Most of the serious analysis for IoT will happen in the cloud, a data center, or
more likely a hybrid cloud and server-based environment.
That is because, despite the elasticity and scalability of the cloud, it may not be
suited for scenarios requiring large amounts of data to be processed in real time.
• For example, moving 1 terabyte over a 10Gbps network takes 13 minutes, which is fine
for batch processing and management of historical data but is not practical for analyzing
real-time event streams, a recent example is data transmitted by autonomous cars especially
in critical situations that required a split second decision.
Challenges to IoT Analytics
IoT Analytics at the Edge
IoT sensors, devices and gateways are distributed across different manufacturing
floors, homes, retail stores, and farm fields, to name just a few locations.
The main difference is that edge analytics applications need to run on edge devices
that may have storage, processing power, or communication limitations.
These applications are optimized to work within these limits report.
Challenges to IoT Analytics
IoT Analytics at the Edge
This is particularly true for large IoT deployments where billions of events may
stream through each second, but systems only need to know an average over time or
be alerted when a trends fall outside established parameters.
The answer is to conduct some analytics on IoT devices or gateways at the edge
and send aggregated results to the central system.
Through such edge analytics, organizations can ensure the timely detection of
important trends or aberrations while significantly reducing network traffic to improve
performance.
Challenges to IoT Analytics
IoT Analytics and AI
The greatest—and as yet largely untapped—power of IoT analysis is to go beyond
reacting to issues and opportunities in real time and instead prepare for them
beforehand.
That is why prediction is central to many IoT analytics strategies, whether to
project demand, anticipate maintenance, detect fraud, predict churn, or segment
customers.
Artificial Intelligence use and improves current statistical models for handling
prediction.
Introduction to IoT Cloud Platforms
IoT cloud platforms do the task of bringing together the capabilities of IoT
devices and cloud platforms to perform end-to-end service.
IoT device has multiple sensors and it is connected to the cloud via gateways.
Various devices are connected to the internet and big data is processed through
IoT devices and get connected to the multiple applications.
IoT cloud is deployed in three different ways such as SaaS (software as a
service), PaaS (platform as a service) and IaaS (infrastructure as a service).
It is built on top of the other generic clouds such as Microsoft, Amazon, Google,
etc.
IoT cloud platforms do the task of stretching and analyzing data and process it
through cloud and devices.
Introduction to IoT Cloud Platforms
1. AWS IoT
2. Microsoft Azure IoT Hub
3. Salesforce IoT
4. Google Cloud IoT
5. IBM Watson IoT Platform
6. Oracle Integrated Cloud for IoT
Why Cloud Computing is Crucial for Large Scale IoT Solutions
Over the last few years, we have witnessed a tremendous increase in IoT products
worldwide.IoT involves many devices that are connected over the internet and are used to carry
out processes and services that enhance human living.
Integration of Cloud Computing into IoT
In reality, cloud computing and IoT are tightly coupled. The growth and
development of IoT and related technologies are mainly dependent on the
availability of cloud services.
Roles of Cloud computing in IoT
Provides Remote Services
Cloud computing provides IoT devices with services such as processing power,
applications, and data storage.
The IoT devices can access these services remotely from any place on the planet as
long as there is internet access.
This relieves the IoT devices from having to depend on on-premise infrastructure.
The cloud offers services in three delivery models, that is, infrastructure as a
service (IaaS), platform as a service (PaaS), and software as a service (SaaS).
Roles of Cloud computing in IoT
2. Allow Scalability
Hosting your application on the cloud gives an unlimited room for scalability,
which cannot be provided by the on-premise infrastructure.
Scaling on the on-premise infrastructure may be very expensive as it would require
buying more hardware, increased configurations, and more deployment time.
When scaling on the cloud is less expensive as it just involves leasing more storage
space.
The cloud also offers flexibility, enabling you to scale up or down the number of
IoT devices and applications that you can use.
Roles of Cloud computing in IoT
3. Provide Security
IoT devices collect all types of data, including sensitive data such as
health, financial and personally identifiable information (PII).
This data requires protection from privacy and integrity breaches by
malicious actors.
Cloud computing provides a secure storage environment for this data
which is monitored all the time.
The cloud also ensures regular updates to their platforms, firmware, and
applications to eliminate known vulnerabilities.
Roles of Cloud computing in IoT
4. Allow Collaboration
Cloud computing has broad network access and thus connects many IoT
developers
Cloud computing enables those developers of IoT products to be able to
collaborate with ease.
The developers use the cloud IoT platforms in building IoT apps and can work
with others in remote locations on one project.
Collaboration ensures timely project delivery and quality products. Additionally,
applications can share data over cloud platforms
Cloud computing in IoT
Cloud computing in IoT
Cloud computing in IoT
Cloud computing in IoT
Cloud computing in IoT
Cloud computing in IoT
Cloud computing in IoT
Cloud computing in IoT
Cloud computing in IoT
The Relationship between Internet of Things and Big
Data
The Internet of Things is an opportunity to streamline operations in many sectors
to enable interaction between machines and humans (M2H), and devices and
machines (M2M).
In most of the cases, sensor-generated data are fed to the big data system for
analysis and final reports are generated out of it.
Hence, this is the main point of inter-relation between the two technologies.
Convergence of Internet of Things, Big Data and Cloud
Computing
Here Cloud computing plays the role of a common workplace for IoT and big data
where IoT is the source of data and big data as a technology is the analytic platform
of the data.
Convergence of Internet of Things, Big Data and Cloud
Computing
An enormous amount of IoT data generation which will feed the big data systems.
To reduce the complexity of data blending in IoT which is one of its criteria to
maximize its benefits.
The concept behind it is – if the IoT applications and data operate in silos we will
not get the full potential out of it.
To get better insights and to make decisions, blending information (data) from
various sources is the best way.
Hence, for the above mentioned two points, we see a clear need for embracing
cloud-based systems for both IoT and Big Data.
This directs towards information-based outcome orientation from product-
orientation.
THANK YOU
References
https://www.bbvaopenmind.com/en/technology/digital-world/five-challenges-to-
iot-analytics-success/
https://www.slideshare.net/slideshow/internet-of-things-with-cloud-computing-
and-m2m-communication/66410016
https://www.scribd.com/document/557370040/unit2-Data-acq-store