Chapter 4 Big Data
Chapter 4 Big Data
Characteristics
1.Radio Frequency (RF) transmission: This is the most widely used wireless transmission technology and
is used in applications such as Wi-Fi networks, Bluetooth devices, and mobile phones. RF transmission uses
radio waves in the range of 3 kHz to 300 GHz to transmit data over short or long distances.
2.Infrared (IR) transmission: This technology uses infrared radiation to transmit signals between devices.
IR transmission is commonly used in remote controls for televisions, DVD players, and other devices.
3.Bluetooth: Bluetooth is a short-range wireless transmission technology that uses low-power radio waves
to connect devices such as smartphones, laptops, and speakers. Bluetooth operates in the 2.4 GHz
frequency band.
4.Near Field Communication (NFC): NFC is a short-range wireless transmission technology that allows two
devices to communicate with each other when they are in close proximity. NFC is commonly used in
contactless payment systems and access control systems.
5.Satellite communication: This technology uses satellites in orbit around the Earth to transmit signals over
long distances. Satellite communication is commonly used for television and radio broadcasting, GPS
navigation, and military communications. PhD. Quang-Phuoc Tran
Ho Chi Minh City
University of Technology
Big data Industry 4.0 technologies in Mechanical Engineering
Data Science
Data science is the process of examining data sets to conclude the information they contain,
increasingly with the aid of specialized systems and software, using techniques, scientific
models, theories, and hypotheses. These three pillars have very much been the mainstay of
data science ever since it started getting embraced by businesses over the past two decades
and should continue to be even in the future.
Data Science expressed like an idea accepted in academia and industry. It’s an intersection of
programming, analytical, and business skills that allows extracting meaningful insights from
data to benefit business growth. However, this is used in social research, scientific & space
programs, government planning, and so on.
Business Acumen in its purest form means running a Business Enterprise. Any business
existing to sell its product or services for a profit incurring some cost and generally having
the functions like HR, Supply Chain, Finance, Sales & marketing to support it.
PhD. Quang-Phuoc Tran
Ho Chi Minh City
University of Technology
Big data Industry 4.0 technologies in Mechanical Engineering
Data Analytics
The Analytics Advancement Model helps
define, identify and illustrate what these
types of analysis mean. In the above
model, we can visualize four types of
analysis possible and show them in terms
of complexity of analysis and volume of
analysis. Volume here means done often.
There is no apparent relationship between
volume and complexity.
Data Analytics
+The descriptive analysis is termed as the first step in any analytical problem-solving project. It is the simplest to
perform in the analysis ladder of knowledge. As a foundational analysis, it aims to answer the question “what
happened?”
+The diagnostic analysis delves a little deeper to answer the question “why it happened?” and helps discover
historical context through data. Continuing with the previous context, the question of “how effective was a
promotional campaign based on the response in different geographies?” This type of analysis can help to identify
causal relationships and anomalies in the data.
+The predictive analysis is a little more complicated than the previous two discussed and answers “what can
happen?” meaning looking into the future. The results from a predictive analysis should be treated as an estimate
of the chance or probability of occurrence of that event. Widely used, a few examples are what the sales volume
will be for the next time period? What is the propensity to buy for a new product release? Should I offer a loan to a
particular applicant or no? This form of analysis uses knowledge and patterns from historical data to predict the
future. In a world of uncertainty that businesses operate in, this is a very powerful tool to plan for the future.
+The prescriptive analysis is almost the other end of the ladder, answering the question “how can it happen?”
For example, businesses need the advice to understand the future course of action to take from all the available
alternatives based on potential return and prescriptive analysis. For example, to achieve the outcome of a specific
sale, it can suggest an alternative mix of investing in various types of promotions or media for advertising. This will
be discussed more in-depth later with applications in supply chain, sales and marketing, and HR functions.
PhD. Quang-Phuoc Tran
Ho Chi Minh City
University of Technology
Big data Industry 4.0 technologies in Mechanical Engineering
Data Analytics
Process of Descriptive Analytics
• Define business metrics: Determine which metrics are important for evaluating performance against business
goals. Goals include to increase revenue, reduce costs, improve operational efficiency and measure productivity.
Each goal must have associated key performance indicators (KPIs) to help monitor achievement.
• Identify data required: Data are located in many different sources within the enterprise, including systems of
record, databases, desktops and shadow IT repositories. To measure data accurately against KPIs, companies must
catalog and prepare the correct data sources to extract the needed data and calculate metrics based on the current
state of the business.
• Extract and prepare data: Data must be prepared for analysis. Deduplication, transformation and cleansing are a
few examples of the data preparation steps that need to occur before analysis. This is often the most time-
consuming and labor-intensive step, requiring up to 80% of an analyst’s time, but it is critical for ensuring accuracy.
• Analyze data: Data analysts can create models and run analyses such as summary statistics, clustering and
regression analysis on the data to determine patterns and measure performance. Key metrics are calculated and
compared with stated business goals to evaluate performance based on historical results. Data scientists often use
open source tools such as R and Python to programmatically analyze and visualize data.
• Present data: Results of the analytics are usually presented to stakeholders in the form of charts and graphs. This
is where data visualization comes into play. Business intelligence tools give users the ability to present data visually
in a way that non-data analysts can understand. Many self-service data visualization tools also enable business
users to create their own visualizations and manipulate the output. PhD. Quang-Phuoc Tran
Ho Chi Minh City
University of Technology
Big data Industry 4.0 technologies in Mechanical Engineering
Data Analytics
Diagnostic Analytics
At this stage, historical data can be measured against other data to answer the question of why
something happened. Thanks to diagnostic analytics, there is a possibility to drill down, to find out
dependencies and to identify patterns. Companies go for diagnostic analytics, as it gives a deep
insight into a particular problem. At the same time, a company should have detailed information at
their disposal, otherwise data collection may turn out to be individual for every issue and time-
consuming.
Let’s take another look at the examples from different industries: a healthcare provider compares
patients’ response to a promotional campaign in different regions; a retailer drills the sales down to
subcategories. Another flashback to our BI projects: in the healthcare industry, customer
segmentation coupled with several filters applied (like diagnoses and prescribed medications)
allowed measuring the risk of hospitalization.
Data Analytics
Data Analytics
4. Statistics: Statistical analysis validates the
assumptions and hypotheses and tests them using
standard statistical models.
5. Modeling: Predictive modeling provides the
ability to automatically create accurate predictive
models about the future. There are also options to
choose the best solution with multi-model
evaluation.
6. Deployment: Predictive model deployment
provides the option to deploy the analytical results
in the everyday decision-making process to obtain
results, reports and output by automating the
decisions based on the modeling.
7. Model monitoring: Models are managed and
monitored to review the model performance to
ensure they are providing the results expected.
PhD. Quang-Phuoc Tran
Ho Chi Minh City
University of Technology
Big data Industry 4.0 technologies in Mechanical Engineering
Data Analytics
Process of Prescriptive Analysis
• Build a business case: Prescriptive analytics are best used when data-driven decision-making goes
beyond human capabilities, such as when there are too many input variables, or data volumes are
high. A business case will help identify whether machine-generated recommendations are
appropriate and trustworthy.
• Define rules: Prescriptive analytics require rules to be codified that can be applied to generate
recommendations. Business rules thus need to be identified and actions defined for each possible
outcome. Rules are decisions that are programmatically implemented in software. The system
receives and analyzes data, then prescribes the next best course of action based on predetermined
parameters. Prescriptive models can be very complex to implement. Appropriate analytic techniques
need to be applied to ensure that all possible outcomes are considered to prevent missteps. This
includes the application of optimization and other analytic techniques in conjunction with rules
management.
• Test, Test, Test: As the intent of prescriptive analytics is to automate the decision-making process,
testing the models to ensure that they are providing meaningful recommendations is imperative to
prevent costly mistakes.
PhD. Quang-Phuoc Tran
Ho Chi Minh City
University of Technology
Big data Industry 4.0 technologies in Mechanical Engineering
Definition
Big Data is defined as a tool and platform that is used to store, process, and analyze
data to identify business insights that were not possible due to the limitation of the
traditional data processing and management technologies. Big Data is also viewed
as a technology for processing huge datasets in distributed scalable platforms.
Big Data Applications in Industry 4.0 Data that cannot be stored and processed in
commodity hardware and greater than one terabyte is called Big Data. The existing
commodity hardware size of computing is only one terabyte where processing and
storage of data are limited.
7V in Bigdata:
- Volume: Volume represents the amount of data that is growing at an exponential rate, i.e. in
Petabytes and Exabytes.
- Velocity: Velocity refers to the speed at which data is growing, very fast. Today, yesterday's data is
considered stale data. Today, social media is a big contributor to the growth of data.
- Variety: Variety refers to the heterogeneity of data types. In other words, the data collected comes
in many formats like video, audio, csv, etc. So these different formats represent many types of
data.
- Veracity: Veracity refers to doubtful or uncertain data of available data due to inconsistent and
incomplete data. Available data can sometimes be messy and hard to trust. With many forms of big
data, quality and accuracy are difficult to control. Volume is often the reason behind the lack of
quality and accuracy of the data.
- Validity: The fifth V denotes the validity of data that is essential in business to identify the validity
of data patterns for planning business strategies.
- Virality: The sixth V denotes the virality aspect of data that is generally used to measure the reach
of data.
- Value: All is well and good to have access to big data but unless we can turn it into a value.
PhD. Quang-Phuoc Tran
Ho Chi Minh City Industry 4.0 technologies in Mechanical Engineering
University of Technology
5V in Bigdata:
Sample :
The characteristics of Big Data for the coronavirus pandemic is mapped below.
+Volume: Huge volume of data is evolved every hour related to a patient affected, illness conditions,
precaution measures, diagnosis, and hospital facilities.
+Velocity: The information about people affected and the ill effects of COVID- 19 is streaming in
nature which is evolving dynamically.
+Variety: Huge volume of data related to COVID 19 is accumulated as structured data in patient
database, demographics of citizens, clinical diagnosis, travel data, genomic studies, and drug targets.
Unstructured data for COVID- 19 is voluminous in social media platforms of Twitter, Facebook, and
WhatsApp to share preventive measures in the form of text, audio, video, and related chats. Role of Big
Data Analytics
+Veracity and Virality: The information of preventive cure mechanism mentioned in social media
platforms are inconsistent and viral leading to uncertainty among people.
+Validity and Value: Measuring the validity and the value of the content available in the digital globe
for the pandemic has become a challenge.
To create big data for a manufacturing process, you can follow these steps:
1.Data Collection: Collect data from various sources such as sensors, machines,
production systems, and databases. This data can include production data,
machine performance data, quality control data, and supply chain data. When
using big data in production, the data integration process is critical to ensure that
the data is correctly formatted and usable by big data technologies.
2.Data Integration: Integrate the data from different sources into a centralized
repository such as a Hadoop cluster or a data lake.
Step 1: Collect data from different sources
Step 2: Standardize data
Step 3: Integrate data
Step 4: Ensure data consistency and format
Step 5: Store data
PhD. Quang-Phuoc Tran
Ho Chi Minh City Industry 4.0 technologies in Mechanical Engineering
University of Technology
3. Data Processing:
- Data processing is the process of processing data to convert source data into
useful and reliable information. It includes activities such as collecting,
storing, organizing, classifying, calculating, analyzing, and presenting
information in reports, charts, or other formats.
- Data processing can be performed using various means and tools, including
data processing software, database systems, data query tools, algorithms and
processing techniques. different data. The purpose of data processing is to
help managers, researchers or other organizations find useful information
from data and make the right decisions.
- Use big data technologies such as Apache Spark or Apache Flink to
process and lean the data, identify patterns and anomalies.
PhD. Quang-Phuoc Tran
Ho Chi Minh City Industry 4.0 technologies in Mechanical Engineering
University of Technology
4. Data Analysis:
Data analysis can be applied to many different fields such as business, science,
health, education, politics... Each field has its own goals and methods of data
analysis. However, in general, data analysis aims to solve specific problems
using existing or newly collected data.
Analyze the data using tools such as Apache Hive, Apache Impala, or Apache
Drill to gain insights into the manufacturing process and make data-driven
decisions.
5. Data Visualization:
Common types of data visualization formats: Column chart, Bar chart, Line
graph, Two-axis chart, Mekko chart, Pie chart, Bubble chart, Domain chart,
Scatter chart, Heat map, Scatter plot diagram, Area chart….
Visualize the results of the data analysis using tools such as Apache Zeppelin,
Tableau, or PowerBI.
PhD. Quang-Phuoc Tran
Ho Chi Minh City Industry 4.0 technologies in Mechanical Engineering
University of Technology
5. Data Management:
- Manage the data using tools such as Apache HBase or Apache Cassandra
to ensure data consistency, reliability, and security.