KEMBAR78
Introduction - Importance of Data | PDF | Computer Security | Security
0% found this document useful (0 votes)
6 views40 pages

Introduction - Importance of Data

Uploaded by

suhanasingh2404
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views40 pages

Introduction - Importance of Data

Uploaded by

suhanasingh2404
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Introduction – Importance of Data

“Data is the new oil.” Today data is everywhere in every field. Whether you are a data
scientist, marketer, businessman, data analyst, researcher, or you are in any other
profession, you need to play or experiment with raw or structured data. This data is so
important for us that it becomes important to handle and store it properly, without any
error. While working on these data, it is important to know the types of data to process
them and get the right results. There are two types of data: Qualitative and
Quantitative data, which are further classified into:

The data is classified into four categories:

● Nominal data.
● Ordinal data.
● Discrete data.
● Continuous data.

So there are 4 Types of Data: Nominal, Ordinal, Discrete, and Continuous.

Now business runs on data, and most companies use data for their insights to create
and launch campaigns, design strategies, launch products and services or try out
different things. According to a report, today, at least 2.5 quintillion bytes of data are
produced per day.

Also read: 22 Top Data Science Books – Learn Data Science Like an Expert
Types of Data

Qualitative or Categorical Data

Qualitative or Categorical Data is data that can’t be measured or counted in the form of
numbers. These types of data are sorted by category, not by number. That’s why it is
also known as Categorical Data. These data consist of audio, images, symbols, or text.
The gender of a person, i.e., male, female, or others, is qualitative data.

Qualitative data tells about the perception of people. This data helps market
researchers understand the customers’ tastes and then design their ideas and
strategies accordingly.

The other examples of qualitative data are :

● What language do you speak


● Favorite holiday destination
● Opinion on something (agree, disagree, or neutral)
● Colors

The Qualitative data are further classified into two parts :

Nominal Data

Nominal Data is used to label variables without any order or quantitative value. The
color of hair can be considered nominal data, as one color can’t be compared with
another color.

The name “nominal” comes from the Latin name “nomen,” which means “name.” With
the help of nominal data, we can’t do any numerical tasks or can’t give any order to sort
the data. These data don’t have any meaningful order; their values are distributed into
distinct categories.

Examples of Nominal Data :

● Colour of hair (Blonde, red, Brown, Black, etc.)


● Marital status (Single, Widowed, Married)
● Nationality (Indian, German, American)
● Gender (Male, Female, Others)
● Eye Color (Black, Brown, etc.)
Ordinal Data

Ordinal data have natural ordering where a number is present in some kind of order by
their position on the scale. These data are used for observation like customer
satisfaction, happiness, etc., but we can’t do any arithmetical tasks on them.

Ordinal data is qualitative data for which their values have some kind of relative
position. These kinds of data can be considered “in-between” qualitative and
quantitative data. The ordinal data only shows the sequences and cannot use for
statistical analysis. Compared to nominal data, ordinal data have some kind of order
that is not present in nominal data.

Examples of Ordinal Data :

● When companies ask for feedback, experience, or satisfaction on a scale of 1


to 10
● Letter grades in the exam (A, B, C, D, etc.)
● Ranking of people in a competition (First, Second, Third, etc.)
● Economic Status (High, Medium, and Low)
● Education Level (Higher, Secondary, Primary)

Difference between Nominal and Ordinal Data

Nominal Data Ordinal Data

Nominal data can’t be quantified, Ordinal data gives some kind of sequential
neither they have any intrinsic ordering order by their position on the scale

Nominal data is qualitative data or Ordinal data is said to be “in-between”


categorical data qualitative data and quantitative data

They don’t provide any quantitative They provide sequence and can assign
value, neither can we perform any numbers to ordinal data but cannot perform the
arithmetical operation arithmetical operation

Nominal data cannot be used to Ordinal data can help to compare one item with
compare with one another another by ranking or ordering
Examples: Eye color, housing style, Examples: Economic status, customer
gender, hair color, religion, marital satisfaction, education level, letter grades, etc
status, ethnicity, etc

Quick Check – Introduction to Data Science

Quantitative Data

Quantitative data can be expressed in numerical values, making it countable and


including statistical data analysis. These kinds of data are also known as Numerical
data. It answers the questions like “how much,” “how many,” and “how often.” For
example, the price of a phone, the computer’s ram, the height or weight of a person,
etc., falls under quantitative data.

Quantitative data can be used for statistical manipulation. These data can be
represented on a wide variety of graphs and charts, such as bar graphs, histograms,
scatter plots, boxplots, pie charts, line graphs, etc.

Examples of Quantitative Data :

● Height or weight of a person or object


● Room Temperature
● Scores and Marks (Ex: 59, 80, 60, etc.)
● Time

The Quantitative data are further classified into two parts :

Discrete Data

The term discrete means distinct or separate. The discrete data contain the values that
fall under integers or whole numbers. The total number of students in a class is an
example of discrete data. These data can’t be broken into decimal or fraction values.

The discrete data are countable and have finite values; their subdivision is not possible.
These data are represented mainly by a bar graph, number line, or frequency table.

Examples of Discrete Data :

● Total numbers of students present in a class


● Cost of a cell phone
● Numbers of employees in a company
● The total number of players who participated in a competition
● Days in a week

Continuous Data

Continuous data are in the form of fractional numbers. It can be the version of an
android phone, the height of a person, the length of an object, etc. Continuous data
represents information that can be divided into smaller levels. The continuous variable
can take any value within a range.

The key difference between discrete and continuous data is that discrete data contains
the integer or whole number. Still, continuous data stores the fractional numbers to
record different types of data such as temperature, height, width, time, speed, etc.

Examples of Continuous Data :

● Height of a person
● Speed of a vehicle
● “Time-taken” to finish the work
● Wi-Fi Frequency
● Market share price

Difference between Discrete and Continuous Data

Discrete Data Continuous Data

Discrete data are countable and finite; Continuous data are measurable; they are in
they are whole numbers or integers the form of fractions or decimal

Discrete data are represented mainly by Continuous data are represented in the form
bar graphs of a histogram

The values cannot be divided into The values can be divided into subdivisions
subdivisions into smaller pieces into smaller pieces

Discrete data have spaces between the Continuous data are in the form of a
values continuous sequence
Examples: Total students in a class, Example: Temperature of room, the weight
number of days in a week, size of a shoe, of a person, length of an object, etc
etc

Conclusion

In this article, we have discussed the data types and their differences. Working on data
is crucial because we need to figure out what kind of data it is and how to use it to get
valuable output out of it. It is also important to know what kind of plot is suitable for
which data category; it helps in data analysis and visualization. Working with data
requires good data science skills and a deep understanding of different types of data
and how to work with them.

Different types of data are used in research, analysis, statistical analysis, data
visualization, and data science. This data helps a company analyze its business, design
its strategies, and help build a successful data-driven decision-making process.

If these data-driven topics got you interested in pursuing professional courses or a


career in the field of Data Science. Log on to our website and explore courses delivered
by industry experts.
Roles in Data Science 👍
Here's a breakdown of some key data science roles:

Data Scientist:
Designs and implements data modeling processes, develops algorithms, and
performs custom analysis to solve business problems.
Data Analyst:
Focuses on collecting, cleaning, and analyzing data to identify trends and inform
business decisions.
Data Engineer:
Builds and manages the infrastructure for data storage, processing, and access,
ensuring data quality and availability.
Machine Learning Engineer:
Develops and deploys machine learning models, optimizing them for
performance and scalability.
Data Architect:
Designs and manages the overall data architecture of an organization, ensuring it
aligns with business requirements.
Business Intelligence (BI) Analyst:
Focuses on reporting, dashboard creation, and data visualization to support
business intelligence efforts.
Data Strategist:
Defines how data can serve business goals and develops data strategies to
improve business performance.
Data Product Manager:
Oversees the success of data-driven products, defining product vision and
strategy.
Database Administrator:
Manages and maintains the systems where data is stored, ensuring security,
accessibility, and data integrity, according to Syracuse University.
Statistician:
Uses statistical methods to analyze data, develop models, and interpret results.
AI Data Scientist:
Specializes in applying AI and machine learning techniques to data, including
natural language processing and generative AI.
Data munging
Data munging is the general procedure for transforming data from
erroneous or unusable forms, into useful and use-case-specific ones.
Without some degree of munging, whether performed by automated
systems or specialized users, data cannot be ready for any kind of
downstream consumption.

Data Wrangling
Data wrangling, also referred to as data munging, is the process of converting
and mapping data from one raw format into another. The purpose of this is to
prepare the data in a way that makes it accessible for effective use further
down the line. Not all data is created equal, therefore it’s important to organize
and transform your data in a way that can be easily accessed by others.

While an activity such as data wrangling might sound like a job for someone in
the Wild West, it’s an integral part of the classic data pipeline and ensuring
data is prepared for future use. A data wrangler is a person responsible for
performing the process of wrangling.

Benefits of Data Wrangling


Although data wrangling is an essential part of preparing your data for use,
the process yields many benefits. Benefits include:

● Enhances ease of access to data


● Faster time to insights
● Improved efficiency when it comes to data-driven decision making
Data Cleaning
Data cleaning, also referred to as data cleansing, is the process of finding and
correcting inaccurate data from a particular data set or data source. The
primary goal is to identify and remove inconsistencies without deleting the
necessary data to produce insights. It’s important to remove these
inconsistencies in order to increase the validity of the data set.

Cleaning encompasses a multitude of activities such as identifying duplicate


records, filling empty fields and fixing structural errors. These tasks are crucial
for ensuring the quality of data is accurate, complete, and consistent.
Cleaning assists in fewer errors and complications further downstream. For a
deeper dive into the best practices and techniques for performing these tasks,
look to our Ultimate Guide to Cleaning Data.

Benefits of Data Cleaning


There is a wide range of benefits that come with cleaning data that can lead to
increased operational efficiency. Properly cleansing your data before use
leads to benefits such as:

● Elimination of errors
● Reduced costs associated with errors
● Improves the integrity of data
● Ensures the highest quality of information for decision making

When comparing the benefits of each, it’s clear that the goals behind data
wrangling and data cleaning are consistent with one another. They each aim
at improving the ease of use when it comes to working with data, making
data-driven decision making faster and more effective as a result.

What’s the Difference Between Data Wrangling and


Data Cleaning?
While the methods might be similar in nature, data wrangling and data
cleaning remain very different processes. Data cleaning focuses on removing
inaccurate data from your data set whereas data wrangling focuses on
transforming the data’s format, typically by converting “raw” data into another
format more suitable for use. Data cleaning enhances the data’s accuracy and
integrity while wrangling prepares the data structurally for modeling.
Data Security Issues :

The ever-increasing data presents both opportunities and challenges. While the
prospect of better analysis allows companies to make better decisions, there are
certain disadvantages like it brings security issues that could get companies in the
soup while working with sensitive information. Here are some of the Big Data
Security challenges that companies should mitigate:

● Big Data Security Issues: Data Storage


● Big Data Security Issues: Fake Data
● Big Data Security Issues: Data Privacy
● Big Data Security Issues: Data Management
● Big Data Security Issues: Data Access Control
● Big Data Security Issues: Data Poisoning
● Big Data Security Issues: Employee Theft

Big Data Security Issues: Data Storage - Businesses are adopting Cloud Data
Storage to move their data easily to expedite business operations. However, the
risks involved are exponential with security issues. Even the slightest mistake in
controlling the access of data can allow anyone to get a host of sensitive data. As a
result, big tech companies embrace both on-premise and Cloud Data Storage to
obtain security as well as flexibility.

While mission-critical information can be stored in on-premise databases, less


sensitive data is kept in the cloud for ease of use. However, to implement security
policies in on-premise databases, companies require cybersecurity experts.
Although it increases the cost of managing data in on-premise databases,
companies must not take security risks for granted by storing every data in the
cloud.

Big Data Security Issues: Fake Data Fake Data generation poses a severe
threat to businesses as it consumes time that otherwise could be spent to identify or
solve other pressing issues. There is more scope for leveraging inaccurate
information on a very large scale, as assessing individual data points can be a
daunting task for companies.
False flags for fake Data can also drive unnecessary actions that can potentially
lower production or other critical processes required for running businesses. One
way to avoid this is to ensure that companies should be critical of the data they are
working on for enhancing business processes. An ideal approach is to validate the
data sources by periodic assessments and evaluate Machine Learning models with
diverse test datasets to find anomalies.

Big Data Security Issues: Data Privacy Data Privacy is a big challenge in this
digital world. It aims to safeguard personal or sensitive information from
cyberattacks, breaches, and intentional or unintentional data loss. Businesses must
follow stricter Data Privacy principles with the help of access management services
in the cloud, including very rigid privacy compliance, to strengthen Data Protection. It
is best to follow a few rules alongside implementing one or more Data Security
technologies. The general rules are knowing your data, having more grip over your
data stores and backup, safeguarding your network against unauthorized access,
conducting regular risk assessments, and training the users regularly about Data
Privacy and Data Security.

Big Data Security Issues: Data Management A security breach can have
crushing consequences on businesses, including the vulnerability of critical business
information to a completely compromised database. Deploying highly secured
databases is vital to ensure data security at all levels. A superior Database
Management System comes with various access controls. While it is advisable to
follow rigid and rigorous physical security practices, it is even more essential to
follow extensive software-based security measures to safeguard data storage. A few
methods to effectively achieve this goal are—practicing data encryption, data
segmenting and partitioning, securing on-the-move, and implementing a trusted
server. Besides, a few security tools can integrate with databases to automatically
monitor data sharing and notify businesses when data has been compromised.

Big Data Security Issues: Data Access Control Controlling which data users
can view or edit enables companies to ensure not only data integrity but also
preserves its privacy. But managing access control is not straightforward, especially
in larger companies that have thousands of employees. However, a shift from on-
premise solutions to cloud-based services has simplified the process of working with
Identity Access Management (IAM). IAM does the job of controlling data flow via
identification, authentication, and authorization. Following relevant ISO standards is
a good starting place to ensure organizations meet the best IAM practices.
Big Data Security Issues: Data Poisoning Today, there are several Machine
Learning solutions like chatbots that are trained on a colossal amount of data. The
advantages of such solutions are that they keep on improving as users interact.
However, this leads to Data Poisoning, a technique to attack Machine Learning
models’ training data. It can be considered as an integrity attack as the tampered
training data can affect the model’s ability to provide correct predictions. The results
can be catastrophic, ranging from logic corruption to Data Manipulation and Data
Injection. The best way to beat the evasion is through outlier detection, wherein the
injected elements in the training pool can get separated from the existing data
distribution.

Big Data Security Issues: Employee Theft Advance data culture has allowed
every employee to hold a certain level of critical business information. While it
boosts data democratization, the risk of an employee leaking sensitive information,
intentionally or unintentionally, is high. Employee Theft is prevalent not only in big
tech companies but also in startups. To avoid Employee Theft, companies have to
implement legal policies along with securing the network with a virtual private
network. In addition, companies can use a Desktop as a Service (DaaS) to eliminate
the functionalities of data stored in local drives.

Conclusion Based on the enlisted concerns, it is apparent as to why enterprises


are seeing Big Data Security as a major concern. However, the good news is that
with the right information, resources, skilled manpower, detailed coping strategy,
and commitment towards data integrity and privacy, many of such challenges can be
easily addressed. The absence of threats to Big Data will lead businesses to
achieve their ultimate goal of harnessing data for better customer experience and
enhanced customer retention. Extracting complex data from a diverse set of data
sources can be a challenging task and this is where Hevo saves the day! Hevo
offers a faster way to move data from Databases or SaaS applications into your
Data Warehouse to be visualized in a BI tool.

Key Data Security Issues in Data Science:


● Confidentiality: Protecting sensitive information from unauthorized access and
disclosure.
● Integrity: Ensuring data remains accurate and unaltered.
● Availability: Guaranteeing that data is accessible when needed.
● Data Breaches: Unauthorized access and disclosure of sensitive data,
potentially leading to financial losses, reputational damage, and legal penalties.
● Data Loss: Accidental deletion, hardware malfunctions, or software errors
leading to data loss and operational disruptions.
● Non-compliance: Failure to adhere to data protection regulations, resulting in
fines and legal issues.
● Access Control: Ensuring only authorized users can access specific data, which
can be challenging with large datasets.
● Data Storage and Management: Securing data at rest, in transit, and in use, as
well as managing data copies and lifecycle management.
● Data Poisoning: Tampering with data used for training machine learning
models, leading to inaccurate or biased results.
● Insecure APIs: Weaknesses in APIs can expose data to unauthorized access
and manipulation.
● Insider Threats: Malicious or negligent actions by individuals within an
organization.
● Advanced Persistent Threats (APTs): Sophisticated and persistent attacks that
can bypass traditional security measures.
● Human Error: Mistakes such as sending sensitive data to the wrong recipients
or accidentally deleting data.
● Evolving Threats: The constant emergence of new cyber threats requires
continuous adaptation of security measures.
● Resource Intensity: Implementing robust data security measures can be
resource-intensive, requiring specialized skills and infrastructure.
● False Positives: Security systems may generate a high number of false
positives, overwhelming security teams and potentially hindering legitimate
activity.

Applications of Data Science


The role of Data Science Applications hasn’t evolved overnight. Thanks to faster
computing and cheaper storage, we can now predict outcomes in minutes, which
could take several human hours to process.

A Data Scientist gets home a whopping $124,000 a year and they owe it to the
deficiency of skilled professionals in this field. Applications that build upon the
concepts of Data Science, exploring various domains such as the following:

● Fraud and Risk Detection


● Healthcare
● Internet Search
● Targeted Advertising
● Website Recommendations
● Advanced Image Recognition
● Speech Recognition
● Airline Route Planning
● Gaming
● Augmented Reality

Fraud and Risk Detection


The earliest applications of data science were in Finance. Companies were fed up of
bad debts and losses every year. However, they had a lot of data which use to get
collected during the initial paperwork while sanctioning loans. They decided to bring in
data scientists in order to rescue them from losses.

Over the years, banking companies learned to divide and conquer data via customer
profiling, past expenditures, and other essential variables to analyze the probabilities of
risk and default. Moreover, it also helped them to push their banking products based on
customer’s purchasing power.

Healthcare
The healthcare sector, especially, receives great benefits from data science
applications.

1. Medical Image Analysis


Procedures such as detecting tumors, artery stenosis, organ delineation employ various
different methods and frameworks like MapReduce to find optimal parameters for tasks
like lung texture classification. It applies machine learning methods, support vector
machines (SVM), content-based medical image indexing, and wavelet analysis for solid
texture classification.
2. Genetics & Genomics
Data Science applications also enable an advanced level of treatment personalization
through research in genetics and genomics. The goal is to understand the impact of the
DNA on our health and find individual biological connections between genetics,
diseases, and drug response. Data science techniques allow integration of different
kinds of data with genomic data in the disease research, which provides a deeper
understanding of genetic issues in reactions to particular drugs and diseases. As soon
as we acquire reliable personal genome data, we will achieve a deeper understanding
of the human DNA. The advanced genetic risk prediction will be a major step towards
more individual care.

3. Drug Development
The drug discovery process is highly complicated and involves many disciplines. The
greatest ideas are often bounded by billions of testing, huge financial and time
expenditure. On average, it takes twelve years to make an official submission.

Data science applications and machine learning algorithms simplify and shorten this
process, adding a perspective to each step from the initial screening of drug compounds
to the prediction of the success rate based on the biological factors. Such algorithms
can forecast how the compound will act in the body using advanced mathematical
modeling and simulations instead of the “lab experiments”. The idea behind the
computational drug discovery is to create computer model simulations as a biologically
relevant network simplifying the prediction of future outcomes with high accuracy.

4. Virtual assistance for patients and customer support


Optimization of the clinical process builds upon the concept that for many cases it is not
actually necessary for patients to visit doctors in person. A mobile application can give a
more effective solution by bringing the doctor to the patient instead.

The AI-powered mobile apps can provide basic healthcare support, usually chatbots.
You simply describe your symptoms, or ask questions, and then receive key information
about your medical condition derived from a wide network linking symptoms to causes.
Apps can remind you to take your medicine on time, and if necessary, assign an
appointment with a doctor.

This approach promotes a healthy lifestyle by encouraging patients to make healthy


decisions, saves their time waiting in line for an appointment, and allows doctors to
focus on more critical cases.

Internet Search
Now, this is probably the first thing that strikes your mind when you think Data Science
Applications.

When we speak of search, we think ‘Google’. Right? But there are many other search
engines like Yahoo, Bing, Ask, AOL, and so on. All these search engines (including
Google) make use of data science algorithms to deliver the best result for our searched
query in a fraction of seconds. Considering the fact that, Google processes more than
20 petabytes of data every day.Had there been no data science, Google wouldn’t have
been the ‘Google’ we know today.

Targeted Advertising
If you thought Search would have been the biggest of all data science applications, here
is a challenger – the entire digital marketing spectrum. Starting from the display banners
on various websites to the digital billboards at the airports – almost all of them are
decided by using data science algorithms.

This is the reason why digital ads have been able to get a lot higher CTR (Call-Through
Rate) than traditional advertisements. They can be targeted based on a user’s past
behavior.

This is the reason why you might see ads of Data Science Training Programs while I
see an ad of apparels in the same place at the same time.

Website Recommendations
Aren’t we all used to the suggestions about similar products on Amazon? They not only
help you find relevant products from billions of products available with them but also add
a lot to the user experience.

A lot of companies have fervidly used this engine to promote their products in
accordance with user’s interest and relevance of information. Internet giants like
Amazon, Twitter, Google Play, Netflix, Linkedin, IMDb, and much more use this system
to improve the user experience. The recommendations are made based on previous
search results for a user.
Next

Advanced Image Recognition


You upload your image with friends on Facebook and you start getting suggestions to
tag your friends. This automatic tag suggestion feature uses face recognition algorithm.
In their latest update, Facebook has outlined the additional progress they’ve made in
this area, making specific note of their advances in image recognition accuracy and
capacity.

“We’ve witnessed massive advances in image classification (what is in the image?) as


well as object detection (where are the objects?), but this is just the beginning of
understanding the most relevant visual content of any image or video. Recently we’ve
been designing techniques that identify and segment each and every object in an
image, a key capability that will enable entirely new applications.”
In addition, Google provides you with the option to search for images by uploading
them. It uses image recognition and provides related search results.
Speech Recognition
Some of the best examples of speech recognition products are Google Voice, Siri,
Cortana etc. Using the speech-recognition feature, even if you aren’t in a position to
type a message, your life wouldn’t stop. Simply speak out the message and it will be
converted to text. However, at times, you would realize, speech recognition doesn’t
perform accurately.

Airline Route Planning


Airline Industry across the world is known to bear heavy losses. Except for a few airline
service providers, companies are struggling to maintain their occupancy ratio and
operating profits. With high rise in air-fuel prices and need to offer heavy discounts to
customers has further made the situation worse. It wasn’t for long when airlines
companies started using data science to identify the strategic areas of improvements.
Now using data science, the airline companies can:

1. Predict flight delay


2. Decide which class of airplanes to buy
3. Whether to directly land at the destination or take a halt in between (For
example, A flight can have a direct route from New Delhi to New York.
Alternatively, it can also choose to halt in any country.)
4. Effectively drive customer loyalty programs

Southwest Airlines, Alaska Airlines are among the top companies who’ve embraced
data science to bring changes in their way of working.

You can get a better insight into it by referring to this video by our team, which vividly
speaks of all the various fields conquered by Data Science Applications.

Gaming
Games are now designed using machine learning algorithms that improve/upgrade
themselves as the player moves up to a higher level. In motion gaming also, your
opponent (computer) analyzes your previous moves and accordingly shapes up its
game. EA Sports, Zynga, Sony, Nintendo, Activision-Blizzard have led the gaming
experience to the next level using data science.

Augmented Reality
This is the final of the data science applications which seem most exciting in the future.
Augmented reality.

Data Science and Virtual Reality do have a relationship, considering a VR headset


contains computing knowledge, algorithms and data to provide you with the best
viewing experience. A very small step towards this is the high-trending game of
Pokemon GO. The ability to walk around things and look at Pokemon on walls, streets,
things that aren’t really there. The creators of this game used the data from Ingress, the
last app from the same company, to choose the locations of the Pokemon and gyms.

However, Data Science makes more sense once VR economy becomes accessible in
terms of pricing, and consumers use it often like other apps.

Though, not much has been revealed about them except the prototypes, and neither do
we know when they would be available for a common man’s disposal. Let’s see, what
amazing data science applications the future holds for us!

Data Science Applications link - (good one) -

https://builtin.com/data-science/data-science-applications-examples

DATA SCIENCE APPLICATIONS AND


EXAMPLES

● Healthcare: Data science can identify and predict disease, and

personalize healthcare recommendations.


● Transportation: Data science can optimize shipping routes in real-time.

● Sports: Data science can accurately evaluate athletes’ performance.

● Government: Data science can prevent tax evasion and predict

incarceration rates.
● E-commerce: Data science can automate digital ad placement.

● Gaming: Data science can improve online gaming experiences.

● Social media: Data science can create algorithms to pinpoint compatible

partners.
● Fintech: Data science can help create credit reports and financial

profiles, run accelerated underwriting and create predictive models


based on historical payroll data.

Healthcare Data Science Applications


Back in 2008, data science made its first major mark on the healthcare
industry. Google staffers discovered they could map flu outbreaks in real time
by tracking location data on flu-related searches. The CDC’s existing maps of
documented flu cases, FluView, was updated only once a week. Google
quickly rolled out a competing tool with more frequent updates: Google Flu
Trends.

But it didn’t work. In 2013, Google estimated about twice the flu cases that
were actually observed. The tool’s secret methodology seemed to involve
finding correlations between search term volume and flu cases. That meant
the Flu Trends algorithm sometimes put too much stock in seasonal search
terms like “high school basketball.”

Even so, it demonstrated the serious potential of data science in healthcare.


Here are some examples of more powerful and precise healthcare tools
developed in the years after Google’s initial attempt. All of them are powered
by data science.

1. IDENTIFYING CANCER TUMORS


Google hasn’t abandoned applying data science to healthcare. In fact, the
company developed a tool, LYNA, for identifying breast cancer tumors that
metastasize to nearby lymph nodes. That can be difficult for the human eye to
see, especially when the new cancer growth is small. In one trial, LYNA —
short for Lymph Node Assistant —accurately identified metastatic cancer 99
percent of the time using its machine-learning algorithm. More testing is
required, however, before doctors can use it in hospitals.
2. TRACKING MENSTRUAL CYCLES
The popular Clue app employs data science to forecast users’ menstrual
cycles and reproductive health by tracking cycle start dates, moods, stool
type, hair condition and many other metrics. Behind the scenes, data
scientists mine this wealth of anonymized data with tools like Python and
Jupyter’s Notebook. Users are then algorithmically notified when they’re
fertile, on the cusp of a period or at an elevated risk for conditions like an
ectopic pregnancy.

3. PERSONALIZING TREATMENT PLANS


Oncora’s software uses machine learning to create personalized
recommendations for current cancer patients based on data from past ones.
Healthcare facilities using the company’s platform include UT Health San
Antonio and Scripps Health. Their radiology team collaborated with Oncora
data scientists to mine 15 years’ worth of data on diagnoses, treatment plans,
outcomes and side effects from more than 50,000 cancer records. Based on
this data, Oncora’s algorithm learned to suggest personalized chemotherapy
and radiation regimens.

4. CLEANING CLINICAL TRIAL DATA


Veeva is a cloud software company that provides data and software solutions
for the healthcare industry. The company’s reach extends through clinical,
regulatory and commercial medical fields. Veeva’s Vault EDC uses data
science to clean clinical trial findings and help medical professionals make
adjustments mid-study.

Transportation and Logistics Data


Science Examples
Driving plays a central role in American life. The Supreme Court has called it
“a virtual necessity,” and the vast majority of Americans — about 132 million
households — own or lease cars. In 2021, American automobiles burned
about 135 billion gallons of gasoline. Unfortunately, this habit contributes to
climate change, but data science is here to help.

While both biking and public transit can curb driving-related emissions, data
science can do the same by optimizing road routes. And though data-driven
route adjustments are often small, they can help save thousands of gallons of
gas when spread across hundreds of trips and vehicles — even among
companies that aren’t explicitly eco-focused. Here are some examples of data
science hitting the road.

5. MODELING TRAFFIC PATTERNS


StreetLight uses data science to model traffic patterns for cars, bikes and
pedestrians on North American streets. Based on a monthly influx of trillions
of data points from smartphones, in-vehicle navigation devices and more,
Streetlight’s traffic maps stay up-to-date. They’re more granular than
mainstream maps apps too: they can identify groups of commuters that use
multiple transit modes to get to work, like a train followed by a scooter. The
company’s maps inform various city planning enterprises, including commuter
transit design.

6. OPTIMIZING FOOD DELIVERY


The data scientists at UberEats have a fairly simple goal: getting hot food
delivered quickly. Making that happen across the country though, takes
machine learning, advanced statistical modeling and staff meteorologists. In
order to optimize the full delivery process, the team has to predict how every
possible variable — from storms to holiday rushes — will impact traffic and
cooking time.

7. IMPROVING PACKAGE DELIVERY


UPS uses data science to optimize package transport from drop-off to
delivery. The company’s integrated navigation system ORION helps drivers
choose over 66,000 fuel-efficient routes. ORION has saved UPS
approximately 100 million miles and 10 million gallons of fuel per year with the
use of advanced algorithms, AI and machine learning. The company plans to
continue to update its ORION system, with the last version having been rolled
out in 2021. The latest update allowed drivers to reduce their routes by two to
four miles.

Sports Data Science Applications


In the early 2000s, the Oakland Athletics’ recruitment budget was so small the
team couldn’t recruit quality players. At least, they couldn’t recruit players any
other teams considered quality. So the general manager redefined quality,
using in-game statistics other teams ignored to predict player potential and
assemble a strong team despite their budget.

His strategy helped the A’s make the playoffs, and it snowballed from there.
Author Michael Lewis wrote a book about the phenomenon, Moneyball. Since
then, the global market for sports analytics has grown significantly and is
expected to reach 8.4 billion by 2026. Here are some examples of how data
science is transforming sports.

8. MAKING PREDICTIVE INSIGHTS IN BASKETBALL


RSPCT’s shooting analysis system, adopted by NBA and college teams,
relies on a sensor on a basketball hoop’s rim, whose tiny camera tracks
exactly when and where the ball strikes on each basket attempt. It funnels that
data to a device that displays shot details in real time and generates predictive
insights.

“Based on our data… We can tell [a shooter], ‘If you are about to take the last
shot to win the game, don’t take it from the top of the key, because your best
location is actually the right corner,’” RSPCT COO Leo Moravtchik told SVG
News.

9. TRACKING PHYSICAL DATA FOR ATHLETES


WHOOP makes wearable devices that track athletes’ physical data like
resting heart rate, sleep cycle and respiratory rate. The goal is to help athletes
understand when to push their training and when to rest — and to make sure
they’re taking the necessary steps to get the most out of their body.
Professional athletes like Olympic sprinter Gabby Thomas, Olympic golfer
Nelly Korda and PGA golfer Nick Watney are among the WHOOPS’ users,
according to the company’s website.

10. GATHERING PERFORMANCE METRICS FOR


SOCCER PLAYERS
Trace provides soccer coaches with recording gear and an AI system that
analyzes game film. Players wear a tracking device, called a Tracer, while its
specially designed camera records the game. The AI bot then takes that
footage and stitches together all of the most important moments in a game —
from shots on goal to defensive lapses and more. This technology allows
coaches and players to have more detailed insights from game film. Beyond
stitching together clips, the software also provides performance metrics and a
field heat map.

Government Data Science Applications


Though few think of the U.S. government as “extremely online,” its agencies
can access heaps of data. Not only do its agencies maintain their own
databases of ID photos, fingerprints and phone activity, government agents
can get warrants to obtain data from any American data warehouse.
Investigators often reach out to Google’s warehouse, for instance, to get a list
of the devices that were active at the scene of a crime.

Though many view such activity as an invasion of privacy, the United States
has minimal privacy regulations, and the government’s data well won’t run dry
anytime soon. Here are some of the ways government agencies apply data
science to vast stores of data.

11. PREDICTING RECIDIVISM WITHIN INCARCERATED


POPULATIONS
Widely used by the American judicial system and law enforcement, Equivant’s
Northpointe software suite attempts to gauge an incarcerated person’s risk of
reoffending. Its algorithms predict that risk based on a questionnaire that
covers the person’s employment status, education level and more. No
questionnaire items explicitly address race, but according to a ProPublica
analysis that was disputed by Northpointe, the Equivant algorithm pegs Black
people as higher recidivism risks than white people 77 percent of the time —
even when they’re the same age and gender, with similar criminal records.
ProPublica also found that Equivant's predictions were 71 percent accurate.

12. MINING DATABASES WITH FACIAL RECOGNITION


SOFTWARE
The U.S. Immigrations and Customs Enforcement has used facial recognition
technology to mine driver’s license photo databases, with the goal of
deporting undocumented immigrants. The practice — which has sparked
criticism from both an ethical and technological standpoint (facial recognition
technology remains shaky) — falls under the umbrella of data science. Facial
recognition builds on photos of faces, a.k.a raw data, with AI and machine
learning capabilities.

13. DETECTING TAX FRAUD


Tax evasion costs the U.S. government $1 tillion a year, according to one
estimate, so it’s no wonder the IRS has modernized its fraud-detection
protocols in the digital age. To the dismay of privacy advocates, the agency
has improved efficiency by constructing multidimensional taxpayer profiles
from public social media data, assorted metadata, emailing analysis,
electronic payment patterns and more. Based on those profiles, the agency
forecasts individual tax returns; anyone with wildly different real and
forecasted returns gets flagged for auditing.

Gaming Data Science Examples

The gaming industry is growing, and it’s using data science to help expand.
The global video game market was valued at $195.65 billion in 2021 and is
expected to grow by nearly 13 percent by 2030.

Data science and AI have been used in video games since as early as the
1950s with the creation of Nim — a mathematical strategy game in which two
players take turns to remove objects from piles. The innovation continued with
Pac-Man where AI and data science were used in the game’s mazes and to
give the ghosts distinct personalities.

The video game industry continues to find creative ways to implement data
science and AI to improve game play and entertain millions of people across
the globe. Here are just a few examples of how data science is used in video
games.

14. IMPROVING ONLINE GAMING


Known for being the company behind games with cult followings like Call of
Duty, World of Warcraft, Candy Crush and Overwatch, Activision Blizzard
uses big data to improve their online gaming experiences. One example of
this being the company’s game science division analyzing gaming data to
prevent empowerment — the attempt to improve someone else’s sports
scores through negative means — amongst COD players. The company also
uses machine learning to detect power boosting and identify and track key
indicators for increasing quality of game time.

15. MAKING SUGGESTIONS TO GAMERS TO


IMPROVE PLAY
2k Games is a video game studio that has created popular titles like Bioshock
and Borderlands, as well as both WWE and PGA games series. The
company’s growing game science team focuses on extracting gaming data
and building models in order to improve its sports games like NBA2K. Data
scientists at 2K games analyze player gameplay and economy telemetry data
to understand player behavior and suggest actions to improve the player
experience.

16. MONITORING BUSINESS METRICS IN THE VIDEO


GAME INDUSTRY
Unity is a platform for creating and operating interactive, real-time 3D content,
including games. The platform is used by gaming companies like Riot Games,
Atari and Respawn Entertainment, according to its website. Unity uses
gaming data to make data-driven decision making within its product
development team and to monitor business metrics.
E-Commerce Data Science Applications
Once upon a time, everyone in a given town shopped at the same mall: a
physical place with some indoor fountains, a jewelry kiosk and probably a
Body Shop. Today, citizens of that same town can each shop in their own
personalized digital mall — also known as the internet. Online retailers often
automatically tailor their web storefronts based on viewers’ data profiles. That
can mean tweaking page layouts and customizing spotlighted products,
among other things. Some stores may also adjust prices based on what
consumers seem able to pay, a practice called personalized pricing. Even
websites that sell nothing feature targeted ads. Here are some examples of
companies using data science to automatically personalize the online
shopping experience.

17. CREATING TARGETED ADS


Sovrn brokers deals between advertisers and outlets like Bustle, ESPN and
Encyclopedia Britannica. Since these deals happen millions of times a day,
Sovrn has mined a lot of data for insights, which manifest in its intelligent
advertising technology. Compatible with Google and Amazon’s server-to-
server bidding platforms, its interface can monetize media with minimal
human oversight — or, on the advertiser end, target campaigns to customers
with specific intentions.

18. CURATING VACATION RENTALS


Data science helped Airbnb totally revamp its search function. Once upon a
time, it prioritized top-rated vacation rentals that were located a certain
distance from a city’s center. That meant users could always find beautiful
rentals, but not always in cool neighborhoods. Engineers solved that issue by
prioritizing the search rankings of a rental if it’s in an area that has a high
density of Airbnb bookings. There’s still breathing room for quirkiness in the
algorithm, too, so cities don’t dominate towns and users can stumble on the
occasional rental treehouse.

19. PREDICTING CONSUMERS’ PRODUCT INTERESTS


Instagram uses data science to target its sponsored posts, which hawk
everything from trendy sneakers to influencers posting sponsored ads. The
company’s data scientists pull data from Instagram as well as its owner, Meta,
which has exhaustive web-tracking infrastructure and detailed information on
many users, including age and education. From there, the team crafts
algorithms that convert users’ likes and comments, their usage of other apps
and their web history into predictions about the products they might buy.

Though Instagram’s advertising algorithms remain shrouded in mystery, they


work impressively well, according to The Atlantic’s Amanda Mull: “I often feel
like Instagram isn’t pushing products, but acting as a digital personal shopper
I’m free to command.”

20. CREATING DIGITAL AD OPPORTUNITIES


Taboola uses deep learning, AI and large datasets to create engagement
opportunities for advertisers and digital properties. Its discovery platform
creates new monetization, audience and engagement by placing
advertisements throughout a variety of online publishers and sites. Its
discovery platform can expose readers to news, entertainment, topical
information or advice as well as a new product or service. The company
partners with outlets like USA Today, Bloomberg, Insider and MSN, according
to its website.

Social Platform Data Science Examples


The rise of social networks has completely altered how people socialize.
Romantic relationships unfold publicly on Venmo. Meta engineers can rifle
through users’ birthday party invite lists. Friendship, acquaintanceship and
coworker-ship all leave extensive online data trails.

Some argue that these trails — Facebook friend lists or LinkedIn connections
— don’t mean much. Anthropologist Robin Dunbar, for instance, has found
that people can maintain only about 150 casual connections at a time;
cognitively, humans can’t handle much more than that. In Dunbar’s view,
racking up more than 150 digital connections says little about a person’s day-
to-day social life.
Catalogs of social network users’ most glancing acquaintances hold another
kind of significance though. Now that many relationships begin online, data
about your social world impacts who you get to know next. Here are some
examples of data science fostering human connection.

21. CURATING MATCHES ON DATING APPS


When singles match on Tinder, they can thank the company’s data scientists.
A carefully-crafted algorithm works behind the scenes, boosting the probability
of matches. Once upon a time, this algorithm relied on users’ Elo scores,
essentially an attractiveness ranking. Now, it prioritizes matches between
active users, users near each other and users who seem like each other’s
“types” based on their swiping history.

22. SUGGESTING FRIENDS ON FACEBOOK


Location: Menlo Park, California

Meta’s Facebook platform, of course, uses data science in various ways, but
one of its buzzier data-driven features is the “People You May Know” sidebar,
which appears on the social network’s home screen. Often creepily prescient,
it’s based on a user’s friend list, the people they’ve been tagged with in photos
and where they’ve worked and gone to school. It’s also based on “really good
math,” according to the Washington Post — specifically, a type of data
science known as network science, which essentially forecasts the growth of a
user’s social network based on the growth of similar users’ networks.

Fintech Data Science Applications


Fintech and data science go hand in hand, as financial companies typically
use insights drawn from raw data to make lending decisions and create credit
reports. Data science is also used to predict consumer behavior, run risk
evaluations and optimize financial portfolios and assets. Here are some of the
companies using data science in fintech applications.

23. ACCELERATING UNDERWRITING FOR LIFE


INSURANCE
Bestow offers life insurance solutions for both individuals and enterprises. The
company’s goal is to make life insurance accessible and affordable for
everyone. It uses data science to power its accelerated underwriting process,
which pulls data from external sources like credit reports, motor vehicle
records or the Medical Information Bureau. Accelerated underwriting is helped
by data science’s predictive algorithms to determine an applicant’s risk
factors.

24. CREATING CREDIT REPORTS


TransUnion is a credit reporting agency known for providing credit reports,
fraud monitoring services and financial loans. The company’s data science
team is responsible for creating predictive models based on data reporting
from auto dealers to retailers to mortgage companies. The company uses
data science to extract insights from both an individual’s credit data and public
record data. These insights are used by financial institutions and lenders to
make informed decisions about extending credit offers and loan opportunities.

25. GATHERING PAYROLL DATA


Pinwheel uses data science to provide payroll solutions in the banking and
lending industries. Pinwheel’s Earning Stream gives financial institutions time-
and-attendance data on their customers, as well as historical payroll data,
accrued earnings and projected earnings. The system bases projections on
compiled historical data and allows finance companies to stay up to date on
their customer’s income and employment history.
Data Collection and Data Collection Strategies

The process of gathering and analyzing accurate data from various sources to find
answers to research problems, trends and probabilities, etc., to evaluate possible
outcomes is Known as Data Collection. Knowledge is power, information is knowledge,
and data is information in digitized form, at least as defined in IT. Hence, data is
power. But before you can leverage that data into a successful strategy for your
organization or business, you need to gather it. That’s your first step.

Example: A company collects customer feedback through online surveys and social
media monitoring to improve their products and services.

Data collection is the process of collecting, measuring and analyzing different types of
information using a set of standard validated techniques. The main objective of data
collection is to gather information-rich and reliable data, and analyze them to make
critical business decisions. Once the data is collected, it goes through a rigorous
process of data cleaning and data processing to make this data truly useful for
businesses. There are two main methods of data collection in research based on the
information that is required, namely:

● Primary Data Collection


● Secondary Data Collection

Primary Data Collection Methods


Primary data refers to data collected from first-hand experience directly from the main
source. It refers to data that has never been used in the past. The data gathered by
primary data collection methods are generally regarded as the best kind of data in
research.

The methods of collecting primary data can be further divided into quantitative data
collection methods (deals with factors that can be counted) and qualitative data
collection methods (deals with factors that are not necessarily numerical in nature).

Here are some of the most common primary data collection methods:

1. Interviews

Interviews are a direct method of data collection. It is simply a process in which the
interviewer asks questions and the interviewee responds to them. It provides a high
degree of flexibility because questions can be adjusted and changed anytime
according to the situation.

2. Observations

In this method, researchers observe a situation around them and record the findings. It
can be used to evaluate the behaviour of different people in controlled (everyone
knows they are being observed) and uncontrolled (no one knows they are being
observed) situations. This method is highly effective because it is straightforward and
not directly dependent on other participants.

For example, a person looks at random people that walk their pets on a busy street,
and then uses this data to decide whether or not to open a pet food store in that area.
3. Surveys and Questionnaires

Surveys and questionnaires provide a broad perspective from large groups of people.
They can be conducted face-to-face, mailed, or even posted on the Internet to get
respondents from anywhere in the world. The answers can be yes or no, true or false,
multiple choice, and even open-ended questions. However, a drawback of surveys and
questionnaires is delayed response and the possibility of ambiguous answers.

4. Focus Groups

A focus group is similar to an interview, but it is conducted with a group of people who
all have something in common. The data collected is similar to in-person interviews,
but they offer a better understanding of why a certain group of people thinks in a
particular way. However, some drawbacks of this method are lack of privacy and
domination of the interview by one or two participants. Focus groups can also be time-
consuming and challenging, but they help reveal some of the best information for
complex situations.

5. Oral Histories

Oral histories also involve asking questions like interviews and focus groups.
However, it is defined more precisely and the data collected is linked to a single
phenomenon. It involves collecting the opinions and personal experiences of people in
a particular event that they were involved in. For example, it can help in studying the
effect of a new product in a particular community.

Secondary Data Collection Methods


Secondary data refers to data that has already been collected by someone else. It is
much more inexpensive and easier to collect than primary data. While primary data
collection provides more authentic and original data, there are numerous instances
where secondary data collection provides great value to organizations.

Here are some of the most common secondary data collection methods:

1. Internet

The use of the Internet has become one of the most popular secondary data collection
methods in recent times. There is a large pool of free and paid research resources that
can be easily accessed on the Internet. While this method is a fast and easy way of
data collection, you should only source from authentic sites while collecting
information.

2. Government Archives

There is lots of data available from government archives that you can make use of.
The most important advantage is that the data in government archives are authentic
and verifiable. The challenge, however, is that data is not always readily available due
to a number of factors. For example, criminal records can come under classified
information and are difficult for anyone to have access to them.

3. Libraries

Most researchers donate several copies of their academic research to libraries. You
can collect important and authentic information based on different research contexts.
Libraries also serve as a storehouse for business directories, annual reports and other
similar documents that help businesses in their research.
Comparison Chart between Primary Data Vs Secondary Data

BASIS FOR
PRIMARY DATA SECONDARY DATA
COMPARISON

Meaning Primary data refers to Secondary data means


the first hand data data collected by
gathered by the someone else earlier.
researcher himself.

Data Real time data Past data

Process Very involved Quick and easy

Source Surveys, observations, Government


experiments, publications, websites,
questionnaire, personal books, journal articles,
interview, etc. internal records etc.

Cost Expensive Economical


effectiveness

Collection time Long Short

Specific Always specific to the May or may not be


researcher's needs. specific to the
researcher's need.
Available in Crude form Refined form

Accuracy and More Relatively less


Reliability

Section - A Part- II

In ppt - Data preprocessing


Section - A Part- III

Data Science Similarity Algorithms -

Proximity Measure in Data Science / Mining Application

1. https://www.youtube.com/watch?v=dQNO2VnMdtk (video 1 Only)

Euclidean Distance & Manhattan Distance

2. https://www.youtube.com/watch?v=N-a1pom7J9M
3. https://www.youtube.com/watch?v=gwtTM9M6ugU

Nearest Neighbor Algo -

KNN- K Nearest Neighbor Algo

https://www.youtube.com/watch?v=abnL_GUGub4

ANN - Approximate Nearest Neighbor algo

https://www.youtube.com/watch?v=DRbjpuqOsjk

Shingling -

https://www.youtube.com/watch?v=zd64IyQ9caw
Section - 2

Part1 - EDA

1. https://www.youtube.com/watch?v=JG8GRlMjp3c
2.

You might also like