KEMBAR78
ML Module1 (Ch-02) | PDF
0% found this document useful (0 votes)
17 views24 pages

ML Module1 (Ch-02)

The document discusses various data systems and analytics, emphasizing the importance of structured and unstructured data in business decision-making. It outlines different data types, storage methods, and the role of analytics in understanding and predicting trends. Additionally, it covers data collection techniques and the significance of data quality and preprocessing in analytics.

Uploaded by

Rohit Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views24 pages

ML Module1 (Ch-02)

The document discusses various data systems and analytics, emphasizing the importance of structured and unstructured data in business decision-making. It outlines different data types, storage methods, and the role of analytics in understanding and predicting trends. Additionally, it covers data collection techniques and the significance of data quality and preprocessing in analytics.

Uploaded by

Rohit Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

PAGE EDGa

DATE: /

Recomn endahbn Byatens


These. aMe Bysteis that maska perGonalizod
DuTchases possible<ox eA Amazon ecommonde
ses toind related GooKs.or baaks bought b
people uko Raxe fle aametsta likoyo
EX Net4éx
3) Voie ass tons
proucs lce Ama zon, ALeLgMioD,A4t
CortanaApple si,and GoegleAssukants ae

to Locali andnangate Shotkst palh to


nedue hme

Chabten2
Undexstanding Sata

All tacts ane data. Tn Comptc aystems, bi enode


dacts presont in nnses teztimages, audiD andvo
data canbe dinectty human intexpretblo or dituscd
data Buch as imag ox video -that Can be intexpreted

Vstand
Buisnes Organi 2ahbns ae acCuulahng

eiher o
hyte (MB) à
kilabyte
A
appYoxMakeby looke, ohe qigatby
in
acts, tentph
omVeuos
dataas data
deals
q. ce acuna
cansclessited
hoted Con-den Conrersahonsh
Catibs B04res
the emss,
tem eayte s hepstoneerstad
&pead thT s
indaookme
ad
to
abigdata trorm +rasachon
vecerds
andstd
in appli anc
ezl mãny
Cn4oxoi 4echnical data lha
honaldate am veracik es_ard
The Rnge
hmesesos
one1oillion
era
bsytesJ lievbili
n Tbescaa AouTee
too' and
Mulimedal
date
cEa).st be
QataypEshuman relicdi
Volume1radi increaseton ke may daa
roah be as data
eaaktes
The Sys aspeets the
Such
dataThee ike
truthulness Scla the
seLS uschon
CampaSones
yelocl Yanied
Forno data Yaidik
anth and Loith em
Ve by
biqs t
PAGE
EDGa s o
Loa data each
ohee
numencathabutes
pxecison, ihan collechon.
DATE
budes
ath
qnume9ic
andhe gamam
ustuctac
Btorcd
datbae
data datset
meAsunemnents
ClasD
Tho s
table.
a
The lhke is
data oxM
entrackdrom data
data set a
valate actstothe as
q
Buch in
Stuchunee data orm da objechs La
the 4es Shuchuned
the manner g in
etruchredA set
hy hatne datai the Baka- aenged
Shuchuaod
indicates in gq
Actexmined data
q
2.11TPes
Toedaaquali
accUrac
and has
collechon
ae seni SaamahoL
Acutay available
DxgAnized
e s
"Tthey Tn Recod Ojecks
be
that
Ahat and Can
a
Dxdered datei

parchsing pateans dumhg fcstral time ie

Sequence d a -It l:ee S 4enkal dat


bect does not hare time atamps hes data

Spahaldata Tt Rss attistes Kuch 8

Unshuchcddaa Unshuched datalnclles


video, image and audo, It leoihckydes
tea huel documeots peas and bleg cdala
Bomi-Stuchued data sergihu ctunad e
partally shuichaed and peiatialy usachuae
Theie inzlude dara ke KML/So da
Ros deods and Biesachidl daa'
212 Bata stege and Represan tahion
Once tho dataset k assem bled,t must be
Skorcd ina Shuchae that Bu iteeble dos
manasemot komakedea availale dox
analys.
Thexo aaedidtereot approaches tp egeorze nd
manae data in Khorage
K&se
les. ard kenys trem at
4ile to datcsAe
PAGEEDGa
DATE :

Elat iles
ho daa these lat iles ae Yho iles shoze
datai shorcd in plaih Asc II EBcDIC
Aoxmat
CSy dies: CSV &tands os Comma SepanloÁ
Vame iles o here he v a s re kepsaeed
by Commas.ThOSe 47e sed by&prescshoet
ald dakabeseapphcakhbs
TSV ilesTSV &tares or tab kepratek

Tab
dakcbase

iles and daassR Manaemeot sys tem CDBMs)


Metadaka
Batabss iles Ccn dains oninal dáaand
SBMS aos. to manage dotq ardiopruY operelos
pesfoxmenca bg incuding vauus took ike
dakbese adishala query faeasihp and
rasaChbn Managea
dakese coKsk g Retq 4eeb les
A helahionzl
The tblos have aoos and cokmnshe cokums
nepreseDt the attibutes and gaas 7pesnt
ples Atuple Comespords o eifheL an
object a nelahonsRup behaeen objects
dreat types datbesey ao lnte bela@
4 Atasachonal databse meCoaiataaYsQlhio
a Collechbn
Arasa Chonal %e0es Each idenhdieL
Atranacion man hare a tmstamp
lies to othez tbles
and ser teysohch avc
Tame emesdaaase Qtores imerelalacl
ibetnaion like lag les coheNO data k
ssocatcditha hme atacop h data
data, uo hich
ReprCseats
the
sackawned
peusdand repeated tme apan
ov a

8pahal d a t b S CObain pahil Tofoxmahin


ina rstes veca omat Rastai toma
a0 iher bitnneps &r pcmaps

onkne chtormahbn sosce

dara tosmat that can becscd trepresent


data

oundat q e obseovig eonomont


JSor CSavasoiet objec Notahon) I s anohe
Use ful data inteachang ornat thatK otte
ts ed dor many machine leanone calgonlhs
2-2BIG DATA ANALYTLCS ArDtyeES oF ANALYTICS
Ahe pximaxy aum q data arelysk i to asot
busness trgaoszahonto teke deosis
Qata anakyhcs t es to the pro ce3q daaCG Ilechon
Prprcbng and analys is.
Seseróphre analytit&Aprcsciphre aaky ic
4 Qiagnoshc anayhcs
3.
PAGE EGE
DATE://

Desci binp the gin featuxcs qShe daa


A&Lerthe dataaollechi done descophre
anaky hs daals oih tha Callecked data andquan klh
9Riagnashie AnayhLs i-Coy) 10 irdat the
eveaks
CaLuse' adie4feca the
3) pxedichYc AnalyhACa hatsoill happea in duhae
tho paHers to pcedict tha etuhae
presoiphve AnalyicsE Helps in easan makin
hy ivihgaaet achons
3Bg oaka Ana bysa FnameoK
OB19data ramneussK alayered rchitechuae
A-kayerarchikechue has ke folleing hayens
1 Qata Connechon Laye
2. Qata MaNagemet Lae
3. data Ana He Layer
4 presentahbn Layer

Sata Connechbnayea Tt Rasdata Ingeshbn


Lmechansms and data conneckossAska ingeshan
means tekine awclaka and impsahne nto
apprpiai daka shuchurses pesoxms the

2nlution g aucnles ad had o e ant caka


manageme tasks
2-3-1Data
lolechbn
1ata
Otheirst
3)
beHLesult!
bias
eheuldbe Collechbna Aata Bgdaa
alganthms.
Machita"
earning YmachineSacha BaBa
preankhbn
eguireA
eahibn
othe and
knaalzdse
Thtexmãnagemeot algrihm
dasblay &uch as
machine
in in petahbn ollec
pcacosine
Analyhic
pretable. Ko tasK nmsdes
thee&ulk Leaaning
procesiYg to
data.
arailasle q Leanihg io machinoearnine
ashlboads
laeaiStahitical
tests,
laye
1hedata dataA a gTesubs n.
that uodenatand
ga
hnd (henng ovasks Cay I-
and Agon
LearXWalzahon
qocd Tt
Shutd thee clecpplicahbns
analytia thatand Th
Rauld datses inyolwes has and has.
qualig hnand
the
se e mechensms
Rald gonkm. onshnichbn" mana
andorskasla
aufioent ollaoihedaka eagineS
deta ae
be
ho yelds the
usckenly
steps and
R3.2ata
poocossinepre
3.

iboxmahoM data
ypos.
md
Mulimoal inosmahon dataSocialmediaL.catRnd
doctors
and
patient
Healthcane
Social databases,
biolegical mulimoal the
oinetechniques
datathe May Sluoittes,
raioys Aike2aperimeo
Scleoh Hent any
daa Qata
atacuYate
Outsieadata data deasuLeCan
Tncomplola&aa ata &hins
modes ic
wih
wih pepoco AacebcDKyaukubeand Media dataolo tal
incons data- kocal Systems domaibs as ant
missi auchas datatikeqeroic
i 0elasdoument
Copight
stent Ssine mediaplattors Its that
Th beClssnied
clos ifba
Yrlaes tent
iDcydes the
impDYCS use
biointormahcs.health
de cxkesire hugeqe mules
video,
audio kagenerabed
ka datathat ihsuraoCe
data colechanimages
and.rcula
ha ata 9s
Tstaqam /:
DATEPAGE
Aike open EDGa
ualiy databases and
inyolves tios
and /publie
b
butlies.ae Nevse Sata
imppe The
Igne MSsingdara
Ahe Anaysis
tupe hadakrhing Sata uNLSUal
thateddtesert
vauos S0units Astrtien
dplkcahang
Converson ebyecks dappcosseb eNNS
BaacoYngin
attrihatesandomasQachas
he eaultshe
aadatn
odteescleansn
saeros in
tes vauts, the
Aanden
data
incosstot
and
oakines.atknpt Can procaK5must
data
or
&meothen that
Corect aso Machine mean bet
ron comnponent
&
eaht aMeoo preprocose
detechbn
Called
olhedaka makinp
dhe ha umats ledtoinp
incobslstenCiCg novse to the and dokcleaDing
ives the
l chaiactC
ad aAd
cahile adattex omion
up algonthoma data o
the hare ieMeyel
YE
iderh
s act
o
lyin Ve t
n a
Bibning data Remorctl
Consido
BinoingCountespaas tho
techorues the eualasbinnínemeaaurd
Binde categoi
trastoming al ata
Aumbeng1he 3l,24y eteck bckets
oergh
biDDing
bißa
bs the cohrch
The Equal requenly
bor
Bin App vale
bibs Aange dolloalng s
1uld bucketing
numeial ob t
- id eyahon alues samehod
qdata vaios
beLot
h binnine bis Tt
12 Binoinp set
uschoase inko to Casbe
4l9 S=12, vaiablos
binninp sapocAs
ers The
equal Ahnah
oelhod Lohee
bibs
asoa90
14.L3,22, emaYed
3&iI22d technigues
bios into U3d
Ttkapo the wses
Hhen tho DATE
/1
PAGE
EGa:
fox Toexyal& then tonoydaa.
24,24,2 minimi2e ar
Ainp Catled
4) 3) 21

4)903Bin Smolbine
Cuodasy B Binning biThne CBibning
Pio3:C2¬,3134)
Bib: bounda
Replace 9edian3| BiD2: Replaca
Bin3-C2%,3), Bib2:
2 Smeotheo
binLC12,14,1q)
C12+4+
Mean
C22, bih)is,sS)Sroco(han
Cj2, C22, the by
Ko 24)ion
Smoolhed
BinCSA,4(4)
24, Smoolbedsin2C24,A, 92,24,2c)Maarn242
Smcofhen
Srmoothad3=C3),33yA Bin C
rauesCeihen 24,24) medi meamoo
datapotots
dlataooibs
26)
Smoolhed 4,
L43 Bin
343
as 2(24)D4
gin3=
Median24 smohine
Meao-
Bin)C in
26)3 nibimu C31, n hing
eachbin each
(2s+31434/3jD4)
C22, 12, 3),31)
=24 bin
26,12/)4) yoib
4) mz +26 wto
fhe )/ the
mean
technique Min- min-m4
byhe pertoxmna anol The data Rata Daka
to 1O ikeSataSharcesbs
it
aDtegrahon.
Mgx nOxmalzRhon emorc main
di noYM2i2hon
tansfonahon
ihqsange 4om"mulhple
Integpahon Toteqahion
44eren
proQde cohe g0lA
2)2-SCOoe
ca Adusdancee may
bith and
each Odakamibing
fheatnbleg detaibtegahon
to lead
He - (o- ineves
It impe suces ata
minimmaie ahnes to
heus
min s to that
a taSSOmahos
the
edndant in
Duhnes
nlumin)
XCnaw
maX-hormeizehon
improYevaues
max pafoxm ae toa
value valgonihm pen
s s
(are EDCa
DATE//PAGE
:
4ned min normaz2 the tormaple to
doraBingtoThat
ha diida4 on
detet
kcaled operahos
minmn caramerge
a
EXam
plkCosido
NoYoalzahDn
2-SCoNc - SBep
Stepl-
nenco
yale
meas hus Aqpsi
2'-
rngLo,the Set
V=q2
min-maL
Vnao,
o332,o.c
ppky min
vE mihV-
Reßulhng jprcodure
mih-m4cnd
weentebe 8¬4e, the
88mgte
nSmelzed Y set
ne efmula
mappa 12, v=fB&9,
12-&8
44-88 =
value
ield 44-88
44
qo-88
94-2-o
6833
to
|
vales :Aa0666 each
anee
rue L-mine map
and betweon 92,4)

he
2.
3)

standerddeviahi
Calulal5the Coside Fos Machine
Rene
dara he
trhe
Conpulo inaBan that mean
deahre
Leanine
the Ce
the arords Standed
q
maucls
inadataset4 Ke
z
ane
algon st
soe o-20 thes
ondashialy deriahbnV
Rose hms
v: Outlesuos.
tOo-o
Iio,2e, 4een
2:00- 3
g
ahugle Le
&165 + 3o] dterent
s
20) (3o-20)73] b
Convet DATE: 1
andukV PAGE
EDGa
to
3 oth
Aealeg
dassets
the
-22
eann
eilher cdala the thotbint
Raho
machiae
data, dala
Zatien to
date
beaggregahbneale
% editos appied
rels
unchon Can be BataYV`uel
rha Nurnei
cal
data
desei o
dateaccbon henae q kands 3ntervca
9cduchsh NnddunansibnalK be
Gee educhon
S_ and can
-tha
dakset
r,o, z-Soxx STIC the
tLa 22 trc8ad i
that types
todetmine Qahyaedaka
dataordinal
Nonal
ta STATI
outie
Pessiblan
pocos
Reducbn-Ra Burnmawndesand
tesks
ralue data Aney data
alCategoi
usay& adta
4DESCRIpriVE
data as to or
Aatamihing
ard c
bD4dernt
t Buh DasCsphre
belos
used
o Aataset
he 2e Beips
Sata tobich
ut k
I4 S It
Bumpos
PAGE
EGa
a valuo Aerez
Dsease satstical aaa
e
olhechon
necoscstho Data
many data
DATE: as be a loo, alues
numeic
CeseS, a
pakx~,eYents, ye
csilh
cbect.
defined
Contain andodinel canot
nominal
ypes averas2 enough
be
beq ahb eon
may ever
ted kato
daka 3posihke any
o beas Cia t o
72c0ds dat a Negahy nel
objects k and
baides Rate be
daka
be.asko
squalitaiYe
quantitahre moke
the
assunoec caoq into Qualitaire hentID
bos
chaackeshc daa ode dad4eH
tyee -7hdesv
athobute ex,
sm Qualitiye
daka hese claSbihed Age not Tt ToBeYVal
e hould nel pq unol
Aodoe8
Cashe Noroi darare
Nectos o John - Satai-
An & CategoriGal
cts. obsexvahohS
bula
s
Nuogical Name
data numte na the
Data
popeks
be Catesorical paientI) mean
Adataset ypes has o ryees uhich
obie astnbukescan
athi Nominal, Nunmeoc
Nominal OTdina
a Sense Cnd
tspoin 3nteal
Pa
tient
date Aata.
Each Tus 1ike 1u0 or
2- 2.
0
1ecoded ange s onthe
Convesion
Fabeahes
and
oierenaa
nurnber aSe Ca4goy
one
a
ba ibtoem- basadotaset on nembey
da Hiahbn has
as
hoth ta iHed
o s
the data dara vaasleasocalled
the
da the dataet
classine point that
dotdu iclon be
kind theih
Ttcan used
homeaning inocatcg
Empleyeo decmaltying 3iVaiaa
daka,the
dataFoxma 2.Canti
nuaLs
dea
IACentigda
Batatb los Data
cessi
t- Conhnto
daka. Yorras
g dara
a e wey
incudos dara
g
teqes Coptinwaus A
Uniyrate
atio hung
1birwy
13pIVamta
varabe
Biraniala
Anothe ke Akcoe
Rahio in Cund
Qund
as
aun ining
tte he
hegpha
a e
paygs Canasohelp
explain
Caunsg
the
PAGE
EDGE 4 ad
salizahot duaplay
DATE:
/ categonythe
YoLiaslo tendenuy
cenha anesá
dae an
Varahion.
measunesduapesLon
Veables. ada
nomihal
ioaes 9 to
descmbe drsguenay
one daka.Gome used dsreke
data chait
data
and ONy asa uiYate
k The
to descophon
Kas
Analysk s hatoas,cRart
aalysscaaibuhons
Caed requen lndestand duahobuhbn
dakaset Ba9Arequeas
bate
JsRatavsualizakibn
Aata se data in
Banchaki-A
thecanuoiraLialR wsedchas 58
SuD[yaxala uniyaiate 4Le ho
daka piechasts Height
and
Weight 65 Heght
(In
cs)Cass
Students
tes
pa
Vesuablo holas
thatae aresed
aralysiá 52 160
Tt ba
A 5714 5429 5143 4657 4571 4286
lechaat

20% Category1
Category 2
Category3

Category 4

Category 5

Frequency
Results of Maths Test
Tt plays a important 0lk
in datMning oshoain
3 rcqueny shib hons
HHha toqram Conyeys wsetul
10 20 30 40 50
Mark (out of 50) date ad ús mode
hstogas asbe used as
chats to ssha requenu SkeuIness present

Dotplot of Random Values


hey cae less cus keaed
Conpaed to b chats

012 34 5 67 8 9
Random Values
PAGE EDGE
DATE : /

A:SConhal 1enden
4Meaßures e mesnmedian ad mode.
1Mean-Anhmehc avenaglr meanDsa meaaa
Qcenhal tendenaghet 1epresens the cegtea g Ko
ddarase
be a&et o N Yalues os
obsevahons hen te aLithmeh meaN Firn by

N
Webked Maan, Ggomehic Mean
Geomohc mean pn

2. Median- 1he mfdale wseA Value in La ditibhbn

called median
items in the ceobc the mecian

ln conhnuous case, the median AYco b (e


dormula
Median-L t
Median class is that clas whoeNdem s presedt
s thecLas ibeoval he mediáocas

eegulocy Ye median clam


meduan
Mode - Mode s tho yaluothat 0cUs mox
Jrequonthy Cn the haaset.1he value that
Ras RsghSst ArqueOsy rel+Cada
Med mode
a r sno
tos cuz
applicableox conhnuus data s ther
Neno Tpead raues in Coninuasdae

The spadnt aa Aeta data araund the


Cenhal tendeg
Caled dpers
RSpest epesndandcesaho
uchas 9LRnQe, UAOle
and atandarde r n h t

Rangei- Addererke behsen marand oin


8sndaadderahon:

Suaile and IhtoQurile Range-o


betseen
Skewness L
The neasurs Adirec hÙn and

bitdorden.
PAGE EDGa
DATE : / /

SkeornesRol be Zeas De dis ibhn

posihrcGkewed Negai Akeaeddake


Pecosbn 2 8kloneso Co-ezfGent
3KCumedia

obsenvaion the the SskeLones ean Seren

kurtosk
Kuxtosa also ihdicates to peK g daka

Mhe nomal di hobuhon Ras beh SRapec


wre i h ho tis.
Ttú Mesredin e eoxmule

Mean Absolwke Devahon :


MAD s anohe dspesob measu22
ard us obs to outhiers
Meen
MAD
vanahon(C Y ) ocompsae
datase ts wrlhb ditfeent uoits

The stcm and Lea plot or fh

&tcm Leaf
5

data gaimsE thecYeHal hYmel

You might also like