KEMBAR78
Big Data..Unit-1 Notes | PDF
0% found this document useful (0 votes)
162 views16 pages

Big Data..Unit-1 Notes

sdfighbhgfb

Uploaded by

Akash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
162 views16 pages

Big Data..Unit-1 Notes

sdfighbhgfb

Uploaded by

Akash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 16
UNIT1 Introduction to Big Data Syllabus ta, history of ae wi ee u al cal of Bi Introduction to Hig Data: Types Oo Giglio Big. Daley Big Data ita introduction to Big Data platform. dive pnology COMPODENS, Bi priieet pata plianc 7 Big Data f gcourily Se agés of and prota . Big Data features ~ 5°°° ties, CI of conveny Data privacy and Big Data analy processes and tools, anata intelligent data analysis, nature of dala analy ysis vs 4 modern data analytic tools Vs of Big Data. Big Types of digital data DIGITAL DATA series of 0's and Digital data is information stored on a computer system a @ e t'sina ‘A binary language. Digital data jumps from one value to the next in a step by step sequence, Example: Whenever we send an email, rea jal of fale pictures with our digital camera, we are wor Digital data can be classified into three forms: a. Unstructured Data; The data which does 1 not in a form that can be used easily by a computer P! unstructured data. About 80—90% data of an organizat Example: Memos, chat rooms, PowerPoint presentations, images, videos, letters, researches, white papers, the body of an email, etc. b. Semi-Structured Data; The data which does not conform to a data model is categorized as semi-structured data. However, itis da social media P igital data ing with di ot conform to a data model oris rogram is categorized as ition is in this format. but has some structure not in a form that can be used easily by a computer program Example : Emails, XML, markup languages like HTML, etc. Metadata for this data is available but is not sufficient. ¢., Structured Data: The data which isin an organized form (ie. in rows and columns) and can be easily used by a computer program is categorized aS semi-structured data, Relationships exist between entities of data, such a5 classes and their objects. Example: Data stored in databases. ff Introduction to Big Data platform | the features and A big data platform is a type of IT solution that com" oh in a single capabilities of several big data applications and utilities enalyed Big Dats solution, this is then used further for managing 48 2" Sn c It focuses on providing its users with efficient analytics datasets. ans according to thei ‘The users of such platforms can custom build appfication® 9s : use case like to calculate customer loyalty (E-Commere? eee on. Goal: The main goal of a Big Data Platform is to achieve: S Availability, Performance, and Security Example: Some of the most commonly used Big ols for massive calability, Data Platforms are : + Hadoop Delta Lake Migration Platform + Data Catalog Platform + Data Ingestion Platform + loT Analytics Platform Big Data Architecture : Big data architecture is designed to handle the ingestion, processing, Ge analysis of data that is too large or complex for traditional database systems Analytical Data Store Real time Message Ingestion ‘The big data architectures include the following components: f Data sources: All big data solutions start wit! Example, © Application data stores, such as relational dstebeSasear log files. 0 Static files produced by applications, such as W°" - © Real-time data sources, such as loT devices. tored in a distributed Data storage: Data for batch processing operations is a 's formats (also file tore that can hold high volumes of large files n Var" called data lake). Example, ‘Azure Data Lake Store or blob containers in Azure S| Batch processing: Since the data sets are SO large, solution must process data files using long-running batch Borers data sources. th one torage- F therefore a big data jobs to filter, aggregate, and prepare the data for analysis. oath i cludes real-time sources, the e messages for Real-time message ingestion: If a solution in architecture must include a way to capture and store realtim stream processing, Stream processing: After capturing real-time messages, and preparing the data for analysis. We can use the solution must process them by filtering, aggregating, The processed stream data is then written to an output sink. ‘open-source Apache streaming technologies like Storm and Spark Streaming for this. Analytical data store: Many big data solutions prepare data for analysis and then serve the processed data in a structured format that can be queried using analytical tools. Example: Azure Synapse Analytics provides a managed service for large-scale, cloud-based data warehousing, Analysis and reporting: The goal of most big data solutions is to provide insights into the data through analysis and reporting. To empower users to analyze the data, the architecture may include a data modelling layer. ‘Analysis and reporting can also take the form of interactive data ex . data scientists or data analysts. loration by Orchestration: Most big data solutions consist of repeat operations, that transform source data, move data betw. and sinks, load the processed data into an analytical g; ‘ed data processing een multiple sources lata store, or push the we canuse an results straight to a report. To automate these workflows: orchestration technology such as Azure Data Factory: Drivers for Big Data iced topics in the Big Data has quickly risen to become one of the most des" — Data Analytics are: The main business drivers for such rising demand for Bi 1. The digitization of society 2. The drop in technology costs 3. Connectivity through cloud computing 4. Increased knowledge about data science 5, Social media applications 6. The rise of Internet-of-Things(loT) Example: A number of companies that have Big Dat Strategy like : Apple, Amazon, Facebook and Nettlix have become very succes: beginning of the 21st century Big Data Characteristics : Big data can be described by the following characteristics: a at the core of their ssful at the + Volume + Variety + Velocity 5 Vs of Big Data, Big Data technology components 5 Vs of Big Data : 1. Volume : i Big Data is a vast “volumes” of data generated from many sources daily, such as business processes, machines, social media platforms, networks, human interactions, and so on. Example: Facebook generates approximately a billion messages, 4.5 billion times the “Like” button is recorded, and more than 350 million new posts are uploaded each day. Big data technologies can handle large amounts of data. 2. Variety : Big Data can be structured, unstructured, and semi-structured that are being collected from different sources. Data were only collected from databases and sheets in the past, But these days the data will come in an array of forms ie.- PDFs, Emails, audios, Social Media posts, photos, videos, etc. 3. Velocity : Velocity refers to the speed with which data is generated in teal-time. Velocity plays an important role compared to others, It contains the linking of incoming data sets Speeds, ra activity bursts, The primary aspect of Big Data is to provide demanding data rapidly, ite of change, and r messages or Example of data that is generated with high velocity - THe Facebook posts. 4. Veracity Veracity refers to the quality of the data that is beind analyzed: : Itis the process of being able to handle and manage dat@ efficient. Example: Facebook posts with hashtags. 5. Value Value is an essential characteristic of big data. Itis not the data that we process or a is valuable and reliable data that we store, process and analyse. Big Data importance and applications Big Data Importance : Big Data importance doesn't revolve around the amount of data a company has but lies in the fact that how the company utiizes the gathered data. Every company uses its collected data in its own way. More effectively the company uses its data, more rapidly it grows. By analysing the big data pools effectively the companies can get answers to: Cost Savings : 0 Some tools of Big Data like Hadoop can bring cost advantages to business when large amounts of data are to be stored. © These tools help in identifying more efficient ways of doing business. Time Reductions : © The high speed of tools like Hadoop and in-memory analytics can easily identify new sources of data which helps businesses analyzing data immediately. © This helps us to make quick decisions based on the learnings. Understand the market conditions : ° Byanalyzing big data we can get a better Understanding of current market conditions. co For example: By analyzing customers’ purchasing behaviours, a company can find out the products that are sold the most and produce products according to this trend. By this, it can get ahead of its competit Control online reputation : ors. © Big data tools can do sentiment analysis. pat about Your © Therefore, you can get feedback about who is s2¥iN9 fo company. of your business, © If you want to monitor and improve the online presence then big data tools can help in all this, Using Big Data Analytics to Boost Customer ae jeitfon(purchase) and ends on. Retention : isiness 46P' : .stablish a solid ‘© The customer is the most important asset any Du! OC © No single business can claim success without first having customer base. for, then it is very 0 Ifa business is slow to learn what customers are looking likely to deliver poor quality products. ustomer-telated : eae 7 © The use of big data allows businesses to observe Various atterns and trends. : v fer Marketing Using Big Data Analytics to Solve Advertisers Problem and Of Insights : © Big data analytics can help change all business operations. © Like the ability to match customer expectations, changing company’s product line, etc. And ensuring that the marketing campaigns are powerful. Big Data Applications : In today’s world big data have several applications, some of them are listed below: Tracking Customer Spending Habit, Shopping Behavior : In big retails stores, the management team has to keep data of customer's spending habits, shopping behaviour, most liked product, which product is being searched/sold most, based on that data, the production/collection rate of that product gets fixed. Recommendation : By tracking customer spending habits, shopping behaviour, big retail stores provide recommendations to the customers. Smart Traffic System : Data about the condition of the traffic of differen cameras, GPS devices placed in the vehicle, “roads, collected through / All such data are analyzed and jam-free or less jam WY" ways are recommended, One more profit is fuel consumption can be reduced Secure Air Traffic System ; At various places of flight, sensors are present temperature, These sensors capture data like the speed of fight, moisturer and other environmental conditions, Eran ak Based on such data analysis, an environmental parameter within flight is set up and varied By analyzing figh’s machine-generate data it can be estimated ow long the machine can operate flawlessly and when it can be replaced/repaired. Auto Driving Car : In the various spots of the car camera, a sensor is placed that gathers Sd like the size of the surrounding car, obstacle, distance from those, etc. These data are being analyzed, then various calculations are carried out. These calculations help to take action automatically. Virtual Personal Assistant Tool : Big data analysis helps virtual personal assistant tools like Siri, Cortana and Google Assistant to provide the answer to the various questions asked by jess time taking users. This tool tracks the location of the user, their local time, season, other data related to questions asked, etc. Analyzing all such data provides an answer. Example: Suppose one user asks “Do | need to take Umbrella?”The tool collects data like location of the user, season and weather condition at that location, then analyzes these data to conclude if there is a chance of raining, then provides the answer. JoT: Manufacturing companies install OT sensors into machines to collect operational data. Analyzing such data, it can be predicted how long a Machine will work without any problem when it requires repair. Thus, the cost to replace the whole machine can be Saved, Education Sector Energy Sector : ‘i to seay Online educational courses conducting organization utilize PIS So mn candidates interested in that course, “oct, then aN onling 'f someone searches for a YouTube tutorial video on @ our er etind to oF offline course provider organization on that subject S°"45 that person about their course. Media and Entertainment Sector - Media and entertainment service providing company like Nett a Amazon Prime, Spotify do analysis on data collected from toe non Data like what type of video, music users are watching. listenIn9 ed to set how long users are spending on site, etc are collected and analy the next business strategy. Big Data Technology Components : Components of a Big Data Ecosystem 1. Ingestion : The ingestion layer is the very first step of pulling in raw data. It comes from internal sources, relational databases, non-relational databases, social media, emails, phone calls etc. There are two kinds of ingestions Batch, in which large groups of data are gathered ang delivered together. Streaming, which is a continuous flow of data, This is Necessary for realtime data analytics. 2. Storage : warehouse ang o Storage is where the converted data is stored in a data !@k© eventually processed a big data of The data lake/warehouse is the most essential compone” ecosystem. se insights 28 valuable e Itnneeds to contain only thorough, relevant data to ma! as possible. aliow for quicker It must be efficient with as little redundancy as possible '° Processing. 3. Analysis : srtots, shaping nt In the analysis layer, data gets passed through seve actionable insights. There are four types of analytics on big data : + Diagnostic: Explains why a problem is happening: Descriptive: Describes the current state of a business through historical data. Predictive: Projects future results based on hi ; Prescriptive: Takes predictive analytics a step further by projecting best future efforts. 4. Consumption : The final big data component is presenting the information in a format digestible to the end-user. This can be in the forms of tables, advanced visualizations and even single istorical data. numbers if requested. The most important thing in this layer is making sure the intent and meaning of the output is understandable. Big Data features —security, compliance, auditing and protection BIG DATA SECURITY : 1. Big data security is the collective tern for all the measures and tools used to ols us guard both the data and analytics processes from attacks, theft, che ” or other malicious activities that could harm or negatively affect tt hem, : enges are 2. For companies that operate on the cloud, big data secu challeng . , be multi-faceted. they trust 3. When customers give their per : i compare Personal information t0 WE the re spit fall: them with personal data Which can be used against them if Le wrong hands, BIG DATA COMPLIANCE : i i p is organize 1, Data compliance is the practice of ensuring that sensitive data is orgal : cd i iterprise and managed in such a way as to enable organizations t meet enterprise business rules along with legal and governmental regulations. jizati it ¢d up to 2. Organizations that don’t implement these regulations ¢" be fined up to tens of millions of dollars and even receive a 20-year penalty- BIG DATA AUDITING : . Auditors can use big data to expand the scope of their projects and draw comparisons over larger populations of data. 2. Big data also helps financial auditors to streamline the reporting process and detect fraud. 3. These professionals can identify business risks in time and conduct more relevant and accurate audits. BIG DATA PROTECTION Big data security is the collective term for all the measures and tools used to guard both the data and analytics processes from attacks, theft, or other malicious activities that could harm or negatively affect them, . That's why data privacy is there to protect those customers but also companies and their employees from security breaches, 2. When customers give their personal information to Companies, they trust anies, they trust them with personal data which can be used against them s them if it falls j is into the wrong hands. +g implement these 20-year penalty. Big Data privacy and ethicg fost data i vation. ° M lata is collected through surveys, interviews, obser ation. es, they trust © When cus ive the; i Yomers give their personal information «0 COMP" one if it falls into the them with personal data which can be used against them wrong hands. ¢ That’s why data privacy is there to protect those customers: but also companies and their employees from security breaches ¢ One of the main reasons why companies comply with data privacy regulations is to avoid fines, be fined up to Organizations that don’t implement these regulations can tens of millions of dollar’ and even receive a 20-year penalty. '* Reasons, why we need to take data privacy seriously. are : * Data breaches could hurt your business. © Protecting your customers’ privacy Maintaining and improving brand value «It gives you a competitive advantage It supports the code of ethics Challenges of conventional systems + Big data is the storage and analysis of large data sets, + These are complex data sets that can be both structured or unstructured. + They are so large that it is not possible to work on th em wit itional analytical tools. ith tradition: One of the Major ch uncertainty of the p, Big data is continy the lenges of conventional systems W4® ata Management Landscape. Ously expanding, there are new comP anies and Pest for them without he introduction o new isk 2 tof big These days, organizations are realising the value they 9° 8 data analytics and hence they are deploying big dat@ tools a Processes to bring more efficiency in their work environment Big Data Analytics : Big data analytics is a complex process of examining bi ions, market uncover information, such as - hidden patterns, correlations, NST trends and customer preferences. pe This can help organizations make informed business decisions. Data Analytics technologies and techniques give organizations a way to analyze data sets and gather new information. : Big Data Analytics enables enterprises to analyze their data in full context quickly and some also offer real-time analysis. Importance of Big Data Analytics : + Organizations use big data analytics systems and software to make data-driven decisions that can improve business-related outcomes. + The benefits include more effective marketing, new revenue opportunities, customer personalization and improved operational efficiency. ae With an effective strategy, these benefits can Provide competitive advantages over :: . ivals, _ + Big Data Analytics tools also help businesses Save time aid in gaining insights to inform data-driven decisions, + Big Data Analytics enables enterprises to Narrow their Bi most relevant information and analyze it to inform critical decisions. Intelligent data analysis, nature of data data to '€ and money and Data to the business Intetigent p, Analysi ‘ proaches in th field of data = ¥S'S (IDA) is one of the most important 2 6 Based ne he Basie oa that I handles, th * PSI icines oft and the features of gatas cen "Ne development of IDA is briefly summarized from thre : Algorithm Principle * The scale * Type of the d ataset i F al Inttigent Data Analysis (IDA) is one of the raj issues 8 ACR Mtelligence and information. Meligence and information, Inteligant data analysis discioses hidden facts that fe noLKnOwn PSTSUSy and provide potentially important information or facts from large quantities of data. Wtalso helps in making a decision. Bn Based on machine learning, artificial intelligence. recognition of fae and Tecords and visualization technology, IDA helps to obtain useful information, necessary data and interesting models from a lot of data available online in order to make the right choices. IDA includes three stages: (1) Preparation of data (2) Data mining (3) Data validation and Explanation Modern data analytic tools + These days, organizations are realising the value they get out of big data analytics and hence they are deploying big data tools and & deploying big cata tools & processes to bring more efficiency to their work environment. + Many big data tools and processes are being utilised by companies these days in the processes of discovering insights and supporting decision making Data Analytics tools are types of application software that retrieve data from one or more systems and combine iin a repository, such a8 data warehouse, to be reviewed and analysed. bp Sof —D Rerivie Dalia BP) |_| Mizatio NS yy si Sheets © More th, tool including Wi . an lytics ata Mining co Statistica) eo ae re software Packages, . es ns, statistical So TORtther, thea eta Ped jing tools. ese /@ modelling too! OVErView of the oe Analvtics = nization a complete © Marketp PAY to provide key insights and understanding of ata analytics 80 smarter decisions may be Mace: lols not only renga the results of &xplai only report the results. ofS Plain why the FeSults occurred to help identify weaknesses, fix a =a on TEAS, alert decision-makers to unforeseen EVents Sven forecast future results based on decis Tools give the org! ness + the data but also Potential probienva; jons the company might Make. ee Below is the list some of data analytics tools : R Programming (Leading Analytics Toot in he indust'y) * Python + Excel + SAS + Apache Spark + Splunk + RapidMiner * Tableau Public + KNime Analysis vs reporting Reporting :Once data is collected, it will be organized using tools such as graphs and tables. + The process of organizing this data is called reporting. + Reporting translates raw data into information. « Reporting helps companies to monitor their online business and be alerted when data falls outside of expected ranges. + Good reporting should raise questions about the business from its end users. Analysis : lyzing it and anal ‘pod dat Analytics is the Process of taking the organize This helps users pusinesses can how to gain valuable insights 0 improve their Performance. sights. ion into insta ing the data Analysis transforms data and information poet oterret u * The goal of the analysis is to answer guest : ata deeper level and providing actionable Conclusion : Reporting shows us “what is happening”. ing” and “whet ening” an chy itis hapP' The analysis focuses on explaining “why / we can do about it”

You might also like