Big Data
Author-Aditi Pawar
#*FYCO-1,CO Department
Bahusaheb Vartak Polytechnic
Abstract-Big data is a broad term for data sets so large or
complex that traditional data processing applications are CHARACTERISTICS
inadequate. Challenges include analysis, capture, data curation,
search, sharing, storage, transfer, visualization, and information
privacy. The term often refers simply to the use of predictive Big data can be described by the following characteristics:
analytics or other certain advanced methods to extract value from
data, and seldom to a particular size of data set. Accuracy in big data Volume – The quantity of data that is generated is very important in this
may lead to more confident decision making. And better decisions context. It is the size of the data which determines the value and potential
can mean greater operational efficiency, cost reductions and reduced of the data under consideration and whether it can actually be considered
risk. Big Data or not. The name ‘Big Data’ itself contains a term which is
related to size and hence the characteristic.
Keywords— Big data,storeage,sharing.
Variety - The next aspect of Big Data is its variety. This means that the
category to which Big Data belongs to is also a very essential fact that
needs to be known by the data analysts. This helps the people, who are
I. INTRODUCTION closely analyzing the data and are associated with it, to effectively use the
Big Data is a collection of data that is huge in volume, yet growing data to their advantage and thus upholding the importance of the Big
exponentially with time. It is a data with so large size and complexity that Data.
none of traditional data management tools can store it or process it
efficiently. Big data is also a data but with huge size. Velocity - The term ‘velocity’ in the context refers to the speed of
generation of data or how fast the data is generated and processed to meet
Big data analysis helps businesses make better decisions while maximizing the demands and the challenges which lie ahead in the path of growth and
operations and reducing risk and efficiency. By using big data analytics development.
tools, businesses worldwide are improving their digital marketing strategies
by leveraging data and reducing risk from social platforms. Variability - This is a factor which can be a problem for those who
analyse the data. This refers to the inconsistency which can be shown by
the data at times, thus hampering the process of being able to handle and
manage the data effectively.
ARCHITECTURE:
Veracity - The quality of the data being captured can vary greatly.
Accuracy of analysis depends on the veracity of the source data.
In 2004, Google published a paper on a process called MapReduce that
used such an architecture. The MapReduce framework provides a parallel Complexity - Data management can become a very complex process,
processing model and associated implementation to process huge amounts especially when large volumes of data come from multiple sources. These
of data. With MapReduce, queries are split and distributed across parallel data need to be linked, connected and correlated in order to be able to
nodes and processed in parallel (the Map step). The results are then grasp the information that is supposed to be conveyed by these data. This
gathered and delivered (the Reduce step). The framework was very situation, is therefore, termed as the ‘complexity’ of Big Data
successful, so others wanted to replicate the algorithm. Therefore, an
implementation of the MapReduce framework was adopted by an Apache
open source project named Hadoop.
APPLICATIONS
MANUFACTURING Big data has increased the demand of information management
specialists in that Software AG, Oracle Corporation, IBM, Microsoft,
Based on TCS 2013 Global Trend Study, improvements in supply SAP, EMC, HP and Dell have spent more than $15 billion on software
planning and product quality provide the greatest benefit of big data for firms specializing in data management and analytics. In 2010, this
manufacturing. Big data provides an infrastructure for transparency in industry was worth more than $100 billion and was growing at almost
manufacturing industry, which is the ability to unravel uncertainties such 10 percent a year: about twice as fast as the software business as a
as inconsistent component performance and availability. Predictive whole.
manufacturing as an applicable approach toward near-zero downtime and
transparency requires vast amount of data and advanced prediction tools Developed economies make increasing use of data-intensive
for a systematic process of data into useful information. A conceptual technologies. There are 4.6 billion mobile-phone subscriptions
framework of predictive manufacturing begins with data acquisition worldwide and between 1 billion and 2 billion people accessing the
where different type of sensory data is available to acquire such as internet Between 1990 and 2005, more than 1 billion people worldwide
acoustics, vibration, pressure, current, voltage and controller data. Vast entered the middle class which means more and more people who gain
amount of sensory data in addition to historical data construct the big money will become more literate which in turn leads to information
data in manufacturing. The generated big data acts as the input into growth. The world's effective capacity to exchange information through
predictive tools and preventive strategies such as Prognostics and Health telecommunication networks was 281 petabytes in 1986, 471 petabytes
Management (PHM). in 1993, 2.2 exabytes in 2000, 65 exabytes in 2007and it is predicted
that the amount of traffic flowing over the internet will reach 667
exabytes annually by 2014. It is estimated that one third of the globally
stored information is in the form of alphanumeric text and still image
data, which is the format most useful for most big data applications.
This also shows the potential of yet unused data (i.e. in the form of
video and audio content).
ACKNOWLEDGEMENT
A. References Big Data is a term that refers to a massive amount of structured
and unstructured data that is generated at an unprecedented scale.
This data is so vast that traditional methods of data processing and
1. IBM Big Data & Analytics Hub: analysis are no longer sufficient. Big Data has become increasingly
https://www.ibmbigdatahub.com/ important for businesses looking to improve their operations,
2. Hortonworks Data Platform: reduce risk, and make informed decisions. It provides valuable
https://hortonworks.com/products/data-platforms/hdp/ insights that can help companies identify trends and patterns that
3. Cloudera Data Platform: were previously hidden.
https://www.cloudera.com/products/cloudera-data-
platform.html The characteristics of Big Data can be summarized as follows:
4. MapR Data Platform: https://mapr.com/products/mapr- Volume, Variety, Velocity, Variability, Veracity, and Complexity.
Volume refers to the vast amount of data that is generated and
platform/ collected every day. Variety refers to the different types of data
5. Apache Hadoop: https://hadoop.apache.org/ that are available, including structured, unstructured, and semi-
6. Data Science Central: https://www.datasciencecentral.com/ structured data. Velocity refers to the speed at which data is
7. KDnuggets: https://www.kdnuggets.com/ generated and processed. Variability refers to the inconsistencies
8. Big Data Made Simple: https://bigdata-madesimple.com/ in the data that can be challenging to manage. Veracity refers to
9. Datafloq: https://datafloq.com/ the accuracy and trustworthiness of the data. Complexity refers to
10. Big Data University: https://bigdatauniversity.com/ the challenge of managing and processing large volumes of data
that come from multiple sources.
In summary, Big Data is a crucial aspect of modern businesses. It
provides valuable insights and opportunities for companies to
improve their operations, reduce risk, and make informed
I. CONCLUSIONS
decisions. Understanding the characteristics of Big Data is
essential for data analysts and businesses that want to leverage the
power of data.
The availability of Big Data, low-cost commodity hardware,
and new information management and analytic software have
produced a unique moment in the history of data analysis. The
convergence of these trends means that we have the capabilities
required to analyze astonishing data sets quickly and cost-
effectively for the first time in history.
These capabilities are neither theoretical nor trivial. They
represent a genuine leap forward and a clear opportunity to
realize enormous gains in terms of efficiency, productivity,
revenue, and profitability. The Age of Big Data is here, and these
are truly revolutionary times if both business and technology
professionals continue to work together and deliver on the
promise