5.
1
DATA AND
INFORMATION
C R E AT E D B Y : M O H A M M A D N A B E E L A R S H A D
5.1 DATA AND INFORMATION
Data:
• Data is the collection of raw facts and figures. Actually data is unprocessed, that is
why data is called collection of raw facts and figures.
• We collect data from different resources. After collection, data is entered into computer for
processing. Data may be collection of words, numbers, pictures, or sounds etc
• Data is unprocessed facts and figures without any added interpretation or analysis.
"The price of crude oil is $80"
• For example: Survey Data
• Different companies collect data by survey to know the opinion of people about their product.
the company's survey staff will go house to house and interview people about the use or like /
unlike their products. They also collect data about their competitor companies in a particular
area.
INFORMATION
• Information is organized or classified data, which has some meaningful values for the receiver.
Information is the processed data on which decisions and actions are based.
• For the decision to be meaningful, the processed data must qualify for the following
characteristics −
• Timely − Information should be available when required.
• Accuracy − Information should be accurate.
• Completeness − Information should be complete.
• Information is data that has been interpreted so that it has meaning for the user.
"The price of crude oil has risen from $70 to $80 per barrel" gives meaning to the
data and so is said to be information to someone who tracks oil prices.
• For example: Student Address Labels
• Stored data of students can be used to print address labels of students. these address labels
are used to send any intimation / information to students at their home addresses.
5.1.1 DIFFERENCE BETWEEN
DATA AND INFORMATION
Differences between data and information:
• Data is used as input for the computer system. Information is the output of data.
• Data is unprocessed facts figures. Information is processed data.
• Data doesn’t depend on Information. Information depends on data.
• Data is not specific. Information is specific.
• Data is a single unit. A group of data which carries news and meaning is called Information.
• Data doesn’t carry a meaning. Information must carry a logical meaning.
• Data is the raw material. Information is the product.
5.1.2 STRUCTURED AND UNSTRUCTURED
DATA
• Structured data is most often categorized as quantitative data, and it's the type of data most
of us are used to working with. Think of data that fits neatly within fixed fields and columns
in relational databases and spreadsheets.
• Examples of structured data include names, dates, addresses, credit card numbers, stock
information, geolocation, and more.
• Structured data has the advantage of being easily entered, stored, queried and analyzed. At one
time, because of the high cost and performance limitations of storage, memory and processing,
relational databases and spreadsheets using structured data were the only way to effectively
manage data.
• Structured data is highly organized and easily understood by machine language. Those working
within relational databases can input, search, and manipulate structured data relatively quickly.
This is the most attractive feature of structured data.
• The programming language used for managing structured data is called structured query
language, also known as SQL. This language was developed by IBM in the early 1970s and is
particularly useful for handling relationships in databases.
• If it sounds confusing, the picture below should help visualize how structured data relates to
each other within a database.
UNSTRUCTURED DATA
• The phrase unstructured data usually refers to information that doesn't reside in a traditional
row-column database. As you might expect, it's the opposite of structured data — the data
stored in fields in a database.
• Unstructured data is the most abundant. It’s so prolific because unstructured data could be
anything: media, imaging, audio, sensor data, text data, and much more. Unstructured simply
means that it is datasets (typical large collections of files) that aren’t stored in a structured
database format. Unstructured data has an internal structure, but it’s not predefined through
data models. It might be human generated, or machine generated in a textual or a non-textual
format.
Examples of Unstructured Data
• Unstructured data files often include text and multimedia content. Examples include e-mail
messages, word processing documents, videos, photos, audio files, presentations, webpages and
many other kinds of business documents. Note that while these sorts of files may have an
internal structure, they are still considered "unstructured" because the data they contain
doesn't fit neatly in a database.
Here are some examples of machine-generated unstructured data:
• Satellite images: This includes weather data or the data that the government captures in its
satellite surveillance imagery. Just think about Google Earth, and you get the picture.
• Scientific data: This includes seismic imagery, atmospheric data, and high energy physics.
• Photographs and video: This includes security, surveillance, and traffic video.
• Radar or sonar data: This includes vehicular, meteorological, and oceanographic seismic profiles.
The following list shows a few examples of human-generated unstructured data:
• Text internal to your company: Think of all the text within documents, logs, survey results, and
e-mails. Enterprise information actually represents a large percent of the text information in the
world today.
• Social media data: This data is generated from the social media platforms such as YouTube,
Facebook, Twitter, LinkedIn, and Flickr.
• Mobile data: This includes data such as text messages and location information.
• Website content: This comes from any site delivering unstructured content, like YouTube, Flickr
Structured VS Unstructured data
PROPERTIES STRUCTURED DATA UNSTRUCTURED DATA
Technology It is based on Relational database table It is based on character and binary data
Transaction management Matured transaction and various concurrency technique No transaction management and no concurrency
Version management Versioning over tuples, row, tables Versioned as whole
Flexibility It is schema dependent and less flexible it very flexible and there is absence of schema
Scalability It is very difficult to scale DB schema It is very scalable
Robustness Very robust —
Query performance Structured query allow complex joining Only textual query are possible
5.1.3 EXTRACTING MEANINGFUL INFORMATION
FROM DATA
Fundamental data extraction is a process of data from an unstructured source in order to
process it further and/or store it.
However, the techniques of extraction have changed over time and the process is more
automated mow.
Automated fundamental data extraction is very useful for large organizations who deal with data
on a large scale to generate meaningful information. t is a wav to extract structure data from
disorganized and semi. Structured documents, found on the web and various data warehouses.
Automated fundamental data extraction is a broader part of business intelligence, which also
includes relational database management systems and report writing. There are several tools to
analyze the more complicated multidimensional operating data. The emphasis of the fundamental
operating data allows companies to monitor performance, data extraction techniques and
generate desired results. The data extraction is a very important source of information in todays
fast changing business world
• . The data and the information dug out could help company's take business decisions, thereby
providing new dimension to your business goals. Competitors these days do mot spare an inch
when it comes to maximizing their business performance and the technique of automated
fundamental data extraction has become critical to one's success. There are several
applications where extraction of operating data can be of great use.
• The list includes extracting data for preparing earnings mode and financial reporting, business
reporting for sacs, marketing, management reporting, business process management (BPM),
budgeting and forecasting
• Automated fundamental data extraction is important as it allows the use of various metrics
based on the available operating data to simplify the measurement of past performance and
guide business planning