Unstructured data
Hi, great to see you again! Earlier, we compared some data formats, including structured and
unstructured data. Most of the data being generated right now is actually unstructured. Audio
files, video files, emails, photos, and social media are all examples of unstructured data. These
can be harder to analyze in their unstructured format. But here's the good news, you'll be
working with structured data most of the time. For example, if you need to analyze data about
the unstructured data in emails, photos, and social media sites, it'll most likely be structured for
analysis before you even get to it. Because of that, I want to explore structured data a bit more.
As a quick refresher, structured data is data organized in a format like rows and columns. But
there's definitely more to it than that. Structured data works nicely within a data model, which is
a model that is used for organizing data elements and how they relate to one another. What are
data elements? They're pieces of information, such as people's names, account numbers, and
addresses. Data models help to keep data consistent and provide a map of how data is
organized. This makes it easier for analysts and other stakeholders to make sense of their data
and use it for business purposes. In addition to working well within data models, structured data
is also useful for databases. This makes it easy for analysts to enter, query, and analyze the
data whenever they need to. This also helps make data visualization pretty easy because
structured data can be applied directly to charts, graphs, heat maps, dashboards and most
other visual representations of data. Alright, so now we know that spreadsheets and databases
that store data sets are widely used sources of structured data. After you explore some other
data structures, you'll check out more data types using a spreadsheet. The adventure
continues!
The structure of data
Data is everywhere and it can be stored in lots of ways. Two general categories of data are:
Structured data: Organized in a certain format, such as rows and columns.
Unstructured data: Not organized in any easy-to-identify way.
For example, when you rate your favorite restaurant online, you're creating structured data. But
when you use Google Earth to check out a satellite image of a restaurant location, you're using
unstructured data.
Here's a refresher on the characteristics of structured and unstructured data:
Structured data
As we described earlier, structured data is organized in a certain format. This makes it
easier to store and query for business needs. If the data is exported, the structure goes
along with the data.
Unstructured data
Unstructured data can’t be organized in any easily identifiable manner. And there is
much more unstructured than structured data in the world. Video and audio files, text
files, social media content, satellite imagery, presentations, PDF files, open-ended
survey responses, and websites all qualify as types of unstructured data.
The fairness issues
The lack of structure makes unstructured data difficult to search, manage, and analyze.
But recent advancements in artificial intelligence and machine learning algorithms are
beginning to change that. Now, the new challenge facing data scientists is making sure
these tools are inclusive and unbiased. Otherwise, certain elements of a dataset will be
more heavily weighted and/or represented than others. And as you're learning, an unfair
dataset does not accurately represent the population, causing skewed outcomes, low
accuracy levels, and unreliable analysis.