KEMBAR78
Data Collection & Analysis Methods | PDF | Mobile App | Data
0% found this document useful (0 votes)
122 views16 pages

Data Collection & Analysis Methods

This document discusses approaches to collecting and analyzing large datasets, known as "data at scale" or "big data". It describes scraping data from the web, collecting personal data via apps and devices, and crowdsourcing data from many contributors. Methods of analyzing data at scale include sentiment analysis, social network analysis, and combining multiple sources. Visualizing and exploring large datasets helps make sense of the information. The document stresses the importance of ethical design when working with personal or sensitive data.

Uploaded by

Joyce Ann Rufino
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
122 views16 pages

Data Collection & Analysis Methods

This document discusses approaches to collecting and analyzing large datasets, known as "data at scale" or "big data". It describes scraping data from the web, collecting personal data via apps and devices, and crowdsourcing data from many contributors. Methods of analyzing data at scale include sentiment analysis, social network analysis, and combining multiple sources. Visualizing and exploring large datasets helps make sense of the information. The document stresses the importance of ethical design when working with personal or sensitive data.

Uploaded by

Joyce Ann Rufino
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

HUMAN-COMPUTER

INTERACTION
CCHUCOIL

Data at scale

Week 11
2021
1
Table of Contents
Introduction ..................................................................................................................................... 3

Intended Learning Outcome ........................................................................................................... 3

Learning Materials .......................................................................................................................... 3

Data at scale .................................................................................................................................... 3

Approaches to collecting and analyzing data.................................................................................. 4

Scraping and “second source” data ............................................................................................. 5

Collecting personal data .............................................................................................................. 6

Crowdsourcing data .................................................................................................................... 6

Sentiment analysis ...................................................................................................................... 7

Social network analysis............................................................................................................... 7

Combining multiple sources of data ........................................................................................... 9

Visualizing and exploring the data ............................................................................................... 11

Ethical design concerns................................................................................................................. 14

References ..................................................................................................................................... 16

2
Introduction
How do you start your day? How much data do you encounter when first looking at your
smartphone, switching on your laptop, or turning on another device? How much do you
knowingly create and how much do you create unknowingly? Upon waking up, many people
routinely will ask their personal assistant, something like, “Alexa, what is the weather today?” or
“Alexa, what is the news?” or “Alexa, is the S-Bahn train to Schönefeld Airport running on
time?” Or, they will ask Siri, “What is my first meeting?” or “Where is the meeting?”

What happens to all the data collected about us? How does it improve the services provided by
society? Does it make traveling more efficient? Does it reduce traffic congestion? Does it make
the streets safer? Moreover, how much of the data collected from our smartcards, smartphone
Wi-Fi signals, and CCTV footage can be tracked back to us and pieced together to reveal a
bigger picture of who we are and where we go? What might that data reveal about us?

In this course material, we will explore how to present this massive data in our small screen.

Intended Learning Outcome


1. Provide an overview of some of the potential impacts of data at scale on society.
2. Introduce key methods of collecting data at scale.
3. Discuss how data at scale becomes meaningful.
4. Introduce design principle for making data at scale ethical.

Learning Materials
• Course Material copy
• Assessment tasks copy (individual)

Data at scale
Data at scale or it is also often called big data, describes all kind of data including databases of
numbers, images of people, things and places, footage of conversations recorded, videos, text,
and environmentally sensed data (such as air quality).

3
It is also being collected at an exponential rate; for example, 400 new YouTube videos are
uploaded every minute, while millions of messages circulate through social media. Furthermore,
sensors collect billions of bytes of scientific data.

Approaches to collecting and analyzing data


Collecting data has never been easier. What is challenging is knowing how best to analyze,
collate, and act upon the data in ways that are socially acceptable, beneficial to society, and
ethically sound. Are there certain rules or policies in place on what to reveal about people or
when certain patterns, anomalies, or thresholds are reached in a data stream? For example, if
people-tracking technology is used at an airport, how is that revealed to those at the airport? Is it
enough only to show data that can help manage people flows and bottlenecks? For example, in
an airport terminal showing a public display in which one section of the terminal is detected to
be much busier than another, do travelers ever stop and wonder how this data is being collected?
What else is being collected about them? Do they care?

4
Figure 1 Heathrow Airport Terminal 5 public display in top-corner or image showing the relative level of activity using and
infographic of North cs. south security.

Scraping and “second source” data


One way to extract data is by “scraping” it from the web (assuming that this is allowed by the
application). Once the data is scraped, it can be entered into a spreadsheet for study and analyzed
using data science tools. The focus from an interaction design perspective is how one can interact
with that data and the way it is displayed rather than the actual scraping process per se, so that it
can be analyzed, and sense can be made of it.

Activity 1
What insights do Google Trends searches tell us about ourselves? Go to Google Trends
(https://trends.google.com). Then try typing into the search box statements such as “I hate my
boss,” “I feel sad,” or “I eat too much.” See how many people have typed this into Google
over the last month, year, and for different countries. Then type in the opposite statements: “I

5
love my boss,” “I am happy,” or “I never eat enough.” How do the results compare? Which is
asked more often? Then type in your name and see what Google returns.

Collecting personal data


Many apps and wearable devices exist that people can buy off the shelf, which can collect all
sorts of personal data and visualize it. These results can be matched against targets reached, and
recommendations, hints, or tips can also be provided about how to act upon it. Many apps now
come prefunded on a smartphone or smartwatch, including those that quantify health, screen
time, and sleep. Some also allow multiple activities to be tracked, aggregated, and correlated.
The most common types of apps are for physical and behavioral tracking, including mood
changes, sleep patterns, sedentary behavior, time management, energy levels, mental health,
exercise taken, weight, and diet.

Activity 2
Provide more examples of devices that collects personal data including brands, images, prices,
sources, and what it can do (features).

Crowdsourcing data
The idea of a crowd working together has been taken one step further in Crowd Research where
many researchers from all over the world come together to work on large problems such as
climate science. The goal of this approach is to enable hundreds of people to contribute, through
collecting data, ideating, and critiquing each other’s designs and research projects.

In doing crowdsourcing, it amasses billions of data such as photos, sensor readings, comments,
and discussion. Most of this data are stored in the cloud as well as on local machines. Example of
large science projects include iSpotNature, eBird, iNaturalist, and Zooniverse.

6
Figure 2. Abundance map for the common raven. the darkest area indicates where ravens are most abundant.

Sentiment analysis
Sentiment analysis is a technique that is used to infer the effect of what a group of people or a
crowd is feeling or saying. Examples include the following:

1. The phrases that people use when offering opinions or views in terms of scales such as -
10 is the most negative, 0 is neutral, and +10 is the most positive.
2. Anger, sadness, fear (negative) or happiness, joy, or enthusiasm (positive feelings)

Social network analysis


Social network analysis (SNA) is a method based on social network theory for analyzing and
evaluating the strength of social ties within a network. Trillions of messages, tweets, pictures,
and videos are posted and responded to every second of everyday via Weibo, Tencent, Baidu,
Facebook, Twitter, Instagram, and YouTube.

7
Figure 3. US Senator voting relationships (1989, 1997, and 2013 respectively). Red represents republicans, and blue represents
democrats.

8
Combining multiple sources of data
Several researchers have started collecting data from multiple sources by combining automatic
sensing and subjective reporting. The goal is to obtain a more comprehensive picture about a
domain, such as a population’s mental health, than if only one or two sources of data were used
(for instance, interviews or surveys).

An example was studying the STUDENTLIFE which was concerned with learning more about
student’s mental health. The research wanted to know why the students do better than others
under times of stress, why some students burn out, and still other drop out. The graph below
shows the relationship between activity, deadlines, attendance, and sleep. It shows that students
are very active at the beginning of the term and get very little sleep. This suggest that they are
out partying a lot. They also have high attendance rates at the beginning of term. As term
progresses however, their behavior changes dropping them dramatically.

Figure 4. Student's activity, sleep, and attendance levels against deadlines during a term.

Activity 3
From two graphs, what can you say about the students’ activity, their stress level, and their
level of socializing in relation to deadlines over the course of the term?

9
10
Visualizing and exploring the data

Figure 5 A market map of the S&P 500, which is a financial index for stocks. Green indicates stocks that increased in value, and
red indicates stocks that decreased in value that day.

Figure 6. Visualization of different sounds, including birds, owls, and insects, from three areas of Australia that are displayed so
they can be interpreted and compared.

11
Figure 7. A dashboard that was created to show changes in sales information.

Figure 8. An interactive graphic produced using D3 for the New York Times. It shows the tax rate paid by the different kinds of
companies that form the S&P 500 financial index.

12
Figure 9. Dashboard 1 and Dashboard 5 specifically target decision-making, while Dashboard 3 and Dashboard 4 target
consumer awareness. Dashboard 2 represents the quantified self (such as a smart home), while Dashboard 6 represents.

Activity 4
Study Figure A from the weather site https://www.wunderground.com. It shows weather data
for December 19, 2018, at Washington D.C. in the United States. Particularly take note of the
temperature, precipitation, and wind data. What information do they provide? Now compare
this visualization with that depicted in the “wundermap” (see Figure B). How do the two
displays differ, and which do you prefer?
Figure A

13
Figure B

Ethical design concerns


We mentioned how masses of data are now regularly being collected from people for variety of
reasons, including improving public services, reducing congestion, and enhancing security
measures. It is usually anonymized and sometimes aggregated to make it publicly available. Wen
deciding how to analyze data and act upon data that has been automatically collected from

14
different sensors, it is important to consider how ethical the data collection and storage process
are and how the data analysis will be used.

• ACM Code of Ethics (https://ethics.acm.org/2018-code-draft-1/)


• IEEE Code of Ethics (https://www.computer.org/education/code-of-ethics)

They point out that central to any ethical discussion is the importance of protecting fundamental
human rights and to respect the diversity of all cultures. They also state the need to be fair,
honest, trustworthy, and respectful of privacy.

Activity 5
Shoplifting cost U.S. retailers $44 billion in 2014. To help combat shoplifting, DeepCam
developed an intelligent system that passively monitors people coming into a store by using
CCTV video footage that identifies potential suspects. To do this, it uses AI algorithms and
facial recognition software. Do you think this practice is socially acceptable?
What might be the privacy concerns? To find out more about their system, check out their
website at https://deepcamai.com/.

15
References
Rosala, M. (2020, January 26). The critical incident technique in UX. Retrieved from Nielsen
Norman Group: https://www.nngroup.com/articles/critical-incident-technique/

Sharp, H., Rogers, Y., & Preece, J. (2019). Interaction Design: beyond human-computer
interaction, Fifth Edition. Indianapolis: John Wiley & Sons, Inc.

Weprin, M. (2016, November 13). Design thinking methods: affinity mapping. Retrieved from
UXDICT.IO: https://uxdict.io/design-thinking-methods-affinity-diagrams-357bd8671ad4

16

You might also like