Data Visualization
Network Data Visualization
As you proceed with the assignment, follow the written instructions. Screenshots
are provided ONLY as a reference.
Make sure you submit all screenshots with a clearly visible menu bar including the
date and timestamp.
Objective:
The objective of this exercise is to develop skills on how to visualize and analyze large
networks using Gephi. This exercise focuses on building a social network visualization
using Gephi. For this exercise, we will work on a large Facebook dataset.
Prerequisites:
1. Install Java SE Development Kit 8
http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-
2133151.html
Create an Oracle account if needed to sign in
Judd D. Bradbury Page 1
Data Visualization
2. Install Gephi
http://gephi.github.io/users/download/
3. In case Gephi does not work after completing steps 1 & 2, follow the steps as
below:
a. Open your Gephi Installation folder (probably C:\Program Files (x86)\Gephi-
0.8.2 ) and locate etc folder
b. Within the etc folder, you will find a file named Gephi.conf. Open the file
with notepad
c. Search for “#jdkhome="/path/to/jdk"
d. Remove #, otherwise the code will be considered as a comment and hence
will not be executed
e. Replace the text /path/to/jdk inside double quotations with the directory
address of your java folder. In order to find the java folder, go to windows
drive (Probably C:), go to Program Files and find Java folder. Inside the java
folder you will find a folder that starts with “jdk1.7”. Copy the path of this
folder and paste it instead of /path/to/jdk in the Gephi.config file.
f. Save the config file, in case you are allowed to save it then save it
somewhere else and then replace it by the original file.
4. Open Gephi. If the Welcome pop-up screen appears then click on cancel button.
Judd D. Bradbury Page 2
Data Visualization
5. Download Javascript Gexf viewer, to view the Gephi file in the web browser. After
downloading the file make sure that you place the extracted folder in desktop or in
any other know location and later your exported Gephi file needs to be saved in
this folder.
https://github.com/raphv/gexf-js
Judd D. Bradbury Page 3
Data Visualization
Step 1: Download the provided CSV files from eLearning
About Gephi
Concept: Gephi is an open-source software for network visualization. It can read many
file formats including Gephi, GEXF, GML, GDF, CSV and many others. As part of this
exercise, we would be using csv files.
Before starting with Gephi, we should familiarize ourselves with 2 terms – Nodes and
Edges.
Node: A node is a unique identifier of an object within a data set.
Edge: An edge is a line or the relationship that connects two nodes
To import data from excel or csv file into Gephi, you will usually need to prepare following
2 files:
1. Node File – Containing the nodes and its attributes
a. The node file must include a column having name ‘ID’
b. ID column should contain unique entries
2. Edge File – Containing the edges and its attributes
a. The edge file must include columns with name ‘Source’ & ‘Target’ which
contains the start and the end nodes for each edge
Step 2: Import data into Gephi
Once Gephi is installed properly, the following screen should appear:
Judd D. Bradbury Page 4
Data Visualization
Click on New Project in the welcome screen as highlighted in the screen shot below or
alternatively follow the menu path File -> New project
There are 3 main sections within Gephi:
1. Overview Tab : Setup the network visualization
Judd D. Bradbury Page 5
Data Visualization
2. Data Laboratory Tab : Import, Examine & Edit network data (Nodes & Edges
table)
3. Preview Tab : Configure the rendering settings, for instance color, label
sizes etc. and preview the visualization.
To upload the data, click on the data laboratory tab as highlighted below
Import Nodes File:
1. Click on Import Spreadsheet under Data Laboratory tab
2. Select the csv file downloaded earlier called “facebook_accounts”.
3. A popup will appear. Make sure that the file is being imported as “Nodes table”.
Confirm that the file selected correctly displays all the data and click “Next”.
Judd D. Bradbury Page 6
Data Visualization
4. Configure the import settings on the second page as shown below.
Gephi automatically tries to match the imported columns with suitable data types. You
can configure it according to the required usage. Here, we change the facebook_id from
Long to String.
5. Click on “Finish” to complete the import.
Judd D. Bradbury Page 7
Data Visualization
6. A new popup called Import report is displayed. Change the Graph type from Mixed
to Undirected and click on “OK”.
You will notice there are multiple graph types supported by Gephi. Since we are using a
Facebook network for this assignment, we will choose undirected i.e., both accounts
(nodes) are friends with each other. Undirected relationships indicate a connection that
flows in both directions, while a directed relationship moves in a single direction that
always originates in one node and moves toward another node.
In case of directed graphs, edges may also be weighted, to show the strength of the
connection between nodes. In a supply chain network where a vendor A supplies to both
companies B and C, the strength (width) of the edge will show where the stronger
relationship exists. If A supplies B 3 products, while A supplies C a total of 9 products, we
should weight the A to C connection three times that of the A to B connection.
Question 1: Identify whether the following networks are directed or undirected:
A. Twitter Network – directed, you can follow ppl without them following you.
B. Railway Network – undirected, railways and trains trains could travel east down
a railroad, and then later travel west; therefore going both directions
C. Pedestrian Pathway Network – undirected, while people often stick to the right
side of the pathways (in the US) the pathway often goes in both directions. This
allows pedestrians to come and go on the same path.
Judd D. Bradbury Page 8
Data Visualization
You should now be able to view the contents of the node file by clicking on the nodes tab
under Data Laboratory tab.
Question 2: Paste a screenshot of your nodes table.
Import Edges File:
Follow the same steps as followed while uploading the nodes file.
1. Click on Import spreadsheet icon under Data Laboratory tab.
2. Select the csv file downloaded earlier called “facebook_friends”.
3. A popup will appear. Make sure that the file is being imported as “Edges table”.
Confirm that the file selected correctly displays all the data and click “Next”.
Judd D. Bradbury Page 9
Data Visualization
4. Click on “Finish” to complete the import.
5. A new popup called Import report is displayed. Make sure to select Graph type as
Undirected and check “Append to existing workspace” option. Click on “OK”.
Judd D. Bradbury Page 10
Data Visualization
An Issues after import process window will be displayed. A successful import should not
have any errors. Go ahead and close the report.
You can view the contents of the edges file by clicking on edges tab, under data
laboratory.
Question 3: Paste a screenshot of your edges table.
Judd D. Bradbury Page 11
Data Visualization
Step 4: Prepare Visualization
In order to create visualizations, click on the overview tab.
Notice that Gephi has already provided us with a default visualization based on our data
set once you click on the overview tab as highlighted below
Question 4: Paste a screenshot of the visualization in the overview tab.
Step 4: Filtering in Gephi
This data set has a huge number of nodes and edges. Our first step will be to determine
the average number of degrees of the nodes using Average Degree parameter present
in the right panel under Network Overview within Statistics tab.
Note: In case the statistic tab is not visible right side of the window then go to context
menu Window -> Statistics. After this step, the statistic should be visible on the right-
side panel.
Judd D. Bradbury Page 12
Data Visualization
Click on Run beside the Average Degree parameter. This will assign weights to the node
ids based on the number of edges or relationships present irrespective of the type of entry
i.e., whether it is duplicate or distinct. It will count all the entries against that node id in the
edges table.
Once you click on run, you will get the degree report.
Question 5: What is the average degree of the nodes in this dataset?
Average Degree: 15.220
After this step click on the nodes tab under data laboratory tab. Notice that new
columns have been added to the nodes table with weights assigned to different records
as highlighted below.
Observe there are many accounts with a wide range of degrees (friends). We want to
filter out the nodes with less than 100 degrees. Go ahead and close the Degree Report.
After this step, go to the Filters tab, next to the Statistics tab and select Topology. Drag
the Degree Range filter to the Drag filter here under the Queries section below.
Judd D. Bradbury Page 13
Data Visualization
Increase the lower bound from 1 to 100 as shown below. Click on Filter.
Judd D. Bradbury Page 14
Data Visualization
Notice that the number of nodes and edges have decreased significantly.
Question 6: Paste a screenshot of the filtered network visualization in the overview
tab.
Judd D. Bradbury Page 15
Data Visualization
Step 5.1: Visualization based on page_type
Go to the Appearance section on the left and under the Nodes tab select Partition.
Make sure you have selected the Color icon. Select page_type from the dropdown. Click
on Apply.
Judd D. Bradbury Page 16
Data Visualization
Question 7: How many categories are there under page_type? Rank them in correct
order.
4 categories
Rank:
1. Government
2. Company
3. Politician
4. Tv show
Similarly, go to the Appearance section again and this time select the Size icon. Under
the Nodes tab select Ranking. Select degree from the dropdown. Set the Min size and
Max size to 20 and 200 respectively. Click on Apply. After this step you will notice that
the node size changes.
Choose algorithm Fruchterman Reingold under layout section on the left panel.
Increase the Area to 50000.0 and Speed to 2.0. And then click on Run. Let it run for a 1
minute and then once the visualization is expanded and clearly visible, stop the algorithm
by clicking on the stop button.
Notice that the visualization is changed now. Clustering has now become more
prominent, and nodes seem to have re-organized based on page type (color).
Question 8: Paste a screenshot of the updated network visualization in the
overview tab.
Judd D. Bradbury Page 17
Data Visualization
Question 9: Which accounts have the highest degree for each page_type?
Page_type highest degree
1. Government US Army
2. Company Facebook
3. Politician Barack Obama
4. Tv show today Show
Hint: Find the node with the biggest circle, right click on it, and then click on select in data
laboratory. Go the data laboratory and mention the selected label in the answer.
Step 5.2: Visualization based on modularity
Coloring the nodes by modularity.
Click on Run beside the Modularity parameter under the Community Detection section
of the Statistics Tab. Modularity is a measure of the strength of the network graph's
division into clusters. High modularity means there are dense connections between nodes
Judd D. Bradbury Page 18
Data Visualization
in the same cluster and few connections to nodes in different clusters. Gephi uses the
Louvain method for modularity. This method is used uncover communities in large
networks (millions of nodes) quickly.
Once you click on run, you will get the modularity report.
Question 10: How many communities are created in your dataset?
Results:
Modularity: 0.387
Modularity with resolution: 0.387
Number of Communities: 8
Go back to the Appearance section and under the Nodes tab select Partition. Make
sure you have selected the Color icon. Select Modularity Class from the dropdown this
time. Click on Apply.
Judd D. Bradbury Page 19
Data Visualization
YIFAN HU: Choose algorithm Yifan Hu under layout section on the left panel. Change
the Optimal Distance to 1000.0 and Relative Strength to 1.0. And then click on Run.
Question 11: Paste a screenshot of the network visualization in the overview tab.
FORCEATLAS2: Choose algorithm ForceAtlas 2 under layout section on the left panel.
Scroll down to the Behavior Alternatives and check Linlog mode and Prevent Overlap.
And then click on Run. Let it run for a 1 minute and then once the visualization is
expanded and clearly visible, stop the algorithm by clicking on the stop button.
Question 12: Paste a screenshot of the network visualization in the overview tab.
Judd D. Bradbury Page 20
Data Visualization
Question 13: Name the account (label) that is part of a modularity class with the
lowest number of members.
Home & Family
Step 5.3: Save Gephi File
Go to context menu File -> Save and then save the file as
StudentName_Website.Gephi
Also export the file as a Graph file ( *.GEXF file), in order to view it in the web browser.
Follow the below context menu path.
Judd D. Bradbury Page 21
Data Visualization
A popup window appears, maintain the filename as StudentName_Website and select
file type as .GEXF file and then click Save.
Judd D. Bradbury Page 22
Data Visualization
Step 6: Display the Gephi file in the browser
Double click on the gexf_js_master folder created in the prerequisites section
under step 3 and then open the config.js files using text editors (Sublime,
Notepad++, Brackets etc.)
Copy and Place all the .gexf files created from above steps into gexf_js_master
folder.
Maintain your exported .gexf file as highlighted below
Note: You will have to do this step for each of the .gexf files
Right click on Index.html in JS folder and then select open with any web browser of
your choice. You should be able to view the Gephi file on the browser.
Note: Use Mozilla Firefox, if it doesn’t open in your regular browser
Judd D. Bradbury Page 23
Data Visualization
If you cannot view your visualization in a Firefox browser, you will need to go into the
Terminal Program for Apple, or the Command Prompt for Windows.
If it still does not work, then you might have some security issues in your system. To
overcome those, please download “Web Server for Chrome extension” :
https://chrome.google.com/webstore/detail/web-server-for-chrome/
ofhbbkphhbklhfoeikjpcbhemlocgigb?hl=en
After you install it, launch the app. You will see this window:
Judd D. Bradbury Page 24
Data Visualization
Go ahead and click on ‘Choose Folder’ and direct it to the directory where you have
stored your gexf master file. You can then click on the web server URL or go to
http://127.0.0.1:8887 to access your files locally.
Question 14: Display the exported .gexf file in the browser and paste the screen
shot.
Step 7: Attach assignments in eLearning
1. Attach the assignment document (only with the answers) in Microsoft Word
2. Attach the .gephi file
3. Attach the .gexf file
Judd D. Bradbury Page 25