Nodes and Edges: Visualizing Network Data 101

Google Knowledge Graph, the Facebook Graph, and the Social Graph — today’s digital marketers and brands love to throw these terms around like they are ancient, tested, and proven concepts that everybody grasps and understands. However, the truth is often very different.

Recently, one of our clients asked me to show him his brand’s “Social Graph.” Not being a social expert, I had to do a lot of research and internal inquiries to find some ways to actually “show” him. Turns out it is a little more complex than people often expect. It’s not a simple downloaded Excel sheet you can get that shows your “graph.” Instead, Social Graphs are basically a network of nodes and edges — of entities and the connections between them.

Nodes and edges? Confused? I was, too, which is why I want to share a simple way to analyze your social graphs or networks in order to better understand and visualize them. We will focus today around analyzing Social Network Graphs, but this approach can be used for any type of network data like link networks and website structures.

Data Concepts

Before we dive head-first into one of those “fascinating” screenshot-powered, step-by-step guides, I want to quickly address the data concepts behind graph visualizations. Most graphs are powered by a two-dimensional data system consisting of two core items: nodes and edges.

Nodes are the entities we are evaluating (People, Pages, Handles, Groups, etc.) and edges are the connections between them (Likes, Following, Friendships, etc.). Most of the network data today is handled via GraphML files or .gdf (graph data format) files. Basically, these are simple text files that contain a list of all the nodes and the relationships/edges between them.

Tools

Several tools can visualize network data and the most exhaustive list I know of can be found here. In today’s examples we will use Gephi to visualize our social data. Why Gephi? It’s free, open-source, cross-platform, and easy to use. It also has one of the most appealing visual outputs compared to some of the other tools out there.

Data Sources

There are thousands of ways to extract social network data like the native APIs, Custom Applications, Excel tools, and more. One of the easiest for data extraction AND visualization is NodeXL. It allows for fairly effortless extraction from multiple networks (YouTube, Twitter, etc.) straight out of Excel. It also allows you to visualize and customize them directly inside Excel.

For Facebook data (and for today’s example) I’ll use an actual Facebook app called NetVizz. NetVizz allows you to export your own “Friend” and “Like” networks as well as “Page Like” networks. Examining Page Like Networks is a great way to analyze audience affinities and learn more about target audiences by understanding common interest and connections.

Getting Started

OK, first we have to determine what we want to analyze. Let’s say we are thinking about having a booth, sponsoring, or speaking at the next ClickZ Live conference and we are trying to determine some of the common interests among attendees. What else do they care about, what do they read, or who are they connected with? Having these insights will help us to better understand the audience and determine how and where to communicate with them.

Getting the Data

The first thing we would need to do is to get the Like Network for ClickZ Live. In order to do that, we need to find out the numeric Facebook ID for the conference. The easiest place to get it would be http://lookup-id.com; it’s a free (ad-supported) site that allows you to enter a Facebook URL and in return get the ID. Once you have the ID extracted (in our case, 82891330657) we would go to the NetVizz Facebook page and select “Page Like Network.”

nodes-and-graphs-netvizz-image

Now simply enter the numeric Facebook page ID of your choosing and select a depth of 2. This will take some extra time, but it gives you a broader set that goes to second-level likes.

After a few minutes of crawling you will see a link that allows you to download a GDF network file.

nodes-and-graphs-netvizz-image-2

Note: If you are downloading data for pages with millions of likes, this can take a few hours. But since this is a server side crawl, you are able to have multiple crawls running simultaneously.

Importing the Data

Once you have downloaded your .gdf file, start up Gephi and import it via File->Open. On the Import report, just leave the default options and click “okay.” You will be presented with a somewhat odd-looking bunch of lines and dots.

nodes-and-graphs-gephi-image-1 

Enhancing the Data

One advantage of Gephi is the easy-to-use implementation of mathematical operations. For our data, we want to do two things.

  1. Click on “average path length” on the right-hand side. This will calculate the distance and betweeness centrality of our nodes (their centrality within our chosen network). It will allow us to understand their importance relative to the other nodes. Once the calculation completes, just click “close.”
  2. On the right-hand side, run Modularity. Modularity uses a community detection algorithm that allows us to group related nodes together (we will color code them). Click “close” once the calculation is completed.

After you run both of these, nothing will change visually, but we can now perform operations against these calculations.

Visualizing the Results

This is the fun part. Now that we have run our calculations, let’s start by sizing the nodes. On the top left side select the Nodes tab, then select the diamond icon (size) and choose “betweeness centrality.” The minimum and maximum sizing depends on the size of your set; for this small example I would recommend minimum 10 and maximum 50. Choose “apply” and you should see that the nodes have adjusted their sizes.

Next, choose the Partition tab in the top left corner. Then select “nodes” and hit the green arrows in order to refresh the options. You should see the Modularity class option. This is the data we got from our community detection algorithm. Once you select this and hit “apply,” the nodes will be colored based on the results of our community detection algorithm, according to their common attributes and relation to each other.

nodes-and-graphs-gephi-image-2

Now let’s give our results that awesome look. Underneath the Partitions and Ranking window on the left is a Layout option. This allows you to use different algorithms to lay out the nodes and edges. The best one for this type of data is Force Atlas. Simply select it, check “prevent overlay” and press “apply.” You should be left with a view similar to mine below, which clearly displays the major and minor nodes as well as the connections between them:

nodes-and-graphs-gephi-image-3

But what are they? Use the three little icons highlighted above in yellow to reveal your metrics: The first one will show the labels; in the second one use the dropdown and choose node size; then use the slider (third one) to find a fitting size.

At this point it should like this:

nodes-and-graphs-gephi-image-4

There are a ton of adjustments you can make to sizes, colors, etc. to graph your data and see what’s really happening in a brand’s social network but this is not bad for five minutes of work. Now you can start to zoom in, move and highlight nodes. As an example, when I hover over the ClickZ Live node, I can clearly see the biggest affinities:

nodes-and-graphs-gephi-image-5-large-text

Playing around with the data a bit reveals some interesting connections. During this exercise, for instance, I discovered some patterns from pages that indicate they either paid for their likes or made all their employees like their clients’ pages (but I won’t call them out publicly; can you find them?).

Another insight from my ClickZ Live example is that comScore is the biggest common denominator outside of ClickZ’s own properties.

nodes-and-graphs-gephi-image-6-comscore

There are countless deeper analysis models you can apply in Gephi and build on the existing data such as PageRank and Clustering.

The visualization below is what I eventually sent to my client to show him his brand’s social network. I generated it using Force Atlas, Page Rank, and Modularity and then added some transparency in the Preview Dialog.

nodes-and-graphs-final-gephi-image-black-bkgrd

I hope this inspires you to perform this type of visualization and get some great insights into your brand’s graph data. Questions? Feel free to message me at @nxfxcom.

Related reading

facebook-organic-reach
sw-twitter
nfl
hillary-clinton-text-message-signup
<