Social Network Analysis, Part 2

Social Network Analysis (Part 2 of 2)

Ziad Matni
PART 2: How Social Network Analysis is Done

Types of Social Network Analysis

There are many social network structure characteristics that are employed in SNA. We can analyze them according to their individual actors—or nodes—(we call that, nodal analysis), or according to their structures in totality (we call it, whole-network analysis).

1. Nodal Analysis

Centrality is a property of a node’s position in a network. The centrality of a node is, loosely speaking, about the contribution the node makes to the structure of the network. It is a common way to find out the “most important” or “most influential” actors in a network. The flow of how social resources, like information or social help or money, move in a network typically are very closely tied to the concept of centrality in network analysis.

There are several types of centrality measures, such as degree centrality, eigenvector centrality, betweenness centrality, and several others (we’ll only take a look at the first 3 mentioned here). Which one to use in a particular analysis depends a lot on the types of flow processes we want to study.

The simplest measure of centrality is degree centrality, which is merely the number of ties of a given type that a node has. For example, if you examine the simple network shown in Figure 2, you can see that the degree centrality for the green node is 5 since there are just 5 ties that connect to that node. Likewise, the degree centrality for any of the blue nodes is 1.

 

Figure 2. Example network

Sometimes, networks are constructed with non-directed relationships, like the example from Figure 2, while others are made up of directed relationships. Non-directed relationships don’t have any specific “orientation” and they are fully reciprocal. In Figure 2, we see that the green node and the red node are connected to one another: the relationship is green-to-red and red-to-green. By contrast, a directed relationship is one where there is an asymmetry to the relationship between 2 nodes. An example of this is if “person A” likes “person B”, but “person B” does not (i.e. the relationship is not reciprocated). We would diagram this with a one-way arrow pointing from “A” to “B”. Nodes, in this case, can have an arrow going towards them (incoming) and/or another arrow going away from them (outgoing) as well. Figure 3 illustrates what a network showing directed links might look like.

 

Figure 3. Network showing directed links

When using the measure of degree centrality in networks that have directed links, it is useful to break it down to in-degree and out-degree centrality measures. These classify incoming versus outgoing links to and from a node, respectively. For example, the in-degree centrality of node “C” (in Fig. 3) is 2 and the out-degree of that same node is 1. Likewise, the in-degree of node “D” is 1 and the out-degree of “D” is zero.

A node that has a high-degree centrality value in a network is a “highly connected” one, which means it has the potential of being influential and/or disruptive in a network, although more information is usually needed (beyond just degree centrality) before those assessments can appropriately be made. Degree centrality has been criticized as being “too simple” because it does not consider any measures of the whole network beyond the adjacent nodes, but it is nevertheless easy to calculate and quite popularly used.

Another measure of node centrality is eigenvector centrality which attempts to answer that criticism and is a more sophisticated version of degree centrality, in that it is calculated as the number of nodes adjacent to (i.e. linked to) a given node, but then each adjacent node, in turn, is weighed by its own centrality.

A node that has a high eigenvector centrality value is not only “highly connected”: it is highly connected with other nodes who are also “highly connected.” Hence, it can be argued that high eigenvector centrality nodes are highly influential ones in a network.

Examples of using eigenvector centrality measures in research includes studies wanting to uncover influential authors of emerging messages on Twitter. Interestingly, the original algorithm used by the Google search engine (known as PageRank) is a version of eigenvector centrality, as this is how the search engine determines which web documents are more “relevant” than others to the search query.

The third measure of node centrality that I want to tell you about is betweenness centrality. This is a measure of how often a given node falls along the shortest path between two other nodes. It is calculated for a particular node (called the focal node) by looking at every pair of nodes, except the focal node, and calculating the number of these shortest-paths from one of these nodes to the other, then it gauges the proportion of those that also pass through the focal node.

This elaborate algorithm means that a node with a high betweenness centrality value has a large potential for controlling flows through the network and can be interpreted as not just being influential, but also being in a position to threaten the network with flow disruptions, or act as a filter of resources, or make other nodes less efficient.

Betweenness centrality is an effective way to identify highly strategic people in a social network of business organizations.

These 3 types of node centrality measures are popularly used in SNA research and can tell us different things about the “importance” of a node because “importance” can have differing definitions. The network graph in Figure 4 shows you an example where 3 different nodes show have different types of “highest centrality” measures because these nodes all have different types of “importances” to the network.

 

Figure 4. Example of different node centralities in a network structure

Whole-Network Analysis

In addition to characterizing the nodes and edges of a network, one can characterize the whole network as well. The “cohesion” of network can be a useful general picture of an entire network and can be expressed in terms of network density, which typically measures some total centralization value in proportion to a total network term. In other words, it is a single number that is calculated as a ratio: simply by dividing the number of all ties in the network by the total number of possible ties.

This last number (which ends up in the denominator) is also known as “Metcalfe’s number” and is calculated as ½.n.(n-1), where n is the number of nodes in the network, only if the network is made up of non-directed links. Otherwise (in a network with directed links), Metcalfe’s number is calculated as n.(n-1) (i.e. without the ½ factor).

For example, if a network has 10 non-directed nodes (that’s n) and a total of 40 links (that’s L), then its network density d = L / ( ½.n.(n-1) ), therefore:

d = 40 / ( ½ (10) (9) )  =  40 / 45  =  0.89.

Network density, d, is necessarily, by mathematical definition, always a number between 0 and 1. If d = 0, then the network has no links (i.e. because L = 0). If d = 1, then the network has all the possible links it can physically have (i.e. because L = Metcalfe’s number).

Although the interpretation of d is very much tied to the context of the network itself, we can generally say that if d is a number between 0 and 0.3, then the network has a low density, and if d is a number between 0.7 and 1, then the network has a high density.

Network density is almost always used as a comparative measure between multiple networks, but if the relative sizes of the compared networks are too far apart, some researchers prefer to use the average degree of the network instead, which is merely the mean (i.e. arithmetic average) of all the nodes’ degree centrality.

Another whole-network measure is network diversity, which is popularly used in studies of people in organizations (for example, corporations, volunteer groups, or social change movements). Having many weak ties in a social network yields several benefits, including creating access to diverse resources (such as new information). So, network diversity is, by definition, directly calculated using the number of weak ties in a network. Indeed, studies have shown the positive relationship between network diversity measures and actor performance in an organization. Tie-building strategies help an organization to increase both its network size and network diversity. Abundant weak ties often embed individuals and organizations in diverse networks allowing them to take advantage of opportunities or preempt threats. Network diversity allows an actor in a network to quickly reach out to other valuable actors and respond effectively, especially if new technology is used (for example, cell phones, social media, etc.)

Like network density, diversity is used as a comparative measure between multiple networks. Its most basic form of calculation is as a ratio of number of weak ties to the total number of ties in the network.

Network Data Analysis Software

So much of SNA is computationally heavy, especially if the networks we are studying are large (large networks are pretty commonly studied). Therefore, much of what is done for SNA is done on computers with specialized software.

Some basic SNA can be done using basic spreadsheet programs like Microsoft Excel. However, more sophisticated network data analysis software packages such as UCINET are also readily available (and free to use in their basic versions), as are network data visualization programs including Netdraw and Krackplot. For analyzing social media network data from sources such as Twitter, Node XL is a popular general-purpose open source network analysis application. More sophisticated programs that allow longitudinal network analysis (i.e. looking at several “snapshots” of a network over time), like SIENA from Oxford University, enable researchers to explore network development and effects over time.

In addition, as is the case with many computational science research, popular computer programming languages, such as R and Python, are used extensively for customized network data analysis and visualization tasks. These provide a very wide range of options for analysis and visualization of network data, but they require a strong effort to learn basic computer programming before they are used.

 

Figure 5. Example of analysis and visualization done with Node XL
(source: https://www.smrfoundation.org)

 

Figure 6. Example of analysis and visualization done with R
(source: https://bharatendrarai.sites.umassd.edu/)

Author

  • Ziad Matni (he/him/his) is a member of the faculty at the University of California, Santa Barbara and researches and teaches courses in Computer Science, Communication, and Data Science. He obtained his BS/MS in Electrical Engineering (U. of Southern California) and Ph.D. in Information Science (Rutgers U.). You can reach him via email at ziad.matni (at) ucsb.edu.

Ziad Matni

Ziad Matni (he/him/his) is a member of the faculty at the University of California, Santa Barbara and researches and teaches courses in Computer Science, Communication, and Data Science. He obtained his BS/MS in Electrical Engineering (U. of Southern California) and Ph.D. in Information Science (Rutgers U.). You can reach him via email at ziad.matni (at) ucsb.edu.

Leave a Reply