# Social Network Analysis, Part 2

# Social Network Analysis, Part 2

##### Ziad Matni

**PART 2: How Social Network Analysis is Done**

**Types of Social Network Analysis**

There are many social network structure characteristics that are employed in SNA. We can analyze them according to their individual actors—or nodes—(we call that, **nodal analysis**), or according to their structures in totality (we call it, **whole-network analysis**).

**1. Nodal Analysis**

** Centrality** is a property of a

*node’s position in a network*. The centrality of a node is, loosely speaking, about the contribution the node makes to the structure of the network. It is a common way to find out the “most important” or “most influential” actors in a network. The flow of how

*social resources*, like information or social help or money, move in a network typically are very closely tied to the concept of centrality in network analysis.

There are several types of centrality measures, such as *degree centrality*, *eigenvector centrality,* *betweenness centrality*, and several others (we’ll only take a look at the first 3 mentioned here). Which one to use in a particular analysis depends a lot on the types of flow processes we want to study.

The simplest measure of centrality is ** degree centrality**, which is merely the number of ties of a given type that a node has. For example, if you examine the simple network shown in Figure 2, you can see that the degree centrality for the green node is 5 since there are just 5 ties that connect to that node. Likewise, the degree centrality for any of the blue nodes is 1.

Sometimes, networks are constructed with *non-directed* relationships, like the example from Figure 2, while others are made up of *directed* relationships. *Non-directed* relationships don’t have any specific “orientation” and they are fully reciprocal. In Figure 2, we see that the green node and the red node are connected to one another: the relationship is green-to-red *and* red-to-green. By contrast, a *directed relationship* is one where there is an *asymmetry* to the relationship between 2 nodes. An example of this is if “person A” likes “person B”, but “person B” does not (i.e. the relationship is not reciprocated). We would diagram this with a *one-way arrow* pointing from “A” to “B”. Nodes, in this case, can have an arrow *going towards them* (*incoming*) and/or another arrow *going away from them* (*outgoing*) as well. Figure 3 illustrates what a network showing directed links might look like.

When using the measure of degree centrality in networks that have *directed* links, it is useful to break it down to *in-degree* and *out-degree* centrality measures. These classify *incoming* versus *outgoing* links to and from a node, respectively. For example, the in-degree centrality of node “C” (in Fig. 3) is 2 and the out-degree of that same node is 1. Likewise, the in-degree of node “D” is 1 and the out-degree of “D” is zero.

A node that has a high-degree centrality value in a network is a “highly connected” one, which means it has the potential of being influential and/or disruptive in a network, although more information is usually needed (beyond just degree centrality) before those assessments can appropriately be made. Degree centrality has been criticized as being “too simple” because it does not consider any measures of the whole network beyond the adjacent nodes, but it is nevertheless easy to calculate and quite popularly used.

Another measure of node centrality is** eigenvector centrality** which attempts to answer that criticism and is a more sophisticated version of degree centrality, in that it is calculated as the number of nodes

*adjacent to*(i.e. linked to) a given node, but then each adjacent node, in turn, is

*weighed by its own centrality*.

A node that has a high eigenvector centrality value is not only “highly connected”: it is highly connected with *other nodes who are also “highly connected.”* Hence, it can be argued that high eigenvector centrality nodes are *highly influential* ones in a network.

Examples of using eigenvector centrality measures in research includes studies wanting to uncover influential authors of emerging messages on Twitter. Interestingly, the original algorithm used by the Google search engine (known as *PageRank*) is a version of eigenvector centrality, as this is how the search engine determines which web documents are more “relevant” than others to the search query.

The third measure of node centrality that I want to tell you about is ** betweenness centrality**. This is a measure of

*how often*a given node falls along the

*shortest path*between two other nodes. It is calculated for a particular node (called the

*focal node*) by looking at every pair of nodes,

*except*the focal node, and calculating the number of these shortest-paths from one of these nodes to the other, then it gauges the proportion of those that also pass through the focal node.

This elaborate algorithm means that a node with a high betweenness centrality value has a *large potential for controlling flows through the network* and can be interpreted as not just being influential, but also being in a position to *threaten* the network with flow *disruptions*, or act as a filter of resources, or make other nodes less efficient.

Betweenness centrality is an effective way to identify highly strategic people in a social network of business organizations.

These 3 types of node centrality measures are popularly used in SNA research and can tell us different things about the “importance” of a node because “importance” can have differing definitions. The network graph in Figure 4 shows you an example where 3 different nodes show have different types of “highest centrality” measures because these nodes all have different types of “importances” to the network.

**Whole-Network Analysis**

In addition to characterizing the nodes and edges of a network, one can characterize the *whole network* as well. The “cohesion” of network can be a useful general picture of an entire network and can be expressed in terms of ** network density**, which typically measures some total centralization value

*in proportion to*a total network term. In other words, it is a

*single number*that is calculated as a ratio: simply by dividing

*the number of all ties in the network*by

*the total number of possible ties*.

This last number (which ends up in the denominator) is also known as “Metcalfe’s number” and is calculated as **½. n.(n-1)**, where

**is the**

*n**number of nodes in the network*,

**only if**the network is made up of non-directed links. Otherwise (in a network with directed links), Metcalfe’s number is calculated as

**(i.e. without the**

*n.(n-1)***½**factor).

For example, if a network has 10 non-directed nodes (that’s ** n**) and a total of 40 links (that’s

**), then its network density**

*L***=**

*d***, therefore:**

*L / ( ½.n.(n-1) )*** d** = 40 / ( ½ (10) (9) ) = 40 / 45 = 0.89.

Network density, ** d**, is necessarily, by mathematical definition, always a number between 0 and 1. If

**= 0, then the network has no links (i.e. because**

*d***= 0). If**

*L***= 1, then the network has all the possible links it can physically have (i.e. because**

*d***= Metcalfe’s number).**

*L*Although the interpretation of ** d** is very much tied to the

*context*of the network itself, we can generally say that if

**is a number between 0 and 0.3, then the network has a**

*d**low density*, and if

**is a number between 0.7 and 1, then the network has a**

*d**high density*.

Network density is almost always used as a *comparative measure* between multiple networks, but if the relative sizes of the compared networks are too far apart, some researchers prefer to use the *average degree of the network* instead, which is merely the *mean* (i.e. arithmetic average) of all the nodes’ degree centrality.

Another whole-network measure is ** network diversity**, which is popularly used in studies of people in organizations (for example, corporations, volunteer groups, or social change movements). Having many weak ties in a social network yields several benefits, including creating access to diverse resources (such as new information). So, network diversity is, by definition, directly calculated using the number of

*weak ties*in a network. Indeed, studies have shown the positive relationship between network diversity measures and actor performance in an organization. Tie-building strategies help an organization to increase both its network size and network diversity.

*Abundant weak ties*often embed individuals and organizations in diverse networks allowing them to take advantage of opportunities or preempt threats. Network diversity allows an actor in a network to quickly reach out to other valuable actors and respond effectively, especially if new technology is used (for example, cell phones, social media, etc.)

Like network density, diversity is used as a *comparative measure* between multiple networks. Its most basic form of calculation is as a *ratio* of number of weak ties to the total number of ties in the network.

**Network Data Analysis Software**

So much of SNA is computationally heavy, especially if the networks we are studying are large (large networks are pretty commonly studied). Therefore, much of what is done for SNA is done on computers with specialized software.

Some basic SNA can be done using basic spreadsheet programs like Microsoft **Excel**. However, more sophisticated *network data analysis* software packages such as **UCINET** are also readily available (and free to use in their basic versions), as are *network data* *visualization* programs including **Netdraw** and **Krackplot**. For analyzing social media network data from sources such as Twitter, **Node XL** is a popular general-purpose open source network analysis application. More sophisticated programs that allow longitudinal network analysis (i.e. looking at several “snapshots” of a network over time), like **SIENA **from Oxford University, enable researchers to explore network development and effects over time.

In addition, as is the case with many computational science research, popular computer programming languages, such as **R** and **Python**, are used extensively for customized network data analysis and visualization tasks. These provide a very wide range of options for analysis and visualization of network data, but they require a strong effort to learn basic computer programming before they are used.

Cite this article in APA as: Matni, Z. (2021, August 6). *Social network analysis, part 2*. Information Matters. Vol.1, Issue 8. https://r7q.22f.myftpupload.com/2021/08/social-network-analysis-part-2/