Online Hide-and-Seek: Hiding Your Influence Online
Influence analysis is a sub-branch of data science. It plays a vital role in marketing as it allows us to find the more influential nodes in the graphs enabling marketing teams to discover those "influencers" who can have the best reach to the audience for marketing their product.
While this is one of the use cases of influence analysis, it is also used by security services and organizations to track harmful groups and individuals like terrorist group leaders online. However, these individuals and groups have also devised methods to hide themselves from being detected. Activists also use such methods as well to hide from their hostile governments.
Here we will discuss techniques and measures used for influence analysis and some methods used by different groups and individuals to hide their influence.
<b><h1><center>What is Influence Analysis?</b></h1></center>
Influence Analysis is a graph problem. On a social network site, the users can be seen as nodes, and the people they follow and who follow them form the edges of the graph. In a naive sense, people with many followers (incoming edges) can be seen as more influential than others. But in reality, we need to consider many other things for influence analysis.
<b><h1><center>Influence Measuring Algorithms and Metrics</b></h1></center>
The clustering coefficient of a node measures how complete its neighborhood is. It is given by the proportion of the links between vertices within the area of a node divided by the number of links that could exist between them. The clustering coefficient of the graph is the average clustering coefficient of all the nodes in the network. The clustering coefficient of the network can help you <b>identify the set of nodes belonging to the same group.</b>
Degree centrality is basically the naive method of measuring the influence. It is calculated by dividing the in-degree of a node by the number of nodes in the graph, where the in-degree is the number of incoming edges. This gives the popularity of a node. Generally, in a group, <b>people belonging to the group will follow the leader of the group, giving them a relatively higher Degree Centrality measure.</b>.
The betweenness centrality of a node 'v' is calculated as the shortest paths between all other node pairs in the network that pass through 'v' divided by all the shortest paths between the corresponding node pairs. It depicts which nodes are more likely to act as a bridge in communication paths between other nodes in the network and is also helpful in determining network brokerage points. <b>Removing the node with high betweenness centrality will impact the network as propagating information or a message will require more time</b>.
The closeness centrality of a node is calculated as the reciprocal of the sum of the length of the shortest path between the node and all other nodes in the network. The higher the value higher is the closeness. It is the measure of reach, i.e., the speed with which information can get transferred to other nodes from the given starting node.
Google introduced the <a href="https://www.s-tronomic.in/post/73">PageRank algorithm</a> for ranking web pages. The web pages are ranked based on the in-degree and out-degree of a node (web page). Here, if an edge is coming into the node, that would mean the website with the link to the webpage is giving it some credibility. And if a web page has an outgoing edge, it distributes its credibility equally to all the websites it links to.
In the case of groups and their leaders, the leader might have a higher number of incoming edges, <b>giving the leader node a higher PageRank</b>.
The HITS algorithm (Hyperlink-Induced Topic Search) is a link analysis algorithm that rates web pages. It is used to discover and rank the web pages relevant to a particular search. The algorithm uses hubs and authorities to define a recursive relationship between web pages. Given a query to a search engine, the set of highly relevant web pages is called roots. They are potential authorities. Pages that are irrelevant but point to pages in the root are called hubs.
Here is the comparison of different scores for a given graph:
<img src="https://i.ibb.co/WfxxTS4/Screenshot-2023-05-12-155439.png" alt="score comparison" border="0">
Thus, an authority is a page that many hubs link to, whereas a hub is a page that links to many authorities. Hence in the case of a group of people, <b>authority nodes are generally the leader nodes</b>, and hubs are the followers.
<b><h1><center>How To Fool These Metrics and Algorithms</b></h1></center>
Now we will look at how people fool or misguide these metrics and algorithms so that it might be difficult for others to find the members of a group or the leader. One naive method would be to attach all the nodes in the group. The problem with this method is that the clustering coefficient of the group will increase (=1), which would give away all the group members. But the centrality measures of the nodes in the group will also become equal (=1), which would ensure it's impossible to find the group leader.
ROAM stands for Remove, Add Many. As the name suggests, in this technique, the influential node removes one of the incoming nodes (one of the followers), and for every removed follower, add 4-5 outgoing edges (follow others in the group). This will <b>reduce the centrality values as well as the PageRank and Hub values of the node</b>.
Though this method fails to hide the community as the clustering coefficients might increase in this process. In the <b>extended ROAM technique</b>, the node removes one edge from its own community, but the new edges are added to the nodes outside the community. This helps in <b>reducing the clustering coefficient inside the community and works to hide the group's leader</b>.
This image below from a <a href="https://arxiv.org/abs/1608.00375">research paper</a> shows how just two iterations of ROAM technique reduced the influence ranking of the terrorist Mohammed Atta (who was one of the masterminds of 9/11) reduced from 1 to some lower rank.
<img src="https://i.ibb.co/KcPpVvF/1.png" alt="1" border="0">
DICE stands for Disconnect Internally, Connect Externally. It is very similar to the extended ROAM technique. The ROAM technique is a particular case of the DICE technique. In the DICE method, we start with budget 'b'. We remove d (< b) incoming edges from a node and add 'b-d' outgoing nodes outside the actual community.
This method shares the advantages of ROAM technique, but it also helps by <b>keeping control of the clustering coefficient</b>. In the extended ROAM method, even though we add edges to the nodes outside of the community/group, it still increases the clustering coefficient of the extended group. However, this can be controlled with the DICE method.
- Ojas Srivastava, 05:23 PM, 12 May, 2023