class: center, middle, inverse, title-slide #
Change Detection in Social Networks
##
Social Networks Analysis and Statistical Process Control
###
Ian McCulloh, Matthew Webb, John Graham
U.S. Military Academy
Kathleen Carley
Carnegie Mellon University
Daniel B. Horn
U.S. Army Research Institute ### 05 / 17 / 2018 --- class: center, middle ![](http://www.bienestar.unal.edu.co/wp-content/uploads/2017/04/unal-700x300.png) ## **Submitted by:** ### Hernán David Torres Cardona ## **Professor:** ### Rubén Darío Guevara --- class: inverse, center, middle # Introduction --- # Social Networks Analysis ## Research Requirement ### **Social networks analysis** may signal **change** within an organization and **predict** significant events or behaviors. .footnote[ [1] McCulloh, I., Webb, M., Graham, J., Carley, K., & Horn, D. B. (2008). Change detection in social networks. Military Academy Dept Mathematical sciences. [PDF](http://www.dtic.mil/get-tr-doc/pdf?AD=ADA484611) ] -- ### Detect these changes enables -- - ### the **anticipation** and early **warning** of change and -- - ### **faster response** to change --- background-image: url(http://www.casos.cs.cmu.edu/projects/nukes/nukes.png) background-size: 650px 590px class: center, bottom, inverse # A Twitter retweet! --- # Organizations are not static ## The challenge - ### Their structure, composition, and patterns of communication may change **over time**. -- - ### A certain degree of change is expected in the **normal course** of an unchanging organization. -- - ### Develope metrics to detect **signals of meaningful change** in a background of normal variability. --- # Social Network Change Detection ## Overview ### Previous methods may be effective at quantifying a difference in static networks, **but** they lack an underlying **statistical distribution**. ### **Examples** - ### Hamming distance: Error detector and corrector codes - ### Euclidean distance: Weighted networks - ### Exponential Random Graph Models: Metrics and statistical models describe structural changes --- # Social Network Change Detection ## Now and in what way? -- ### Improving significantly on previous attempts to detect organizational change **over time**. -- ### Introducing a statistically sound **probability space** and uniformly more powerful **detection methods¨** -- ### Techniques from _*SNA*_, combined with those from _*SPC*_. .footnote[ SNA: Social Networks Analysis SPC: Statistical Process Control ] --- class: inverse, middle, center # Social Network Analysis --- # Graphs -- ### SNA provides the basis for how social networks are modeled, measured, compared and visualized. <center><iframe src="https://giphy.com/embed/b8032T9vBcIak" width="330" height="230" frameBorder="0" class="giphy-embed" allowFullScreen></iframe></center> -- + ### An **observed social network** can be modeled on a graph with **nodes** and links between them as **edges**. [1] .footnote[[1] (Scott, 2002; Wasserman & Faust, 1994) ] -- --- # Network Measures -- ### Network measures can be calculated -- - ### **from the entire graph (_extrinsic attributes_) or ** -- - ### for each individual node (_intrinsic attributes_). --- # Network Measures ## Extrinsic Attributes ### **Density** ### How many links exist in the graph divided by the total number of possible links. $$ d = \frac{\text{# edges}}{n(n-1)}$$ --- background-image: url(https://community.toadworld.com/cfs-file/__key/communityserver-blogs-components-weblogfiles/00-00-00-00-70/Comunidades.JPG) background-size: 650px 600px class: center, bottom, inverse # Density --- # Centrality Network Measures ## Extrinsic Attributes ###**Closeness centrality** ### How a node is connected beyond its immediate neighbors. `$$c_k = \frac{\min_k\{\sum_{i=1}^n g_{ki}\}}{\sum_{i=1}^n g_{ki}}$$` + `\(g_{ik}\)`: the number of geodesic paths between nodes i and j. --- background-image: url(untitled.png) background-size: 650px 600px class: center, bottom, inverse # Closeness centrality --- # Centrality Network Measures ## Extrinsic Attributes ### **Betweenness centrality** ### How often a node lies along the shortest path, or geodesic, between two other nodes for all nodes in a graph. `$$b_k = \sum _{i,j} \frac{g_{ikj}}{g_{ij}}$$` + `\(g_{ikj}\)`: the number of geodesic paths between nodes i and j crossing node k. + `\(g_{ij}\)`: total number of geodesic paths between nodes i and j. --- background-image: url(untitled2.png) background-size: 650px 600px class: center, bottom, inverse # Betweenness centrality --- # What is a Meta-Network? + ### **Meta-Network** – A representation of a Group of Networks. -- + ### **Node** – A representation of a real-world item (a who, what, where, how, why item). -- + ### **Node Class** – A set of nodes of one type. -- + ### **Link** – A representation of a tie, edge, connection, or relation link between any two nodes. -- + ### **Network** – A representation of a set of nodes of one type and the links of one type between them. -- + ### **Attribute** – Additional information about a node. --- class: inverse, middle, center # Statistical Process Control --- # Control charts ### SPC is a technique used mostly to monitor industrial processes. -- ### These detect changes in the **mean** of the process by taking periodic samples of the product and tracking the results against a **control limit**. -- ### Control charts are usually optimized for their processes to increase their sensitivity for detecting changes, while minimizing the number of **false alarms**. --- # CUSUM* Control Chart ### The **decision rule** runs off the cumulative statistic `$$C_t = \sum_{j=1}^t (Z_i - k)$$` .footnote[[*] Cumulative Sum] -- #### where `\(Z_i\)` is the standardized normal of each observation and the common choice for `\(k\)` is 0.5 -- ### This chart is well suited: -- + ### To detect small changes in the mean of a process over time. -- + ### To found its built-in change point detection --- class: inverse, middle, center # Methods and Result --- # Graph mesures ### The average graph measures for **density**, **closeness**, and **betweenness centrality** are calculated for several consecutive time periods of the social network. -- + ### The **“in-control” mean and variance** for the measures of the network are calculated by taking a sample average and sample variance of the **stabilized measures**. -- + ### The subsequent, successive social network measures are then used to calculate the CUSUM’s statistics. -- + ### Upon receiving a signal, the **change point** is calculated by tracing the signaling statistic back to the last time period it was zero. --- # Network mesures + ### Network measures of interest should follow or approximate a **normal distribution** due to the central limit theorem. -- + ### Each of the network measures was fit with five **continuous distributions**: normal, uniform, gamma, exponential, and chi-squared. -- + ### **Gamma Distribution** is the best fit for betweenness and density. This invalidated further usage of the **CUSUM Control Chart**. --- # Tactical Officer Eduaction Program #### The TOEP officers allow data about their personal and professional e-mail communication to be tracked over a 24-week period. -- #### Subjects with incomplete communication and not identically distributed data collected were eliminated from further examination. <p align="center"> <img width="400" height="400" src="NetworkToep.png"> </p> Graph made with the software ORA --- # Normally distributed but too much variance ### The planning calendar and participant interviews allowed investigators significant events that occurred each week: + #### Academic Requirements + #### The Next Week’s Academic Requirements + #### Administrative Events + #### Group Projects + Social Gatherings + #### Days Off. --- # Normally distributed but too much variance ### **Analysis of variance (ANOVA)** `$$\text{Closeness} = 0.18 − 0.11(\text{Group Projects} ) + 0.11(\text{Social Gatherings}) + 0.0074(\text{Number of Emails})$$` <p align="center"> <img width="470" height="200" src="Captura2.PNG"> </p> --- ## Closeness CUSUM #### When is the model no longer providing a good prediction? -- #### The `\(C_+\)` and `\(C_-\)` statistics were calculated for each week using a `\(k\)` value of 0.5 and a control limit of 3. <p align="center"> <img width="600" height="300" src="Captura3.PNG"> </p> --- ## Summary <p align="center"> <img width="560" height="300" src="Captura4.PNG"> </p> -- #### **Notices:** + #### An increase in group project work was **correlated** with a decrease in communication. + #### The **residuals** were verified as normally distributed to meet the prerequisites of the CUSUM Control Chart. --- # Al-Qaeda Communications Network #### The data are limited in that we do not know the type, frequency, or substance of the communication and all links are non-directional. <p align="center"> <img width="500" height="450" src="Captura5.PNG"> </p> --- ## Averages for mesures <p align="center"> <img width="550" height="300" src="Captura6.PNG"> </p> -- #### **Notices** + #### There might be a significant change in the al-Qaeda network between the years 2000 and 2001. + #### We would be alerted to a critical change in the network prior to the September 11 terrorist attacks. --- class: inverse, middle, center # Conclusions --- # Discussion and further work + ### Social network monitoring can to detect important changes in the monitored communication of both command and control networks as well as **terrorist networks**. + ### Several difficulties were encountered when working with the datasets, for example, the **completeness** of the dataset. + ### **Future research** should focus on near-complete datasets with high resolution. + ### Networks with a set of good predictors to explain varying behavior may be useful in producing models that can be **control charted**. --- class: center, middle, inverse # Thanks! Slides created via the R package **xaringan**.