Perform inference on networks by incorporating reciprocity as mechanism for tie formation.
Directed networks represent real-world data where interactions have a specific direction. Traditional approaches for analyzing these networks often rely on community detection algorithms, which assume that interactions are solely determined by hidden partitions of nodes. However, many real networks exhibit other mechanisms that influence tie formation, such as reciprocity—the tendency of a pair of nodes to form mutual connections. To effectively account for reciprocity, standard generative models must go beyond the assumption of conditional independence and instead model the edges between node pairs jointly, rather than treating them as independent.
We developed several probabilistic models aimed at performing inference in directed networks by incorporating reciprocity. The foundational methods, CRep(Safdari* et al., 2021) and JointCRep(Contisciani et al., 2022), offer two distinct approaches to combine both reciprocity and community structure within unique probabilistic methods for network analysis. These methods relax the conditional independence assumption and explicitly model the pairwise dependencies between directed edges connecting node pairs, with differences in their generative processes.
Specifically, CRep is designed for analyzing directed networks with nonnegative discrete weights, utilizing Poisson distributions to model the conditional distributions and a pseudo-likelihood approximation to represent the network’s likelihood. In contrast, JointCRep uses a Bivariate Bernoulli distribution to model the joint distribution of edges between node pairs in a closed form, making it suitable for analyzing binary directed networks.
Furthermore, we extended these frameworks to address other scenarios and applications: DynCRep(Safdari et al., 2022) extends CRep to analyze dynamic networks, which are networks that change over time; CRAD(Safdari et al., 2023) builds on the formalism of JointCRep to develop a probabilistic generative approach for anomaly detection on network edges; and VIMuRe(De Bacco et al., 2023) is a method to estimate the unobserved network structure from multiply reported data, incorporating a reciprocity parameter based on the principles of CRep, reflecting the intuition that reporters tend to nominate the same individuals in both directions of a relationship.
Main takeaways
Explicitly modeling pairwise dependencies increases results robustness and boosts performance in prediction and network reconstruction tasks.
Our frameworks accurately capture reciprocity and other model parameters, while also estimating the relative contributions of community structure and reciprocity in determining individual edges.
Our methods function not only as tools for network inference but also as benchmark models, capable of generating synthetic data that align with the underlying assumptions of each algorithm.
References
[2]
Generative model for reciprocity and community detection in networks
Hadiseh Safdari*, Martina Contisciani*, and Caterina De Bacco
We present a probabilistic generative model and efficient algorithm to model reciprocity in directed networks. Unlike other methods that address this problem such as exponential random graphs, it assigns latent variables as community memberships to nodes and a reciprocity parameter to the whole network rather than fitting order statistics. It formalizes the assumption that a directed interaction is more likely to occur if an individual has already observed an interaction towards her. It provides a natural framework for relaxing the common assumption in network generative models of conditional independence between edges, and it can be used to perform inference tasks such as predicting the existence of an edge given the observation of an edge in the reverse direction. Inference is performed using an efficient expectation-maximization algorithm that exploits the sparsity of the network, leading to an efficient and scalable implementation. We illustrate these findings by analyzing synthetic and real data, including social networks, academic citations, and the Erasmus student exchange program. Our method outperforms others in both predicting edges and generating networks that reflect the reciprocity values observed in real data, while at the same time inferring an underlying community structure. We provide an open-source implementation of the code online.
[5]
Community detection and reciprocity in networks by jointly modelling pairs of edges
Martina Contisciani, Hadiseh Safdari, and Caterina De Bacco
To unravel the driving patterns of networks, the most popular models rely on community detection algorithms. However, these approaches are generally unable to reproduce the structural features of the network. Therefore, attempts are always made to develop models that incorporate these network properties beside the community structure. In this article, we present a probabilistic generative model and an efficient algorithm to both perform community detection and capture reciprocity in networks. Our approach jointly models pairs of edges with exact two-edge joint distributions. In addition, it provides closed-form analytical expressions for both marginal and conditional distributions. We validate our model on synthetic data in recovering communities, edge prediction tasks and generating synthetic networks that replicate the reciprocity values observed in real networks. We also highlight these findings on two real datasets that are relevant for social scientists and behavioural ecologists. Our method overcomes the limitations of both standard algorithms and recent models that incorporate reciprocity through a pseudo-likelihood approximation. The inference of the model parameters is implemented by the efficient and scalable expectation–maximization algorithm, as it exploits the sparsity of the dataset. We provide an open-source implementation of the code online.
[3]
Reciprocity, community detection, and link prediction in dynamic networks
Hadiseh Safdari, Martina Contisciani, and Caterina De Bacco
Many complex systems change their structure over time, in these cases dynamic networks can provide a richer representation of such phenomena. As a consequence, many inference methods have been generalized to the dynamic case with the aim to model dynamic interactions. Particular interest has been devoted to extend the stochastic block model and its variant, to capture community structure as the network changes in time. While these models assume that edge formation depends only on the community memberships, recent work for static networks show the importance to include additional parameters capturing structural properties, as reciprocity for instance. Remarkably, these models are capable of generating more realistic network representations than those that only consider community membership. To this aim, we present a probabilistic generative model with hidden variables that integrates reciprocity and communities as structural information of networks that evolve in time. The model assumes a fundamental order in observing reciprocal data, that is an edge is observed, conditional on its reciprocated edge in the past. We deploy a Markovian approach to construct the network’s transition matrix between time steps and parameters’ inference is performed with an expectation-maximization algorithm that leads to high computational efficiency because it exploits the sparsity of the dataset. We test the performance of the model on synthetic dynamical networks, as well as on real networks of citations and email datasets. We show that our model captures the reciprocity of real networks better than standard models with only community structure, while performing well at link prediction tasks.
[10]
Anomaly, reciprocity, and community detection in networks
Hadiseh Safdari, Martina Contisciani, and Caterina De Bacco
Anomaly detection algorithms are a valuable tool in network science for identifying unusual patterns in a network. These algorithms have numerous practical applications, including detecting fraud, identifying network security threats, and uncovering significant interactions within a data set. In this project, we propose a probabilistic generative approach that incorporates community membership and reciprocity as key factors driving regular behavior in a network, which can be used to identify potential anomalies that deviate from expected patterns. We model pairs of edges in a network with exact two-edge joint distributions. As a result, our approach captures the exact relationship between pairs of edges and provides a more comprehensive view of social networks. Additionally, our study highlights the role of reciprocity in network analysis and can inform the design of future models and algorithms. We also develop an efficient algorithmic implementation that takes advantage of the sparsity of the network.
[7]
Latent network models to account for noisy, multiply reported social network data
Caterina De Bacco, Martina Contisciani, Jonathan Cardoso-Silva, Hadiseh Safdari, Gabriela Lima Borges, Diego Baptista, Tracy Sweet, Jean-Gabriel Young, Jeremy Koster, Cody T Ross, Richard McElreath, Daniel Redhead, and Eleanor A Power
Journal of the Royal Statistical Society Series A: Statistics in Society, 2023
Social network data are often constructed by incorporating reports from multiple individuals. However, it is not obvious how to reconcile discordant responses from individuals. There may be particular risks with multiply reported data if people’s responses reflect normative expectations—such as an expectation of balanced, reciprocal relationships. Here, we propose a probabilistic model that incorporates ties reported by multiple individuals to estimate the unobserved network structure. In addition to estimating a parameter for each reporter that is related to their tendency of over- or under-reporting relationships, the model explicitly incorporates a term for ‘mutuality’, the tendency to report ties in both directions involving the same alter. Our model’s algorithmic implementation is based on variational inference, which makes it efficient and scalable to large systems. We apply our model to data from a Nicaraguan community collected with a roster-based design and 75 Indian villages collected with a name-generator design. We observe strong evidence of ‘mutuality’ in both datasets, and find that this value varies by relationship type. Consequently, our model estimates networks with reciprocity values that are substantially different than those resulting from standard deterministic aggregation approaches, demonstrating the need to consider such issues when gathering, constructing, and analysing survey-based network data.