Publications
In reverse chronological order, and asterisks denote equal contribution.
Preprints
- [P1]Flexible inference in heterogeneous and attributed multilayer networksMartina Contisciani*, Marius Hobbhahn*, Eleanor A Power, Philipp Hennig, and Caterina De Bacco2024
Networked datasets are often enriched by different types of information about individual nodes or edges. However, most existing methods for analyzing such datasets struggle to handle the complexity of heterogeneous data, often requiring substantial model-specific analysis. In this paper, we develop a probabilistic generative model to perform inference in multilayer networks with arbitrary types of information. Our approach employs a Bayesian framework combined with the Laplace matching technique to ease interpretation of inferred parameters. Furthermore, the algorithmic implementation relies on automatic differentiation, avoiding the need for explicit derivations. This makes our model scalable and flexible to adapt to any combination of input data. We demonstrate the effectiveness of our method in detecting overlapping community structures and performing various prediction tasks on heterogeneous multilayer data, where nodes and edges have different types of attributes. Additionally, we showcase its ability to unveil a variety of patterns in a social support network among villagers in rural India by effectively utilizing all input information in a meaningful way.
Journal articles
- [10]Anomaly, reciprocity, and community detection in networksHadiseh Safdari, Martina Contisciani, and Caterina De BaccoPhysical Review Research, 2023
Anomaly detection algorithms are a valuable tool in network science for identifying unusual patterns in a network. These algorithms have numerous practical applications, including detecting fraud, identifying network security threats, and uncovering significant interactions within a data set. In this project, we propose a probabilistic generative approach that incorporates community membership and reciprocity as key factors driving regular behavior in a network, which can be used to identify potential anomalies that deviate from expected patterns. We model pairs of edges in a network with exact two-edge joint distributions. As a result, our approach captures the exact relationship between pairs of edges and provides a more comprehensive view of social networks. Additionally, our study highlights the role of reciprocity in network analysis and can inform the design of future models and algorithms. We also develop an efficient algorithmic implementation that takes advantage of the sparsity of the network.
- [9]Community detection in large hypergraphsNicolò Ruggeri, Martina Contisciani, Federico Battiston, and Caterina De BaccoScience Advances, 2023
Hypergraphs, describing networks where interactions take place among any number of units, are a natural tool to model many real-world social and biological systems. Here, we propose a principled framework to model the organization of higher-order data. Our approach recovers community structure with accuracy exceeding that of currently available state-of-the-art algorithms, as tested in synthetic benchmarks with both hard and overlapping ground-truth partitions. Our model is flexible and allows capturing both assortative and disassortative community structures. Moreover, our method scales orders of magnitude faster than competing algorithms, making it suitable for the analysis of very large hypergraphs, containing millions of nodes and interactions among thousands of nodes. Our work constitutes a practical and general tool for hypergraph analysis, broadening our understanding of the organization of real-world higher-order systems. A principled and fast model accurately detects mixed-membership communities in hypergraphs.
- [8]Hypergraphx: a library for higher-order network analysisQuintino Francesco Lotito, Martina Contisciani, Caterina De Bacco, Leonardo Di Gaetano, Luca Gallo, Alberto Montresor, Federico Musciotto, Nicolò Ruggeri, and Federico BattistonJournal of Complex Networks, 2023
From social to biological systems, many real-world systems are characterized by higher-order, non-dyadic interactions. Such systems are conveniently described by hypergraphs, where hyperedges encode interactions among an arbitrary number of units. Here, we present an open-source python library, hypergraphx (HGX), providing a comprehensive collection of algorithms and functions for the analysis of higher-order networks. These include different ways to convert data across distinct higher-order representations, a large variety of measures of higher-order organization at the local and the mesoscale, statistical filters to sparsify higher-order data, a wide array of static and dynamic generative models, and an implementation of different dynamical processes with higher-order interactions. Our computational framework is general, and allows to analyse hypergraphs with weighted, directed, signed, temporal and multiplex group interactions. We provide visual insights on higher-order data through a variety of different visualization tools. We accompany our code with an extended higher-order data repository and demonstrate the ability of HGX to analyse real-world systems through a systematic analysis of a social network with higher-order interactions. The library is conceived as an evolving, community-based effort, which will further extend its functionalities over the years. Our software is available at https://github.com/HGX-Team/hypergraphx.
- [7]Latent network models to account for noisy, multiply reported social network dataCaterina De Bacco, Martina Contisciani, Jonathan Cardoso-Silva, Hadiseh Safdari, Gabriela Lima Borges, Diego Baptista, Tracy Sweet, Jean-Gabriel Young, Jeremy Koster, Cody T Ross, Richard McElreath, Daniel Redhead, and Eleanor A PowerJournal of the Royal Statistical Society Series A: Statistics in Society, 2023
Social network data are often constructed by incorporating reports from multiple individuals. However, it is not obvious how to reconcile discordant responses from individuals. There may be particular risks with multiply reported data if people’s responses reflect normative expectations—such as an expectation of balanced, reciprocal relationships. Here, we propose a probabilistic model that incorporates ties reported by multiple individuals to estimate the unobserved network structure. In addition to estimating a parameter for each reporter that is related to their tendency of over- or under-reporting relationships, the model explicitly incorporates a term for ‘mutuality’, the tendency to report ties in both directions involving the same alter. Our model’s algorithmic implementation is based on variational inference, which makes it efficient and scalable to large systems. We apply our model to data from a Nicaraguan community collected with a roster-based design and 75 Indian villages collected with a name-generator design. We observe strong evidence of ‘mutuality’ in both datasets, and find that this value varies by relationship type. Consequently, our model estimates networks with reciprocity values that are substantially different than those resulting from standard deterministic aggregation approaches, demonstrating the need to consider such issues when gathering, constructing, and analysing survey-based network data.
- [6]Inference of hyperedges and overlapping communities in hypergraphsMartina Contisciani, Federico Battiston, and Caterina De BaccoNature Communications, 2022
Hypergraphs, encoding structured interactions among any number of system units, have recently proven a successful tool to describe many real-world biological and social networks. Here we propose a framework based on statistical inference to characterize the structural organization of hypergraphs. The method allows to infer missing hyperedges of any size in a principled way, and to jointly detect overlapping communities in presence of higher-order interactions. Furthermore, our model has an efficient numerical implementation, and it runs faster than dyadic algorithms on pairwise records projected from higher-order data. We apply our method to a variety of real-world systems, showing strong performance in hyperedge prediction tasks, detecting communities well aligned with the information carried by interactions, and robustness against addition of noisy hyperedges. Our approach illustrates the fundamental advantages of a hypergraph probabilistic model when modeling relational systems with higher-order interactions.
- [5]Community detection and reciprocity in networks by jointly modelling pairs of edgesMartina Contisciani, Hadiseh Safdari, and Caterina De BaccoJournal of Complex Networks, 2022
To unravel the driving patterns of networks, the most popular models rely on community detection algorithms. However, these approaches are generally unable to reproduce the structural features of the network. Therefore, attempts are always made to develop models that incorporate these network properties beside the community structure. In this article, we present a probabilistic generative model and an efficient algorithm to both perform community detection and capture reciprocity in networks. Our approach jointly models pairs of edges with exact two-edge joint distributions. In addition, it provides closed-form analytical expressions for both marginal and conditional distributions. We validate our model on synthetic data in recovering communities, edge prediction tasks and generating synthetic networks that replicate the reciprocity values observed in real networks. We also highlight these findings on two real datasets that are relevant for social scientists and behavioural ecologists. Our method overcomes the limitations of both standard algorithms and recent models that incorporate reciprocity through a pseudo-likelihood approximation. The inference of the model parameters is implemented by the efficient and scalable expectation–maximization algorithm, as it exploits the sparsity of the dataset. We provide an open-source implementation of the code online.
- [4]Multilayer patent citation networks: A comprehensive analytical framework for studying explicit technological relationshipsKyle Higham, Martina Contisciani, and Caterina De BaccoTechnological Forecasting and Social Change, 2022
The use of patent citation networks as research tools is becoming increasingly commonplace in the field of innovation studies. However, these networks rarely consider the contexts in which these citations are generated and are generally restricted to a single jurisdiction. Here, we propose and explore the use of a multilayer network framework that can naturally incorporate citation metadata and stretch across jurisdictions, allowing for a complete view of the global technological landscape that is accessible through patent data. Taking a conservative approach that links citation network layers through triadic patent families, we first observe that these layers contain complementary, rather than redundant, information about technological relationships. To probe the nature of this complementarity, we extract network communities from both the multilayer network and analogous single-layer networks, then directly compare their technological composition with established technological similarity networks. We find that while technologies are more splintered across communities in the multilayer case, the extracted communities match much more closely the established networks. We conclude that by capturing citation context, a multilayer representation of patent citation networks is, conceptually and empirically, better able to capture the significant nuance that exists in real technological relationships when compared to traditional, single-layer approaches. We suggest future avenues of research that take advantage of novel computational tools designed for use with multilayer networks.
- [3]Reciprocity, community detection, and link prediction in dynamic networksHadiseh Safdari, Martina Contisciani, and Caterina De BaccoJournal of Physics: Complexity, 2022
Many complex systems change their structure over time, in these cases dynamic networks can provide a richer representation of such phenomena. As a consequence, many inference methods have been generalized to the dynamic case with the aim to model dynamic interactions. Particular interest has been devoted to extend the stochastic block model and its variant, to capture community structure as the network changes in time. While these models assume that edge formation depends only on the community memberships, recent work for static networks show the importance to include additional parameters capturing structural properties, as reciprocity for instance. Remarkably, these models are capable of generating more realistic network representations than those that only consider community membership. To this aim, we present a probabilistic generative model with hidden variables that integrates reciprocity and communities as structural information of networks that evolve in time. The model assumes a fundamental order in observing reciprocal data, that is an edge is observed, conditional on its reciprocated edge in the past. We deploy a Markovian approach to construct the network’s transition matrix between time steps and parameters’ inference is performed with an expectation-maximization algorithm that leads to high computational efficiency because it exploits the sparsity of the dataset. We test the performance of the model on synthetic dynamical networks, as well as on real networks of citations and email datasets. We show that our model captures the reciprocity of real networks better than standard models with only community structure, while performing well at link prediction tasks.
- [2]Generative model for reciprocity and community detection in networksHadiseh Safdari*, Martina Contisciani*, and Caterina De BaccoPhysical Review Research, 2021
We present a probabilistic generative model and efficient algorithm to model reciprocity in directed networks. Unlike other methods that address this problem such as exponential random graphs, it assigns latent variables as community memberships to nodes and a reciprocity parameter to the whole network rather than fitting order statistics. It formalizes the assumption that a directed interaction is more likely to occur if an individual has already observed an interaction towards her. It provides a natural framework for relaxing the common assumption in network generative models of conditional independence between edges, and it can be used to perform inference tasks such as predicting the existence of an edge given the observation of an edge in the reverse direction. Inference is performed using an efficient expectation-maximization algorithm that exploits the sparsity of the network, leading to an efficient and scalable implementation. We illustrate these findings by analyzing synthetic and real data, including social networks, academic citations, and the Erasmus student exchange program. Our method outperforms others in both predicting edges and generating networks that reflect the reciprocity values observed in real data, while at the same time inferring an underlying community structure. We provide an open-source implementation of the code online.
- [1]Community detection with node attributes in multilayer networksMartina Contisciani, Eleanor A Power, and Caterina De BaccoScientific Reports, 2020
Community detection in networks is commonly performed using information about interactions between nodes. Recent advances have been made to incorporate multiple types of interactions, thus generalizing standard methods to multilayer networks. Often, though, one can access additional information regarding individual nodes, attributes, or covariates. A relevant question is thus how to properly incorporate this extra information in such frameworks. Here we develop a method that incorporates both the topology of interactions and node attributes to extract communities in multilayer networks. We propose a principled probabilistic method that does not assume any a priori correlation structure between attributes and communities but rather infers this from data. This leads to an efficient algorithmic implementation that exploits the sparsity of the dataset and can be used to perform several inference tasks; we provide an open-source implementation of the code online. We demonstrate our method on both synthetic and real-world data and compare performance with methods that do not use any attribute information. We find that including node information helps in predicting missing links or attributes. It also leads to more interpretable community structures and allows the quantification of the impact of the node attributes given in input.