Perform inference on networks with metadata on both edges and nodes.
Advancements in data collection techniques have led to the acquisition of more comprehensive data, particularly by gathering additional information that characterizes the units and their interactions within real-world systems. These enriched data are effectively represented by attributed multilayer networks, which are complex network representations that describe multiple types of interactions among the same set of nodes, while also incorporating node information such as attributes or covariates. To properly analyze such data, models must integrate various sources of information in a convenient and principled manner, leveraging both the network topology and the node metadata.
We developed two distinct probabilistic models, named MTCOV(Contisciani et al., 2020) and PIHAM(Contisciani* et al., 2024), designed to perform community detection and broader inference in attributed multilayer networks. Both models posit the existence of a hidden mixed-membership community structure that drives the generation of both interactions and node attributes, but they differ in their model specification and inference.
Specifically, MTCOV is designed to handle categorical attributes and nonnegative discrete weights, specifying the likelihood of the data through a linear combination of the likelihoods from the two sources of information. Additionally, inference is performed using an efficient EM algorithm. On the other hand, PIHAM flexibly adapts to any combination of input data and employs a Bayesian framework, along with Laplace approximations and automatic differentiation techniques for parameter inference.
In addition to applying these methods to already explored real-world data, such as social and biological networks, we employed this methodology for the first time in the analysis of patent citation networks(Higham et al., 2022). In this context, we not only illustrated the importance of using a multilayer framework for patent citation data analysis but also emphasized the role of a node covariate in driving the inference, alongside the structural information embedded within the network.
Main takeaways
Effectively integrating node attributes with topological information can significantly enhance network inference, by for instance boosting prediction performance.
Better outcomes are achieved when the node metadata are more informative and exhibit some degree of correlation with the information conveyed by the interactions.
When the node metadata offer valuable insights, our models identify communities that align with this information. This approach leads to more interpretable results, where attributes actively influence the inference process.
References
[1]
Community detection with node attributes in multilayer networks
Martina Contisciani, Eleanor A Power, and Caterina De Bacco
Community detection in networks is commonly performed using information about interactions between nodes. Recent advances have been made to incorporate multiple types of interactions, thus generalizing standard methods to multilayer networks. Often, though, one can access additional information regarding individual nodes, attributes, or covariates. A relevant question is thus how to properly incorporate this extra information in such frameworks. Here we develop a method that incorporates both the topology of interactions and node attributes to extract communities in multilayer networks. We propose a principled probabilistic method that does not assume any a priori correlation structure between attributes and communities but rather infers this from data. This leads to an efficient algorithmic implementation that exploits the sparsity of the dataset and can be used to perform several inference tasks; we provide an open-source implementation of the code online. We demonstrate our method on both synthetic and real-world data and compare performance with methods that do not use any attribute information. We find that including node information helps in predicting missing links or attributes. It also leads to more interpretable community structures and allows the quantification of the impact of the node attributes given in input.
[P1]
Flexible inference in heterogeneous and attributed multilayer networks
Martina Contisciani*, Marius Hobbhahn*, Eleanor A Power, Philipp Hennig, and Caterina De Bacco
Networked datasets are often enriched by different types of information about individual nodes or edges. However, most existing methods for analyzing such datasets struggle to handle the complexity of heterogeneous data, often requiring substantial model-specific analysis. In this paper, we develop a probabilistic generative model to perform inference in multilayer networks with arbitrary types of information. Our approach employs a Bayesian framework combined with the Laplace matching technique to ease interpretation of inferred parameters. Furthermore, the algorithmic implementation relies on automatic differentiation, avoiding the need for explicit derivations. This makes our model scalable and flexible to adapt to any combination of input data. We demonstrate the effectiveness of our method in detecting overlapping community structures and performing various prediction tasks on heterogeneous multilayer data, where nodes and edges have different types of attributes. Additionally, we showcase its ability to unveil a variety of patterns in a social support network among villagers in rural India by effectively utilizing all input information in a meaningful way.
[4]
Multilayer patent citation networks: A comprehensive analytical framework for studying explicit technological relationships
Kyle Higham, Martina Contisciani, and Caterina De Bacco
The use of patent citation networks as research tools is becoming increasingly commonplace in the field of innovation studies. However, these networks rarely consider the contexts in which these citations are generated and are generally restricted to a single jurisdiction. Here, we propose and explore the use of a multilayer network framework that can naturally incorporate citation metadata and stretch across jurisdictions, allowing for a complete view of the global technological landscape that is accessible through patent data. Taking a conservative approach that links citation network layers through triadic patent families, we first observe that these layers contain complementary, rather than redundant, information about technological relationships. To probe the nature of this complementarity, we extract network communities from both the multilayer network and analogous single-layer networks, then directly compare their technological composition with established technological similarity networks. We find that while technologies are more splintered across communities in the multilayer case, the extracted communities match much more closely the established networks. We conclude that by capturing citation context, a multilayer representation of patent citation networks is, conceptually and empirically, better able to capture the significant nuance that exists in real technological relationships when compared to traditional, single-layer approaches. We suggest future avenues of research that take advantage of novel computational tools designed for use with multilayer networks.