Attributed multilayer networks | Martina Contisciani

Advancements in data collection techniques have led to the acquisition of more comprehensive data, particularly by gathering additional information that characterizes the units and their interactions within real-world systems. These enriched data are effectively represented by attributed multilayer networks, which are complex network representations that describe multiple types of interactions among the same set of nodes, while also incorporating node information such as attributes or covariates. To properly analyze such data, models must integrate various sources of information in a convenient and principled manner, leveraging both the network topology and the node metadata.

We developed two distinct probabilistic models, named MTCOV (Contisciani et al., 2020) and PIHAM (missing reference), designed to perform community detection and broader inference in attributed multilayer networks. Both models posit the existence of a hidden mixed-membership community structure that drives the generation of both interactions and node attributes, but they differ in their model specification and inference.

Specifically, MTCOV is designed to handle categorical attributes and nonnegative discrete weights, specifying the likelihood of the data through a linear combination of the likelihoods from the two sources of information. Additionally, inference is performed using an efficient EM algorithm. On the other hand, PIHAM flexibly adapts to any combination of input data and employs a Bayesian framework, along with Laplace approximations and automatic differentiation techniques for parameter inference.

In addition to applying these methods to already explored real-world data, such as social and biological networks, we employed this methodology for the first time in the analysis of patent citation networks (Higham et al., 2022). In this context, we not only illustrated the importance of using a multilayer framework for patent citation data analysis but also emphasized the role of a node covariate in driving the inference, alongside the structural information embedded within the network.

Main takeaways

Effectively integrating node attributes with topological information can significantly enhance network inference, by for instance boosting prediction performance.
Better outcomes are achieved when the node metadata are more informative and exhibit some degree of correlation with the information conveyed by the interactions.
When the node metadata offer valuable insights, our models identify communities that align with this information. This approach leads to more interpretable results, where attributes actively influence the inference process.

Community detection in networks is commonly performed using information about interactions between nodes. Recent advances have been made to incorporate multiple types of interactions, thus generalizing standard methods to multilayer networks. Often, though, one can access additional information regarding individual nodes, attributes, or covariates. A relevant question is thus how to properly incorporate this extra information in such frameworks. Here we develop a method that incorporates both the topology of interactions and node attributes to extract communities in multilayer networks. We propose a principled probabilistic method that does not assume any a priori correlation structure between attributes and communities but rather infers this from data. This leads to an efficient algorithmic implementation that exploits the sparsity of the dataset and can be used to perform several inference tasks; we provide an open-source implementation of the code online. We demonstrate our method on both synthetic and real-world data and compare performance with methods that do not use any attribute information. We find that including node information helps in predicting missing links or attributes. It also leads to more interpretable community structures and allows the quantification of the impact of the node attributes given in input.

The use of patent citation networks as research tools is becoming increasingly commonplace in the field of innovation studies. However, these networks rarely consider the contexts in which these citations are generated and are generally restricted to a single jurisdiction. Here, we propose and explore the use of a multilayer network framework that can naturally incorporate citation metadata and stretch across jurisdictions, allowing for a complete view of the global technological landscape that is accessible through patent data. Taking a conservative approach that links citation network layers through triadic patent families, we first observe that these layers contain complementary, rather than redundant, information about technological relationships. To probe the nature of this complementarity, we extract network communities from both the multilayer network and analogous single-layer networks, then directly compare their technological composition with established technological similarity networks. We find that while technologies are more splintered across communities in the multilayer case, the extracted communities match much more closely the established networks. We conclude that by capturing citation context, a multilayer representation of patent citation networks is, conceptually and empirically, better able to capture the significant nuance that exists in real technological relationships when compared to traditional, single-layer approaches. We suggest future avenues of research that take advantage of novel computational tools designed for use with multilayer networks.

Main takeaways

References