Bryan Perozzi
Research Areas
Authored Publications
Google Publications
Other Publications
Sort By
Machine Learning on Graphs: A Model and Comprehensive Taxonomy
Ines Chami
Sami Abu-El-Haija
Chris Ré
Journal of Machine Learning Research, vol. 23 (2022), pp. 1-64
Preview abstract
There has been a surge of recent interest in graph representation learning (GRL). GRL methods have generally fallen into three main categories, based on the availability of labeled data. The first, network embedding, focuses on learning unsupervised representations of relational structure. The second, graph regularized neural networks, leverages graphs to augment neural network losses with a regularization objective for semi-supervised learning. The third, graph neural networks, aims to learn differentiable functions over discrete topologies with arbitrary structure. However, despite the popularity of these areas there has been surprisingly little work on unifying the three paradigms. Here, we aim to bridge the gap between network embedding, graph regularization and graph neural networks. We propose a comprehensive taxonomy of GRL methods, aiming to unify several disparate bodies of work. Specifically, we propose the GraphEDM framework, which generalizes popular algorithms for semi-supervised learning (e.g. GraphSage, GCN, GAT), and unsupervised learning (e.g. DeepWalk, node2vec) of graph representations into a single consistent approach. To illustrate the generality of GraphEDM, we fit over thirty existing methods into this framework. We believe that this unifying view both provides a solid foundation for understanding the intuition behind these methods, and enables future research in the area.
View details
Differentially Private Graph Learning via Sensitivity-Bounded Personalized PageRank
Advances in Neural Information Processing Systems (2022)
Preview abstract
Personalized PageRank (PPR) is a fundamental tool in unsupervised learning of graph representations such as node ranking, labeling, and graph embedding. However, while data privacy is one of the most important recent concerns, existing PPR algorithms are not designed to protect user privacy. PPR is highly sensitive to the input graph edges: the difference of only one edge may cause a big change in the PPR vector, potentially leaking private user data.
In this work, we propose an algorithm which outputs an approximate PPR and has provably bounded sensitivity to input edges. In addition, we prove that our algorithm achieves similar accuracy to non-private algorithms when the input graph has large degrees. Our sensitivity-bounded PPR directly implies private algorithms for several tools of graph learning, such as, differentially private (DP) PPR ranking, DP node classification, and DP node embedding. To complement our theoretical analysis, we also empirically verify the practical performances of our algorithms.
View details
Zero-shot Transfer Learning within a Heterogeneous Graph via Knowledge Transfer Networks
Minji Yoon
Ziniu Hu
Ruslan Salakhutdinov
Advances in Neural Information Processing Systems (2022)
Preview abstract
Data continuously emitted from industrial ecosystems such as social or e-commerce platforms are commonly represented as heterogeneous graphs (HG) composed of multiple node/edge types. State-of-the-art graph learning methods for HGs known as heterogeneous graph neural networks (HGNNs) are applied to learn deep context-informed node representations. However, many HG datasets from industrial applications suffer from label imbalance between node types. As there is no direct way to learn using labels rooted at different node types, HGNNs have been applied to only a few node types with abundant labels. We propose a zero-shot transfer learning module for HGNNs called a Knowledge Transfer Network (KTN) that transfers knowledge from label-abundant node types to zero-labeled node types through rich relational information given in the HG. KTN is derived from the theoretical relationship, which we introduce in this work, between distinct feature extractors for each node type given in an HGNN model. KTN improves the performance of 6 different types of HGNN models by up to 960% for inference on zero-labeled node types and outperforms state-of-the-art transfer learning baselines by up to 73% across 18 different transfer learning tasks on HGs.
View details
GraphWorld: Fake Graphs Bring Real Insights for GNNs
Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2022)
Preview abstract
The continuing maturity of the deep learning subfield of graph neural networks (GNNs) has motivated recent studies into the standard datasets used to benchmark GNNs. While important improvements have been made to GNN datasets and experimental design, any one dataset provides only a singular, potentially spurious insight into the performance of any GNN being tested. We show that state-of-the-art GNN task datasets do not cover the distribution of graphs in a much larger real-data graph repository, with respect to several key graph metrics. Motivated by this finding, we introduce GraphWorld, a novel distributed framework and software package for testing GNN models on an arbitrarily-large population of \emph{synthetic} task datasets.
GraphWorld allows a user to efficiently generate a \emph{world} of millions of graph datasets, with fine-grained control over graph generator parameters, and benchmark arbitrary GNN models, with built-in hyperparameter tuning. Using GraphWorld to generate diverse graph worlds corresponding to node classification, graph classification, and link prediction tasks, we provide insight into the sensitivity of 10,000+ GNN models to various parameters of graphs and node features and} show comparisons between models that have not been possible to make in any previous work. We also introduce a novel metric with which to explore each models' performance on the graph world, conditioning on graph metrics and graph generator parameters.
View details
Preview abstract
Graph Neural Networks (GNNs) have achieved state-of-the-art results on many graph analysis tasks such as node classification and link prediction. However, important unsupervised problems on graphs, such as graph clustering, have proved more resistant to advances in GNNs. In this paper, we study unsupervised training of GNN pooling in terms of their clustering capabilities.
We start by drawing a connection between graph clustering and graph pooling: intuitively, a good graph clustering is what one would expect from a GNN pooling layer. Counterintuitively, we show that this is not true for state-of-the-art pooling methods, such as MinCut pooling. To address these deficiencies, we introduce Deep Modularity Networks (DMoN), an unsupervised pooling method inspired by the modularity measure of clustering quality, and show how it tackles recovery of the challenging clustering structure of real-world graphs. In order to clarify the regimes where existing methods fail, we carefully design a set of experiments on synthetic data which show that DMoN is able to jointly leverage the signal from the graph structure and node attributes. Similarly, on real-world data, we show that DMoN produces high quality clusters which correlate strongly with ground truth labels, achieving state-of-the-art results.
View details
Examining COVID-19 Forecasting using Spatio-Temporal Graph Neural Networks
Amol Kapoor
Xue Ben
Martin Blais
Shawn O'Banion
MLG workshop @ KDD'2020, epiDAMIK workshop @ KDD'2020 (2020) (to appear)
Preview abstract
In this work, we examine a novel forecasting approach for COVID-19 case prediction that uses Graph Neural Networks and mobility data. In contrast to existing time series forecasting models, the proposed approach learns from a single large-scale spatio-temporal graph, where nodes represent the region-level human mobility, spatial edges represent the human mobility based inter-region connectivity, and temporal edges represent node features through time. We evaluate this approach on the US county level COVID-19 dataset, and demonstrate that the rich spatial and temporal information leveraged by the graph neural network allows the model to learn complex dynamics. We show a 6% reduction of RMSLE and an absolute Pearson Correlation improvement from 0.9978 to 0.998 compared to the best performing baseline models. This novel source of information combined with graph based deep learning approaches can be a powerful tool to understand the spread and evolution of COVID-19. We encourage others to further develop a novel modeling paradigm for infectious disease based on GNNs and high resolution mobility data.
View details
Scaling Graph Neural Networks with Approximate PageRank
Aleksandar Bojchevski
Johannes Klipera
Amol Kapoor
Martin Blais
Benedek András Rózemberczki
Stephan Günnnemann
KDD (2020)
Preview abstract
Graph neural networks (GNNs) have emerged as a powerful approach for solving many network mining tasks. However, despite their successes on small datasets, efficiently utilizing them on massive web-scale data remains a challenge.
All recently proposed scalable GNN approaches rely on a message passing procedure to propagate information on the graph, leading to expensive recursive neighborhood expansion (and aggregation) schemes during both training and inference. This limitation is particularly problematic if we want to consider neighbors that are multiple hops away.
In contrast, by leveraging connections between GNNs and personalized PageRank, we develop a model that incorporates multi-hop neighborhood information in a single (non-recursive) step.
Our model \model, is significantly faster than previous scalable approaches while maintaining state-of-the-art prediction performance. Moreover, our algorithm can produce a scalability certificate which guarantees that the predictions will not change if we would have used instead a more expensive non-scalable baseline.
To demonstrate the strengths and the scalability of our approach we both evaluate on existing datasets, and propose a new large scale graph learning setting, using the open academic graph (90M nodes, 3B edges).
Additionally, we discuss the practical applications of large-scale semi-supervised learning, like \model~ at Google to solve node classification problems.
View details
Grale: Designing Networks for Graph Learning
Alexandru Moșoi
Sam Ruth
Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Association for Computing Machinery (2020), 2523–2532
Preview abstract
How can we find the right graph for semi-supervised learning? In real world applications, the choice of which edges to use for computation is the first step in any graph learning process. Interestingly, there are often many types of similarity available to choose as the edges between nodes, and the choice of edges can drastically affect the performance of downstream semi-supervised learning systems. However, despite the importance of graph design, most of the literature assumes that the graph is static.
In this work, we present Grale, a scalable method we have developed to address the problem of graph design for graphs with billions of nodes. Grale operates by fusing together different measures of (potentially weak) similarity to create a graph which exhibits high task-specific homophily between its nodes. Grale is designed for running on large datasets. We have deployed Grale in more than 20 different industrial settings at Google, including datasets which have tens of billions of nodes, and hundreds of trillions of potential edges to score. By employing locality sensitive hashing techniques, we greatly reduce the number of pairs that need to be scored, allowing us to learn a task specific model and build the associated nearest neighbor graph for such datasets in hours, rather than the days or even weeks that might be required otherwise.
We illustrate this through a case study where we examine the application of Grale to an abuse classification problem on YouTube with hundreds of million of items. In this application, we find that Grale detects a large number of malicious actors on top of hard-coded rules and content classifiers, increasing the total recall by 89% over those approaches alone.
View details
Debiasing Graph Embeddings with Metadata-Orthogonal Training
2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (2020)
Preview abstract
In many real world graphs, the formation of edges can be influenced by certain sensitive features of the nodes (e.g. their gender, community, or reputation). In this paper we argue that when such influences exist, any downstream Graph Neural Network (GNN) will be implicitly biased by these structural correlations. To allow control over this phenomenon, we introduce the Metadata-Orthogonal Node Embedding Training (MONET) unit, a general neural network architecture component for performing training-time linear debiasing of graph embeddings. MONET operates by ensuring that the node embeddings are trained on a hyperplane orthogonal to that of the node features (metadata). Unlike debiasing approaches in similar domains, our method offers exact guarantees about the correlation between the resulting embeddings and any sensitive metadata. We illustrate the effectiveness of MONET though our experiments on a variety of real world graphs against challenging baselines (e.g. adversarial debiasing), showing superior performance in tasks such as preventing the leakage of political party affiliation in a blog network, and preventing the gaming of embedding-based recommendation systems.
View details
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
Proceedings of the 2019 World Wide Web Conference on World Wide Web
Preview abstract
Can neural networks learn to compare graphs without feature engineering?
In this paper, we show that it is possible to learn representations for graph similarity with neither domain knowledge nor supervision (i.e. feature engineering or labeled graphs).
We propose Deep Divergence Graph Kernels, an unsupervised method for learning representations over graphs that encodes a relaxed notion of graph isomorphism.
Our method consists of three parts.
First, we learn an encoder for each anchor graph to capture its structure.
Second, for each pair of graphs, we train a cross-graph attention network which uses the node representations of an anchor graph to reconstruct another graph.
This approach, which we call isomorphism attention, captures how well the representations of one graph can encode another.
We use the attention-augmented encoder's predictions to define a divergence score for each pair of graphs.
Finally, we construct an embedding space for all graphs using these pair-wise divergence scores.
Unlike previous work, much of which relies on 1) supervision, 2) domain specific knowledge (e.g. a reliance on Weisfeiler-Lehman kernels), and 3) known node alignment, our unsupervised method jointly learns node representations, graph representations, and an attention-based alignment between graphs.
Our experimental results show that Deep Divergence Graph Kernels can learn an unsupervised alignment between graphs, and that the learned representations achieve competitive results when used as features on a number of challenging graph classification tasks.
Furthermore, we illustrate how the learned attention allows insight into the the alignment of sub-structures across graphs.
View details
MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing
Sami Abu-El-Haija
Amol Kapoor
Hrayr Harutyunyan
Nazanin Alipourfard
Kristina Lerman
Greg Ver Steeg
Aram Galstyan
The Thirty-sixth International Conference on Machine Learning (ICML) (2019)
Preview abstract
Recently, many methods have been proposed for semi-supervised learning that extend the convolutional operator from Euclidean domains to graph-structured data by approximating the eigenbasis of the graph Laplacian. However, despite their prevalence, there has not been extensive analysis of the expressive power of these models. In this work, we prove that popular methods (such as the Graph Convolutional Network) do not model and cannot learn a class of neighborhood difference relationships which we call \delta operators. To address this weakness, we propose a new model, MixHop, that can capture these difference relationships by learning mixed feature representations of neighbors at various distances. MixHop requires no additional memory or computational complexity, and outperforms challenging baselines on several graph datasets including citation networks, synthetic graphs, and molecule classification for quantum chemistry. Furthermore, we quantify how the model prioritizes neighborhood information across different network datasets by adding a sparsity regularizer.
View details
Preview abstract
Recent interest in graph embedding methods has focused on learning a single representation for each node in the graph.
But can nodes really be best described by a single vector representation?
In this work, we propose a method for learning multiple representations of the nodes in a graph (e.g., the users of a social network).
Based on a principled decomposition of the ego-network, each representation encodes the role of the node in a different local community in which the nodes participate.
These representations allow for improved reconstruction of the nuanced relationships that occur in the graph -- a phenomenon that we illustrate through state-of-the-art results on link prediction tasks on a variety of graphs, reducing the error by up to $90\%$.
In addition, we show that these embeddings allow for effective visual analysis of the learned community structure.
View details
N-GCN: Multi-scale Graph Convolution for Semi-supervised Node Classification
Sami Abu-El-Haija
Amol Kapoor
Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) (2019)
Preview abstract
Graph Convolutional Networks (GCNs) have shown significant improvements in semi-supervised learning on graph-structured data. Concurrently, unsupervised learning of graph embeddings has benefited from the information contained in random walks. In this paper, we propose a model: Network of GCNs (NGCN), which marries these two lines of work. At its core, N-GCN trains multiple instances of GCNs over node pairs discovered at different distances in random walks, and learns a combination of the instance outputs which optimizes the classification objective. Our experiments show that our proposed N-GCN model improves state-of-the-art baselines on all of the challenging node classification tasks we consider: Cora, Citeseer, Pubmed, and PPI. In addition, our proposed method has other desirable properties, including generalization to recently proposed semi-supervised learning methods such as GraphSAGE, allowing us to propose N-SAGE, and resilience to adversarial input perturbations.
View details
Preview abstract
Graph embedding methods represent nodes in a continuous vector space, preserving information from the graph (e.g. by sampling random walks). There are many hyper-parameters to these methods (such as random walk length) which have to be manually tuned for every graph. In this paper, we replace random walk hyper-parameters with trainable parameters that we automatically learn via backpropagation. In particular, we learn a novel attention model on the power series of the transition matrix, which guides the random walk to optimize an upstream objective. Unlike previous approaches to attention models, the method that we propose utilizes attention parameters exclusively on the data (e.g. on the random walk), and not used by the model for inference. We experiment on link prediction tasks, as we aim to produce embeddings that best-preserve the graph structure, generalizing to unseen information. We improve state-of-the-art on a comprehensive suite of real world datasets including social, collaboration, and biological networks. Adding attention to random walks can reduce the error by 20% to 45% on datasets we attempted. Further, our learned attention parameters are different for every graph, and our automatically-found values agree with the optimal choice of hyper-parameter if we manually tune existing methods.
View details
Preview abstract
We present HARP, a novel method for learning low dimensional embeddings of a graph's nodes which preserves higher-order structural features. Our proposed method achieves this by compressing the input graph prior to embedding it, effectively avoiding troublesome embedding configurations (i.e. local minima) which can pose problems to non-convex optimization. HARP works by finding a smaller graph which approximates the global structure of its input. This simplified graph is used to learn a set of initial representations, which serve as good initializations for learning representations in the original, detailed graph. We inductively extend this idea, by decomposing a graph in a series of levels, and then embed the hierarchy of graphs from the coarsest one to the original graph. HARP is a general meta-strategy to improve all of the state-of-the-art neural algorithms for embedding graphs, including DeepWalk, LINE, and Node2vec. Indeed, we demonstrate that applying HARP's hierarchical paradigm yields improved implementations for all three of these methods, as evaluated on both classification tasks on real-world graphs such as DBLP, BlogCatalog, CiteSeer, and Arxiv, where we achieve a performance gain over the original implementations by up to 14% Macro F1.
View details
Learning Edge Representations via Low-Rank Asymmetric Projections
ACM International Conference on Information and Knowledge Management (2017) (to appear)
Preview abstract
We propose a new method for embedding graphs while preserving directed edge information. Learning such continuous-space vector representations (or embeddings) of nodes in a graph is an important first step for using network information (from social networks, user-item graphs, knowledge bases, etc.) in many machine learning tasks.
Unlike previous work, we (1) explicitly model an edge as a function of node embeddings, and we (2) propose a novel objective, the "graph likelihood", which contrasts information from sampled random walks with non-existent edges. Individually, both of these contributions improve the learned representations, especially when there are memory constraints on the total size of the embeddings. When combined, our contributions enable us to significantly improve the state-of-the-art by learning more concise representations that better preserve the graph structure.
We evaluate our method on a variety of link-prediction task including social networks, collaboration networks, and protein interactions, showing that our proposed method learn representations with error reductions of up to 76% and 55%, on directed and undirected graphs. In addition, we show that the representations learned by our method are quite space efficient, producing embeddings which have higher structure-preserving accuracy but are 10 times smaller.
View details
When Recommendation Goes Wrong - Anomalous Link Discovery in Recommendation Networks
Michael Schueppert
Jack Saalweachter
Mayur Thakur
Proceedings of the 22th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016)
Preview abstract
We present a secondary ranking system to find and remove
erroneous suggestions from a geospatial recommendation system.
We discover such anomalous links by “double checking”
the recommendation system’s output to ensure that it
is both structurally cohesive, and semantically consistent.
Our approach is designed for the Google Related Places
Graph, a geographic recommendation system which provides
results for hundreds of millions of queries a day. We model
the quality of a recommendation between two geographic entities
as a function of their structure in the Related Places
Graph, and their semantic relationship in the Google Knowledge
Graph.
To evaluate our approach, we perform a large scale human
evaluation of such an anomalous link detection system. For
the long tail of unpopular entities, our models can predict
the recommendations users will consider poor with up to
42% higher mean precision (29 raw points) than the live
system.
Results from our study reveal that structural and semantic
features capture different facets of relatedness to human
judges. We characterize our performance with a qualitative
analysis detailing the categories of real-world anomalies our
system is able to detect, and provide a discussion of additional
applications of our method.
View details
DeepWalk: Online Learning of Social Representations
Steven Skiena
Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2014), pp. 701-710
Polyglot: Distributed Word Representations for Multilingual NLP
Steven Skiena
Proceedings of the Seventeenth Conference on Computational Natural Language Learning (2013), pp. 183-192