Nina Taft
Nina Taft is a Senior Staff Research Scientist at Google where she leads the Applied Privacy Research group. Prior to joining Google, Nina worked at Technicolor Research, Intel Labs Berkeley, Sprint Labs and SRI. She received her PhD from UC Berkeley. Over the years, she has worked in the fields of networking protocols, network traffic matrix estimation, Internet traffic modeling and prediction, intrusion detection, recommendation systems and privacy. Her current interests like in applications of machine learning for privacy, private data analytics, and user experience. She has been the chair or co-chair of the SIGCOMM, IMC and PAM conferences. (While some papers are listed here, see Google Scholar for a complete listing.)
Authored Publications
Google Publications
Other Publications
Sort By
Balancing Privacy and Serendipity in CyberSpace
M. Satyanarayanan
Nigel Davies
International Workshop on Mobile Computing Systems and Applications (ACM HotMobile), http://www.hotmobile.org/2022/ (2022)
Preview abstract
Unplanned encounters or casual collisions between colleagues have long
been recognized as catalysts for creativity and innovation. The absence of
such encounters has been a negative side effect of COVID-enforced remote
work. However, there have also been positive side effects such as less time
lost to commutes, lower carbon footprints, and improved work-life balance.
This vision paper explores how serendipity for remote workers can be
created by leveraging IoT technologies, edge computing, high-resolution
video, network protocols for live interaction, and video/audio denaturing.
We reflect on the privacy issues that technology-mediated serendipity raises
and sketch a path towards honoring diverse privacy preferences.
View details
Analyzing User Perspectives on Mobile App Privacy at Scale
International Conference on Software Engineering (ICSE) (2022)
Preview abstract
In this paper we present a methodology to analyze users’ concerns and perspectives about privacy at scale. We leverage NLP
techniques to process millions of mobile app reviews and extract
privacy concerns. Our methodology is composed of a binary classifier that distinguishes between privacy and non-privacy related
reviews. We use clustering to gather reviews that discuss similar
privacy concerns, and employ summarization metrics to extract
representative reviews to summarize each cluster. We apply our
methods on 287M reviews for about 2M apps across the 29 categories in Google Play to identify top privacy pain points in mobile
apps. We identified approximately 440K privacy related reviews.
We find that privacy related reviews occur in all 29 categories, with
some issues arising across numerous app categories and other issues
only surfacing in a small set of app categories. We show empirical
evidence that confirms dominant privacy themes – concerns about
apps requesting unnecessary permissions, collection of personal
information, frustration with privacy controls, tracking and the selling of personal data. As far as we know, this is the first large scale
analysis to confirm these findings based on hundreds of thousands
of user inputs. We also observe some unexpected findings such
as users warning each other not to install an app due to privacy
issues, users uninstalling apps due to privacy reasons, as well as
positive reviews that reward developers for privacy friendly apps.
Finally we discuss the implications of our method and findings for
developers and app stores.
View details
Hark: A Deep Learning System for Navigating Privacy Feedback at Scale
Rishabh Khandelwal
2022 IEEE Symposium on Security and Privacy (SP)
Preview abstract
Integrating user feedback is one of the pillars for building successful products. However, this feedback is generally collected in an unstructured free-text form, which is challenging to understand at scale. This is particularly demanding in the privacy domain due to the nuances associated with the concept and the limited existing solutions. In this work, we present Hark, a system for discovering and summarizing privacy-related feedback at scale. Hark automates the entire process of summarizing privacy feedback, starting from unstructured text and resulting in a hierarchy of high-level privacy themes and fine-grained issues within each theme, along with representative reviews for each issue. At the core of Hark is a set of new deep learning models trained on different tasks, such as privacy feedback classification, privacy issues generation, and high-level theme creation. We illustrate Hark’s efficacy on a corpus of 626M Google Play reviews. Out of this corpus, our privacy feedback classifier extracts 6M privacy-related reviews (with an AUC-ROC of 0.92). With three annotation studies, we show that Hark’s generated issues are of high accuracy and coverage and that the theme titles are of high quality. We illustrate Hark’s capabilities by presenting high-level insights from 1.3M Android apps.
View details
"Shhh...be Quiet!" Reducing the Unwanted Interruptions of Notification Permission Prompts on Chrome
Balazs Engedy
Jud Porter
Kamila Hasanbega
Andrew Paseltiner
Hwi Lee
Edward Jung
PJ McLachlan
Jason James
30th USENIX Security Symposium (USENIX Security 21), USENIX Association, Vancouver, B.C. (2021)
Preview abstract
Push notifications are an extremely useful feature. In web browsers, they allow users to receive timely updates even if the website is not currently open. On Chrome, the feature has become extremely popular since its inception in 2015, but it is also the least likely to be accepted by users. Our telemetry shows that, although 74% of all permission prompts are about notifications, they are also the least likely to be granted with only a 10% grant rate on desktop and 21% grant rate on Android. In order to preserve its utility for the websites and to reduce unwanted interruptions for the users, we designed and tested a new UI for notification permission prompt on Chrome.
In this paper, we conduct two large-scale studies of Chrome users interactions with the notifications permission prompt in the wild, in order to understand how users interact with such prompts and to evaluate a novel design that we introduced in Chrome version 80 in February 2020. Our main goal for the redesigned UI is to reduce the unwanted interruptions due to notification permission prompts for Chrome users, the frequency at which users have to suppress them and the ease of changing a previously made choice.
Our results, based on an A/B test using behavioral data from more than 40 million users who interacted with more than 100 million prompts on more than 70 thousand websites, show that the new UI is very effective at reducing the unwanted interruptions and their frequency (up to 30% fewer unnecessary actions on the prompts), with a minimal impact (less than 5%) on the grant rates, across all types of users and websites. We achieve these results thanks to a novel adaptive activation mechanism coupled with a block list of interrupting websites, which is derived from crowd-sourced telemetry from Chrome clients.
View details
A Large Scale Study of Users Behaviors, Expectations and Engagement with Android Permissions
Weicheng Cao
Chunqiu Xia
David Lie
Lisa Austin
Usenix Security Symposium, Usenix, https://www.usenix.org/conference/usenixsecurity21 (2021)
Preview abstract
We conduct a global study on the behaviors, expectations and engagement of 1,719 participants across 10 countries and regions towards Android application permissions. Participants were recruited using mobile advertising and used an application we designed for 30 days. Our app samples user behaviors (decisions made), rationales (via in-situ surveys), expectations, and attitudes, as well as some app provided explanations. We study the grant and deny decisions our users make, and build mixed effect logistic regression models to illustrate the many factors that influence this decision making. Among several interesting findings, we observed that users facing an unexpected permission request are more than twice as likely to deny it compared to a user who expects it, and that permission requests accompanied by an explanation have a deny rate that is roughly half the deny rate of app permission requests without explanations. These findings remain true even when controlling for other factors. To the best of our knowledge, this may be the first study of actual privacy behavior (not stated behavior) for Android apps, with users using their own devices, across multiple continents.
View details
Reducing Permission Requests in Mobile Apps
Martin Pelikan
Ulfar Erlingsson
Giles Hogben
Proceedings of ACM Internet Measurement Conference (IMC) (2019)
Preview abstract
Users of mobile apps sometimes express discomfort or concerns with what they see as unnecessary or intrusive permission requests by certain apps. However encouraging mobile app developers to request fewer permissions is challenging because there are many reasons why permissions are requested; furthermore, prior work has shown it is hard to disambiguate the purpose of a particular permission with high certainty. In this work we describe a novel, algorithmic mechanism intended to discourage mobile-app developers from asking for unnecessary permissions. Developers are incentivized by an automated alert, or "nudge", shown in the Google Play Console when their apps ask for permissions that are requested by very few functionally-similar apps---in other words, by their competition. Empirically, this incentive is effective, with significant developer response since its deployment. Permissions have been redacted by 59% of apps that were warned, and this attenuation has occurred broadly across both app categories and app popularity levels. Importantly, billions of users' app installs from the Google Play have benefited from these redactions
View details
Exploring decision making with Android's runtime permission dialogs using in-context surveys
Thirteenth Symposium on Usable Privacy and Security (SOUPS), Usenix (2017)
Preview abstract
A great deal of research on the management of user data on smartphones via permission systems has revealed significant levels of user discomfort, lack of understanding, and lack of attention. The majority of these studies were conducted on Android devices before runtime permission dialogs were widely deployed. In this paper we explore how users make decisions with runtime dialogs on smartphones with Android 6.0 or higher. We employ an experience sampling methodology in order to ask users the reasons influencing their decisions immediately after they decide. We conducted a longitudinal survey with 157 participants over a 6 week period.
We explore the grant and denial rates of permissions, overall and on a per permission type basis. Overall, our participants accepted 84% of the permission requests. We observe differences in the denial rates across permissions types; these vary from 23% (for microphone) to 10% (calendar). We find that one of the main reasons for granting or denying a permission request depends on users’ expectation on whether or not an app should need a permission. A common reason for denying permissions is because users know they can change them later. Among the permissions granted, our participants said they were comfortable with 90% of those decisions - indicating that for 10% of grant decisions users may be consenting reluctantly. Interestingly, we found that women deny permissions twice as often as men.
View details
Intuitions, analytics, and killing ants: Inference literacy of high school-educated adults in the US
Preview abstract
Analytic systems increasingly allow companies to draw inferences about users’ characteristics, yet users may not fully understand these systems due to their complex and often unintuitive nature. In this paper, we investigate inference literacy: the beliefs and misconceptions people have about how companies collect and make inferences from their data. We interviewed 21 non-student participants with a high school education, finding that few believed companies can make the type of deeply personal inferences that companies now routinely make through machine learning. Instead, most participant’s inference literacy beliefs clustered around one of two main concepts: one cluster believed companies make inferences about a person based largely on a priori stereotyping, using directly gathered demographic data; the other cluster believed that companies make inferences based on
computer processing of online behavioral data, but often expected these inferences to be limited to straightforward intuitions. We also find evidence that cultural models related to income and ethnicity influence the assumptions that users make about their own role in the data economy. We share implications for research, design, and policy on tech savviness, digital inequality, and potential inference literacy interventions.
View details
Privacy Mediators: Helping IoT Cross the Chasm
Nigel Davies
Mahadev Satyanarayanan
Sarah Clinch
Brandon Amos
International Workshop on Mobile Computing Systems and Applications (ACM HotMobile) (2016)
Preview abstract
Unease over data privacy will retard consumer acceptance of IoT
deployments. The primary source of discomfort is a lack of user
control over raw data that is streamed directly from sensors to the
cloud. This is a direct consequence of the over-centralization of
today’s cloud-based IoT hub designs. We propose a solution that
interposes a locally-controlled software component called a privacy
mediator on every raw sensor stream. Each mediator is in the same
administrative domain as the sensors whose data is being collected,
and dynamically enforces the current privacy policies of the owners
of the sensors or mobile users within the domain. This solution ne-
cessitates a logical point of presence for mediators within the admin-
istrative boundaries of each organization. Such points of presence
are provided by cloudlets, which are small locally-administered data
centers at the edge of the Internet that can support code mobility.
The use of cloudlet-based mediators aligns well with natural personal
and organizational boundaries of trust and responsibility.
View details
Preview abstract
The majority of internet traffic is now dominated
by streamed video content. As video quality continues to increase,
the strain that streaming traffic places on the network
infrastructure also increases. Caching content closer to users, e.g.,
using Content Distribution Networks, is a common solution to
reduce the load on the network. A simple approach to selecting
what to put in regional caches is to put the videos that are
most popular globally across the entire customer base. However,
this approach ignores distinct regional taste. In this paper we
explore the question of how a video content provider could go
about determining whether or not they should use a cache filling
policy based solely upon global popularity or take into account
regional tastes as well. We propose a model that captures the
overlap between inter-regional and intra-regional preferences. We
focus on movie content and derive a synthetic model that captures
“taste” using matrix factorization, similarly to the method used
in recommender systems. Our model enables us to widely explore
the parameter space, and derive a set of metrics providers can
use to determine whether populating caches according to regional
of global tastes provides better cache performance.
View details
Managing your Private and Public Data: Bringing down Inference Attacks against your Privacy
Amy Zhang
Branislav Kveton
Flavio du Pin Calmon
Nadia Fawaz
Pedro Oliveira
Salman Salamatian
Sandilya Bhamidipati
IEEE Journal on Signal Processing (2015)
Preview abstract
We propose a practical methodology to protect a user’s private data, when he wishes to publicly release data that is correlated with his private data, in the hope of getting some utility. Our approach relies on a general statistical inference framework that captures the privacy threat under inference
attacks, given utility constraints. Under this framework, data is distorted before it is released, according to a probabilistic privacy mapping. This mapping is obtained by solving a convex optimization problem, which minimizes information leakage under a distortion constraint. We address practical challenges encountered when applying this theoretical framework to real world
data. On one hand, the design of optimal privacy-preserving mechanisms requires knowledge of the prior distribution linking private data and data to be released, which is often unavailable in practice. On the other hand, the optimization may become untractable when data assumes values in large size alphabets, or is high dimensional. Our work makes three major contributions.
First, we provide bounds on the impact of a mismatched prior on the privacy-utility tradeoff. Second, we show how to reduce the optimization size by introducing a quantization step, and how to generate privacy mappings under quantization. Third, we evaluate our method on two datasets, including a new dataset that we collected, showing correlations between political convictions
and TV viewing habits. We demonstrate that good privacy properties can be achieved with limited distortion so as not to undermine the original purpose of the publicly released data, e.g. recommendations.
View details
GraphSC: Parallel Secure Computation Made Easy
Kartik Nayak
Xiao S. Wang
Stratis Ioannidis
Udi Weinsberg
Elaine Shi
IEEE Symposium on Security and Privacy, IEEE (2015)
Preview abstract
We propose introducing modern parallel programming
paradigms to secure computation, enabling their secure
execution on large datasets. To address this challenge, we
present GraphSC, a framework that (i) provides a programming
paradigm that allows non-cryptography experts to write secure
code; (ii) brings parallelism to such secure implementations; and
(iii) meets the needs for obliviousness, thereby not leaking any
private information. Using GraphSC, developers can efficiently
implement an oblivious version of graph-based algorithms (including
sophisticated data mining and machine learning algorithms)
that execute in parallel with minimal communication
overhead. Importantly, our secure version of graph-based algorithms
incurs a small logarithmic overhead in comparison
with the non-secure parallel version. We build GraphSC and
demonstrate, using several algorithms as examples, that secure
computation can be brought into the realm of practicality for big
data analysis. Our secure matrix factorization implementation
can process 1 million ratings in 13 hours, which is a multiple
order-of-magnitude improvement over the only other existing attempt, which requires 3 hours to process 16K ratings.
View details
Perceived Frequency of Advertising Practices
Allen Collins
Aaron Sedley
Allison Woodruff
Symposium on Usable Privacy and Security (SOUPS), Privacy Personas and Segmentation Workshop, Usenix (2015)
Preview abstract
In this paper, we introduce a new construct for measuring individuals’ privacy-related beliefs and understandings, namely their perception of the frequency with which information about individuals is gathered and used by others for advertising purposes. We introduce a preliminary instrument for measuring this perception, called the Ad Practice Frequency Perception Scale. We report data from a survey using this instrument, as well as the results of an initial clustering of participants based on this data. Our results, while preliminary, suggest that this construct may have future potential to characterize and segment individuals, and is worthy of further exploration.
View details
Privacy Tradeoffs in Predictive Analytics
Stratis Ioannidis
Andrea Montanari
Udi Weinsberg
Smriti Bhagat
Nadia Fawaz
Sigmetrics, ACM (2014)
Recommending with an Agenda: Active Learning of Private Attributes using Matrix Factorization.
Privacy Preserving Matrix Factorization
Valeria Nikolaenko
Stratis Ioannidis
Udi Weinsberg
Marc Joye
Dan Boneh
20th ACM Conference on Computer and Communications Security (CCS) (2013)
Privacy Preserving Ridge Regression on Hundreds of Millions of Records
Valeria Nikolaenko
Udi Weinsberg
Stratis Ioannidis
Marc Joye
Dan Boneh
Symposium on Security and Privacy, IEEE (2013), pp. 334-348
Finding a Needle in a Haystack of Reviews: Cold Start Context-Based Hotel Recommender System
Asher Levi
Osnat Mokryn
Christophe Diot
Proceedings of ACM Conference on Recommender Systerms (2012), pp. 115-122
BlurMe: Inferring and Obfuscating User Gender Based on Ratings.
CARE: Content Aware Redundancy Elimination for Challenged Networks
Athula Balachandran
Gianluca Iannaccone
Qingxi Li
Srinivasan Seshan
Udi Weinsberg
Vyas Sekar
ACM Hot Topics in Networking (2012)
Public Health for the Internet (PHI)
Joseph M. Hellerstein
Tyson Condie
Minos N. Garofalakis
Boon Thau Loo
Timothy Roscoe
CIDR (2007), pp. 332-340
Traffic Matrices: Balancing Measurements, Inference and Modeling
Augustin Soule
Anukool Lakhina
Konstantina Papagiannaki
Kave Salamatian
Antonio Nucci
Mark Crovella
Christophe Diot
ACM Sigmetrics: Conference on Measurement and Modeling of Computer Systems (2005)
Combining Filtering and Statistical Methods for Anomaly Detection
Structural Analysis of Network Traffic Flows
Anukool Lakhina
Konstantina Papagiannaki
Mark Crovella
Christophe Diot
Eric Kolaczyk
ACM Sigmetrics: Conference on Measurement and Modeling of Computer Systems (2004), pp. 61-72
Long Term Forecasting of Internet Backbone Traffic: Observations and Initial Models
Traffic Matrix Estimation: Existing Techniques and New Directions
Alberto Medina
Kave Salamatian
Supratik Bhattacharyya
Christophe Diot
ACM Proceedings of SIGCOMM conference (2002), pp. 161-174