Zhuohang LiVanderbilt UniversityNashvilleTNUSAzhuohang.li@vanderbilt.edu,Andrew LowyUniversity of Wisconsin–MadisonMadisonWIUSAalowy@wisc.edu,Jing LiuMitsubishi Electric Research LaboratoriesCambridgeMAUSAjiliu@merl.com,Toshiaki KoikeAkinoMitsubishi Electric Research LaboratoriesCambridgeMAUSAkoike@merl.com,Kieran ParsonsMitsubishi Electric Research LaboratoriesCambridgeMAUSAparsons@merl.com,Bradley MalinVanderbilt UniversityNashvilleTNUSAb.malin@vanderbilt.eduandYe WangMitsubishi Electric Research LaboratoriesCambridgeMAUSAyewang@merl.com
Abstract.
In distributed learning settings, models are iteratively updated with shared gradients computed from potentially sensitive user data. While previous work has studied various privacy risks of sharing gradients, our paper aims to provide a systematic approach to analyze private information leakage from gradients. We present a unified gamebased framework that encompasses a broad range of attacks including attribute, property, distributional, and user disclosures. We investigate how different uncertainties of the adversary affect their inferential power via extensive experiments on five datasets across various data modalities. Our results demonstrate the inefficacy of solely relying on data aggregation to achieve privacy against inference attacks in distributed learning.We further evaluate five types of defenses, namely, gradient pruning, signed gradient descent, adversarial perturbations, variational information bottleneck, and differential privacy, under both static and adaptive adversary settings. We provide an informationtheoretic view for analyzing the effectiveness of these defenses against inference from gradients. Finally, we introduce a method for auditing attribute inference privacy, improving the empirical estimation of worstcase privacy through crafting adversarial canary records.
1. Introduction
Ensuring privacy is an important prerequisite for adopting machine learning (ML) algorithms in critical domains that require training on sensitive user data, such as medical records, personal financial information, private images, and speech.Prominent ML models, ranging from compact neural networks tailored for mobile platforms(howard2017mobilenets) to large foundation models(brown2020language; rombach2022high), areoften trained on user data via gradientbased iterative optimization.In many cases, such as decentralized learning(dhasade2023decentralized; hsieh2017gaia) or federated learning (FL)(mcmahan2017communication; hard2018federated; guliani2021training), model gradients are directly exchanged in place of raw training data to facilitatejoint learning, which opens up an additional channel for potential privacy leakage(lowy2022private).
Recent works have explored information leakage through this gradient channel in various forms, albeit in isolation.For instance, Nasr et al.(nasr2019comprehensive) showed that it is feasible to infer membership (i.e., singlebit information indicating the existence of a target record in the training data pool) from model updates in federated learning.Beyond membership, Melis et al.(melis2019exploiting) demonstrated inference over sensitive properties of the training data in collaborative learning.Other independent lines of work additionally explored attribute inference(lyu2021novel; driouich2022novel) and data reconstruction(zhu2019deep; geiping2020inverting; gupta2022recovering) through shared model gradients.However, some emerging privacy concerns that have so far only been considered under the centralized learning setting, such as the distributional inference(suri2022formalizing; chaudhari2023snap) and userlevel inference(kandpal2023user; li2022user), have not been well investigated in the gradient leakage setting.
Existing studies on information leakage from gradients have several limitations.First, the majority of the current literature focuses on investigating each individual type of inference attack under their specific threat models while lacking a comprehensive examination of inference attack performance under various adversarial assumptions, which is essential for providing a holistic view of the adversary’s capabilities.For instance, from the attack’s perspective, assuming the adversary to have access to a reasonablysized shadow dataset and limited rounds of access to the model’s gradients helps to capture the realistic inference privacy risk under a practical threat model. Conversely, from the defense’s perspective, assuming a powerful adversary with access to recordlevel gradients and auxiliary information about the private record helps to estimate the worstcase privacy risk, which may facilitate the design of more robust defenses.Second, while several types of heuristic defenses have been explored by prior work, their supposed effectiveness has not been fully verified under more challenging adaptive adversary settings. Moreover, existing studies do not adequately explain why some defenses succeed in reducing the inference risk over gradients, while others fail,which could provide important guidance on the design of more effective defenses.
In this paper, we conduct a systematic analysis of private information leakage from gradients.We start by defining a unified inference game that broadly encompasses four types of inference attacks that aims at inferring common private information of the data from gradients, namely, attribute inference attack (AIA), property inference attack (PIA), distributional inference attack (DIA), and user inference attack (UIA), as illustrated in Figure1.Under this framework, we show that information leakage from gradients can be treated as performing statistical inference over a sensitive variable upon observing samples of the gradients, with different definitions of the information encapsulated by the variable being inferred, leading to a generic template for constructing different types of inference attacks.We additionally explore different tiers of adversarial assumptions, with varying numbers of available data samples, numbers of observable rounds of gradients, and varying batch sizes, to investigate how different priors and uncertainties in the adversary’s knowledge about the gradient and data distribution affect the adversary’s inferential power.
We perform a systematic evaluation of these attacks on five datasets (Adult(misc_adult_2), Health(health_heritage), CREMAD(cao2014crema), CelebA(liu2015deep), UTKFace(zhang2017age)) with three different data modalities (tabular, speech, and image).A common setting in distributed learning is that the data distribution is heterogeneous across different nodes but hom*ogeneous within each node.Under this assumption, where the sensitive variable is common across a batch, we show that a larger batch size leads to higher inference privacy risk from gradients across all considered attacks, highlighting that solely relying on data aggregation is insufficient for achieving meaningful privacy in distributed learning.With a moderate batch size (e.g., $16$), we show that an adversary can launch successful inference attacks with very few shadow data samples ($\leq 1{,}000$). For instance, in the case of property inference on the Adult dataset, the adversary can achieve $0.92$ AUROC with only $100$ shadow data samples.Moreover, we demonstrate that an adversary with access to multiple rounds of gradient updates can perform Bayesian inference to aggregate adversarial knowledge, eventually leading to higher confidence and better attack performance.
We apply the developed inference attacks to evaluate the effectiveness of five common types of defenses from the privacy literature(zhu2019deep; sun2021soteria; wu2023learning; jia2018attriguard; jia2019memguard; shan2020fawkes; song2019overlearning; scheliga2022precode; scheliga2023privacy), including Gradient Pruning(zhu2019deep), Signed Stochastic Gradient Descent (SignSGD)(bernstein2018signsgd), Adversarial Perturbations(madry2018towards), Variational Information Bottleneck (VIB)(alemi2016deep), and Differential Privacy (DPSGD)(abadi2016deep), against both static adversaries that are unaware of the defense and adaptive adversaries that can adapt to the defense mechanism. We find that most heuristic defense methods only offer a weak notion of “security through obscurity”, in the sense that they defend against static adversaries empirically but can be easily bypassed by adaptive adversaries.Although DPSGD shows consistent performance against both static and adaptive adversaries, to fully prevent inference attacks, it often requires injecting too much noise which diminishes the utility of the learning model.We provide an informationtheoretic perspective for explaining and analyzing the (in)effectiveness of these considered defenses and show that the key ingredient of a successful defense is to effectively reduce the mutual information between the released gradients and the sensitive variable, which could serve as a guideline for designing future defenses.Finally, to provide practical guidance in selecting privacy parameters, we introduce an auditing approach for empirically estimating the privacy loss of attribute inference attacks through crafting adversarial canary records to approximate the privacy risk in the worst case.
In summary, our main contributions are as follows:
 •
We provide a holistic analysis of inference privacy from gradients through a unified inference game that broadly encompasses a range of attacks concerning attribute, property, distributional, and user inference.
 •
We demonstrate the weakness of solely relying on data aggregation to achieve privacy against inference attacks in distributed learning. We do this through a systematic evaluation of the four types of attacks on datasets with different modalities under various adversarial assumptions.
 •
Our analyses reveal that reducing the mutual information between the released gradients and the sensitive variable is the key ingredient of a successful defense. This is shown by investigating five common types of defense strategies against inference over gradients from an informationtheoretic perspective.
 •
Our auditing results provide an empirical justification for tolerating large DP parameters when defending against attribute inference attacks (c.f.(lowy2024does)). This is achieved by implementing an auditing method for empirically estimating the privacy loss against attribute inference attacks from gradients.
2. Background and Related Work
2.1. Machine Learning Notation
A machine learning (ML) model can be denoted as a function $f_{\bm{\theta}}:{\mathbf{x}}\rightarrow{\mathbf{y}}$ parameterized by ${\bm{\theta}}$ that maps from the input (feature) space to the output (label) space.The training of an ML model involves a set of training data and an optimization procedure, such as stochastic gradient descent (SGD). At each step of SGD, a loss function $\mathcal{L}({\bm{\theta}},{\mathcal{D}}_{b})$ is first computed based on the current model and a batch of $k$ training samples ${\mathcal{D}}_{b}=\{({\bm{x}}_{i},{\bm{y}}_{i})\}_{i=1}^{k}$ and then a set of gradients is computed as ${\bm{g}}=\nabla_{{\bm{\theta}}}\mathcal{L}({\bm{\theta}},{\mathcal{D}}_{b})$. Finally, the model is updated by taking a gradient step towards minimizing the loss.
2.2. Related Work
Developing ML models in many applications involves training on the users’ private data, which introduces privacy leakage risks from different components of the ML model across several stages of the development and deployment pipeline.
Leakage From Model Parameters (${\bm{\theta}}$).The first way of exposing privacy information is through analyzing the model parameters.This is connected to the most prominent centralized ML setting, where the model is first developed on a local dataset and then released to the users for deployment.Various forms of privacy leakage have been studied in this setting.Whitebox membership inference(leino2020stolen; nasr2019comprehensive; sablayrolles2019white) aims at identifying the presence of individual records in the training dataset given access to the full model.Data extraction attacks exploit the memorization of the ML model to extract training samples(haim2022reconstructing; carlini2023extracting), whereas model inversion attacks generate synthetic data samples from the training distribution(yin2020dreaming; wang2021variational).In contrast, for distributional inference attacks(ateniese2015hacking; ganju2018property; suri2022formalizing), the attacker’s goal is to make inferences about the entire training data distribution rather than individuals.
Leakage From Model Outputs ($f_{\bm{\theta}}({\bm{x}})$).Another source of privacy leakage is the model output, which is related to more restrictive settings such as machine learning as a service (MLaaS) in cloud APIs where only blackbox access to the ML model is granted. Under this setting, researchers have studied several privacy attacks that can be launched by querying the model and observing the outputs.For instance, querybased model inversion attacks(fredrikson2014privacy; fredrikson2015model) exploit the predicted confidence or labels from the model to make inferences about the input data instance(zhang2020secret) or attribute(mehnaz2022your).Model stealing attacks attempt to recover the confidential model weights(tramer2016stealing) or hyperparameters(wang2018stealing) given query access to the model.Blackbox membership inference attacks(salem2018ml; truex2019demystifying; sablayrolles2019white; song2021systematic) and blackbox distributional inference attacks(mahloujifar2022property; chaudhari2023snap) allow an adversary to decide whether a data point was included in training or reveal information about the training data distribution by analyzing its output prediction or confidence.
Leakage From Model Gradients (${\bm{g}}$).The final source of privacy leakage is the gradient of the loss function with respect to the model parameters, which is essential for updating the model with stochastic gradient descent. This is relevant to ML settings that release intermediate model updates during model development, such as distributed training, federated learning, peertopeer learning, and online learning.Compared to model parameters, model gradients carry more nuanced information about a small batch of data used for computing the update and thus may reveal more information about the underlying data instances.Current literature studies different types of gradientbased privacy leakage in isolation.One line of work focused on data reconstruction from model gradients(zhu2019deep; geiping2020inverting) or updates(salem2020updates; haim2022reconstructing) with various data types, such as image(zhu2019deep; geiping2020inverting; yin2021see; li2022auditing), text(gupta2022recovering; haim2022reconstructing), tabular(vero2023tableak), and speech data(li2023speech).However, these attacks rely on strong adversarial assumptions and do not generalize to large batch sizes(huang2021evaluating).Another line of work investigated the extraction of private attributes or properties(melis2019exploiting; feng2021attribute) of the private data from model gradients.Specifically, Melis et al.(melis2019exploiting) first revealed that gradients shared in collaborative learning can be used to infer properties of the training data that are uncorrelated with the task label.Lyu et al.(lyu2021novel) explored attribute reconstruction from epochaveraged gradients on tabular and genomics data.Feng et al.(feng2021attribute) discovered that gradients of Speech Emotion Recognition models leak information about user demographics such as gender and age.Dang et al.(dang2022method) showed that speaker identities can be revealed from the gradients of Automatic Speech Recognition models.Kerkouche et al.(kerkouche2023client) demonstrated the weakness of secure aggregation without differential privacy in Federated learning by designing a disaggregation attack that exploits the linearity of model aggregation and client participation across multiple rounds to capture clientspecific properties.In contrast to existing studies that design separate treatments for each type of attack, in this work, we take a holistic view of information leakage from gradients.
3. Problem Formalization
This section introduces four types of inference attacks from gradients, namely, attribute inference, property inference, distributional inference, and user inference. We formally define information leakage from gradients using a unified security game, following standard practices in machine learning privacy studies(salem2023sok), and discuss variants of threat models that affect the adversary’s inferential power. In Section4, we describe methods to construct these attacks.
3.1. Attack Definitions
We consider four types of information leakage from model gradients that generally involve two parties, namely, a private learner who releases model gradients computed on a private data batch, and an adversary who tries to make inferences about the private data given access to the gradients.This generic setting captures multiple ML application scenarios such as distributed training, federated learning, and online learning.
Attribute Inference.Attribute inference attacks (AIA) seek to infer a data record’s unknown attribute (feature) from its gradient.Prior works in both centralized(wu2016methodology; yeom2018privacy) and federated settings(lyu2021novel; driouich2022novel) usually assume the record to be partially known.For instance, infer a missing entry (e.g., genotype) of a person’s medical record(fredrikson2014privacy).It is worth noting that, in practice, when the attributes are not completely independent, an adversary with partial knowledge about the record may be able to infer the unknown attribute just from the known ones, as in data imputation(jayaraman2022attribute).
Property Inference.Property inference attacks (PIA) aim to infer a global property of the private data batch that is not directly present in the data feature space but is correlated with some of the features (and consequently the gradients). For tabular data, these properties could be sensitive features that have been intentionally excluded from training (e.g., pseudoidentifiers in health records that are required to be removed for HIPAA compliance); for highdimensional data like image and speech, they could be some highlevel statistical features capturing the semantics of the data instance (e.g., race of a face image(melis2019exploiting) or gender of a speech recording(feng2021attribute)).
Distributional Inference.Distributional inference attacks (DIA) aim to infer the ratio of the training samples ($\alpha$) that satisfy some target property^{1}^{1}1Some prior work also refers to distributional inference as property inference..The majority of current literature on DIA(ganju2018property; suri2022formalizing; mahloujifar2022property; chaudhari2023snap) is in the space of centralized learning, which captures leakage from model parameters. These studies usually define DIA as a distinguishing test between two worlds where the model is trained on two datasets with different ratios ($\alpha_{0}$ and $\alpha_{1}$)(suri2022formalizing). This can be further categorized into property existence tests that decide if there exists any data point with the target property in the training set and property size estimation tests that infer the exact ratio of the property in the training data(chaudhari2023snap).In this work, we extend DIA to the gradient space and consider a general case that combines property existence and property size estimationby formulating DIA as performing ordinal classification between a set of $m$ ratio bins ($m\geq 3$), i.e., $\{0\},(0,\frac{1}{m1}],(\frac{1}{m1},\frac{2}{m1}],...,(\frac{m2}{m1},1]$.
User Inference. User inference attacks (UIA) or reidentification attacks aim to identify which user’s data was used to compute the observed gradients. Here, the adversary does not know the user’s exact data used for computing the gradients. Instead, the adversary is provided a set of candidate users and their corresponding underlying userlevel data distributions. This setting shares similarities with the subjectlevel membership inference(suri2022subject) in the sense that both attacks measure the privacy risk at the granularity of each individual. However, the user inference attack aims to infer richer information that directly exposes the user’s identity compared to the membership inference attack, which only discloses a single bit of information (i.e., whether a given user’s data sample is involved in training). Thus user inference can be considered as a generalization of subjectlevel membership inference attack.
We note that except for attribute inference which directly exposes (part of) the user’s private data, property inference, distributional inference, and user inference attacks are inferential disclosures (also known as deductive disclosures) that exploit the statistical correlation exists in data to infer sensitive information from the released gradients with high confidence.We exclude recordlevel privacy attacks such as membership inference and data reconstruction as our analysis here focuses on distributed learning scenarios where private information can be shared across different data samples within a batch.
3.2. Unified Inference Game
Our framework aims to capture an abstraction of privacy problems in distributed learning settings, where an attacker aims to recover some sensitive information of a particular client from their shared gradients (or model updates).In practical distributed learning settings, the data may be heterogeneously split across the clients, and an attacker may take advantage of side information about a particular client’s local data distribution.Generally, the objective of the attacker is to recover the sensitive information, represented by the variable ${\mathbf{a}}$, which is related to the local data distribution of the client through a joint distribution $\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}},{\mathbf{a}})=%\operatorname{\mathbb{P}}({\mathbf{a}})\operatorname{\mathbb{P}}({\mathbf{x}},%{\mathbf{y}}{\mathbf{a}})$.As we will detail later, specific choices in what ${\mathbf{a}}$ represents and the corresponding specialized structure of $\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}},{\mathbf{a}})$ enable the framework to capture attribute, property, distributional, and user inference privacy problems.This joint distribution may capture both the side information available to the attacker and the inherent heterogeneity of the data.To focus on evaluating the effectiveness of gradientbased attacks and defenses,we simplify the modeling of the overall training procedure, by updating the model in a centralized fashion on the entire training data set ${\mathcal{D}}$, but generating gradients for the attacker on batches drawn according to $\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}},{\mathbf{a}})$.
Definition 3.1.
Unified Inference Game.Let $\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}},{\mathbf{a}})$ be the joint distribution, ${\mathcal{L}}$ the loss function, ${\mathcal{T}}$ the training algorithm, $r$ the total number of training rounds, and ${\mathcal{R}}\subset[r]$ a set of rounds that are observable to the adversary^{2}^{2}2We use $[r]$ to denote the discrete set $\{1,2,...,r\}$..The unified inference game from gradients between a challenger (private learner) and an adversary is as follows:
 (1)
Challenger initializes the model parameters as ${\bm{\theta}}_{0}$.
 (2)
Challenger samples a training dataset ${\mathcal{D}}=\{({\bm{x}}_{j},{\bm{y}}_{j})\}_{j=1}^{n}$, where $({\bm{x}}_{j},{\bm{y}}_{j})\overset{\mathrm{i.i.d.}}{\sim}\operatorname{%\mathbb{P}}({\mathbf{x}},{\mathbf{y}})$.
 (3)
Challenger draws the sensitive variable ${\bm{a}}\sim\operatorname{\mathbb{P}}({\mathbf{a}})$.
 (4)
Challenger draws a batch of $k$ data samples ${\mathcal{D}}_{\bm{a}}=\{({\bm{x}}_{p},{\bm{y}}_{p})\}_{p=1}^{k}$, where $({\bm{x}}_{p},{\bm{y}}_{p})\overset{\mathrm{i.i.d.}}{\sim}\operatorname{%\mathbb{P}}({\mathbf{x}},{\mathbf{y}}{\bm{a}})$, for the given ${\bm{a}}$.
 (5)
Challenger computes the gradient of the loss on the data batch, ${\bm{g}}_{i}=\nabla_{{\bm{\theta}}_{i1}}\mathcal{L}({\bm{\theta}}_{i1},{%\mathcal{D}}_{\bm{a}})$.
 (6)
Challenger applies the defense mechanism $\mathcal{M}$ to produce a privatized version of the gradient $\tilde{{\bm{g}}}_{i}=\mathcal{M}({\bm{g}}_{i})$.When no defense is applied, $\mathcal{M}$ is simply the identity function, i.e., $\tilde{{\bm{g}}}_{i}={\bm{g}}_{i}$.
 (7)
The model is updated by applying the training algorithm on the training dataset for one epoch ${\bm{\theta}}_{i}\leftarrow{\mathcal{T}}({\bm{\theta}}_{i1},{\mathcal{D}},%\mathcal{L},\mathcal{M})$.
 (8)
Steps (5)(7) are repeated for $r$ rounds.
 (9)
A static adversary ${\mathcal{A}}_{s}$ gets access to ${\mathcal{L}}$, ${\mathcal{T}}$, $\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}},{\mathbf{a}})$, and the set of (intermediate) model parameters $\Theta=\{{\bm{\theta}}_{i1}i\in{\mathcal{R}}\}$ and released gradients ${\mathcal{G}}=\{\tilde{{\bm{g}}}_{i}i\in{\mathcal{R}}\}$. An adaptive adversary ${\mathcal{A}}_{a}$ also gets the defense mechanism $\mathcal{M}$.
 (10)
The adversary outputs its inference $\hat{{\bm{a}}}$ of the sensitive variable, i.e., $\hat{{\bm{a}}}\leftarrow{\mathcal{A}}_{s}({\mathcal{L}},{\mathcal{T}},%\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}},{\mathbf{a}}),\Theta,{%\mathcal{G}})$ for the static adversary, or $\hat{{\bm{a}}}\leftarrow{\mathcal{A}}_{a}({\mathcal{L}},{\mathcal{T}},%\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}},{\mathbf{a}}),\Theta,{%\mathcal{G}},\mathcal{M})$ for the adaptive adversary. The adversary wins if $\hat{{\bm{a}}}={\bm{a}}$ and loses otherwise.
In the above general game, the flexibility of the joint distribution $\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}},{\mathbf{a}})$ allows capturing various scenarios.Rather than explicitly defining this joint distribution, which anyways depends on the unknown data distribution, we implicitly define it through transformations/filtering of a given data set.Further, providing the adversary with knowledge of the distribution $\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}},{\mathbf{a}})$ is realized by providing the adversary with suitable shadow datasets drawn according to such transformations and filtering operations.
Attribute Inference Game.The variable ${\bm{a}}\in[m]$ is a discrete attribute within the features ${\bm{x}}$.Sampling ${\bm{a}}\sim\operatorname{\mathbb{P}}({\mathbf{a}})$ is accomplished by drawing uniformly or according to its marginal empirical distribution within the given training data set ${\mathcal{D}}$. Drawing the data batch ${\mathcal{D}}_{\bm{a}}$ according to the distribution $\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}}{\bm{a}})$, is accomplished by uniformly selecting data samples $({\bm{x}},{\bm{y}})$ from the entire training data set ${\mathcal{D}}$ with features ${\bm{x}}$ that possess the attribute ${\bm{a}}$.
Property Inference Game.This scenario is similar to attribute inference, except that ${\bm{a}}\in[m]$ is a property associated with, but external to the features of, each data sample (i.e., ${\bm{a}}$ may be some metadata property of each sample, but excluded from the features of ${\bm{x}}$).Drawing the data batch ${\mathcal{D}}_{\bm{a}}$ is handled similarly to the attribute inference case.
Distributional Inference Game. In this class of scenarios, we have a general set of $m$ transformations $\{\Phi_{\bm{a}}{\bm{a}}\in[m]\}$, which are selected by the sensitive variable ${\bm{a}}$.Each transformation $\Phi_{\bm{a}}$ corresponds to implicitly realizing the corresponding $\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}}{\bm{a}})$, by applying a general transformation that involves selective sampling from the overall training set ${\mathcal{D}}$.For example, the selection of ${\bm{a}}$ may indicate a particular proportion for the prevalence of a certain attribute or property, and thus the corresponding transformation would select batches of data according to that proportion.
User Inference Game. This is a special case of property inference, where ${\bm{a}}$ specifically corresponds to the identity of an individual that provided the corresponding data samples.Unlike other inference attacks, the sensitive variable, as it represents identity, does not take on a fixed set of values. To make the attack more operational, similar to prior work on data reconstruction(hayes2024bounding), we assume the inference is over a fixed set of $m$ candidate users randomly sampled from the population at the beginning of each game.
3.3. Threat Model
In this work, we assume the adversary has no control over the training protocol and only passively observes gradients as the model is being updated.In practice, the adversary could be an honestbutcurious parameter server(li2014scaling) in a distributed learning or federated learning setting, a node in decentralized learning(dhasade2023decentralized), or an attacker who eavesdrops on the communication channel.The game as defined in Definition3.1 is similar to games defined in many prior works(carlini2022membership; yeom2018privacy) which captures the averagecase privacy as the performance of the attack is measured by its expected value over the random draw of data samples.In Section7, we consider an alternative game where the data samples are adversarially chosen to provide a measure of worstcase privacy for privacy auditing.
We consider the following aspects that reflect different levels of the adversary’s knowledge:
 •
Knowledge of Data Distribution.Similar to many prior works on inference attacks(shokri2017membership; melis2019exploiting; ye2022enhanced; suri2022formalizing; carlini2022membership; liu2022ml; chaudhari2023snap), we model the adversarial knowledge of the data distribution through access to data samples drawn from this distribution, which are referred to as shadow datasets. A larger shadow dataset implies a more powerful adversary that has more knowledge about the underlying data distribution.For discrete attributes, we additionally consider a more informed adversary who knows the prior distribution of the attribute, which can be estimated by drawing a large amount of data from the population.
 •
Continuous Observation. We use the observable set ${\mathcal{R}}$ to capture the adversary’s ability to observe the gradients continuously. Intuitively, an adversary observing multiple rounds should perform better than a singleround adversary.Assuming a powerful adversary is beneficial for analyzing and auditing defenses. For instance, the privacy analysis in DPSGD(abadi2016deep) assumes that the adversary has access to all rounds of gradients.
 •
Adaptive Adversary. When evaluating defenses, in addition to the static adversary, we consider a stronger adaptive adversary who is aware of the underlying defense mechanism. This has been demonstrated as pivotal for thoroughly assessing the effectiveness of security defenses(carlini2017adversarial; tramer2020adaptive).
4. Attack Construction
4.1. Inference Attacks
The objective of the inference adversary is to infer the sensitive variable from the observed gradient, i.e., modeling the posterior distribution $\operatorname{\mathbb{P}}({\mathbf{a}}{\mathbf{g}})$.The general strategy of implementing inference attacks from gradients is to exploit the following two adversarial assumptions as defined in the unified inference game in Section3.2.First, the adversary possesses knowledge about the underlying population data distribution. Operationally, this implies that the adversary is able to draw data samples $({\bm{x}},{\bm{y}})$ with corresponding sensitive variable ${\bm{a}}$ from $\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}},{\mathbf{a}})$ to construct a shadow dataset.Second, the adversary has access to the training algorithm and the current model parameters, which allows the adversary to compute the gradients ${\bm{g}}$ for each batch of samples within the shadow dataset.With this information, the adversary can train a predictive model $P_{\bm{\omega}}({\mathbf{a}}{\mathbf{g}})$ to approximate the posterior.
Attribute & Property Inference.The attribute and property inference attacks follow a similar attack procedure, with the difference being whether the sensitive variable ${\mathbf{a}}$ is internal or external to the data record.Specifically, the adversary first constructs a shadow dataset ${\mathcal{D}}_{\bm{s}}$ by sampling from the population distribution, i.e., ${\mathcal{D}}_{\bm{s}}=\{({\bm{x}}_{j},{\bm{y}}_{j},{\bm{a}}_{j})\}_{j=1}^{s}$ where $({\bm{x}}_{j},{\bm{y}}_{j},{\bm{a}}_{j})\overset{\mathrm{i.i.d.}}{\sim}%\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}},{\bm{a}})$.Then the adversary draws data batches ${\mathcal{D}}_{\bm{a}}=\{({\bm{x}}_{j},{\bm{y}}_{j})\}_{j=1}^{k}$ from the shadow dataset through bootstrapping. This is achieved by repeatedly sampling the sensitive attribute ${\bm{a}}$ and then drawing $k$ records that have the sensitive attribute from ${\mathcal{D}}_{\bm{s}}$.Next, for each data batch ${\mathcal{D}}_{\bm{a}}$, the adversary computes the gradient ${\bm{g}}_{\bm{a}}=\nabla_{{\bm{\theta}}}\mathcal{L}({\bm{\theta}},{\mathcal{D}%}_{\bm{a}})$ using the current model parameters ${\bm{\theta}}$.This results in a set of labeled data pairs $({\bm{g}}_{\bm{a}},{\bm{a}})$, which can then be used for training an ML model $P_{\bm{\omega}}({\mathbf{a}}{\mathbf{g}})$ that predicts the sensitive variable from gradient observations.In practice, we find that it is beneficial to train the predictive model using a balanced dataset, which can be seen as modeling $\frac{\operatorname{\mathbb{P}}({\mathbf{a}}{\mathbf{g}})}{\operatorname{%\mathbb{P}}({\mathbf{a}})}$, and capture the prior knowledge in a separate term. This provides more stable performance for small shadow dataset sizes and skewed sensitive variable distributions.
It is worth noting that here we are considering a more restrictive setting for attribute inference where the adversary holds no additional knowledge about the private data besides the gradients compared to prior works that assume the private record to be partially known (e.g., (lyu2021novel; driouich2022novel) assume that everything is known except for the sensitive attribute).Our framework can be easily extended to the general case where the adversary holds arbitrary additional knowledge $\varphi({\bm{x}})$ about the private record ${\bm{x}}$ by training a predictive model $P_{\bm{\omega}}({\mathbf{a}}{\mathbf{g}},\varphi({\bm{x}}))$ using shadow data drawn from $\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}},{\mathbf{a}}\varphi({\bm{%x}}))$.
Distributional Inference.In distributional inference, the sensitive variable is the index of the ratio bin to which the property ratio belongs.The adversary first samples a random bin index ${\bm{a}}$ and then samples a property ratio $\alpha$ within that bin.Next, the adversary draws a data batch ${\mathcal{D}}_{\bm{a}}$ with $\lfloor\alpha k\rfloor$ records with the property and the rest without the property and derives the gradient ${\bm{g}}_{\bm{a}}$. This process is repeated by the adversary to collect a set of labeled gradients and attribute pairs $({\bm{g}}_{\bm{a}},{\bm{a}})$ to train a predictive model.We note that in the setting of distributional inference, the sensitive variable is a series of ordinal numbers indicative of the continuous property ratio $\alpha$ and thus should not be treated as regular multiclass classification.To utilize the ordering information, we adopt a simple strategy to ordinal classification(frank2001simple), which transforms the $m$class ordinal classification problem into $m1$ binary classifications. Specifically, the adversary trains a series of $m1$ binary classifiers, with the $i$th classifier $P_{{\bm{\omega}}_{i}}({\mathbf{a}}>i{\mathbf{g}})$ trained to decide whether or not ${\bm{a}}$ is larger than $i$. The final posterior probability can be obtained as
$\displaystyle P_{\bm{\omega}}({\mathbf{a}}={\bm{a}}{\mathbf{g}})=\begin{cases%}1P_{{\bm{\omega}}_{1}}({\mathbf{a}}>1{\mathbf{g}}),&\text{if }{\bm{a}}=1\\P_{{\bm{\omega}}_{{\bm{a}}1}}({\mathbf{a}}>{\bm{a}}1{\mathbf{g}})P_{{\bm{%\omega}}_{\bm{a}}}({\mathbf{a}}>{\bm{a}}{\mathbf{g}}),&\text{if }1<{\bm{a}}<m%\\P_{{\bm{\omega}}_{m1}}({\mathbf{a}}>m1{\mathbf{g}}),&\text{if }{\bm{a}}=m\\\end{cases}.$ 
User Inference.In contrast to other inference attacks where the sensitive variable is sampled from a welldefined set of values, in user inference, the sensitive variable is the user’s identity, which does not take on a fixed set of values.Moreover, the identities that occur during test time are likely not seen during the development of the attack model. As a result, the posterior $\operatorname{\mathbb{P}}({\mathbf{a}}{\mathbf{g}})$ cannot be directly modeled.To resolve this, we employ a training strategy analogous to the prototypical network(snell2017prototypical) for fewshot learning. Specifically, we first train a neural network $f_{\bm{\omega}}\circ u$ that is composed of an encoder $f_{\bm{\omega}}:{\mathbf{g}}\rightarrow{\mathbf{h}}$ that maps the gradient vector to a continuous embedding space and a classifier $u:{\mathbf{h}}\rightarrow{\mathbf{a}}$ that takes the embedding as input and outputs the predicted user identity. Given gradient and sensitive variable pairs $({\bm{g}},{\bm{a}})$ created from the shadow dataset, as the number of available users in the shadow dataset is finite, the neural network can be trained in an endtoend manner using standard multiclass classification loss such as crossentropy. After training, the classifier $u$ is discarded. At the time of inference, the adversary is provided with an observed gradient $\tilde{{\bm{g}}}$ and a set of $m$ candidate data batches $\{{\mathcal{D}}_{i}i\in[m]\}$, where ${\mathcal{D}}_{i}=\{({\bm{x}}_{j},{\bm{y}}_{j})\}_{j=1}^{k}$.Then, the adversary can derive the corresponding set of candidate gradients $\{{\bm{g}}_{i}i\in[m]\}$ based on the current model parameters ${\bm{\theta}}$.Finally, the adversary computes the probability of each candidate identity after observing the gradient as
$P_{\bm{\omega}}({\mathbf{a}}={\bm{a}}{\mathbf{g}}=\tilde{{\bm{g}}})=\frac{%\exp{(f_{\bm{\omega}}({\bm{g}}_{\bm{a}})f_{\bm{\omega}}(\tilde{{\bm{g}}})%_{2})}}{\sum_{i\in[m]}\exp{(f_{\bm{\omega}}({\bm{g}}_{i})f_{\bm{\omega}}(%\tilde{{\bm{g}}})_{2})}}.$ 
4.2. Continual Attack and Adaptive Attack
The inference attack can be further improved if the adversary has access to multiple rounds of gradients or the defense mechanism.
Inference under Continual Observation.In cases where continual observation of the gradients is allowed, the adversary can use the set of observed gradients ${\mathcal{G}}=\{\tilde{{\bm{g}}}_{i}i\in{\mathcal{R}}\}$ from multiple rounds to improve the attack. A naive solution would be to train a model to directly approximate $\operatorname{\mathbb{P}}({\mathbf{a}}{\mathcal{G}})$. However, this would be generally infeasible in practice because of the high dimensionality of ${\mathcal{G}}$.Instead, the adversary can use Bayesian updating to accumulate adversarial knowledge.Specifically, given a set of observed gradients, the logposterior can be formulated as
(1)  $\displaystyle\log$  $\displaystyle\operatorname{\mathbb{P}}({\mathbf{a}}={\bm{a}}{\mathcal{G}})$  
(2)  $\displaystyle=\log{\operatorname{\mathbb{P}}({\mathcal{G}}{\mathbf{a}}={\bm{a%}})}+\log{\operatorname{\mathbb{P}}({\mathbf{a}}={\bm{a}})}\log{\operatorname%{\mathbb{P}}({\mathcal{G}})}$  
(3)  $\displaystyle\approx\sum_{i\in{\mathcal{R}}}\log{\operatorname{\mathbb{P}}(%\tilde{{\bm{g}}}_{i}{\mathbf{a}}={\bm{a}})}+\log\operatorname{\mathbb{P}}({%\mathbf{a}}={\bm{a}})\log{\operatorname{\mathbb{P}}({\mathcal{G}})}$  
$\displaystyle=\sum_{i\in{\mathcal{R}}}\bigg{(}\log\operatorname{\mathbb{P}}({%\mathbf{a}}={\bm{a}}\tilde{{\bm{g}}}_{i})+\log\operatorname{\mathbb{P}}(%\tilde{{\bm{g}}}_{i})\log\operatorname{\mathbb{P}}({\mathbf{a}}={\bm{a}})%\bigg{)}$  
(4)  $\displaystyle\qquad+\log\operatorname{\mathbb{P}}({\mathbf{a}}={\bm{a}})\log{%\operatorname{\mathbb{P}}({\mathcal{G}})}$  
(5)  $\displaystyle=\sum_{i\in{\mathcal{R}}}\log\operatorname{\mathbb{P}}({\mathbf{a%}}={\bm{a}}\tilde{{\bm{g}}}_{i})({\mathcal{R}}1)\log\operatorname{\mathbb%{P}}({\mathbf{a}}={\bm{a}})+{\mathcal{C}},$ 
where Eq.(3) makes the approximating assumption that the gradients are conditionally independent given ${\bm{a}}$.Since ${\mathcal{C}}=\log{\operatorname{\mathbb{P}}({\mathcal{G}})}+\sum_{i\in{%\mathcal{R}}}\log{\operatorname{\mathbb{P}}(\tilde{{\bm{g}}}_{i})}$ is independent of ${\bm{a}}$, and therefore it can be treated as a constant. ${\mathcal{C}}=0$ if the gradients $\tilde{{\bm{g}}}_{i}$ are additionally mutually independent. In Eq.(5), the prior term is known and $\operatorname{\mathbb{P}}({\mathbf{a}}={\bm{a}}\tilde{{\bm{g}}}_{i})$ can be approximated by training a fresh model for each round of observation. The sensitive variable can thus be estimated as $\hat{{\bm{a}}}=\operatorname*{arg\,max}_{\bm{a}}\log\operatorname{\mathbb{P}}(%{\mathbf{a}}={\bm{a}}{\mathcal{G}})$.
Adaptive Attack.The adversary can design adaptive attacks if the defense mechanism ${\mathcal{M}}$ is known.Instead of training the predictive model $P_{\bm{\omega}}({\mathbf{a}}{\mathbf{g}})$ using clean gradient pairs $({\bm{g}}_{\bm{a}},{\bm{a}})$, a simple strategy for adaptive attack is to apply the same defense mechanism to the shadow data’s gradients and use the transformed gradient pairs $({\mathcal{M}}({\bm{g}}_{\bm{a}}),{\bm{a}})$ to train the predictive model $P_{\bm{\omega}}({\mathbf{a}}{\mathcal{M}}({\mathbf{g}}))$.As we will show in Section6, this simple strategy is sufficient to bypass several heuristicbased defenses.
5. Attack Evaluation
In this section, we evaluate the four inference attacks on datasets with different modalities to investigate the impact of various adversarial assumptions. The findings we present below indicate the key factors that affect the attack performance are: (1) Continual Observation: an adversary can improve the inference by accumulating information from multiple rounds of updates, (2) Batch Size: when the private information is shared across the batch, using a large batch averages out the effect of the other variables, making it easier to infer the sensitive variable, and (3) Adversarial Knowledge: the attack improves with the amount of knowledge of the data distribution (as captured by the number of available shadow data points).
5.1. Experimental Setup
5.1.1. Datasets and Model Architecture.
We consider the following five datasets with different data modalities (tabular, speech, and image) in our experiments.
Dataset  Type  Task Label  Sensitive Variable  Correlation 

Adult  Tabular  Income  Gender  0.1985 
Health  Tabular  Mortality  Gender  0.1123 
CREMAD  Speech  Emotion  Gender  0.0133 
CelebA  Image  Smiling  High Cheekbones  0.6904 
UTKFace  Image  Age  Ethnicity  0.1788 
 (1)
Adult(misc_adult_2) is a tabular dataset containing $48{,}842$ records from the 1994 Census database. We train a fullyconnected neural network to predict the person’s annual income (whether or not more than $50$K a year) and use gender (male or female) as the private attribute. For property and distributional inference attacks, the sex feature is removed.
 (2)
Health(health_heritage) (Heritage Health Prize) is a tabular dataset from Kaggle that contains deidentified medical records of over $55{,}000$ patients’ inpatient or emergency room visits. We train a fullyconnected neural network to predict whether the Charlson Index (an estimate of patient mortality) is greater than zero. We use the patient’s gender (male, female, or unknown) as the private attribute, which is removed for property and distributional inference attacks.
 (3)
CREMAD(cao2014crema) is a multimodal dataset that contains $7{,}442$ emotional speech recordings collected from $91$ actors ($48$ male and $43$ female). Speech signals are preprocessed using OpenSMILE(eyben2010opensmile) to extract a total number of $23{,}990$ utterancelevel audio features for automatic emotion recognition. Following prior work(feng2021attribute), we use EmoBase which is a standard feature set that contains the MFCC, voice quality, fundamental frequency, and other statistical features, resulting in a feature dimension of $988$ for each utterance(haider2021emotion). We train a fully connected neural network to classify four emotions, including happy, sad, anger, and neutral. We use the speaker’s gender (male or female) as the target property for inference attacks.
 (4)
CelebA(liu2015deep) contains $202{,}599$ face images, each of which is labeled with $40$ binary attributes. We resize the images to $32\times 32$ pixels and train a convolutional neural network to classify whether the person is smiling and use whether or not the person has high cheekbones as the target property.
 (5)
UTKFace(zhang2017age) consists of over $20{,}000$ face images annotated with age, gender, and ethnicity. We resize the images to $32\times 32$ pixels and select $22{,}012$ images from the four largest ethnicity groups (White, Black, Asian, or Indian) to train a convolutional neural network to classify three age groups ($030$, $3160$, and $\geq 61$ years old). Ethnicity is used as the target property.
We split each dataset threefold into a training set, a testing set, and a public set. The training set is considered to be private and is only used for model training and inference attack evaluation. The testing set is reserved for evaluating the utility of the ML model. The public set is accessible to both the adversary and the private learner, which can be used as the shadow dataset for training the adversary’s predictive model or developing defenses as described in Section6.We provide a summary of the datasets in Table1, including the task label ${\mathbf{y}}$, the sensitive variable ${\mathbf{a}}$ for AIA and PIA, and the Pearson correlation between ${\mathbf{y}}$ and ${\mathbf{a}}$.
5.1.2. Metrics.
We define the following metrics for measuring inference attack performance:
 (1)
Attack Success Rate (ASR): We measure the attack performance by the number of times the adversary successfully guesses the sensitive variable, i.e., $p=\sum_{t\in[T]}\mathbbm{1}_{\hat{{\bm{a}}}={\bm{a}}}/T$, where $T$ is the total number of trials (i.e., repetitions of the inference game).
 (2)
AUROC: We additionally report the area under the receiver operating characteristic curve (AUROC). For sensitive variables that have more than two classes, we report the macroaveraged AUROC.
 (3)
Advantage: We follow prior work(yeom2018privacy; guo2023analyzing) and use the advantage metric to measure the gain in the adversary’s inferential power upon observing the gradients.Specifically, the advantage of an adversary is defined by comparing its success rate $p$ to a baseline adversary who doesn’t observe the gradients, i.e., $\texttt{Adv}(p)\coloneqq{\max(pp^{*},0)}/{(1p^{*})}\in[0,1]$, where $p^{*}$ is the success rate of the baseline adversary.The Bayes optimal strategy for the baseline adversary without observing gradients is to guess the majority class, i.e., $p^{*}=\operatorname*{arg\,max}_{\bm{a}}\operatorname{\mathbb{P}}({\mathbf{a}}=%{\bm{a}})$.
 (4)
TPR@$1\%$FPR: Besides average performance metrics, recent work on membership inference(carlini2022membership; ye2022enhanced) argue the importance of understanding the privacy risk on worstcase training data by examining the low false positive rate (FPR) region. Inspired by this, we additionally report the true positive rate (TPR) when the FPR is $1\%$.
5.1.3. Adversary’s Model.
We conducted preliminary experiments with various types and configurations of ML models and found that random forest with $50$ estimators performs the best (especially in the low FPR region) for estimating the posterior in AIA, PIA, and DIA with small shadow dataset sizes. For UIA, we use a fullyconnected network with one hidden layer as the encoder. The embedding dimension is set to be $50$ for the CREMAD dataset of $100$ for CelebA dataset. As the gradient vector is extremely high dimensional (e.g., the gradient dimensions for CREMAD and CelebA datasets are $67{,}716$ and $45{,}922$, respectively), we apply a $1$dimensional maxpooling layer before the adversary’s predictive model with a kernel size of $3$ for tabular datasets and $10$ for other datasets for dimensionality reduction.
5.1.4. Other Attack Settings.
We assume the model parameters ${\bm{\theta}}$ are randomly initialized at the beginning of the inference game. During the game, the model parameters are updated at each epoch using SGD with a learning rate of $0.01$.We evaluate AIA on the tabular datasets and UIA on datasets that contain user labels (CREMAD and CelebA), while PIA and DIA are evaluated on all datasets.For AIA, PIA, and DIA, we use a training set of $5{,}000$ samples and a balanced public set that contains a default number of $1{,}000$ samples equally divided for each sensitive attribute/property class. For UIA, we first filter out user identities that contain less than $2\times$ batch size number of samples and then split the dataset according to user identities. We select $15$ and $30$ users on the CREMAD dataset, and $150$ and $300$ users on the CelebA dataset as the training and public sets, respectively. We select more users on the CelebA dataset because the majority of users only have very few samples ($\leq 16$).We set $m=6$ for DIA, i.e., inferring over $6$ ratio bins ($\{0\},(0,0.2],(0.2,0.4],...,(0.8,1]$), and $m=5$ for UIA, i.e., choosing from $5$ candidate users.For AIA and PIA, we assume the adversary has access to a prior of the sensitive variable that is estimated from the population. For DIA and UIA, we assume the adversary holds an uninformed prior, and thus the baseline is simply random guessing.The default batch sizes are $16$ for AIA and PIA, $128$ for DIA, and $8$ for UIA.For AIA, PIA, and DIA, the total number of trials $T$ of each experiment is equal to the number of random draws of training batches (i.e., $5{,}000$); for UIA, $T$ is the number of random draws of candidate sets, which we set to be $1{,}000$.We repeat each experiment with $5$ different random seeds and report the mean and standard deviation of the results.
5.2. Evaluation of Inference Attacks
We evaluate each type of inference attack with a small shadow dataset ($1{,}000$ samples) and compare the results of singleround attacks (where the adversary only observes a single round of gradients) to multiround attacks (where the adversary gets continual observation of the gradients).Due to space limits, we only include a snapshot of the results (one dataset per attack) in Figure2 and provide the full results in Appendix FigureLABEL:fig:sr_mr.
Attribute Inference.We present the results of AIA in FigureLABEL:fig:sr_mr_aia. We observe that the adversary is able to infer the sensitive attribute with high confidence using only $1{,}000$ shadow data samples. For instance, on the Adult dataset, the multiround adversary is able to achieve a high average AUROC of $0.9991$ and a TPR@$1\%$FPR of $0.9823$. On the Health dataset, however, the AUROC of the multiround adversary reduces slightly to $0.8122$ while the TPR@$1\%$FPR drops drastically to $0.1611$. This is likely because the sensitive attribute on the Health dataset contains an “unknown” class ($18.9\%$) that is uncorrelated with other features, making it hard to estimate statistically.
Property Inference.FigureLABEL:fig:sr_mr_pia depicts the results of PIA, where we observe that the adversary is able to achieve high performance across all five datasets.Namely, the average AUROCs of the multiround adversary on the Adult, Health, CREMAD, CelebA, and UTKFace datasets are $0.9919$, $0.8294$, $0.8970$, $0.9993$, and $0.9167$, respectively.This consistent high attack performance is in contrast to the general low correlation between the sensitive properties and the task labels across all datasets as indicated in Table1 (except for CelebA, where a spurious relationship exists), which suggests that the information leakage observed is intrinsic to the computed gradients(melis2019exploiting), regardless of the specific data type and learning task.
Distributional Inference.FigureLABEL:fig:sr_mr_dia summarizes the results of DIA.Although distributional inference is a more challenging task ($6$class ordinal classification), we observe that the multiround adversary still performs fairly well with a batch size of $128$, achieving an average AUROC of $0.8848$, $0.7806$, $0.7572$, $0.9522$, and $0.7664$ on the Adult, Health, CREMAD, CelebA, and UTKFace datasets, respectively.
User Inference.We report the results of UIA in FigureLABEL:fig:sr_mr_uia. We observe that the adversary is able to identify the user with relatively high confidence on the CelebA dataset, with an average AUROC and TPR@$1\%$FPR of $0.8935$ and $0.2828$ for the multiround adversary. On the CREMAD dataset, the average AUROC of the multiround adversary is only $0.6808$, which may be due to the low identifiability of the features extracted for emotion recognition.
General Observations.Additionally, we have the following general observations across different type of attacks and datasets.First, the performance of singleround attacks decreases as the training progresses. This is because the gradients of the training data will become smaller in magnitude as the training loss decreases and thus the variation within these gradients will become harder to capture.Second, on most datasets, the multiround attack performs better than any singleround attack, proving the effectiveness of the Bayesian attack framework.Third, we observe very similar performance for AIA and PIA on the tabular datasets. This indicates that whether the sensitive variable is internal or external to the data features does not affect the inference performance.
5.3. Attack Analyses
We investigate the following factors that may affect the performance of inference attacks.
Impact of Batch Sizes.In Figure3, we study the impact of varying batch sizes on the performance of the inference attacks.We report the results on the Adult dataset for AIA, PIA, and DIA, and results on the CREMAD dataset for UIA.We observe that the performance of all four considered inference attacks improves as the batch size increases. This is because the records within the batch are sampled from the same conditional distribution $\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}}{\mathbf{a}})$.As the private information ${\bm{a}}$ is shared across the batch, a larger batch size would amplify the private information and suppress other varying signals, thereby improving inference performance on ${\bm{a}}$.For distributional inference, the difference in the number of samples with the property between each ratio bin $\lfloor\alpha k\rfloor$ also increases as the batch size increases and thus becomes easier to distinguish.For AIA and PIA, we observe that the gap between the singleround adversary (solid lines) and multiround adversary (dashed lines) is the largest when the batch size is $4$, and then gradually reduces as the batch size increases further due to performance saturation.This result suggests that simply aggregating more data does not protect gradients from inference. In fact, it may even increase the privacy risk in distributed learning where data are sampled from the same conditional distribution. This indicates that data aggregation alone is insufficient to achieve meaningful privacy in these settings.
Impact of Adversary’s Knowledge.To investigate the impact of the adversary’s knowledge on the performance of the attack, we use PIA as an example and plot the attack performance with varying shadow data size and number of observations on the Adult dataset in Figure5.We observe the general trend that the attack performance increases with the number of observations and available shadow data samples. Interestingly, the attack performance does not always increase monotonically along each axis. For instance, given a small shadow dataset of only $100$ samples, the AUROC of an adversary that observes $10$ rounds does not outperform an adversary that only observes $5$ rounds of gradients. This is likely because when the model is near convergence, the gradients are small and thus have low variance, which requires more shadow data to accurately estimate the posterior. Such errors in the predictive model will accumulate when using the summation of the loglikelihoods of all single rounds to approximate the joint distribution (Eq.(3)), eventually leading to suboptimal performance.
Impact of Model Size.In Figure4, we use PIA as an example to study the impact of the machine learning model size. We control the size of the models by varying the model width. Specifically, for fully connected neural networks, we control the number of neurons for the hidden layer. For convolutional neural networks, we control the number of output channels for the first convolutional layer, with the remaining convolutional layers being scaled accordingly.We observe that the attack performance tends to improve slightly with increasing model size, except for the Adult and UTKFace datasets, where performance is saturated. However, most of these improvements are not statistically significant (falling within the margin of error) and thus do not allow for a conclusive statement.We include additional results of other types of inference attacks in Appendix FigureLABEL:fig:model_size, where we make similar observations. These results demonstrate that all four types of inference attacks can be generalized to larger model sizes.
6. Defenses
In this section, we investigate five types of strategies for defending inference from gradients against both static and adaptive adversaries and analyze their performance from an informationtheoretic view. The main takeaways from our analyses are: (1) heuristic defenses can defend static adversaries but are ineffective against adaptive adversaries, (2) DPSGD(abadi2016deep) is the only considered defense that remains effective against adaptive attacks, at the cost of sacrificing model utility, and (3) reducing the mutual information between the released gradients and the sensitive variable is a key ingredient for a successful defense.
6.1. Privacy Defenses Against Inference
Privacyenhancing strategies in machine learning generally follow two principles: data minimization and data anonymization.Data minimization strategies, such as the application of cryptographic techniques (e.g., Secure Multiparty Computation and hom*omorphic Encryption) and Federated Learning, aim to reveal only the minimal amount of information that is necessary for achieving a specific computational task  and only to the necessary parties. As shown by prior work(truex2019hybrid; elkordy2023much; lam2021gradient; kerkouche2023client), data minimization alone may not provide sufficient privacy protection and, thus, should be applied in combination with data anonymization defenses to further reduce privacy risks.However, for heuristicbased privacy defenses, it is important to conduct a careful evaluation of their effectiveness against adaptive adversaries.We consider the following five types of representative defenses from the current literature in our experiments:
 (1)
Gradient Pruning. Gradient pruning creates a sparse gradient vector by pruning gradient elements with small magnitudes. This strategy has been used as a baseline for privacy defense in federated learning(zhu2019deep; sun2021soteria; wu2023learning). By default, we set the pruning rate to be $99\%$.
 (2)
SignSGD. SignSGD(bernstein2018signsgd) binarizes the gradients by applying an elementwise sign function to the gradients,thereby compressing the gradients to 1bit per dimension. Similar to gradient pruning, it has been explored in prior work(wu2023learning; yue2023gradient) as a defense against data reconstruction attacks in federated learning.Along similar lines, Kerkouche et al.(kerkouche2020federated) evaluated SignFed, a variant of the SignSGD protocol adapted for federated settings, and found it to be more resilient to privacy and security attacks than the standard federated learning scheme.
 (3)
Adversarial Perturbation.Inspired by prior research on protecting privacy through adopting evasion attacks in adversarial machine learning(jia2018attriguard; jia2019memguard; shan2020fawkes; o2022voiceblock), we explore a heuristic defense strategy against inference attacks that inject adversarial perturbation to the gradients. Specifically, at each round of observation, the adversary first trains a neural network $f_{\bm{\phi}}:{\mathbf{g}}\rightarrow{\mathbf{a}}$ to classify the sensitive variable ${\bm{a}}$ from the gradient ${\bm{g}}$ using a public dataset (same as the shadow dataset). Then, the defense generates a protective adversarial perturbation to cause $f_{\bm{\phi}}$ to misclassify the perturbed gradients. We adopt $l_{\infty}$bounded projected gradient descent (PGD)(madry2018towards), which generates the adversarial example ${\bm{g}}^{\prime}$ (perturbed gradient) by iteratively taking gradient steps.For AIA, PIA, and DIA, this defense generates an untargeted adversarial perturbation through gradient ascent, i.e.,$\tilde{{\bm{g}}}\leftarrow\prod_{{\mathcal{B}}_{\infty}({\bm{g}},\gamma)}\big{%(}\tilde{{\bm{g}}}+\alpha\cdot\text{sign}(\nabla_{{\bm{g}}}\mathcal{L}({\bm{%\phi}},{\bm{g}},{\bm{a}}))\big{)}$, where ${\mathcal{B}}_{\infty}({\bm{g}},\epsilon)$ is the $l_{\infty}$ norm ball centered around ${\bm{g}}$ with radius $\epsilon$.For UIA, the defense generates a targeted adversarial perturbation through gradient descent, i.e.,$\tilde{{\bm{g}}}\leftarrow\prod_{{\mathcal{B}}_{\infty}({\bm{g}},\gamma)}\big{%(}\tilde{{\bm{g}}}\alpha\cdot\text{sign}(\nabla_{{\bm{g}}}\mathcal{L}({\bm{%\phi}},{\bm{g}},{\bm{a}}_{t}))\big{)}$, to make the gradients misrecognized as the target user ${\bm{a}}_{t}$.By default, we set the total number of steps to be $5$, $\gamma=0.005$, and $\alpha=0.002$.
 (4)
Variational Information Bottleneck (VIB). This defense inserts an additional VIB layer(alemi2016deep) that splits the neural network $f_{\bm{\theta}}$ into a probabilistic encoder $p({\mathbf{h}}{\mathbf{x}})$ and a decoder $q({\mathbf{y}}{\mathbf{h}})$, where ${\mathbf{h}}$ is a latent representation that follows a Gaussian distribution.An additional KullbackLeibler (KL) divergence term is introduced to the training loss: $\mathcal{L}_{VIB}=\mathcal{L}({\bm{\theta}},{\mathcal{D}})+\beta\cdot KL(p({%\mathbf{h}}{\mathbf{x}})q({\mathbf{z}}))$, where $q({\mathbf{z}})=\mathcal{N}(\bm{0},\bm{I})$ is the standard Gaussian. Optimizing this VIB objective reduces the mutual information $I({\mathbf{x}};{\mathbf{h}})$ between the representation and the input by minimizing a variational upper bound. Prior work suggests that this helps to reduce the model’s dependence on input’s sensitive attributes and improve privacy(song2019overlearning; scheliga2022precode; scheliga2023privacy). We set $\beta=0.01$ as the default for our experiments.
 (5)
Differential Privacy (DPSGD).Differential privacy (DP)(dwork2006calibrating) provides a rigorous notion of algorithmic privacy.
6.2. Defense Evaluation
In Figure6, we compare the performance of defenses against static and adaptive adversaries. Due to space limits, here we focus on PIA on the adult dataset. The full results including all four types of inference attacks are available in Appendix FigureLABEL:fig:defenses_full.We observe that heuristic defenses such as Gradient Pruning, SignSGD, and Adversarial Perturbation can successfully defend against static adversaries in terms of reducing the advantage of the adversary to zero. However, these defenses are ineffective against adaptive adversaries aware of the defense. For instance, in the case of gradient pruning, the adaptive adversary can achieve a high advantage ($0.7841$) that is only slightly decreased compared to no defense ($0.9363$).Interestingly, in the case of Adversarial Perturbation, we found that the adaptive adversary’s performance is increased, rather than decreased, compared to no defense, reaching a perfect advantage and AUROC of $1.00$.For the rest of the defenses, namely, VIB and DPSGD, the attack performance is consistent across static and adaptive adversaries. However, only DPSGD manages to effectively reduce the advantage of the adaptive adversary to near zero.
To understand the privacyutility tradeoff of these defenses, we plot the PIA adversary’s advantage evaluated on the training data versus the measured AUROC of the network on predicting the task label on the test dataset on the Adult dataset in Figure7. We consider three different sets of parameters for each type of defense (details in Appendix). We observe that in the case of static adversaries, SignSGD achieves the best tradeoff that approximates the ideal defense (upper left corner) by reducing the advantage to zero without affecting model utility. However, in the case of adaptive adversary, only DPSGD provides a meaningful notion of privacy, at the cost of diminishing model utility.Moreover, there may exist stronger adversaries that are more resilient against these defenses.For instance, in Table2, we show that an adversary using principal component analysis (PCA) with $50$ principal dimensions as dimensionality reduction can bypass the DPSGD defense with $\varepsilon=96.90$ and $\delta=10^{5}$ that defends an adversary using maxpooling, and requires $15\times$ larger noise to thwart.
In the next section, we analyze the underlying principles of these defenses and the necessary ingredients for a successful defense.
6.3. Defense Analyses
In this section, we provide an informationtheoretic perspective for understanding and analyzing defenses against inference attacks from gradients.
Informationtheoretic View on Inference Privacy.The inference attacks captured in the unified game can be viewed as performing statistical inference(du2012privacy) on properties of the underlying data distributions upon observing samples of the gradients.A wellknown informationtheoretic result for analyzing inference is Fano’s inequality, which guarantees a lower bound on the estimation error of any inference adversary.Formally, consider any arbitrary data release mechanism that provides ${\mathbf{Y}}$ computed from the private discrete random variable ${\mathbf{X}}$ supported on ${\mathcal{X}}$.Any inference from the observation ${\mathbf{Y}}$ must produce an estimate $\hat{{\mathbf{X}}}$ that satisfies the Markov chain ${\mathbf{X}}\rightarrow{\mathbf{Y}}\rightarrow\hat{{\mathbf{X}}}$.Let ${\mathbf{e}}$ be a binary random variable that indicates an error, i.e., ${\mathbf{e}}=1$ if $\hat{{\mathbf{X}}}\neq{\mathbf{X}}$. Then we have
(6)  $H({\mathbf{X}}{\mathbf{Y}})\leq H({\mathbf{X}}\hat{{\mathbf{X}}})\leq H_{2}(%{\mathbf{e}})+\operatorname{\mathbb{P}}({\mathbf{e}}=1)\log({\mathcal{X}}1),$ 
where $H_{2}({\mathbf{e}})=\operatorname{\mathbb{P}}(e=1)\log\operatorname{\mathbb{P%}}(e=1)\big{(}1\operatorname{\mathbb{P}}(e=1)\big{)}\log\big{(}1%\operatorname{\mathbb{P}}(e=1)\big{)}$ is the binary entropy.For ${\mathcal{X}}>2$, a standard treatment is to consider the mutual information $I({\mathbf{X}};{\mathbf{Y}})=H({\mathbf{X}})H({\mathbf{X}}{\mathbf{Y}})$ and $H_{2}({\mathbf{e}})\leq\log 2$, and thereby we can obtain a lower bound on the error probability:
(7)  $\operatorname{\mathbb{P}}(\hat{{\mathbf{X}}}\neq{\mathbf{X}})\geq\frac{H({%\mathbf{X}})I({\mathbf{X}};{\mathbf{Y}})\log 2}{\log({\mathcal{X}}1)}.$ 
Note that this bound is vacuous when ${\mathcal{X}}=2$, and a slightly tighter bound can be obtained by considering $H_{2}({\mathbf{e}})$ exactly (rather than using the approximating bound of $\log 2$) and numerically computing the lowest error probability that satisfies the inequality in(6), as noted by prior work(guo2023analyzing).The bound in inequality (7) captures both the prior (via $H({\mathbf{X}})$) and the cardinality of the sensitive variable alphabet, indicating that data with a large degree of uncertainty is hard to infer or reconstruct, which aligns with intuition from Balle et al.(balle2022reconstructing).Inequality (7) generically holds for any data release mechanism. In the context of inference from gradients, the adversary’s goal is to obtain an estimate of ${\mathbf{a}}$ upon observing $\tilde{{\mathbf{g}}}$, which can be described as a Markov chain of ${\mathbf{a}}\rightarrow{\mathbf{x}}\rightarrow{\mathbf{g}}\rightarrow\tilde{{%\mathbf{g}}}\rightarrow\hat{{\mathbf{a}}}$.Since the adversary’s success rate is $p=1\operatorname{\mathbb{P}}({\mathbf{e}}=1)$, one can get an immediate upper bound on the adversary’s advantage:
(8)  $\texttt{Adv}(p)\leq 1\frac{H({\mathbf{a}})I({\mathbf{a}};\tilde{{\mathbf{g}}%})\log 2}{(1p^{*})\log(m1)}.$ 
As $H({\mathbf{a}})$ is a constant, this indicates that reducing $I({\mathbf{a}};\tilde{{\mathbf{g}}})$ results in increasing the lower bound of the error probability and consequently diminishing the adversary’s advantage.This analysis can be generalized to continuous sensitive variables by applying continuum Fano’s inequality(duchi2013distance).
Understanding Defenses.Next, we provide an explanation of the failures of heuristic defenses using the above framework and argue that a successful defense should effectively minimize the mutual information $I({\mathbf{a}};\tilde{{\mathbf{g}}})$ between the gradients and the sensitive variable.The Gradient Pruning and SignSGD defenses can be viewed as trying to reduce the number of transmitted bits in the gradients. However, this does not necessarily reduce the mutual information.The neural network classifier $f_{\bm{\phi}}:{\mathbf{g}}\rightarrow{\mathbf{a}}$ used in the Adversarial Perturbation defense is trained to minimize crossentropy loss,which provides an approximate upper bound on the conditional entropy $H({\mathbf{a}}{\mathbf{g}})$, and serves as a proxy for estimating the mutual information $I({\mathbf{a}};\tilde{{\mathbf{g}}})=H({\mathbf{a}})H({\mathbf{a}}{\mathbf{g%}})$.However, generating adversarial perturbations to produce $\tilde{{\mathbf{g}}}$ against this fixed classifier does not necessarily result in a reduction of the mutual information $I({\mathbf{a}};\tilde{{\mathbf{g}}})$, and likely increases it.This is because the gradient steps $\nabla_{{\bm{g}}}\mathcal{L}({\bm{\phi}},{\bm{g}},{\bm{a}})$ used to generate the protective perturbation also contain information about ${\bm{a}}$. As the perturbation generation process is deterministic, an adaptive adversary can learn to pick up these patterns and gain additional advantage.In the case of VIB, the mechanism is stochastic but optimizing the VIB objective only gradually reduces the mutual information $I({\mathbf{x}};{\mathbf{h}})$ between the latent representation ${\mathbf{h}}$ and the input ${\mathbf{x}}$, which still does not guarantee a reduction in $I({\mathbf{a}};\tilde{{\mathbf{g}}})$ during the optimization process.By design, differential privacy is not intended to protect against statistical inference as its goal is to preserve the statistical properties of the dataset while protecting the privacy of individual samples.However, an alternative informationtheoretical interpretation of differential privacy is that it places a constraint on mutual information(bun2016concentrated; cuff2016differential). An easy way to see this is that by adding Gaussian noises to the gradients, the DPSGD algorithm essentially creates a Gaussian channel between the true and released gradients, thereby placing a constraint on $I({\mathbf{g}};\tilde{{\mathbf{g}}})$, which further bounds $I({\mathbf{a}};\tilde{{\mathbf{g}}})$ as $I({\mathbf{a}};\tilde{{\mathbf{g}}})\leq I({\mathbf{g}};\tilde{{\mathbf{g}}})$ according to the data processing inequality.More concretely, due tothe Gaussian channel $\tilde{{\mathbf{g}}}={\mathbf{g}}+{\mathcal{N}}(\bm{0},\sigma^{2}\bm{I})$,we have the upper bound given by the channel capacity$I({\mathbf{g}};\tilde{{\mathbf{g}}})\leq\frac{1}{2}\log(1+\frac{P}{\sigma})$, if the gradients ${\bm{g}}$ satisfy an average power constraint $\mathbb{E}[\{\bm{g}}\_{2}^{2}]\leq nP$, where $n$ is the dimensionality of ${\bm{g}}$.One can obtain a stronger result in cases where the $l_{2}$ sensitivity is bounded (e.g., Theorem 2 in (guo2023analyzing)).
It is worth noting that the goal of our analyses here is to provide a perspective for understanding the effectiveness of a class of defense strategies, rather than deriving tight bounds.Additionally, as mutual information is a statistical quantity, the mutual information interpretation of inference privacy inherently only captures the averagecase privacy risk.In the next section, we provide a privacy auditing framework for empirically estimating the privacy risk by approximating the worstcase scenario.
$\varepsilon$  Adversary Type  AUROC  TPR@1%FPR  ASR  Advantage  

96.90  MaxPooling 



 
96.90  PCA 



 
6.46  PCA 




7. Empirical Estimation of Privacy Risk
In the privacy game defined in Definition3.1, the data is randomly sampled from the distribution, which only captures the averagecase privacy risk and therefore cannot be used for reasoning about the minimal level of noise required for ensuring a certain level of privacy, as it may underestimate the privacy risk in the worst case.To better understand the privacy risk in the worstcase scenario, we provide a privacy auditing framework for empirically estimating the privacy leakage of a specific type of inference attack, namely, attribute inference, by allowing the data to be chosen adversarially.We start with a formal definition of perattribute privacy following prior work(ahmed2016social; ghazi2022algorithms):
Definition 7.1.
Perattribute DP.A randomized mechanism ${\mathcal{M}}$ is $(\varepsilon,\delta)$perattribute DP if for all pairs of inputs $x,x^{\prime}$ differing only on a single attribute and for all events $S$ defined on the output of ${\mathcal{M}}$, the following inequality holds:
$\operatorname{\mathbb{P}}[{\mathcal{M}}(x)\in S]\leq e^{\varepsilon}\cdot%\operatorname{\mathbb{P}}[{\mathcal{M}}(x^{\prime})\in S]+\delta.$ 
One can show that DPSGD satisfies $(\varepsilon,\delta)$perattribute DP. However, it is hard to derive the privacy parameter analytically, as the perattribute sensitivity of the gradient is not readily tractable and the common technique of gradient clipping only provides a very loose bound on sensitivity.Instead, we seek to obtain an empirical estimate of the perattribute DP for each step through the following audit game.