Analyzing Inference Privacy Risks Through Gradients In Machine Learning (2024)

Zhuohang LiVanderbilt UniversityNashvilleTNUSAzhuohang.li@vanderbilt.edu,Andrew LowyUniversity of Wisconsin–MadisonMadisonWIUSAalowy@wisc.edu,Jing LiuMitsubishi Electric Research LaboratoriesCambridgeMAUSAjiliu@merl.com,Toshiaki Koike-AkinoMitsubishi Electric Research LaboratoriesCambridgeMAUSAkoike@merl.com,Kieran ParsonsMitsubishi Electric Research LaboratoriesCambridgeMAUSAparsons@merl.com,Bradley MalinVanderbilt UniversityNashvilleTNUSAb.malin@vanderbilt.eduandYe WangMitsubishi Electric Research LaboratoriesCambridgeMAUSAyewang@merl.com

Abstract.

In distributed learning settings, models are iteratively updated with shared gradients computed from potentially sensitive user data. While previous work has studied various privacy risks of sharing gradients, our paper aims to provide a systematic approach to analyze private information leakage from gradients. We present a unified game-based framework that encompasses a broad range of attacks including attribute, property, distributional, and user disclosures. We investigate how different uncertainties of the adversary affect their inferential power via extensive experiments on five datasets across various data modalities. Our results demonstrate the inefficacy of solely relying on data aggregation to achieve privacy against inference attacks in distributed learning.We further evaluate five types of defenses, namely, gradient pruning, signed gradient descent, adversarial perturbations, variational information bottleneck, and differential privacy, under both static and adaptive adversary settings. We provide an information-theoretic view for analyzing the effectiveness of these defenses against inference from gradients. Finally, we introduce a method for auditing attribute inference privacy, improving the empirical estimation of worst-case privacy through crafting adversarial canary records.

1. Introduction

Ensuring privacy is an important prerequisite for adopting machine learning (ML) algorithms in critical domains that require training on sensitive user data, such as medical records, personal financial information, private images, and speech.Prominent ML models, ranging from compact neural networks tailored for mobile platforms(howard2017mobilenets) to large foundation models(brown2020language; rombach2022high), areoften trained on user data via gradient-based iterative optimization.In many cases, such as decentralized learning(dhasade2023decentralized; hsieh2017gaia) or federated learning (FL)(mcmahan2017communication; hard2018federated; guliani2021training), model gradients are directly exchanged in place of raw training data to facilitatejoint learning, which opens up an additional channel for potential privacy leakage(lowy2022private).

Recent works have explored information leakage through this gradient channel in various forms, albeit in isolation.For instance, Nasr et al.(nasr2019comprehensive) showed that it is feasible to infer membership (i.e., single-bit information indicating the existence of a target record in the training data pool) from model updates in federated learning.Beyond membership, Melis et al.(melis2019exploiting) demonstrated inference over sensitive properties of the training data in collaborative learning.Other independent lines of work additionally explored attribute inference(lyu2021novel; driouich2022novel) and data reconstruction(zhu2019deep; geiping2020inverting; gupta2022recovering) through shared model gradients.However, some emerging privacy concerns that have so far only been considered under the centralized learning setting, such as the distributional inference(suri2022formalizing; chaudhari2023snap) and user-level inference(kandpal2023user; li2022user), have not been well investigated in the gradient leakage setting.

Existing studies on information leakage from gradients have several limitations.First, the majority of the current literature focuses on investigating each individual type of inference attack under their specific threat models while lacking a comprehensive examination of inference attack performance under various adversarial assumptions, which is essential for providing a holistic view of the adversary’s capabilities.For instance, from the attack’s perspective, assuming the adversary to have access to a reasonably-sized shadow dataset and limited rounds of access to the model’s gradients helps to capture the realistic inference privacy risk under a practical threat model. Conversely, from the defense’s perspective, assuming a powerful adversary with access to record-level gradients and auxiliary information about the private record helps to estimate the worst-case privacy risk, which may facilitate the design of more robust defenses.Second, while several types of heuristic defenses have been explored by prior work, their supposed effectiveness has not been fully verified under more challenging adaptive adversary settings. Moreover, existing studies do not adequately explain why some defenses succeed in reducing the inference risk over gradients, while others fail,which could provide important guidance on the design of more effective defenses.

Analyzing Inference Privacy Risks Through Gradients In Machine Learning (1)

In this paper, we conduct a systematic analysis of private information leakage from gradients.We start by defining a unified inference game that broadly encompasses four types of inference attacks that aims at inferring common private information of the data from gradients, namely, attribute inference attack (AIA), property inference attack (PIA), distributional inference attack (DIA), and user inference attack (UIA), as illustrated in Figure1.Under this framework, we show that information leakage from gradients can be treated as performing statistical inference over a sensitive variable upon observing samples of the gradients, with different definitions of the information encapsulated by the variable being inferred, leading to a generic template for constructing different types of inference attacks.We additionally explore different tiers of adversarial assumptions, with varying numbers of available data samples, numbers of observable rounds of gradients, and varying batch sizes, to investigate how different priors and uncertainties in the adversary’s knowledge about the gradient and data distribution affect the adversary’s inferential power.

We perform a systematic evaluation of these attacks on five datasets (Adult(misc_adult_2), Health(health_heritage), CREMA-D(cao2014crema), CelebA(liu2015deep), UTKFace(zhang2017age)) with three different data modalities (tabular, speech, and image).A common setting in distributed learning is that the data distribution is heterogeneous across different nodes but hom*ogeneous within each node.Under this assumption, where the sensitive variable is common across a batch, we show that a larger batch size leads to higher inference privacy risk from gradients across all considered attacks, highlighting that solely relying on data aggregation is insufficient for achieving meaningful privacy in distributed learning.With a moderate batch size (e.g., 16161616), we show that an adversary can launch successful inference attacks with very few shadow data samples (1,000absent1000\leq 1{,}000≤ 1 , 000). For instance, in the case of property inference on the Adult dataset, the adversary can achieve 0.920.920.920.92 AUROC with only 100100100100 shadow data samples.Moreover, we demonstrate that an adversary with access to multiple rounds of gradient updates can perform Bayesian inference to aggregate adversarial knowledge, eventually leading to higher confidence and better attack performance.

We apply the developed inference attacks to evaluate the effectiveness of five common types of defenses from the privacy literature(zhu2019deep; sun2021soteria; wu2023learning; jia2018attriguard; jia2019memguard; shan2020fawkes; song2019overlearning; scheliga2022precode; scheliga2023privacy), including Gradient Pruning(zhu2019deep), Signed Stochastic Gradient Descent (SignSGD)(bernstein2018signsgd), Adversarial Perturbations(madry2018towards), Variational Information Bottleneck (VIB)(alemi2016deep), and Differential Privacy (DP-SGD)(abadi2016deep), against both static adversaries that are unaware of the defense and adaptive adversaries that can adapt to the defense mechanism. We find that most heuristic defense methods only offer a weak notion of “security through obscurity”, in the sense that they defend against static adversaries empirically but can be easily bypassed by adaptive adversaries.Although DP-SGD shows consistent performance against both static and adaptive adversaries, to fully prevent inference attacks, it often requires injecting too much noise which diminishes the utility of the learning model.We provide an information-theoretic perspective for explaining and analyzing the (in)effectiveness of these considered defenses and show that the key ingredient of a successful defense is to effectively reduce the mutual information between the released gradients and the sensitive variable, which could serve as a guideline for designing future defenses.Finally, to provide practical guidance in selecting privacy parameters, we introduce an auditing approach for empirically estimating the privacy loss of attribute inference attacks through crafting adversarial canary records to approximate the privacy risk in the worst case.

In summary, our main contributions are as follows:

  • We provide a holistic analysis of inference privacy from gradients through a unified inference game that broadly encompasses a range of attacks concerning attribute, property, distributional, and user inference.

  • We demonstrate the weakness of solely relying on data aggregation to achieve privacy against inference attacks in distributed learning. We do this through a systematic evaluation of the four types of attacks on datasets with different modalities under various adversarial assumptions.

  • Our analyses reveal that reducing the mutual information between the released gradients and the sensitive variable is the key ingredient of a successful defense. This is shown by investigating five common types of defense strategies against inference over gradients from an information-theoretic perspective.

  • Our auditing results provide an empirical justification for tolerating large DP parameters when defending against attribute inference attacks (c.f.(lowy2024does)). This is achieved by implementing an auditing method for empirically estimating the privacy loss against attribute inference attacks from gradients.

2. Background and Related Work

2.1. Machine Learning Notation

A machine learning (ML) model can be denoted as a function f𝜽:𝐱𝐲:subscript𝑓𝜽𝐱𝐲f_{\bm{\theta}}:{\mathbf{x}}\rightarrow{\mathbf{y}}italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT : bold_x → bold_y parameterized by 𝜽𝜽{\bm{\theta}}bold_italic_θ that maps from the input (feature) space to the output (label) space.The training of an ML model involves a set of training data and an optimization procedure, such as stochastic gradient descent (SGD). At each step of SGD, a loss function (𝜽,𝒟b)𝜽subscript𝒟𝑏\mathcal{L}({\bm{\theta}},{\mathcal{D}}_{b})caligraphic_L ( bold_italic_θ , caligraphic_D start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) is first computed based on the current model and a batch of k𝑘kitalic_k training samples 𝒟b={(𝒙i,𝒚i)}i=1ksubscript𝒟𝑏superscriptsubscriptsubscript𝒙𝑖subscript𝒚𝑖𝑖1𝑘{\mathcal{D}}_{b}=\{({\bm{x}}_{i},{\bm{y}}_{i})\}_{i=1}^{k}caligraphic_D start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = { ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT and then a set of gradients is computed as 𝒈=𝜽(𝜽,𝒟b)𝒈subscript𝜽𝜽subscript𝒟𝑏{\bm{g}}=\nabla_{{\bm{\theta}}}\mathcal{L}({\bm{\theta}},{\mathcal{D}}_{b})bold_italic_g = ∇ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT caligraphic_L ( bold_italic_θ , caligraphic_D start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ). Finally, the model is updated by taking a gradient step towards minimizing the loss.

2.2. Related Work

Developing ML models in many applications involves training on the users’ private data, which introduces privacy leakage risks from different components of the ML model across several stages of the development and deployment pipeline.

Leakage From Model Parameters (θ𝜃{\bm{\theta}}bold_italic_θ).The first way of exposing privacy information is through analyzing the model parameters.This is connected to the most prominent centralized ML setting, where the model is first developed on a local dataset and then released to the users for deployment.Various forms of privacy leakage have been studied in this setting.White-box membership inference(leino2020stolen; nasr2019comprehensive; sablayrolles2019white) aims at identifying the presence of individual records in the training dataset given access to the full model.Data extraction attacks exploit the memorization of the ML model to extract training samples(haim2022reconstructing; carlini2023extracting), whereas model inversion attacks generate synthetic data samples from the training distribution(yin2020dreaming; wang2021variational).In contrast, for distributional inference attacks(ateniese2015hacking; ganju2018property; suri2022formalizing), the attacker’s goal is to make inferences about the entire training data distribution rather than individuals.

Leakage From Model Outputs (fθ(x)subscript𝑓𝜃𝑥f_{\bm{\theta}}({\bm{x}})italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x )).Another source of privacy leakage is the model output, which is related to more restrictive settings such as machine learning as a service (MLaaS) in cloud APIs where only black-box access to the ML model is granted. Under this setting, researchers have studied several privacy attacks that can be launched by querying the model and observing the outputs.For instance, query-based model inversion attacks(fredrikson2014privacy; fredrikson2015model) exploit the predicted confidence or labels from the model to make inferences about the input data instance(zhang2020secret) or attribute(mehnaz2022your).Model stealing attacks attempt to recover the confidential model weights(tramer2016stealing) or hyper-parameters(wang2018stealing) given query access to the model.Black-box membership inference attacks(salem2018ml; truex2019demystifying; sablayrolles2019white; song2021systematic) and black-box distributional inference attacks(mahloujifar2022property; chaudhari2023snap) allow an adversary to decide whether a data point was included in training or reveal information about the training data distribution by analyzing its output prediction or confidence.

Leakage From Model Gradients (g𝑔{\bm{g}}bold_italic_g).The final source of privacy leakage is the gradient of the loss function with respect to the model parameters, which is essential for updating the model with stochastic gradient descent. This is relevant to ML settings that release intermediate model updates during model development, such as distributed training, federated learning, peer-to-peer learning, and online learning.Compared to model parameters, model gradients carry more nuanced information about a small batch of data used for computing the update and thus may reveal more information about the underlying data instances.Current literature studies different types of gradient-based privacy leakage in isolation.One line of work focused on data reconstruction from model gradients(zhu2019deep; geiping2020inverting) or updates(salem2020updates; haim2022reconstructing) with various data types, such as image(zhu2019deep; geiping2020inverting; yin2021see; li2022auditing), text(gupta2022recovering; haim2022reconstructing), tabular(vero2023tableak), and speech data(li2023speech).However, these attacks rely on strong adversarial assumptions and do not generalize to large batch sizes(huang2021evaluating).Another line of work investigated the extraction of private attributes or properties(melis2019exploiting; feng2021attribute) of the private data from model gradients.Specifically, Melis et al.(melis2019exploiting) first revealed that gradients shared in collaborative learning can be used to infer properties of the training data that are uncorrelated with the task label.Lyu et al.(lyu2021novel) explored attribute reconstruction from epoch-averaged gradients on tabular and genomics data.Feng et al.(feng2021attribute) discovered that gradients of Speech Emotion Recognition models leak information about user demographics such as gender and age.Dang et al.(dang2022method) showed that speaker identities can be revealed from the gradients of Automatic Speech Recognition models.Kerkouche et al.(kerkouche2023client) demonstrated the weakness of secure aggregation without differential privacy in Federated learning by designing a disaggregation attack that exploits the linearity of model aggregation and client participation across multiple rounds to capture client-specific properties.In contrast to existing studies that design separate treatments for each type of attack, in this work, we take a holistic view of information leakage from gradients.

3. Problem Formalization

This section introduces four types of inference attacks from gradients, namely, attribute inference, property inference, distributional inference, and user inference. We formally define information leakage from gradients using a unified security game, following standard practices in machine learning privacy studies(salem2023sok), and discuss variants of threat models that affect the adversary’s inferential power. In Section4, we describe methods to construct these attacks.

3.1. Attack Definitions

We consider four types of information leakage from model gradients that generally involve two parties, namely, a private learner who releases model gradients computed on a private data batch, and an adversary who tries to make inferences about the private data given access to the gradients.This generic setting captures multiple ML application scenarios such as distributed training, federated learning, and online learning.

Attribute Inference.Attribute inference attacks (AIA) seek to infer a data record’s unknown attribute (feature) from its gradient.Prior works in both centralized(wu2016methodology; yeom2018privacy) and federated settings(lyu2021novel; driouich2022novel) usually assume the record to be partially known.For instance, infer a missing entry (e.g., genotype) of a person’s medical record(fredrikson2014privacy).It is worth noting that, in practice, when the attributes are not completely independent, an adversary with partial knowledge about the record may be able to infer the unknown attribute just from the known ones, as in data imputation(jayaraman2022attribute).

Property Inference.Property inference attacks (PIA) aim to infer a global property of the private data batch that is not directly present in the data feature space but is correlated with some of the features (and consequently the gradients). For tabular data, these properties could be sensitive features that have been intentionally excluded from training (e.g., pseudo-identifiers in health records that are required to be removed for HIPAA compliance); for high-dimensional data like image and speech, they could be some high-level statistical features capturing the semantics of the data instance (e.g., race of a face image(melis2019exploiting) or gender of a speech recording(feng2021attribute)).

Distributional Inference.Distributional inference attacks (DIA) aim to infer the ratio of the training samples (α𝛼\alphaitalic_α) that satisfy some target property111Some prior work also refers to distributional inference as property inference..The majority of current literature on DIA(ganju2018property; suri2022formalizing; mahloujifar2022property; chaudhari2023snap) is in the space of centralized learning, which captures leakage from model parameters. These studies usually define DIA as a distinguishing test between two worlds where the model is trained on two datasets with different ratios (α0subscript𝛼0\alpha_{0}italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and α1subscript𝛼1\alpha_{1}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT)(suri2022formalizing). This can be further categorized into property existence tests that decide if there exists any data point with the target property in the training set and property size estimation tests that infer the exact ratio of the property in the training data(chaudhari2023snap).In this work, we extend DIA to the gradient space and consider a general case that combines property existence and property size estimationby formulating DIA as performing ordinal classification between a set of m𝑚mitalic_m ratio bins (m3𝑚3m\geq 3italic_m ≥ 3), i.e., {0},(0,1m1],(1m1,2m1],,(m2m1,1]001𝑚11𝑚12𝑚1𝑚2𝑚11\{0\},(0,\frac{1}{m-1}],(\frac{1}{m-1},\frac{2}{m-1}],...,(\frac{m-2}{m-1},1]{ 0 } , ( 0 , divide start_ARG 1 end_ARG start_ARG italic_m - 1 end_ARG ] , ( divide start_ARG 1 end_ARG start_ARG italic_m - 1 end_ARG , divide start_ARG 2 end_ARG start_ARG italic_m - 1 end_ARG ] , … , ( divide start_ARG italic_m - 2 end_ARG start_ARG italic_m - 1 end_ARG , 1 ].

User Inference. User inference attacks (UIA) or re-identification attacks aim to identify which user’s data was used to compute the observed gradients. Here, the adversary does not know the user’s exact data used for computing the gradients. Instead, the adversary is provided a set of candidate users and their corresponding underlying user-level data distributions. This setting shares similarities with the subject-level membership inference(suri2022subject) in the sense that both attacks measure the privacy risk at the granularity of each individual. However, the user inference attack aims to infer richer information that directly exposes the user’s identity compared to the membership inference attack, which only discloses a single bit of information (i.e., whether a given user’s data sample is involved in training). Thus user inference can be considered as a generalization of subject-level membership inference attack.

We note that except for attribute inference which directly exposes (part of) the user’s private data, property inference, distributional inference, and user inference attacks are inferential disclosures (also known as deductive disclosures) that exploit the statistical correlation exists in data to infer sensitive information from the released gradients with high confidence.We exclude record-level privacy attacks such as membership inference and data reconstruction as our analysis here focuses on distributed learning scenarios where private information can be shared across different data samples within a batch.

3.2. Unified Inference Game

Our framework aims to capture an abstraction of privacy problems in distributed learning settings, where an attacker aims to recover some sensitive information of a particular client from their shared gradients (or model updates).In practical distributed learning settings, the data may be heterogeneously split across the clients, and an attacker may take advantage of side information about a particular client’s local data distribution.Generally, the objective of the attacker is to recover the sensitive information, represented by the variable 𝐚𝐚{\mathbf{a}}bold_a, which is related to the local data distribution of the client through a joint distribution (𝐱,𝐲,𝐚)=(𝐚)(𝐱,𝐲|𝐚)𝐱𝐲𝐚𝐚𝐱conditional𝐲𝐚\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}},{\mathbf{a}})=%\operatorname{\mathbb{P}}({\mathbf{a}})\operatorname{\mathbb{P}}({\mathbf{x}},%{\mathbf{y}}|{\mathbf{a}})blackboard_P ( bold_x , bold_y , bold_a ) = blackboard_P ( bold_a ) blackboard_P ( bold_x , bold_y | bold_a ).As we will detail later, specific choices in what 𝐚𝐚{\mathbf{a}}bold_a represents and the corresponding specialized structure of (𝐱,𝐲,𝐚)𝐱𝐲𝐚\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}},{\mathbf{a}})blackboard_P ( bold_x , bold_y , bold_a ) enable the framework to capture attribute, property, distributional, and user inference privacy problems.This joint distribution may capture both the side information available to the attacker and the inherent heterogeneity of the data.To focus on evaluating the effectiveness of gradient-based attacks and defenses,we simplify the modeling of the overall training procedure, by updating the model in a centralized fashion on the entire training data set 𝒟𝒟{\mathcal{D}}caligraphic_D, but generating gradients for the attacker on batches drawn according to (𝐱,𝐲,𝐚)𝐱𝐲𝐚\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}},{\mathbf{a}})blackboard_P ( bold_x , bold_y , bold_a ).

Definition 3.1.

Unified Inference Game.Let (𝐱,𝐲,𝐚)𝐱𝐲𝐚\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}},{\mathbf{a}})blackboard_P ( bold_x , bold_y , bold_a ) be the joint distribution, {\mathcal{L}}caligraphic_L the loss function, 𝒯𝒯{\mathcal{T}}caligraphic_T the training algorithm, r𝑟ritalic_r the total number of training rounds, and [r]delimited-[]𝑟{\mathcal{R}}\subset[r]caligraphic_R ⊂ [ italic_r ] a set of rounds that are observable to the adversary222We use [r]delimited-[]𝑟[r][ italic_r ] to denote the discrete set {1,2,,r}12𝑟\{1,2,...,r\}{ 1 , 2 , … , italic_r }..The unified inference game from gradients between a challenger (private learner) and an adversary is as follows:

  1. (1)

    Challenger initializes the model parameters as 𝜽0subscript𝜽0{\bm{\theta}}_{0}bold_italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

  2. (2)

    Challenger samples a training dataset 𝒟={(𝒙j,𝒚j)}j=1n𝒟superscriptsubscriptsubscript𝒙𝑗subscript𝒚𝑗𝑗1𝑛{\mathcal{D}}=\{({\bm{x}}_{j},{\bm{y}}_{j})\}_{j=1}^{n}caligraphic_D = { ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, where (𝒙j,𝒚j)i.i.d.(𝐱,𝐲)({\bm{x}}_{j},{\bm{y}}_{j})\overset{\mathrm{i.i.d.}}{\sim}\operatorname{%\mathbb{P}}({\mathbf{x}},{\mathbf{y}})( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_OVERACCENT roman_i . roman_i . roman_d . end_OVERACCENT start_ARG ∼ end_ARG blackboard_P ( bold_x , bold_y ).

  3. (3)

    Challenger draws the sensitive variable 𝒂(𝐚)similar-to𝒂𝐚{\bm{a}}\sim\operatorname{\mathbb{P}}({\mathbf{a}})bold_italic_a ∼ blackboard_P ( bold_a ).

  4. (4)

    Challenger draws a batch of k𝑘kitalic_k data samples 𝒟𝒂={(𝒙p,𝒚p)}p=1ksubscript𝒟𝒂superscriptsubscriptsubscript𝒙𝑝subscript𝒚𝑝𝑝1𝑘{\mathcal{D}}_{\bm{a}}=\{({\bm{x}}_{p},{\bm{y}}_{p})\}_{p=1}^{k}caligraphic_D start_POSTSUBSCRIPT bold_italic_a end_POSTSUBSCRIPT = { ( bold_italic_x start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , bold_italic_y start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_p = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT, where (𝒙p,𝒚p)i.i.d.(𝐱,𝐲|𝒂)({\bm{x}}_{p},{\bm{y}}_{p})\overset{\mathrm{i.i.d.}}{\sim}\operatorname{%\mathbb{P}}({\mathbf{x}},{\mathbf{y}}|{\bm{a}})( bold_italic_x start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , bold_italic_y start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) start_OVERACCENT roman_i . roman_i . roman_d . end_OVERACCENT start_ARG ∼ end_ARG blackboard_P ( bold_x , bold_y | bold_italic_a ), for the given 𝒂𝒂{\bm{a}}bold_italic_a.

  5. (5)

    Challenger computes the gradient of the loss on the data batch, 𝒈i=𝜽i1(𝜽i1,𝒟𝒂)subscript𝒈𝑖subscriptsubscript𝜽𝑖1subscript𝜽𝑖1subscript𝒟𝒂{\bm{g}}_{i}=\nabla_{{\bm{\theta}}_{i-1}}\mathcal{L}({\bm{\theta}}_{i-1},{%\mathcal{D}}_{\bm{a}})bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∇ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_L ( bold_italic_θ start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , caligraphic_D start_POSTSUBSCRIPT bold_italic_a end_POSTSUBSCRIPT ).

  6. (6)

    Challenger applies the defense mechanism \mathcal{M}caligraphic_M to produce a privatized version of the gradient 𝒈~i=(𝒈i)subscript~𝒈𝑖subscript𝒈𝑖\tilde{{\bm{g}}}_{i}=\mathcal{M}({\bm{g}}_{i})over~ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = caligraphic_M ( bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ).When no defense is applied, \mathcal{M}caligraphic_M is simply the identity function, i.e., 𝒈~i=𝒈isubscript~𝒈𝑖subscript𝒈𝑖\tilde{{\bm{g}}}_{i}={\bm{g}}_{i}over~ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

  7. (7)

    The model is updated by applying the training algorithm on the training dataset for one epoch 𝜽i𝒯(𝜽i1,𝒟,,)subscript𝜽𝑖𝒯subscript𝜽𝑖1𝒟{\bm{\theta}}_{i}\leftarrow{\mathcal{T}}({\bm{\theta}}_{i-1},{\mathcal{D}},%\mathcal{L},\mathcal{M})bold_italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← caligraphic_T ( bold_italic_θ start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , caligraphic_D , caligraphic_L , caligraphic_M ).

  8. (8)

    Steps (5)-(7) are repeated for r𝑟ritalic_r rounds.

  9. (9)

    A static adversary 𝒜ssubscript𝒜𝑠{\mathcal{A}}_{s}caligraphic_A start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT gets access to {\mathcal{L}}caligraphic_L, 𝒯𝒯{\mathcal{T}}caligraphic_T, (𝐱,𝐲,𝐚)𝐱𝐲𝐚\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}},{\mathbf{a}})blackboard_P ( bold_x , bold_y , bold_a ), and the set of (intermediate) model parameters Θ={𝜽i1|i}Θconditional-setsubscript𝜽𝑖1𝑖\Theta=\{{\bm{\theta}}_{i-1}|i\in{\mathcal{R}}\}roman_Θ = { bold_italic_θ start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT | italic_i ∈ caligraphic_R } and released gradients 𝒢={𝒈~i|i}𝒢conditional-setsubscript~𝒈𝑖𝑖{\mathcal{G}}=\{\tilde{{\bm{g}}}_{i}|i\in{\mathcal{R}}\}caligraphic_G = { over~ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_i ∈ caligraphic_R }. An adaptive adversary 𝒜asubscript𝒜𝑎{\mathcal{A}}_{a}caligraphic_A start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT also gets the defense mechanism \mathcal{M}caligraphic_M.

  10. (10)

    The adversary outputs its inference 𝒂^^𝒂\hat{{\bm{a}}}over^ start_ARG bold_italic_a end_ARG of the sensitive variable, i.e., 𝒂^𝒜s(,𝒯,(𝐱,𝐲,𝐚),Θ,𝒢)^𝒂subscript𝒜𝑠𝒯𝐱𝐲𝐚Θ𝒢\hat{{\bm{a}}}\leftarrow{\mathcal{A}}_{s}({\mathcal{L}},{\mathcal{T}},%\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}},{\mathbf{a}}),\Theta,{%\mathcal{G}})over^ start_ARG bold_italic_a end_ARG ← caligraphic_A start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( caligraphic_L , caligraphic_T , blackboard_P ( bold_x , bold_y , bold_a ) , roman_Θ , caligraphic_G ) for the static adversary, or 𝒂^𝒜a(,𝒯,(𝐱,𝐲,𝐚),Θ,𝒢,)^𝒂subscript𝒜𝑎𝒯𝐱𝐲𝐚Θ𝒢\hat{{\bm{a}}}\leftarrow{\mathcal{A}}_{a}({\mathcal{L}},{\mathcal{T}},%\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}},{\mathbf{a}}),\Theta,{%\mathcal{G}},\mathcal{M})over^ start_ARG bold_italic_a end_ARG ← caligraphic_A start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( caligraphic_L , caligraphic_T , blackboard_P ( bold_x , bold_y , bold_a ) , roman_Θ , caligraphic_G , caligraphic_M ) for the adaptive adversary. The adversary wins if 𝒂^=𝒂^𝒂𝒂\hat{{\bm{a}}}={\bm{a}}over^ start_ARG bold_italic_a end_ARG = bold_italic_a and loses otherwise.

In the above general game, the flexibility of the joint distribution (𝐱,𝐲,𝐚)𝐱𝐲𝐚\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}},{\mathbf{a}})blackboard_P ( bold_x , bold_y , bold_a ) allows capturing various scenarios.Rather than explicitly defining this joint distribution, which anyways depends on the unknown data distribution, we implicitly define it through transformations/filtering of a given data set.Further, providing the adversary with knowledge of the distribution (𝐱,𝐲,𝐚)𝐱𝐲𝐚\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}},{\mathbf{a}})blackboard_P ( bold_x , bold_y , bold_a ) is realized by providing the adversary with suitable shadow datasets drawn according to such transformations and filtering operations.

Attribute Inference Game.The variable 𝒂[m]𝒂delimited-[]𝑚{\bm{a}}\in[m]bold_italic_a ∈ [ italic_m ] is a discrete attribute within the features 𝒙𝒙{\bm{x}}bold_italic_x.Sampling 𝒂(𝐚)similar-to𝒂𝐚{\bm{a}}\sim\operatorname{\mathbb{P}}({\mathbf{a}})bold_italic_a ∼ blackboard_P ( bold_a ) is accomplished by drawing uniformly or according to its marginal empirical distribution within the given training data set 𝒟𝒟{\mathcal{D}}caligraphic_D. Drawing the data batch 𝒟𝒂subscript𝒟𝒂{\mathcal{D}}_{\bm{a}}caligraphic_D start_POSTSUBSCRIPT bold_italic_a end_POSTSUBSCRIPT according to the distribution (𝐱,𝐲|𝒂)𝐱conditional𝐲𝒂\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}}|{\bm{a}})blackboard_P ( bold_x , bold_y | bold_italic_a ), is accomplished by uniformly selecting data samples (𝒙,𝒚)𝒙𝒚({\bm{x}},{\bm{y}})( bold_italic_x , bold_italic_y ) from the entire training data set 𝒟𝒟{\mathcal{D}}caligraphic_D with features 𝒙𝒙{\bm{x}}bold_italic_x that possess the attribute 𝒂𝒂{\bm{a}}bold_italic_a.

Property Inference Game.This scenario is similar to attribute inference, except that 𝒂[m]𝒂delimited-[]𝑚{\bm{a}}\in[m]bold_italic_a ∈ [ italic_m ] is a property associated with, but external to the features of, each data sample (i.e., 𝒂𝒂{\bm{a}}bold_italic_a may be some meta-data property of each sample, but excluded from the features of 𝒙𝒙{\bm{x}}bold_italic_x).Drawing the data batch 𝒟𝒂subscript𝒟𝒂{\mathcal{D}}_{\bm{a}}caligraphic_D start_POSTSUBSCRIPT bold_italic_a end_POSTSUBSCRIPT is handled similarly to the attribute inference case.

Distributional Inference Game. In this class of scenarios, we have a general set of m𝑚mitalic_m transformations {Φ𝒂|𝒂[m]}conditional-setsubscriptΦ𝒂𝒂delimited-[]𝑚\{\Phi_{\bm{a}}|{\bm{a}}\in[m]\}{ roman_Φ start_POSTSUBSCRIPT bold_italic_a end_POSTSUBSCRIPT | bold_italic_a ∈ [ italic_m ] }, which are selected by the sensitive variable 𝒂𝒂{\bm{a}}bold_italic_a.Each transformation Φ𝒂subscriptΦ𝒂\Phi_{\bm{a}}roman_Φ start_POSTSUBSCRIPT bold_italic_a end_POSTSUBSCRIPT corresponds to implicitly realizing the corresponding (𝐱,𝐲|𝒂)𝐱conditional𝐲𝒂\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}}|{\bm{a}})blackboard_P ( bold_x , bold_y | bold_italic_a ), by applying a general transformation that involves selective sampling from the overall training set 𝒟𝒟{\mathcal{D}}caligraphic_D.For example, the selection of 𝒂𝒂{\bm{a}}bold_italic_a may indicate a particular proportion for the prevalence of a certain attribute or property, and thus the corresponding transformation would select batches of data according to that proportion.

User Inference Game. This is a special case of property inference, where 𝒂𝒂{\bm{a}}bold_italic_a specifically corresponds to the identity of an individual that provided the corresponding data samples.Unlike other inference attacks, the sensitive variable, as it represents identity, does not take on a fixed set of values. To make the attack more operational, similar to prior work on data reconstruction(hayes2024bounding), we assume the inference is over a fixed set of m𝑚mitalic_m candidate users randomly sampled from the population at the beginning of each game.

3.3. Threat Model

In this work, we assume the adversary has no control over the training protocol and only passively observes gradients as the model is being updated.In practice, the adversary could be an honest-but-curious parameter server(li2014scaling) in a distributed learning or federated learning setting, a node in decentralized learning(dhasade2023decentralized), or an attacker who eavesdrops on the communication channel.The game as defined in Definition3.1 is similar to games defined in many prior works(carlini2022membership; yeom2018privacy) which captures the average-case privacy as the performance of the attack is measured by its expected value over the random draw of data samples.In Section7, we consider an alternative game where the data samples are adversarially chosen to provide a measure of worst-case privacy for privacy auditing.

We consider the following aspects that reflect different levels of the adversary’s knowledge:

  • Knowledge of Data Distribution.Similar to many prior works on inference attacks(shokri2017membership; melis2019exploiting; ye2022enhanced; suri2022formalizing; carlini2022membership; liu2022ml; chaudhari2023snap), we model the adversarial knowledge of the data distribution through access to data samples drawn from this distribution, which are referred to as shadow datasets. A larger shadow dataset implies a more powerful adversary that has more knowledge about the underlying data distribution.For discrete attributes, we additionally consider a more informed adversary who knows the prior distribution of the attribute, which can be estimated by drawing a large amount of data from the population.

  • Continuous Observation. We use the observable set {\mathcal{R}}caligraphic_R to capture the adversary’s ability to observe the gradients continuously. Intuitively, an adversary observing multiple rounds should perform better than a single-round adversary.Assuming a powerful adversary is beneficial for analyzing and auditing defenses. For instance, the privacy analysis in DP-SGD(abadi2016deep) assumes that the adversary has access to all rounds of gradients.

  • Adaptive Adversary. When evaluating defenses, in addition to the static adversary, we consider a stronger adaptive adversary who is aware of the underlying defense mechanism. This has been demonstrated as pivotal for thoroughly assessing the effectiveness of security defenses(carlini2017adversarial; tramer2020adaptive).

4. Attack Construction

4.1. Inference Attacks

The objective of the inference adversary is to infer the sensitive variable from the observed gradient, i.e., modeling the posterior distribution (𝐚|𝐠)conditional𝐚𝐠\operatorname{\mathbb{P}}({\mathbf{a}}|{\mathbf{g}})blackboard_P ( bold_a | bold_g ).The general strategy of implementing inference attacks from gradients is to exploit the following two adversarial assumptions as defined in the unified inference game in Section3.2.First, the adversary possesses knowledge about the underlying population data distribution. Operationally, this implies that the adversary is able to draw data samples (𝒙,𝒚)𝒙𝒚({\bm{x}},{\bm{y}})( bold_italic_x , bold_italic_y ) with corresponding sensitive variable 𝒂𝒂{\bm{a}}bold_italic_a from (𝐱,𝐲,𝐚)𝐱𝐲𝐚\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}},{\mathbf{a}})blackboard_P ( bold_x , bold_y , bold_a ) to construct a shadow dataset.Second, the adversary has access to the training algorithm and the current model parameters, which allows the adversary to compute the gradients 𝒈𝒈{\bm{g}}bold_italic_g for each batch of samples within the shadow dataset.With this information, the adversary can train a predictive model P𝝎(𝐚|𝐠)subscript𝑃𝝎conditional𝐚𝐠P_{\bm{\omega}}({\mathbf{a}}|{\mathbf{g}})italic_P start_POSTSUBSCRIPT bold_italic_ω end_POSTSUBSCRIPT ( bold_a | bold_g ) to approximate the posterior.

Attribute & Property Inference.The attribute and property inference attacks follow a similar attack procedure, with the difference being whether the sensitive variable 𝐚𝐚{\mathbf{a}}bold_a is internal or external to the data record.Specifically, the adversary first constructs a shadow dataset 𝒟𝒔subscript𝒟𝒔{\mathcal{D}}_{\bm{s}}caligraphic_D start_POSTSUBSCRIPT bold_italic_s end_POSTSUBSCRIPT by sampling from the population distribution, i.e., 𝒟𝒔={(𝒙j,𝒚j,𝒂j)}j=1ssubscript𝒟𝒔superscriptsubscriptsubscript𝒙𝑗subscript𝒚𝑗subscript𝒂𝑗𝑗1𝑠{\mathcal{D}}_{\bm{s}}=\{({\bm{x}}_{j},{\bm{y}}_{j},{\bm{a}}_{j})\}_{j=1}^{s}caligraphic_D start_POSTSUBSCRIPT bold_italic_s end_POSTSUBSCRIPT = { ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT where (𝒙j,𝒚j,𝒂j)i.i.d.(𝐱,𝐲,𝒂)({\bm{x}}_{j},{\bm{y}}_{j},{\bm{a}}_{j})\overset{\mathrm{i.i.d.}}{\sim}%\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}},{\bm{a}})( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_OVERACCENT roman_i . roman_i . roman_d . end_OVERACCENT start_ARG ∼ end_ARG blackboard_P ( bold_x , bold_y , bold_italic_a ).Then the adversary draws data batches 𝒟𝒂={(𝒙j,𝒚j)}j=1ksubscript𝒟𝒂superscriptsubscriptsubscript𝒙𝑗subscript𝒚𝑗𝑗1𝑘{\mathcal{D}}_{\bm{a}}=\{({\bm{x}}_{j},{\bm{y}}_{j})\}_{j=1}^{k}caligraphic_D start_POSTSUBSCRIPT bold_italic_a end_POSTSUBSCRIPT = { ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT from the shadow dataset through bootstrapping. This is achieved by repeatedly sampling the sensitive attribute 𝒂𝒂{\bm{a}}bold_italic_a and then drawing k𝑘kitalic_k records that have the sensitive attribute from 𝒟𝒔subscript𝒟𝒔{\mathcal{D}}_{\bm{s}}caligraphic_D start_POSTSUBSCRIPT bold_italic_s end_POSTSUBSCRIPT.Next, for each data batch 𝒟𝒂subscript𝒟𝒂{\mathcal{D}}_{\bm{a}}caligraphic_D start_POSTSUBSCRIPT bold_italic_a end_POSTSUBSCRIPT, the adversary computes the gradient 𝒈𝒂=𝜽(𝜽,𝒟𝒂)subscript𝒈𝒂subscript𝜽𝜽subscript𝒟𝒂{\bm{g}}_{\bm{a}}=\nabla_{{\bm{\theta}}}\mathcal{L}({\bm{\theta}},{\mathcal{D}%}_{\bm{a}})bold_italic_g start_POSTSUBSCRIPT bold_italic_a end_POSTSUBSCRIPT = ∇ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT caligraphic_L ( bold_italic_θ , caligraphic_D start_POSTSUBSCRIPT bold_italic_a end_POSTSUBSCRIPT ) using the current model parameters 𝜽𝜽{\bm{\theta}}bold_italic_θ.This results in a set of labeled data pairs (𝒈𝒂,𝒂)subscript𝒈𝒂𝒂({\bm{g}}_{\bm{a}},{\bm{a}})( bold_italic_g start_POSTSUBSCRIPT bold_italic_a end_POSTSUBSCRIPT , bold_italic_a ), which can then be used for training an ML model P𝝎(𝐚|𝐠)subscript𝑃𝝎conditional𝐚𝐠P_{\bm{\omega}}({\mathbf{a}}|{\mathbf{g}})italic_P start_POSTSUBSCRIPT bold_italic_ω end_POSTSUBSCRIPT ( bold_a | bold_g ) that predicts the sensitive variable from gradient observations.In practice, we find that it is beneficial to train the predictive model using a balanced dataset, which can be seen as modeling (𝐚|𝐠)(𝐚)conditional𝐚𝐠𝐚\frac{\operatorname{\mathbb{P}}({\mathbf{a}}|{\mathbf{g}})}{\operatorname{%\mathbb{P}}({\mathbf{a}})}divide start_ARG blackboard_P ( bold_a | bold_g ) end_ARG start_ARG blackboard_P ( bold_a ) end_ARG, and capture the prior knowledge in a separate term. This provides more stable performance for small shadow dataset sizes and skewed sensitive variable distributions.

It is worth noting that here we are considering a more restrictive setting for attribute inference where the adversary holds no additional knowledge about the private data besides the gradients compared to prior works that assume the private record to be partially known (e.g., (lyu2021novel; driouich2022novel) assume that everything is known except for the sensitive attribute).Our framework can be easily extended to the general case where the adversary holds arbitrary additional knowledge φ(𝒙)𝜑𝒙\varphi({\bm{x}})italic_φ ( bold_italic_x ) about the private record 𝒙𝒙{\bm{x}}bold_italic_x by training a predictive model P𝝎(𝐚|𝐠,φ(𝒙))subscript𝑃𝝎conditional𝐚𝐠𝜑𝒙P_{\bm{\omega}}({\mathbf{a}}|{\mathbf{g}},\varphi({\bm{x}}))italic_P start_POSTSUBSCRIPT bold_italic_ω end_POSTSUBSCRIPT ( bold_a | bold_g , italic_φ ( bold_italic_x ) ) using shadow data drawn from (𝐱,𝐲,𝐚|φ(𝒙))𝐱𝐲conditional𝐚𝜑𝒙\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}},{\mathbf{a}}|\varphi({\bm{%x}}))blackboard_P ( bold_x , bold_y , bold_a | italic_φ ( bold_italic_x ) ).

Distributional Inference.In distributional inference, the sensitive variable is the index of the ratio bin to which the property ratio belongs.The adversary first samples a random bin index 𝒂𝒂{\bm{a}}bold_italic_a and then samples a property ratio α𝛼\alphaitalic_α within that bin.Next, the adversary draws a data batch 𝒟𝒂subscript𝒟𝒂{\mathcal{D}}_{\bm{a}}caligraphic_D start_POSTSUBSCRIPT bold_italic_a end_POSTSUBSCRIPT with αk𝛼𝑘\lfloor\alpha k\rfloor⌊ italic_α italic_k ⌋ records with the property and the rest without the property and derives the gradient 𝒈𝒂subscript𝒈𝒂{\bm{g}}_{\bm{a}}bold_italic_g start_POSTSUBSCRIPT bold_italic_a end_POSTSUBSCRIPT. This process is repeated by the adversary to collect a set of labeled gradients and attribute pairs (𝒈𝒂,𝒂)subscript𝒈𝒂𝒂({\bm{g}}_{\bm{a}},{\bm{a}})( bold_italic_g start_POSTSUBSCRIPT bold_italic_a end_POSTSUBSCRIPT , bold_italic_a ) to train a predictive model.We note that in the setting of distributional inference, the sensitive variable is a series of ordinal numbers indicative of the continuous property ratio α𝛼\alphaitalic_α and thus should not be treated as regular multi-class classification.To utilize the ordering information, we adopt a simple strategy to ordinal classification(frank2001simple), which transforms the m𝑚mitalic_m-class ordinal classification problem into m1𝑚1m-1italic_m - 1 binary classifications. Specifically, the adversary trains a series of m1𝑚1m-1italic_m - 1 binary classifiers, with the i𝑖iitalic_i-th classifier P𝝎i(𝐚>i|𝐠)subscript𝑃subscript𝝎𝑖𝐚conditional𝑖𝐠P_{{\bm{\omega}}_{i}}({\mathbf{a}}>i|{\mathbf{g}})italic_P start_POSTSUBSCRIPT bold_italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_a > italic_i | bold_g ) trained to decide whether or not 𝒂𝒂{\bm{a}}bold_italic_a is larger than i𝑖iitalic_i. The final posterior probability can be obtained as

P𝝎(𝐚=𝒂|𝐠)={1P𝝎1(𝐚>1|𝐠),if𝒂=1P𝝎𝒂1(𝐚>𝒂1|𝐠)P𝝎𝒂(𝐚>𝒂|𝐠),if1<𝒂<mP𝝎m1(𝐚>m1|𝐠),if𝒂=m.subscript𝑃𝝎𝐚conditional𝒂𝐠cases1subscript𝑃subscript𝝎1𝐚conditional1𝐠if𝒂1subscript𝑃subscript𝝎𝒂1𝐚𝒂conditional1𝐠subscript𝑃subscript𝝎𝒂𝐚conditional𝒂𝐠if1𝒂𝑚subscript𝑃subscript𝝎𝑚1𝐚𝑚conditional1𝐠if𝒂𝑚\displaystyle P_{\bm{\omega}}({\mathbf{a}}={\bm{a}}|{\mathbf{g}})=\begin{cases%}1-P_{{\bm{\omega}}_{1}}({\mathbf{a}}>1|{\mathbf{g}}),&\text{if }{\bm{a}}=1\\P_{{\bm{\omega}}_{{\bm{a}}-1}}({\mathbf{a}}>{\bm{a}}-1|{\mathbf{g}})-P_{{\bm{%\omega}}_{\bm{a}}}({\mathbf{a}}>{\bm{a}}|{\mathbf{g}}),&\text{if }1<{\bm{a}}<m%\\P_{{\bm{\omega}}_{m-1}}({\mathbf{a}}>m-1|{\mathbf{g}}),&\text{if }{\bm{a}}=m\\\end{cases}.italic_P start_POSTSUBSCRIPT bold_italic_ω end_POSTSUBSCRIPT ( bold_a = bold_italic_a | bold_g ) = { start_ROW start_CELL 1 - italic_P start_POSTSUBSCRIPT bold_italic_ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_a > 1 | bold_g ) , end_CELL start_CELL if bold_italic_a = 1 end_CELL end_ROW start_ROW start_CELL italic_P start_POSTSUBSCRIPT bold_italic_ω start_POSTSUBSCRIPT bold_italic_a - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_a > bold_italic_a - 1 | bold_g ) - italic_P start_POSTSUBSCRIPT bold_italic_ω start_POSTSUBSCRIPT bold_italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_a > bold_italic_a | bold_g ) , end_CELL start_CELL if 1 < bold_italic_a < italic_m end_CELL end_ROW start_ROW start_CELL italic_P start_POSTSUBSCRIPT bold_italic_ω start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_a > italic_m - 1 | bold_g ) , end_CELL start_CELL if bold_italic_a = italic_m end_CELL end_ROW .

User Inference.In contrast to other inference attacks where the sensitive variable is sampled from a well-defined set of values, in user inference, the sensitive variable is the user’s identity, which does not take on a fixed set of values.Moreover, the identities that occur during test time are likely not seen during the development of the attack model. As a result, the posterior (𝐚|𝐠)conditional𝐚𝐠\operatorname{\mathbb{P}}({\mathbf{a}}|{\mathbf{g}})blackboard_P ( bold_a | bold_g ) cannot be directly modeled.To resolve this, we employ a training strategy analogous to the prototypical network(snell2017prototypical) for few-shot learning. Specifically, we first train a neural network f𝝎usubscript𝑓𝝎𝑢f_{\bm{\omega}}\circ uitalic_f start_POSTSUBSCRIPT bold_italic_ω end_POSTSUBSCRIPT ∘ italic_u that is composed of an encoder f𝝎:𝐠𝐡:subscript𝑓𝝎𝐠𝐡f_{\bm{\omega}}:{\mathbf{g}}\rightarrow{\mathbf{h}}italic_f start_POSTSUBSCRIPT bold_italic_ω end_POSTSUBSCRIPT : bold_g → bold_h that maps the gradient vector to a continuous embedding space and a classifier u:𝐡𝐚:𝑢𝐡𝐚u:{\mathbf{h}}\rightarrow{\mathbf{a}}italic_u : bold_h → bold_a that takes the embedding as input and outputs the predicted user identity. Given gradient and sensitive variable pairs (𝒈,𝒂)𝒈𝒂({\bm{g}},{\bm{a}})( bold_italic_g , bold_italic_a ) created from the shadow dataset, as the number of available users in the shadow dataset is finite, the neural network can be trained in an end-to-end manner using standard multi-class classification loss such as cross-entropy. After training, the classifier u𝑢uitalic_u is discarded. At the time of inference, the adversary is provided with an observed gradient 𝒈~~𝒈\tilde{{\bm{g}}}over~ start_ARG bold_italic_g end_ARG and a set of m𝑚mitalic_m candidate data batches {𝒟i|i[m]}conditional-setsubscript𝒟𝑖𝑖delimited-[]𝑚\{{\mathcal{D}}_{i}|i\in[m]\}{ caligraphic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_i ∈ [ italic_m ] }, where 𝒟i={(𝒙j,𝒚j)}j=1ksubscript𝒟𝑖superscriptsubscriptsubscript𝒙𝑗subscript𝒚𝑗𝑗1𝑘{\mathcal{D}}_{i}=\{({\bm{x}}_{j},{\bm{y}}_{j})\}_{j=1}^{k}caligraphic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT.Then, the adversary can derive the corresponding set of candidate gradients {𝒈i|i[m]}conditional-setsubscript𝒈𝑖𝑖delimited-[]𝑚\{{\bm{g}}_{i}|i\in[m]\}{ bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_i ∈ [ italic_m ] } based on the current model parameters 𝜽𝜽{\bm{\theta}}bold_italic_θ.Finally, the adversary computes the probability of each candidate identity after observing the gradient as

P𝝎(𝐚=𝒂|𝐠=𝒈~)=exp(f𝝎(𝒈𝒂)f𝝎(𝒈~)2)i[m]exp(f𝝎(𝒈i)f𝝎(𝒈~)2).subscript𝑃𝝎𝐚conditional𝒂𝐠~𝒈subscriptnormsubscript𝑓𝝎subscript𝒈𝒂subscript𝑓𝝎~𝒈2subscript𝑖delimited-[]𝑚subscriptnormsubscript𝑓𝝎subscript𝒈𝑖subscript𝑓𝝎~𝒈2P_{\bm{\omega}}({\mathbf{a}}={\bm{a}}|{\mathbf{g}}=\tilde{{\bm{g}}})=\frac{%\exp{(-||f_{\bm{\omega}}({\bm{g}}_{\bm{a}})-f_{\bm{\omega}}(\tilde{{\bm{g}}})|%|_{2})}}{\sum_{i\in[m]}\exp{(-||f_{\bm{\omega}}({\bm{g}}_{i})-f_{\bm{\omega}}(%\tilde{{\bm{g}}})||_{2})}}.italic_P start_POSTSUBSCRIPT bold_italic_ω end_POSTSUBSCRIPT ( bold_a = bold_italic_a | bold_g = over~ start_ARG bold_italic_g end_ARG ) = divide start_ARG roman_exp ( - | | italic_f start_POSTSUBSCRIPT bold_italic_ω end_POSTSUBSCRIPT ( bold_italic_g start_POSTSUBSCRIPT bold_italic_a end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT bold_italic_ω end_POSTSUBSCRIPT ( over~ start_ARG bold_italic_g end_ARG ) | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ [ italic_m ] end_POSTSUBSCRIPT roman_exp ( - | | italic_f start_POSTSUBSCRIPT bold_italic_ω end_POSTSUBSCRIPT ( bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT bold_italic_ω end_POSTSUBSCRIPT ( over~ start_ARG bold_italic_g end_ARG ) | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG .

4.2. Continual Attack and Adaptive Attack

The inference attack can be further improved if the adversary has access to multiple rounds of gradients or the defense mechanism.

Inference under Continual Observation.In cases where continual observation of the gradients is allowed, the adversary can use the set of observed gradients 𝒢={𝒈~i|i}𝒢conditional-setsubscript~𝒈𝑖𝑖{\mathcal{G}}=\{\tilde{{\bm{g}}}_{i}|i\in{\mathcal{R}}\}caligraphic_G = { over~ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_i ∈ caligraphic_R } from multiple rounds to improve the attack. A naive solution would be to train a model to directly approximate (𝐚|𝒢)conditional𝐚𝒢\operatorname{\mathbb{P}}({\mathbf{a}}|{\mathcal{G}})blackboard_P ( bold_a | caligraphic_G ). However, this would be generally infeasible in practice because of the high dimensionality of 𝒢𝒢{\mathcal{G}}caligraphic_G.Instead, the adversary can use Bayesian updating to accumulate adversarial knowledge.Specifically, given a set of observed gradients, the log-posterior can be formulated as

(1)log\displaystyle\logroman_log(𝐚=𝒂|𝒢)𝐚conditional𝒂𝒢\displaystyle\operatorname{\mathbb{P}}({\mathbf{a}}={\bm{a}}|{\mathcal{G}})blackboard_P ( bold_a = bold_italic_a | caligraphic_G )
(2)=log(𝒢|𝐚=𝒂)+log(𝐚=𝒂)log(𝒢)absentconditional𝒢𝐚𝒂𝐚𝒂𝒢\displaystyle=\log{\operatorname{\mathbb{P}}({\mathcal{G}}|{\mathbf{a}}={\bm{a%}})}+\log{\operatorname{\mathbb{P}}({\mathbf{a}}={\bm{a}})}-\log{\operatorname%{\mathbb{P}}({\mathcal{G}})}= roman_log blackboard_P ( caligraphic_G | bold_a = bold_italic_a ) + roman_log blackboard_P ( bold_a = bold_italic_a ) - roman_log blackboard_P ( caligraphic_G )
(3)ilog(𝒈~i|𝐚=𝒂)+log(𝐚=𝒂)log(𝒢)absentsubscript𝑖conditionalsubscript~𝒈𝑖𝐚𝒂𝐚𝒂𝒢\displaystyle\approx\sum_{i\in{\mathcal{R}}}\log{\operatorname{\mathbb{P}}(%\tilde{{\bm{g}}}_{i}|{\mathbf{a}}={\bm{a}})}+\log\operatorname{\mathbb{P}}({%\mathbf{a}}={\bm{a}})-\log{\operatorname{\mathbb{P}}({\mathcal{G}})}≈ ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_R end_POSTSUBSCRIPT roman_log blackboard_P ( over~ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_a = bold_italic_a ) + roman_log blackboard_P ( bold_a = bold_italic_a ) - roman_log blackboard_P ( caligraphic_G )
=i(log(𝐚=𝒂|𝒈~i)+log(𝒈~i)log(𝐚=𝒂))absentsubscript𝑖𝐚conditional𝒂subscript~𝒈𝑖subscript~𝒈𝑖𝐚𝒂\displaystyle=\sum_{i\in{\mathcal{R}}}\bigg{(}\log\operatorname{\mathbb{P}}({%\mathbf{a}}={\bm{a}}|\tilde{{\bm{g}}}_{i})+\log\operatorname{\mathbb{P}}(%\tilde{{\bm{g}}}_{i})-\log\operatorname{\mathbb{P}}({\mathbf{a}}={\bm{a}})%\bigg{)}= ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_R end_POSTSUBSCRIPT ( roman_log blackboard_P ( bold_a = bold_italic_a | over~ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + roman_log blackboard_P ( over~ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - roman_log blackboard_P ( bold_a = bold_italic_a ) )
(4)+log(𝐚=𝒂)log(𝒢)𝐚𝒂𝒢\displaystyle\qquad+\log\operatorname{\mathbb{P}}({\mathbf{a}}={\bm{a}})-\log{%\operatorname{\mathbb{P}}({\mathcal{G}})}+ roman_log blackboard_P ( bold_a = bold_italic_a ) - roman_log blackboard_P ( caligraphic_G )
(5)=ilog(𝐚=𝒂|𝒈~i)(||1)log(𝐚=𝒂)+𝒞,absentsubscript𝑖𝐚conditional𝒂subscript~𝒈𝑖1𝐚𝒂𝒞\displaystyle=\sum_{i\in{\mathcal{R}}}\log\operatorname{\mathbb{P}}({\mathbf{a%}}={\bm{a}}|\tilde{{\bm{g}}}_{i})-(|{\mathcal{R}}|-1)\log\operatorname{\mathbb%{P}}({\mathbf{a}}={\bm{a}})+{\mathcal{C}},= ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_R end_POSTSUBSCRIPT roman_log blackboard_P ( bold_a = bold_italic_a | over~ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - ( | caligraphic_R | - 1 ) roman_log blackboard_P ( bold_a = bold_italic_a ) + caligraphic_C ,

where Eq.(3) makes the approximating assumption that the gradients are conditionally independent given 𝒂𝒂{\bm{a}}bold_italic_a.Since 𝒞=log(𝒢)+ilog(𝒈~i)𝒞𝒢subscript𝑖subscript~𝒈𝑖{\mathcal{C}}=-\log{\operatorname{\mathbb{P}}({\mathcal{G}})}+\sum_{i\in{%\mathcal{R}}}\log{\operatorname{\mathbb{P}}(\tilde{{\bm{g}}}_{i})}caligraphic_C = - roman_log blackboard_P ( caligraphic_G ) + ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_R end_POSTSUBSCRIPT roman_log blackboard_P ( over~ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is independent of 𝒂𝒂{\bm{a}}bold_italic_a, and therefore it can be treated as a constant. 𝒞=0𝒞0{\mathcal{C}}=0caligraphic_C = 0 if the gradients 𝒈~isubscript~𝒈𝑖\tilde{{\bm{g}}}_{i}over~ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are additionally mutually independent. In Eq.(5), the prior term is known and (𝐚=𝒂|𝒈~i)𝐚conditional𝒂subscript~𝒈𝑖\operatorname{\mathbb{P}}({\mathbf{a}}={\bm{a}}|\tilde{{\bm{g}}}_{i})blackboard_P ( bold_a = bold_italic_a | over~ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) can be approximated by training a fresh model for each round of observation. The sensitive variable can thus be estimated as 𝒂^=argmax𝒂log(𝐚=𝒂|𝒢)^𝒂subscriptargmax𝒂𝐚conditional𝒂𝒢\hat{{\bm{a}}}=\operatorname*{arg\,max}_{\bm{a}}\log\operatorname{\mathbb{P}}(%{\mathbf{a}}={\bm{a}}|{\mathcal{G}})over^ start_ARG bold_italic_a end_ARG = start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT bold_italic_a end_POSTSUBSCRIPT roman_log blackboard_P ( bold_a = bold_italic_a | caligraphic_G ).

Adaptive Attack.The adversary can design adaptive attacks if the defense mechanism {\mathcal{M}}caligraphic_M is known.Instead of training the predictive model P𝝎(𝐚|𝐠)subscript𝑃𝝎conditional𝐚𝐠P_{\bm{\omega}}({\mathbf{a}}|{\mathbf{g}})italic_P start_POSTSUBSCRIPT bold_italic_ω end_POSTSUBSCRIPT ( bold_a | bold_g ) using clean gradient pairs (𝒈𝒂,𝒂)subscript𝒈𝒂𝒂({\bm{g}}_{\bm{a}},{\bm{a}})( bold_italic_g start_POSTSUBSCRIPT bold_italic_a end_POSTSUBSCRIPT , bold_italic_a ), a simple strategy for adaptive attack is to apply the same defense mechanism to the shadow data’s gradients and use the transformed gradient pairs ((𝒈𝒂),𝒂)subscript𝒈𝒂𝒂({\mathcal{M}}({\bm{g}}_{\bm{a}}),{\bm{a}})( caligraphic_M ( bold_italic_g start_POSTSUBSCRIPT bold_italic_a end_POSTSUBSCRIPT ) , bold_italic_a ) to train the predictive model P𝝎(𝐚|(𝐠))subscript𝑃𝝎conditional𝐚𝐠P_{\bm{\omega}}({\mathbf{a}}|{\mathcal{M}}({\mathbf{g}}))italic_P start_POSTSUBSCRIPT bold_italic_ω end_POSTSUBSCRIPT ( bold_a | caligraphic_M ( bold_g ) ).As we will show in Section6, this simple strategy is sufficient to bypass several heuristic-based defenses.

5. Attack Evaluation

In this section, we evaluate the four inference attacks on datasets with different modalities to investigate the impact of various adversarial assumptions. The findings we present below indicate the key factors that affect the attack performance are: (1) Continual Observation: an adversary can improve the inference by accumulating information from multiple rounds of updates, (2) Batch Size: when the private information is shared across the batch, using a large batch averages out the effect of the other variables, making it easier to infer the sensitive variable, and (3) Adversarial Knowledge: the attack improves with the amount of knowledge of the data distribution (as captured by the number of available shadow data points).

5.1. Experimental Setup

5.1.1. Datasets and Model Architecture.

We consider the following five datasets with different data modalities (tabular, speech, and image) in our experiments.

DatasetTypeTask LabelSensitive VariableCorrelation
AdultTabularIncomeGender-0.1985
HealthTabularMortalityGender-0.1123
CREMA-DSpeechEmotionGender-0.0133
CelebAImageSmilingHigh Cheekbones0.6904
UTKFaceImageAgeEthnicity-0.1788

  1. (1)

    Adult(misc_adult_2) is a tabular dataset containing 48,8424884248{,}84248 , 842 records from the 1994 Census database. We train a fully-connected neural network to predict the person’s annual income (whether or not more than 50505050K a year) and use gender (male or female) as the private attribute. For property and distributional inference attacks, the sex feature is removed.

  2. (2)

    Health(health_heritage) (Heritage Health Prize) is a tabular dataset from Kaggle that contains de-identified medical records of over 55,0005500055{,}00055 , 000 patients’ inpatient or emergency room visits. We train a fully-connected neural network to predict whether the Charlson Index (an estimate of patient mortality) is greater than zero. We use the patient’s gender (male, female, or unknown) as the private attribute, which is removed for property and distributional inference attacks.

  3. (3)

    CREMA-D(cao2014crema) is a multi-modal dataset that contains 7,44274427{,}4427 , 442 emotional speech recordings collected from 91919191 actors (48484848 male and 43434343 female). Speech signals are pre-processed using OpenSMILE(eyben2010opensmile) to extract a total number of 23,9902399023{,}99023 , 990 utterance-level audio features for automatic emotion recognition. Following prior work(feng2021attribute), we use EmoBase which is a standard feature set that contains the MFCC, voice quality, fundamental frequency, and other statistical features, resulting in a feature dimension of 988988988988 for each utterance(haider2021emotion). We train a fully connected neural network to classify four emotions, including happy, sad, anger, and neutral. We use the speaker’s gender (male or female) as the target property for inference attacks.

  4. (4)

    CelebA(liu2015deep) contains 202,599202599202{,}599202 , 599 face images, each of which is labeled with 40404040 binary attributes. We resize the images to 32×32323232\times 3232 × 32 pixels and train a convolutional neural network to classify whether the person is smiling and use whether or not the person has high cheekbones as the target property.

  5. (5)

    UTKFace(zhang2017age) consists of over 20,0002000020{,}00020 , 000 face images annotated with age, gender, and ethnicity. We resize the images to 32×32323232\times 3232 × 32 pixels and select 22,0122201222{,}01222 , 012 images from the four largest ethnicity groups (White, Black, Asian, or Indian) to train a convolutional neural network to classify three age groups (0300300-300 - 30, 3160316031-6031 - 60, and 61absent61\geq 61≥ 61 years old). Ethnicity is used as the target property.

We split each dataset three-fold into a training set, a testing set, and a public set. The training set is considered to be private and is only used for model training and inference attack evaluation. The testing set is reserved for evaluating the utility of the ML model. The public set is accessible to both the adversary and the private learner, which can be used as the shadow dataset for training the adversary’s predictive model or developing defenses as described in Section6.We provide a summary of the datasets in Table1, including the task label 𝐲𝐲{\mathbf{y}}bold_y, the sensitive variable 𝐚𝐚{\mathbf{a}}bold_a for AIA and PIA, and the Pearson correlation between 𝐲𝐲{\mathbf{y}}bold_y and 𝐚𝐚{\mathbf{a}}bold_a.

5.1.2. Metrics.

We define the following metrics for measuring inference attack performance:

  1. (1)

    Attack Success Rate (ASR): We measure the attack performance by the number of times the adversary successfully guesses the sensitive variable, i.e., p=t[T]𝟙𝒂^=𝒂/T𝑝subscript𝑡delimited-[]𝑇subscript1^𝒂𝒂𝑇p=\sum_{t\in[T]}\mathbbm{1}_{\hat{{\bm{a}}}={\bm{a}}}/Titalic_p = ∑ start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT blackboard_1 start_POSTSUBSCRIPT over^ start_ARG bold_italic_a end_ARG = bold_italic_a end_POSTSUBSCRIPT / italic_T, where T𝑇Titalic_T is the total number of trials (i.e., repetitions of the inference game).

  2. (2)

    AUROC: We additionally report the area under the receiver operating characteristic curve (AUROC). For sensitive variables that have more than two classes, we report the macro-averaged AUROC.

  3. (3)

    Advantage: We follow prior work(yeom2018privacy; guo2023analyzing) and use the advantage metric to measure the gain in the adversary’s inferential power upon observing the gradients.Specifically, the advantage of an adversary is defined by comparing its success rate p𝑝pitalic_p to a baseline adversary who doesn’t observe the gradients, i.e., Adv(p)max(pp,0)/(1p)[0,1]Adv𝑝𝑝superscript𝑝01superscript𝑝01\texttt{Adv}(p)\coloneqq{\max(p-p^{*},0)}/{(1-p^{*})}\in[0,1]Adv ( italic_p ) ≔ roman_max ( italic_p - italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , 0 ) / ( 1 - italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∈ [ 0 , 1 ], where psuperscript𝑝p^{*}italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is the success rate of the baseline adversary.The Bayes optimal strategy for the baseline adversary without observing gradients is to guess the majority class, i.e., p=argmax𝒂(𝐚=𝒂)superscript𝑝subscriptargmax𝒂𝐚𝒂p^{*}=\operatorname*{arg\,max}_{\bm{a}}\operatorname{\mathbb{P}}({\mathbf{a}}=%{\bm{a}})italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT bold_italic_a end_POSTSUBSCRIPT blackboard_P ( bold_a = bold_italic_a ).

  4. (4)

    TPR@1%percent11\%1 %FPR: Besides average performance metrics, recent work on membership inference(carlini2022membership; ye2022enhanced) argue the importance of understanding the privacy risk on worst-case training data by examining the low false positive rate (FPR) region. Inspired by this, we additionally report the true positive rate (TPR) when the FPR is 1%percent11\%1 %.

5.1.3. Adversary’s Model.

We conducted preliminary experiments with various types and configurations of ML models and found that random forest with 50505050 estimators performs the best (especially in the low FPR region) for estimating the posterior in AIA, PIA, and DIA with small shadow dataset sizes. For UIA, we use a fully-connected network with one hidden layer as the encoder. The embedding dimension is set to be 50505050 for the CREMA-D dataset of 100100100100 for CelebA dataset. As the gradient vector is extremely high dimensional (e.g., the gradient dimensions for CREMA-D and CelebA datasets are 67,7166771667{,}71667 , 716 and 45,9224592245{,}92245 , 922, respectively), we apply a 1111-dimensional max-pooling layer before the adversary’s predictive model with a kernel size of 3333 for tabular datasets and 10101010 for other datasets for dimensionality reduction.

5.1.4. Other Attack Settings.

We assume the model parameters 𝜽𝜽{\bm{\theta}}bold_italic_θ are randomly initialized at the beginning of the inference game. During the game, the model parameters are updated at each epoch using SGD with a learning rate of 0.010.010.010.01.We evaluate AIA on the tabular datasets and UIA on datasets that contain user labels (CREMA-D and CelebA), while PIA and DIA are evaluated on all datasets.For AIA, PIA, and DIA, we use a training set of 5,00050005{,}0005 , 000 samples and a balanced public set that contains a default number of 1,00010001{,}0001 , 000 samples equally divided for each sensitive attribute/property class. For UIA, we first filter out user identities that contain less than 2×2\times2 × batch size number of samples and then split the dataset according to user identities. We select 15151515 and 30303030 users on the CREMA-D dataset, and 150150150150 and 300300300300 users on the CelebA dataset as the training and public sets, respectively. We select more users on the CelebA dataset because the majority of users only have very few samples (16absent16\leq 16≤ 16).We set m=6𝑚6m=6italic_m = 6 for DIA, i.e., inferring over 6666 ratio bins ({0},(0,0.2],(0.2,0.4],,(0.8,1]000.20.20.40.81\{0\},(0,0.2],(0.2,0.4],...,(0.8,1]{ 0 } , ( 0 , 0.2 ] , ( 0.2 , 0.4 ] , … , ( 0.8 , 1 ]), and m=5𝑚5m=5italic_m = 5 for UIA, i.e., choosing from 5555 candidate users.For AIA and PIA, we assume the adversary has access to a prior of the sensitive variable that is estimated from the population. For DIA and UIA, we assume the adversary holds an uninformed prior, and thus the baseline is simply random guessing.The default batch sizes are 16161616 for AIA and PIA, 128128128128 for DIA, and 8888 for UIA.For AIA, PIA, and DIA, the total number of trials T𝑇Titalic_T of each experiment is equal to the number of random draws of training batches (i.e., 5,00050005{,}0005 , 000); for UIA, T𝑇Titalic_T is the number of random draws of candidate sets, which we set to be 1,00010001{,}0001 , 000.We repeat each experiment with 5555 different random seeds and report the mean and standard deviation of the results.

5.2. Evaluation of Inference Attacks

We evaluate each type of inference attack with a small shadow dataset (1,00010001{,}0001 , 000 samples) and compare the results of single-round attacks (where the adversary only observes a single round of gradients) to multi-round attacks (where the adversary gets continual observation of the gradients).Due to space limits, we only include a snapshot of the results (one dataset per attack) in Figure2 and provide the full results in Appendix FigureLABEL:fig:sr_mr.

Attribute Inference.We present the results of AIA in FigureLABEL:fig:sr_mr_aia. We observe that the adversary is able to infer the sensitive attribute with high confidence using only 1,00010001{,}0001 , 000 shadow data samples. For instance, on the Adult dataset, the multi-round adversary is able to achieve a high average AUROC of 0.99910.99910.99910.9991 and a TPR@1%percent11\%1 %FPR of 0.98230.98230.98230.9823. On the Health dataset, however, the AUROC of the multi-round adversary reduces slightly to 0.81220.81220.81220.8122 while the TPR@1%percent11\%1 %FPR drops drastically to 0.16110.16110.16110.1611. This is likely because the sensitive attribute on the Health dataset contains an “unknown” class (18.9%percent18.918.9\%18.9 %) that is uncorrelated with other features, making it hard to estimate statistically.

Analyzing Inference Privacy Risks Through Gradients In Machine Learning (2)

Property Inference.FigureLABEL:fig:sr_mr_pia depicts the results of PIA, where we observe that the adversary is able to achieve high performance across all five datasets.Namely, the average AUROCs of the multi-round adversary on the Adult, Health, CREMA-D, CelebA, and UTKFace datasets are 0.99190.99190.99190.9919, 0.82940.82940.82940.8294, 0.89700.89700.89700.8970, 0.99930.99930.99930.9993, and 0.91670.91670.91670.9167, respectively.This consistent high attack performance is in contrast to the general low correlation between the sensitive properties and the task labels across all datasets as indicated in Table1 (except for CelebA, where a spurious relationship exists), which suggests that the information leakage observed is intrinsic to the computed gradients(melis2019exploiting), regardless of the specific data type and learning task.

Distributional Inference.FigureLABEL:fig:sr_mr_dia summarizes the results of DIA.Although distributional inference is a more challenging task (6666-class ordinal classification), we observe that the multi-round adversary still performs fairly well with a batch size of 128128128128, achieving an average AUROC of 0.88480.88480.88480.8848, 0.78060.78060.78060.7806, 0.75720.75720.75720.7572, 0.95220.95220.95220.9522, and 0.76640.76640.76640.7664 on the Adult, Health, CREMA-D, CelebA, and UTKFace datasets, respectively.

User Inference.We report the results of UIA in FigureLABEL:fig:sr_mr_uia. We observe that the adversary is able to identify the user with relatively high confidence on the CelebA dataset, with an average AUROC and TPR@1%percent11\%1 %FPR of 0.89350.89350.89350.8935 and 0.28280.28280.28280.2828 for the multi-round adversary. On the CREMA-D dataset, the average AUROC of the multi-round adversary is only 0.68080.68080.68080.6808, which may be due to the low identifiability of the features extracted for emotion recognition.

General Observations.Additionally, we have the following general observations across different type of attacks and datasets.First, the performance of single-round attacks decreases as the training progresses. This is because the gradients of the training data will become smaller in magnitude as the training loss decreases and thus the variation within these gradients will become harder to capture.Second, on most datasets, the multi-round attack performs better than any single-round attack, proving the effectiveness of the Bayesian attack framework.Third, we observe very similar performance for AIA and PIA on the tabular datasets. This indicates that whether the sensitive variable is internal or external to the data features does not affect the inference performance.

5.3. Attack Analyses

We investigate the following factors that may affect the performance of inference attacks.

Impact of Batch Sizes.In Figure3, we study the impact of varying batch sizes on the performance of the inference attacks.We report the results on the Adult dataset for AIA, PIA, and DIA, and results on the CREMA-D dataset for UIA.We observe that the performance of all four considered inference attacks improves as the batch size increases. This is because the records within the batch are sampled from the same conditional distribution (𝐱,𝐲|𝐚)𝐱conditional𝐲𝐚\operatorname{\mathbb{P}}({\mathbf{x}},{\mathbf{y}}|{\mathbf{a}})blackboard_P ( bold_x , bold_y | bold_a ).As the private information 𝒂𝒂{\bm{a}}bold_italic_a is shared across the batch, a larger batch size would amplify the private information and suppress other varying signals, thereby improving inference performance on 𝒂𝒂{\bm{a}}bold_italic_a.For distributional inference, the difference in the number of samples with the property between each ratio bin αk𝛼𝑘\lfloor\alpha k\rfloor⌊ italic_α italic_k ⌋ also increases as the batch size increases and thus becomes easier to distinguish.For AIA and PIA, we observe that the gap between the single-round adversary (solid lines) and multi-round adversary (dashed lines) is the largest when the batch size is 4444, and then gradually reduces as the batch size increases further due to performance saturation.This result suggests that simply aggregating more data does not protect gradients from inference. In fact, it may even increase the privacy risk in distributed learning where data are sampled from the same conditional distribution. This indicates that data aggregation alone is insufficient to achieve meaningful privacy in these settings.

Analyzing Inference Privacy Risks Through Gradients In Machine Learning (3)
Analyzing Inference Privacy Risks Through Gradients In Machine Learning (4)

Impact of Adversary’s Knowledge.To investigate the impact of the adversary’s knowledge on the performance of the attack, we use PIA as an example and plot the attack performance with varying shadow data size and number of observations on the Adult dataset in Figure5.We observe the general trend that the attack performance increases with the number of observations and available shadow data samples. Interestingly, the attack performance does not always increase monotonically along each axis. For instance, given a small shadow dataset of only 100100100100 samples, the AUROC of an adversary that observes 10101010 rounds does not outperform an adversary that only observes 5555 rounds of gradients. This is likely because when the model is near convergence, the gradients are small and thus have low variance, which requires more shadow data to accurately estimate the posterior. Such errors in the predictive model will accumulate when using the summation of the log-likelihoods of all single rounds to approximate the joint distribution (Eq.(3)), eventually leading to suboptimal performance.

Impact of Model Size.In Figure4, we use PIA as an example to study the impact of the machine learning model size. We control the size of the models by varying the model width. Specifically, for fully connected neural networks, we control the number of neurons for the hidden layer. For convolutional neural networks, we control the number of output channels for the first convolutional layer, with the remaining convolutional layers being scaled accordingly.We observe that the attack performance tends to improve slightly with increasing model size, except for the Adult and UTKFace datasets, where performance is saturated. However, most of these improvements are not statistically significant (falling within the margin of error) and thus do not allow for a conclusive statement.We include additional results of other types of inference attacks in Appendix FigureLABEL:fig:model_size, where we make similar observations. These results demonstrate that all four types of inference attacks can be generalized to larger model sizes.

6. Defenses

In this section, we investigate five types of strategies for defending inference from gradients against both static and adaptive adversaries and analyze their performance from an information-theoretic view. The main takeaways from our analyses are: (1) heuristic defenses can defend static adversaries but are ineffective against adaptive adversaries, (2) DP-SGD(abadi2016deep) is the only considered defense that remains effective against adaptive attacks, at the cost of sacrificing model utility, and (3) reducing the mutual information between the released gradients and the sensitive variable is a key ingredient for a successful defense.

Analyzing Inference Privacy Risks Through Gradients In Machine Learning (5)

6.1. Privacy Defenses Against Inference

Privacy-enhancing strategies in machine learning generally follow two principles: data minimization and data anonymization.Data minimization strategies, such as the application of cryptographic techniques (e.g., Secure Multi-party Computation and hom*omorphic Encryption) and Federated Learning, aim to reveal only the minimal amount of information that is necessary for achieving a specific computational task - and only to the necessary parties. As shown by prior work(truex2019hybrid; elkordy2023much; lam2021gradient; kerkouche2023client), data minimization alone may not provide sufficient privacy protection and, thus, should be applied in combination with data anonymization defenses to further reduce privacy risks.However, for heuristic-based privacy defenses, it is important to conduct a careful evaluation of their effectiveness against adaptive adversaries.We consider the following five types of representative defenses from the current literature in our experiments:

  1. (1)

    Gradient Pruning. Gradient pruning creates a sparse gradient vector by pruning gradient elements with small magnitudes. This strategy has been used as a baseline for privacy defense in federated learning(zhu2019deep; sun2021soteria; wu2023learning). By default, we set the pruning rate to be 99%percent9999\%99 %.

  2. (2)

    SignSGD. SignSGD(bernstein2018signsgd) binarizes the gradients by applying an element-wise sign function to the gradients,thereby compressing the gradients to 1-bit per dimension. Similar to gradient pruning, it has been explored in prior work(wu2023learning; yue2023gradient) as a defense against data reconstruction attacks in federated learning.Along similar lines, Kerkouche et al.(kerkouche2020federated) evaluated SignFed, a variant of the SignSGD protocol adapted for federated settings, and found it to be more resilient to privacy and security attacks than the standard federated learning scheme.

  3. (3)

    Adversarial Perturbation.Inspired by prior research on protecting privacy through adopting evasion attacks in adversarial machine learning(jia2018attriguard; jia2019memguard; shan2020fawkes; o2022voiceblock), we explore a heuristic defense strategy against inference attacks that inject adversarial perturbation to the gradients. Specifically, at each round of observation, the adversary first trains a neural network fϕ:𝐠𝐚:subscript𝑓bold-italic-ϕ𝐠𝐚f_{\bm{\phi}}:{\mathbf{g}}\rightarrow{\mathbf{a}}italic_f start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT : bold_g → bold_a to classify the sensitive variable 𝒂𝒂{\bm{a}}bold_italic_a from the gradient 𝒈𝒈{\bm{g}}bold_italic_g using a public dataset (same as the shadow dataset). Then, the defense generates a protective adversarial perturbation to cause fϕsubscript𝑓bold-italic-ϕf_{\bm{\phi}}italic_f start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT to misclassify the perturbed gradients. We adopt lsubscript𝑙l_{\infty}italic_l start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-bounded projected gradient descent (PGD)(madry2018towards), which generates the adversarial example 𝒈superscript𝒈{\bm{g}}^{\prime}bold_italic_g start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT (perturbed gradient) by iteratively taking gradient steps.For AIA, PIA, and DIA, this defense generates an untargeted adversarial perturbation through gradient ascent, i.e.,𝒈~(𝒈,γ)(𝒈~+αsign(𝒈(ϕ,𝒈,𝒂)))~𝒈subscriptproductsubscript𝒈𝛾~𝒈𝛼signsubscript𝒈bold-italic-ϕ𝒈𝒂\tilde{{\bm{g}}}\leftarrow\prod_{{\mathcal{B}}_{\infty}({\bm{g}},\gamma)}\big{%(}\tilde{{\bm{g}}}+\alpha\cdot\text{sign}(\nabla_{{\bm{g}}}\mathcal{L}({\bm{%\phi}},{\bm{g}},{\bm{a}}))\big{)}over~ start_ARG bold_italic_g end_ARG ← ∏ start_POSTSUBSCRIPT caligraphic_B start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( bold_italic_g , italic_γ ) end_POSTSUBSCRIPT ( over~ start_ARG bold_italic_g end_ARG + italic_α ⋅ sign ( ∇ start_POSTSUBSCRIPT bold_italic_g end_POSTSUBSCRIPT caligraphic_L ( bold_italic_ϕ , bold_italic_g , bold_italic_a ) ) ), where (𝒈,ϵ)subscript𝒈italic-ϵ{\mathcal{B}}_{\infty}({\bm{g}},\epsilon)caligraphic_B start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( bold_italic_g , italic_ϵ ) is the lsubscript𝑙l_{\infty}italic_l start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT norm ball centered around 𝒈𝒈{\bm{g}}bold_italic_g with radius ϵitalic-ϵ\epsilonitalic_ϵ.For UIA, the defense generates a targeted adversarial perturbation through gradient descent, i.e.,𝒈~(𝒈,γ)(𝒈~αsign(𝒈(ϕ,𝒈,𝒂t)))~𝒈subscriptproductsubscript𝒈𝛾~𝒈𝛼signsubscript𝒈bold-italic-ϕ𝒈subscript𝒂𝑡\tilde{{\bm{g}}}\leftarrow\prod_{{\mathcal{B}}_{\infty}({\bm{g}},\gamma)}\big{%(}\tilde{{\bm{g}}}-\alpha\cdot\text{sign}(\nabla_{{\bm{g}}}\mathcal{L}({\bm{%\phi}},{\bm{g}},{\bm{a}}_{t}))\big{)}over~ start_ARG bold_italic_g end_ARG ← ∏ start_POSTSUBSCRIPT caligraphic_B start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( bold_italic_g , italic_γ ) end_POSTSUBSCRIPT ( over~ start_ARG bold_italic_g end_ARG - italic_α ⋅ sign ( ∇ start_POSTSUBSCRIPT bold_italic_g end_POSTSUBSCRIPT caligraphic_L ( bold_italic_ϕ , bold_italic_g , bold_italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) ), to make the gradients misrecognized as the target user 𝒂tsubscript𝒂𝑡{\bm{a}}_{t}bold_italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.By default, we set the total number of steps to be 5555, γ=0.005𝛾0.005\gamma=0.005italic_γ = 0.005, and α=0.002𝛼0.002\alpha=0.002italic_α = 0.002.

  4. (4)

    Variational Information Bottleneck (VIB). This defense inserts an additional VIB layer(alemi2016deep) that splits the neural network f𝜽subscript𝑓𝜽f_{\bm{\theta}}italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT into a probabilistic encoder p(𝐡|𝐱)𝑝conditional𝐡𝐱p({\mathbf{h}}|{\mathbf{x}})italic_p ( bold_h | bold_x ) and a decoder q(𝐲|𝐡)𝑞conditional𝐲𝐡q({\mathbf{y}}|{\mathbf{h}})italic_q ( bold_y | bold_h ), where 𝐡𝐡{\mathbf{h}}bold_h is a latent representation that follows a Gaussian distribution.An additional Kullback-Leibler (KL) divergence term is introduced to the training loss: VIB=(𝜽,𝒟)+βKL(p(𝐡|𝐱)||q(𝐳))\mathcal{L}_{VIB}=\mathcal{L}({\bm{\theta}},{\mathcal{D}})+\beta\cdot KL(p({%\mathbf{h}}|{\mathbf{x}})||q({\mathbf{z}}))caligraphic_L start_POSTSUBSCRIPT italic_V italic_I italic_B end_POSTSUBSCRIPT = caligraphic_L ( bold_italic_θ , caligraphic_D ) + italic_β ⋅ italic_K italic_L ( italic_p ( bold_h | bold_x ) | | italic_q ( bold_z ) ), where q(𝐳)=𝒩(𝟎,𝑰)𝑞𝐳𝒩0𝑰q({\mathbf{z}})=\mathcal{N}(\bm{0},\bm{I})italic_q ( bold_z ) = caligraphic_N ( bold_0 , bold_italic_I ) is the standard Gaussian. Optimizing this VIB objective reduces the mutual information I(𝐱;𝐡)𝐼𝐱𝐡I({\mathbf{x}};{\mathbf{h}})italic_I ( bold_x ; bold_h ) between the representation and the input by minimizing a variational upper bound. Prior work suggests that this helps to reduce the model’s dependence on input’s sensitive attributes and improve privacy(song2019overlearning; scheliga2022precode; scheliga2023privacy). We set β=0.01𝛽0.01\beta=0.01italic_β = 0.01 as the default for our experiments.

  5. (5)

    Differential Privacy (DP-SGD).Differential privacy (DP)(dwork2006calibrating) provides a rigorous notion of algorithmic privacy.

Analyzing Inference Privacy Risks Through Gradients In Machine Learning (6)

6.2. Defense Evaluation

In Figure6, we compare the performance of defenses against static and adaptive adversaries. Due to space limits, here we focus on PIA on the adult dataset. The full results including all four types of inference attacks are available in Appendix FigureLABEL:fig:defenses_full.We observe that heuristic defenses such as Gradient Pruning, SignSGD, and Adversarial Perturbation can successfully defend against static adversaries in terms of reducing the advantage of the adversary to zero. However, these defenses are ineffective against adaptive adversaries aware of the defense. For instance, in the case of gradient pruning, the adaptive adversary can achieve a high advantage (0.78410.78410.78410.7841) that is only slightly decreased compared to no defense (0.93630.93630.93630.9363).Interestingly, in the case of Adversarial Perturbation, we found that the adaptive adversary’s performance is increased, rather than decreased, compared to no defense, reaching a perfect advantage and AUROC of 1.001.001.001.00.For the rest of the defenses, namely, VIB and DP-SGD, the attack performance is consistent across static and adaptive adversaries. However, only DP-SGD manages to effectively reduce the advantage of the adaptive adversary to near zero.

To understand the privacy-utility trade-off of these defenses, we plot the PIA adversary’s advantage evaluated on the training data versus the measured AUROC of the network on predicting the task label on the test dataset on the Adult dataset in Figure7. We consider three different sets of parameters for each type of defense (details in Appendix). We observe that in the case of static adversaries, SignSGD achieves the best trade-off that approximates the ideal defense (upper left corner) by reducing the advantage to zero without affecting model utility. However, in the case of adaptive adversary, only DP-SGD provides a meaningful notion of privacy, at the cost of diminishing model utility.Moreover, there may exist stronger adversaries that are more resilient against these defenses.For instance, in Table2, we show that an adversary using principal component analysis (PCA) with 50505050 principal dimensions as dimensionality reduction can bypass the DP-SGD defense with ε=96.90𝜀96.90\varepsilon=96.90italic_ε = 96.90 and δ=105𝛿superscript105\delta=10^{-5}italic_δ = 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT that defends an adversary using max-pooling, and requires 15×15\times15 × larger noise to thwart.

In the next section, we analyze the underlying principles of these defenses and the necessary ingredients for a successful defense.

Analyzing Inference Privacy Risks Through Gradients In Machine Learning (7)

6.3. Defense Analyses

In this section, we provide an information-theoretic perspective for understanding and analyzing defenses against inference attacks from gradients.

Information-theoretic View on Inference Privacy.The inference attacks captured in the unified game can be viewed as performing statistical inference(du2012privacy) on properties of the underlying data distributions upon observing samples of the gradients.A well-known information-theoretic result for analyzing inference is Fano’s inequality, which guarantees a lower bound on the estimation error of any inference adversary.Formally, consider any arbitrary data release mechanism that provides 𝐘𝐘{\mathbf{Y}}bold_Y computed from the private discrete random variable 𝐗𝐗{\mathbf{X}}bold_X supported on 𝒳𝒳{\mathcal{X}}caligraphic_X.Any inference from the observation 𝐘𝐘{\mathbf{Y}}bold_Y must produce an estimate 𝐗^^𝐗\hat{{\mathbf{X}}}over^ start_ARG bold_X end_ARG that satisfies the Markov chain 𝐗𝐘𝐗^𝐗𝐘^𝐗{\mathbf{X}}\rightarrow{\mathbf{Y}}\rightarrow\hat{{\mathbf{X}}}bold_X → bold_Y → over^ start_ARG bold_X end_ARG.Let 𝐞𝐞{\mathbf{e}}bold_e be a binary random variable that indicates an error, i.e., 𝐞=1𝐞1{\mathbf{e}}=1bold_e = 1 if 𝐗^𝐗^𝐗𝐗\hat{{\mathbf{X}}}\neq{\mathbf{X}}over^ start_ARG bold_X end_ARG ≠ bold_X. Then we have

(6)H(𝐗|𝐘)H(𝐗|𝐗^)H2(𝐞)+(𝐞=1)log(|𝒳|1),𝐻conditional𝐗𝐘𝐻conditional𝐗^𝐗subscript𝐻2𝐞𝐞1𝒳1H({\mathbf{X}}|{\mathbf{Y}})\leq H({\mathbf{X}}|\hat{{\mathbf{X}}})\leq H_{2}(%{\mathbf{e}})+\operatorname{\mathbb{P}}({\mathbf{e}}=1)\log(|{\mathcal{X}}|-1),italic_H ( bold_X | bold_Y ) ≤ italic_H ( bold_X | over^ start_ARG bold_X end_ARG ) ≤ italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_e ) + blackboard_P ( bold_e = 1 ) roman_log ( | caligraphic_X | - 1 ) ,

where H2(𝐞)=(e=1)log(e=1)(1(e=1))log(1(e=1))subscript𝐻2𝐞𝑒1𝑒11𝑒11𝑒1H_{2}({\mathbf{e}})=-\operatorname{\mathbb{P}}(e=1)\log\operatorname{\mathbb{P%}}(e=1)-\big{(}1-\operatorname{\mathbb{P}}(e=1)\big{)}\log\big{(}1-%\operatorname{\mathbb{P}}(e=1)\big{)}italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_e ) = - blackboard_P ( italic_e = 1 ) roman_log blackboard_P ( italic_e = 1 ) - ( 1 - blackboard_P ( italic_e = 1 ) ) roman_log ( 1 - blackboard_P ( italic_e = 1 ) ) is the binary entropy.For |𝒳|>2𝒳2|{\mathcal{X}}|>2| caligraphic_X | > 2, a standard treatment is to consider the mutual information I(𝐗;𝐘)=H(𝐗)H(𝐗|𝐘)𝐼𝐗𝐘𝐻𝐗𝐻conditional𝐗𝐘I({\mathbf{X}};{\mathbf{Y}})=H({\mathbf{X}})-H({\mathbf{X}}|{\mathbf{Y}})italic_I ( bold_X ; bold_Y ) = italic_H ( bold_X ) - italic_H ( bold_X | bold_Y ) and H2(𝐞)log2subscript𝐻2𝐞2H_{2}({\mathbf{e}})\leq\log 2italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_e ) ≤ roman_log 2, and thereby we can obtain a lower bound on the error probability:

(7)(𝐗^𝐗)H(𝐗)I(𝐗;𝐘)log2log(|𝒳|1).^𝐗𝐗𝐻𝐗𝐼𝐗𝐘2𝒳1\operatorname{\mathbb{P}}(\hat{{\mathbf{X}}}\neq{\mathbf{X}})\geq\frac{H({%\mathbf{X}})-I({\mathbf{X}};{\mathbf{Y}})-\log 2}{\log(|{\mathcal{X}}|-1)}.blackboard_P ( over^ start_ARG bold_X end_ARG ≠ bold_X ) ≥ divide start_ARG italic_H ( bold_X ) - italic_I ( bold_X ; bold_Y ) - roman_log 2 end_ARG start_ARG roman_log ( | caligraphic_X | - 1 ) end_ARG .

Note that this bound is vacuous when |𝒳|=2𝒳2|{\mathcal{X}}|=2| caligraphic_X | = 2, and a slightly tighter bound can be obtained by considering H2(𝐞)subscript𝐻2𝐞H_{2}({\mathbf{e}})italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_e ) exactly (rather than using the approximating bound of log22\log 2roman_log 2) and numerically computing the lowest error probability that satisfies the inequality in(6), as noted by prior work(guo2023analyzing).The bound in inequality (7) captures both the prior (via H(𝐗)𝐻𝐗H({\mathbf{X}})italic_H ( bold_X )) and the cardinality of the sensitive variable alphabet, indicating that data with a large degree of uncertainty is hard to infer or reconstruct, which aligns with intuition from Balle et al.(balle2022reconstructing).Inequality (7) generically holds for any data release mechanism. In the context of inference from gradients, the adversary’s goal is to obtain an estimate of 𝐚𝐚{\mathbf{a}}bold_a upon observing 𝐠~~𝐠\tilde{{\mathbf{g}}}over~ start_ARG bold_g end_ARG, which can be described as a Markov chain of 𝐚𝐱𝐠𝐠~𝐚^𝐚𝐱𝐠~𝐠^𝐚{\mathbf{a}}\rightarrow{\mathbf{x}}\rightarrow{\mathbf{g}}\rightarrow\tilde{{%\mathbf{g}}}\rightarrow\hat{{\mathbf{a}}}bold_a → bold_x → bold_g → over~ start_ARG bold_g end_ARG → over^ start_ARG bold_a end_ARG.Since the adversary’s success rate is p=1(𝐞=1)𝑝1𝐞1p=1-\operatorname{\mathbb{P}}({\mathbf{e}}=1)italic_p = 1 - blackboard_P ( bold_e = 1 ), one can get an immediate upper bound on the adversary’s advantage:

(8)Adv(p)1H(𝐚)I(𝐚;𝐠~)log2(1p)log(m1).Adv𝑝1𝐻𝐚𝐼𝐚~𝐠21superscript𝑝𝑚1\texttt{Adv}(p)\leq 1-\frac{H({\mathbf{a}})-I({\mathbf{a}};\tilde{{\mathbf{g}}%})-\log 2}{(1-p^{*})\log(m-1)}.Adv ( italic_p ) ≤ 1 - divide start_ARG italic_H ( bold_a ) - italic_I ( bold_a ; over~ start_ARG bold_g end_ARG ) - roman_log 2 end_ARG start_ARG ( 1 - italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) roman_log ( italic_m - 1 ) end_ARG .

As H(𝐚)𝐻𝐚H({\mathbf{a}})italic_H ( bold_a ) is a constant, this indicates that reducing I(𝐚;𝐠~)𝐼𝐚~𝐠I({\mathbf{a}};\tilde{{\mathbf{g}}})italic_I ( bold_a ; over~ start_ARG bold_g end_ARG ) results in increasing the lower bound of the error probability and consequently diminishing the adversary’s advantage.This analysis can be generalized to continuous sensitive variables by applying continuum Fano’s inequality(duchi2013distance).

Understanding Defenses.Next, we provide an explanation of the failures of heuristic defenses using the above framework and argue that a successful defense should effectively minimize the mutual information I(𝐚;𝐠~)𝐼𝐚~𝐠I({\mathbf{a}};\tilde{{\mathbf{g}}})italic_I ( bold_a ; over~ start_ARG bold_g end_ARG ) between the gradients and the sensitive variable.The Gradient Pruning and SignSGD defenses can be viewed as trying to reduce the number of transmitted bits in the gradients. However, this does not necessarily reduce the mutual information.The neural network classifier fϕ:𝐠𝐚:subscript𝑓bold-italic-ϕ𝐠𝐚f_{\bm{\phi}}:{\mathbf{g}}\rightarrow{\mathbf{a}}italic_f start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT : bold_g → bold_a used in the Adversarial Perturbation defense is trained to minimize cross-entropy loss,which provides an approximate upper bound on the conditional entropy H(𝐚|𝐠)𝐻conditional𝐚𝐠H({\mathbf{a}}|{\mathbf{g}})italic_H ( bold_a | bold_g ), and serves as a proxy for estimating the mutual information I(𝐚;𝐠~)=H(𝐚)H(𝐚|𝐠)𝐼𝐚~𝐠𝐻𝐚𝐻conditional𝐚𝐠I({\mathbf{a}};\tilde{{\mathbf{g}}})=H({\mathbf{a}})-H({\mathbf{a}}|{\mathbf{g%}})italic_I ( bold_a ; over~ start_ARG bold_g end_ARG ) = italic_H ( bold_a ) - italic_H ( bold_a | bold_g ).However, generating adversarial perturbations to produce 𝐠~~𝐠\tilde{{\mathbf{g}}}over~ start_ARG bold_g end_ARG against this fixed classifier does not necessarily result in a reduction of the mutual information I(𝐚;𝐠~)𝐼𝐚~𝐠I({\mathbf{a}};\tilde{{\mathbf{g}}})italic_I ( bold_a ; over~ start_ARG bold_g end_ARG ), and likely increases it.This is because the gradient steps 𝒈(ϕ,𝒈,𝒂)subscript𝒈bold-italic-ϕ𝒈𝒂\nabla_{{\bm{g}}}\mathcal{L}({\bm{\phi}},{\bm{g}},{\bm{a}})∇ start_POSTSUBSCRIPT bold_italic_g end_POSTSUBSCRIPT caligraphic_L ( bold_italic_ϕ , bold_italic_g , bold_italic_a ) used to generate the protective perturbation also contain information about 𝒂𝒂{\bm{a}}bold_italic_a. As the perturbation generation process is deterministic, an adaptive adversary can learn to pick up these patterns and gain additional advantage.In the case of VIB, the mechanism is stochastic but optimizing the VIB objective only gradually reduces the mutual information I(𝐱;𝐡)𝐼𝐱𝐡I({\mathbf{x}};{\mathbf{h}})italic_I ( bold_x ; bold_h ) between the latent representation 𝐡𝐡{\mathbf{h}}bold_h and the input 𝐱𝐱{\mathbf{x}}bold_x, which still does not guarantee a reduction in I(𝐚;𝐠~)𝐼𝐚~𝐠I({\mathbf{a}};\tilde{{\mathbf{g}}})italic_I ( bold_a ; over~ start_ARG bold_g end_ARG ) during the optimization process.By design, differential privacy is not intended to protect against statistical inference as its goal is to preserve the statistical properties of the dataset while protecting the privacy of individual samples.However, an alternative information-theoretical interpretation of differential privacy is that it places a constraint on mutual information(bun2016concentrated; cuff2016differential). An easy way to see this is that by adding Gaussian noises to the gradients, the DP-SGD algorithm essentially creates a Gaussian channel between the true and released gradients, thereby placing a constraint on I(𝐠;𝐠~)𝐼𝐠~𝐠I({\mathbf{g}};\tilde{{\mathbf{g}}})italic_I ( bold_g ; over~ start_ARG bold_g end_ARG ), which further bounds I(𝐚;𝐠~)𝐼𝐚~𝐠I({\mathbf{a}};\tilde{{\mathbf{g}}})italic_I ( bold_a ; over~ start_ARG bold_g end_ARG ) as I(𝐚;𝐠~)I(𝐠;𝐠~)𝐼𝐚~𝐠𝐼𝐠~𝐠I({\mathbf{a}};\tilde{{\mathbf{g}}})\leq I({\mathbf{g}};\tilde{{\mathbf{g}}})italic_I ( bold_a ; over~ start_ARG bold_g end_ARG ) ≤ italic_I ( bold_g ; over~ start_ARG bold_g end_ARG ) according to the data processing inequality.More concretely, due tothe Gaussian channel 𝐠~=𝐠+𝒩(𝟎,σ2𝑰)~𝐠𝐠𝒩0superscript𝜎2𝑰\tilde{{\mathbf{g}}}={\mathbf{g}}+{\mathcal{N}}(\bm{0},\sigma^{2}\bm{I})over~ start_ARG bold_g end_ARG = bold_g + caligraphic_N ( bold_0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_italic_I ),we have the upper bound given by the channel capacityI(𝐠;𝐠~)12log(1+Pσ)𝐼𝐠~𝐠121𝑃𝜎I({\mathbf{g}};\tilde{{\mathbf{g}}})\leq\frac{1}{2}\log(1+\frac{P}{\sigma})italic_I ( bold_g ; over~ start_ARG bold_g end_ARG ) ≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log ( 1 + divide start_ARG italic_P end_ARG start_ARG italic_σ end_ARG ), if the gradients 𝒈𝒈{\bm{g}}bold_italic_g satisfy an average power constraint 𝔼[𝒈22]nP𝔼delimited-[]superscriptsubscriptnorm𝒈22𝑛𝑃\mathbb{E}[\|{\bm{g}}\|_{2}^{2}]\leq nPblackboard_E [ ∥ bold_italic_g ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ italic_n italic_P, where n𝑛nitalic_n is the dimensionality of 𝒈𝒈{\bm{g}}bold_italic_g.One can obtain a stronger result in cases where the l2subscript𝑙2l_{2}italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT sensitivity is bounded (e.g., Theorem 2 in (guo2023analyzing)).

It is worth noting that the goal of our analyses here is to provide a perspective for understanding the effectiveness of a class of defense strategies, rather than deriving tight bounds.Additionally, as mutual information is a statistical quantity, the mutual information interpretation of inference privacy inherently only captures the average-case privacy risk.In the next section, we provide a privacy auditing framework for empirically estimating the privacy risk by approximating the worst-case scenario.

ε𝜀\varepsilonitalic_εAdversary TypeAUROCTPR@1%FPRASRAdvantage
96.90MaxPooling
0.3004
±plus-or-minus\pm±0.0773
0.0017
±plus-or-minus\pm±0.0010
0.5732
±plus-or-minus\pm±0.1124
0.0001
±plus-or-minus\pm±0.0002
96.90PCA
0.9825
±plus-or-minus\pm±0.0112
0.7284
±plus-or-minus\pm±0.1679
0.9437
±plus-or-minus\pm±0.0222
0.8239
±plus-or-minus\pm±0.0694
6.46PCA
0.7010
±plus-or-minus\pm±0.0278
0.0471
±plus-or-minus\pm±0.0120
0.6995
±plus-or-minus\pm±0.0091
0.0598
±plus-or-minus\pm±0.0286

7. Empirical Estimation of Privacy Risk

In the privacy game defined in Definition3.1, the data is randomly sampled from the distribution, which only captures the average-case privacy risk and therefore cannot be used for reasoning about the minimal level of noise required for ensuring a certain level of privacy, as it may underestimate the privacy risk in the worst case.To better understand the privacy risk in the worst-case scenario, we provide a privacy auditing framework for empirically estimating the privacy leakage of a specific type of inference attack, namely, attribute inference, by allowing the data to be chosen adversarially.We start with a formal definition of per-attribute privacy following prior work(ahmed2016social; ghazi2022algorithms):

Definition 7.1.

Per-attribute DP.A randomized mechanism {\mathcal{M}}caligraphic_M is (ε,δ)𝜀𝛿(\varepsilon,\delta)( italic_ε , italic_δ )-per-attribute DP if for all pairs of inputs x,x𝑥superscript𝑥x,x^{\prime}italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT differing only on a single attribute and for all events S𝑆Sitalic_S defined on the output of {\mathcal{M}}caligraphic_M, the following inequality holds:

[(x)S]eε[(x)S]+δ.𝑥𝑆superscript𝑒𝜀superscript𝑥𝑆𝛿\operatorname{\mathbb{P}}[{\mathcal{M}}(x)\in S]\leq e^{\varepsilon}\cdot%\operatorname{\mathbb{P}}[{\mathcal{M}}(x^{\prime})\in S]+\delta.blackboard_P [ caligraphic_M ( italic_x ) ∈ italic_S ] ≤ italic_e start_POSTSUPERSCRIPT italic_ε end_POSTSUPERSCRIPT ⋅ blackboard_P [ caligraphic_M ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ italic_S ] + italic_δ .

One can show that DP-SGD satisfies (ε,δ)𝜀𝛿(\varepsilon,\delta)( italic_ε , italic_δ )-per-attribute DP. However, it is hard to derive the privacy parameter analytically, as the per-attribute sensitivity of the gradient is not readily tractable and the common technique of gradient clipping only provides a very loose bound on sensitivity.Instead, we seek to obtain an empirical estimate of the per-attribute DP for each step through the following audit game.

Analyzing Inference Privacy Risks Through Gradients In Machine Learning (2024)
Top Articles
Latest Posts
Article information

Author: Jonah Leffler

Last Updated:

Views: 5660

Rating: 4.4 / 5 (65 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Jonah Leffler

Birthday: 1997-10-27

Address: 8987 Kieth Ports, Luettgenland, CT 54657-9808

Phone: +2611128251586

Job: Mining Supervisor

Hobby: Worldbuilding, Electronics, Amateur radio, Skiing, Cycling, Jogging, Taxidermy

Introduction: My name is Jonah Leffler, I am a determined, faithful, outstanding, inexpensive, cheerful, determined, smiling person who loves writing and wants to share my knowledge and understanding with you.