**Adversarial Attacks**

In progress

With recent technological advances, the use of deep neural networks (DNN) have widespread to numerous applications ranging from biomedical imaging to the design of autonomous vehicle. The reasons of their prosperity strongly rely on the increasingly large datasets becoming available, their high expressiveness and their empirical successes in various tasks (e.g. computer vision, natural language processing or speech recognition). However, their high representation power is also a weakness that some adversary might exploit to craft adversarial attacks which could potentially lead the DNN model to take unwanted actions. More precisely, adversarial attacks are almost imperceptible transformations aiming to modify an example well classified by a DNN into a new example, called adversarial, which is itself wrongly classified.

## 1. Quick reminder about classification based DNN

Given some trained DNN \(f\), the predicted label of any input \(x\in\mathcal{X}\subseteq\mathbb{R}^P\) by \(f\) is denoted as

\[C_f(x) = \underset{i\in\{1,\ldots,c\}}{\mathrm{argmax}}\, f_i(x)\]## 2. Span of adversarial attacks

There exist multiple definition of adversarial examples depending on whether we enforce that the adversarial example yields a specific target predicted by the DNN \(f\) or not.

**Untargeted and targeted attacks.**
Given some valid input instance \(x\in\mathcal{X}\), the adversarial example \(a\) is said to be

*Untargeted*if \(C_f(a)\neq C_f(x)\).*Targeted*, with target \(t\neq C_f(x)\), if \(C_f(a)=t\).

In addition, there exist two main ways of crafting such adversarial example.

**Perturbation and functional attacks.**
Given some valid input instance \(x\in\mathcal{X}\), the adversarial example \(a\) is said to be

*Pertubation-based*if \(a=x+\varepsilon\) for some perturbation \(\varepsilon\) small enough.*Functional-based*if \(a=h(x)\) where \(h\) models a small degradation operator.

The key question then becomes exactly how much distortion we must add to cause the classification to change. In each domain, the distance metric that we must use is different. In the space of images, many works suggest that \(\ell_p\) norms are reasonable approximations of human perceptual distance. It should be noted that most common methods focus on perturbation-based attacks. Henceforth, it what follows we will solely consider those types of attacks. Finally, we will distinguish between per-instance and universal perturbations.

**Per-instance and universal attacks.**
Given some data distribution \(\mu\) on \(\mathcal{X}\), we consider two types of perturbations, namely

*Per-instance*if for every \(x\sim\mu\) there exist \(\varepsilon(x)\) such that \(a=x+\varepsilon(x)\) is an adversarial example.*Universal*if there exist \(\varepsilon\) such that for every \(x\sim\mu\), \(a=x+\varepsilon\) is an adversarial example

## 3. Per-instance attacks

**L-BFGS** [Szegedy et al., 2014]. This work is the first that noticed the existence of adversarial examples for image classification. Given some adversarial target \(t\neq C_f(x)\), solve

where the regularization parameter \(\lambda>0\) is determined by line-search in order to ensure that \(C_f(x+\varepsilon)=t\). The authors have considered the case where \(\mathcal{X}=[0,1]^P\) so that the constraint enforces the \(P\) pixels to lie inside a box. In addition, they have promoted the use of a box-constrained L-BFGS solver, which hence gave its name to such adversarial crafting technique.

**FGSM** [Goodfellow et al., 2015]. The *Fast Gradient Sign Method* is one of the first effective technique to craft an adversarial perturbation. The underlined idea is to perform a single \(\delta\) step in the direction given by the sign of the gradient of the training loss with respect to the input image \(x\), i.e.,

**IFGSM** [Kurakin et al., 2017]. This technique is a multi-step iterative variant of FGSM where the adversarial example is updated until it fools the DNN. More formally, it reads

where \(\mathcal{B}\) denotes the space of allowed perturbations.

**PGD** [Madry et al., 2018]. The same previous idea was also conducted by different authors who termed the method *PGD* since it boils down to a *Projected Gradient Descent* algorithm. The only difference lies in the initial point. While for IFGSM, the initial point is \(x\), there the initial point is randomly sampled in a ball centered in \(x\).

**DeepFool** [Moosavi-Dezfooli et al., 2016]. A more elaborated, yet similar approach, consists in finding the adversarial perturbation \(\varepsilon(x)\) as the solution of the following optimization problem

**CW** [Carlini and Wagner, 2017]. A similar idea to DeepFool is pursued by Carlini and Wagner by considering the fooling requirement as a regularization instead of a constraint, i.e.,

where the first term penalizes the \(\ell_p\)-norm of the added perturbation while the second term enforces the fooling of the DNN classifier \(f\) by means of the function \(g\).

**LogBarrier** [Finlay et al., 2019]. Let \(k=C_f(x)\) be the predicted target of \(x\) by the DNN \(f\). If it is well trained then it should correspond to the label \(y\). Thus, a necessary and sufficient condition for a misclassified adversarial example \(x+\varepsilon\) is to have \(\max_{i\neq k} f_i(x+\varepsilon) - f_k(x+\varepsilon)>0\) with \(\varepsilon\) small. On the one hand, a small perturbation \(\varepsilon\) can be found by minimizing a criterion \(\ell\). One the other hand, the misclassication constraint can be enforced through a negative logarithm penalty (i.e., a logarithmic barrier) weighted by a regularization parameter \(\lambda>0\). The resulting problem reads

## 4. Universal and semi-universal attacks

**UAP** [Moosavi-Dezfooli et al., 2017]. This work seeks for a *Universal Attack Perturbation* that fools the classifier on almost all training points. To do so, the authors have designed an algorithmic solution which relies on an inner loop applying DeepFool to each training instance.

**UAP-v2** [Shafahi et al., 2020]. This method frames the crafting of universarial perturbations as an optimization problem, i.e.,

Contrary to the original UAP, it benefits from more efficient solvers since it can be solved using gradient ascent based methods.

**ADiL** [Frecon et al., 2021]. Contrary to all aforementioned methods, this work is semi-universal as it crafts each adversarial example as \(a(x_i)= x_i + \varepsilon(x_i)\) with \(\varepsilon(x_i)=D v_i\) where \(D\) is a universal dictionary while \(v_i\) is a per-instance coding vector. Given some adversarial targets \(\{t_1,\ldots,t_N\}\), it solves

where \(\mathcal{C}\) encodes some constraints on \(D\) while \(\lambda_1>0\) and \(\lambda_2>0\) are regularization parameters.

## References

**[Carlini and Wagner, 2017]**N. Carlini and D. Wagner. "Towards Evaluating the Robustness of Neural Networks". IEEE Symposium on Security and Privacy (2017)**[Finlay et al., 2019]**C. Finlay, A.-A. Pooladian and A. Oberman. "The logbarrier adversarial attack - making effective use of decision boundary information". Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)**[Frecon et al., 2021]**J. Frecon, L. Anquetil, G. Gasso and S. Canu. "Adversarial Dictionary Learning". ConfĂ©rence sur l'Apprentissage Automatique (CAp) (2021)**[Goodfellow et al., 2015]**I. Goodfellow, J. Shlens and C. Szegedy. "Explaining and Harnessing Adversarial Examples". International Conference on Learning Representations (ICLR) (2015)**[Kurakin et al., 2017]**A. Kurakin, I. Goodfellow and S. Bengio. "Adversarial examples in the physical world". International Conference on Learning Representations (ICLR) - Workshop Track (2017)**[Madry et al., 2018]**A. Madry, A. Makelov, L. Schmidt, D. Tsipras and A. Vladu. "Towards Deep Learning Models Resistant to Adversarial Attacks". International Conference on Learning Representations (ICLR) (2018)**[Moosavi-Dezfooli et al., 2016]**S.-M. Moosavi-Dezfooli, A. Fawzi and P. Frossard. "Deepfool - A Simple and Accurate Method to Fool Deep Neural Networks". IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)**[Moosavi-Dezfooli et al., 2017]**S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi and P. Frossard. "Universal Adversarial Perturbations". IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)**[Shafahi et al., 2020]**A. Shafahi, M. Najibi, Z. Xu, J. Dickerson, L.S. Davis and T. Goldstein. "Universal Adversarial Training". Proceedings of the AAAI Conference on Artificial Intelligence (2020)**[Szegedy et al., 2014]**C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow and R. Fergus. "Intriguing properties of neural networks". International Conference on Learning Representations (ICLR) (2014)