Abstract:
Many fundamental properties of neural networks are still not well
understood. This talk studies two of these from an adversarial perspective.
I begin with my main line of research and examine the apparently-fundamental
susceptibility of neural networks to adversarial examples. I develop effective
algorithms for generating adversarial examples and find that most most training
regimes are ineffective at increasing robustness. Then, I perform a brief
examination of neural network memorization, and demonstrate that training
data can be efficiently extracted from a trained model given only black-box
access to that model. I conclude with directions for future research.