CS103 ‐ Topics in Representation Learning, Information Theory and Control
Caltech, Winter Term 2019
Location: Friday 9-11am in 106 Annenberg
Instructor: Alessandro Achille (achille@cs.ucla.edu)
Course description
Prerequisites: Background in machine learning, probability, Information Theory
Summary: This course will cover current topics on information theory, deep representation learning, and control. We will use an information theoretic formalisation to study properties of the representation learned by deep networks, with focus on invariance (e.g., translations, shape variability) and compositionality of the representation. We will then see how this representations can be learnt, though implicit or explicit biases, by deep networks, and exploited to perform efficient exploration and control.
Course Details
- First meeting: Friday Jan 11 at 9am in 106 Annenberg
- Lectures on Friday 9-11am in 106 Annenberg
- Office hours: Friday 11am-1pm (after class), or contact the instructor
- Piazza: piazza.com/caltech/winter2019/cs103
- Grading: 70% homework assignments, 20% final review of a paper, 10% participation in class and on Piazza
Instructor
Alessandro Achille (achille@cs.ucla.edu)Schedule
Weeks | Topics |
---|---|
1 | Introduction: Overview of the class, embodied intelligence (sensing, cognition, action), role of representations |
2-3 | Invariant representations: overview of invariant representations, different formalizations (group invariance/equivariance, contractive representations, statistical independence), deep convolutional representations |
4 | Task-relevant information and information theoretic formalization of invariance: review of information theory, rate-distortion theory, Information Bottleneck, Actionable Information, minimality and invariance |
5-6 | Learning invariant representation: MDL principle, Variational Auto-Encoders, stochastic optimization, SGD as Variational Inference, Kramer's rate |
7 | Life-long learning of compositional representations: Compositionality and disentanglement, current formalizations (total correlation, linear action, causality), β-VAEs |
8-9 | Information and Actions: Variational Inference in Control/Reinforcement Learning, Visual Turing Test, information theoretic duality for control |
Lectures
Lecture slides will be here, together with links to suggested readings for the class.- Lecture 1: Slides, Readings: Steps Towards a Theory of Visual Information, Section 1
- Lecture 2: Slides, Notes Readings: On the Generalization of Equivariance and Convolution in Neural Networks to the Action of Compact Groups, Spatial Transformer Networks
- Lecture 3: Slides Readings: Rich feature hierarchies for accurate object detection and semantic segmentation, Object Recognition from Local Scale-Invariant Features
- Lecture 4: Slides Readings: Elements of Information Theory, T. Cover and J. Thomas, Chapter 2, The information bottleneck method
- Lecture 5: Slides Readings: Emergence of Invariance and Disentanglement in Deep Representations, Sections 2-3
- Lecture 6: Slides Readings: Understanding disentangling in β-VAE
- Lecture 7: Slides Readings: Kolmogorov's Structure Functions and Model Selection, Emergence of Invariance and Disentanglement in Deep Representations
- Lecture 8: Slides Readings: A PAC-Bayesian Tutorial with A Dropout Bound, Critical Learning Periods in Deep Neural Networks
- Lecture 9: ...
Assignments
Please submit your assignments by email (achille@cs.ucla.edu) using the subject "CS103 - Assignment {N}".- Assignment 1: link, Due: Friday February 1, Data for coding assignment: version 1, version 2
- Assignment 2: link, Due: Monday February 25
- Paper review: Write a short review of one paper on the course topics (see also Suggested Readings) Due: Friday March 15
Write a short review (up to 1 page) for one paper of your choice on one of the topics covered in class. The review should follow the format of reviews for Machine Learning or Computer Vision conferences (for example, see the CVPR reviewer guidelines). In particular, focus on summarizing the main idea of the paper, explain how this relates to previous literature, and what are strengths and shortcomings of the paper.