Paper 1: Vincent et al (2008) Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion. JMLR 2008
Analysis of SSL methods
- Paper 2: Kolesnikov, Zhai and Beyer (2019) Revisiting Self-Supervised Visual Representation Learning. CVPR 2019
- Paper 3: Zhai et al (2019) A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark (A GLUE-like benchmark for images) ArXiv 2019
- Paper 4: Asano et al (2019) A critical analysis of self-supervision, or what we can learn from a single image ICLR 2020
Contrastive methods
Paper 5: van den Oord et al. (2018) Representation Learning with Contrastive Predictive Coding (CPC), ArXiv 2018
Paper 6: Hjelm et al. (2019) Learning deep representations by mutual information estimation and maximization (DIM) ICLR 2019
Paper 7: Tian et al. (2019) Contrastive Multiview Coding (CMC) ArXiv 2019
Paper 8: Hénaff et al. (2019) Data-Efficient Image Recognition with Contrastive Predictive Coding (CPC v2: Improved CPC evaluated on limited labelled data) ArXiv 2019
Paper 9: He et al (2020) Momentum Contrast for Unsupervised Visual Representation Learning (MoCo, see also MoCo v2). CVPR 2020
Paper 10: Chen T et al (2020) A Simple Framework for Contrastive Learning of Visual Representations (SimCLR). ICML 2020
Paper 11: Chen T et al (2020) Big Self-Supervised Models are Strong Semi-Supervised Learners (SimCLRv2) ArXiv 2020
Paper 12: Caron et al (2020) Unsupervised Learning of Visual Features by Contrasting Cluster Assignments (SwAV) ArXiv 2020
Paper 13: Xiao et al (2020) What Should Not Be Contrastive in Contrastive Learning ArXiv 2020
Paper 14: Misra and van der Maaten (2020) Self-Supervised Learning of Pretext-Invariant Representations. CVPR 2020
Generative methods
- Paper 15: Dumoulin et al (2017) Adversarially Learned Inference (ALI) ICLR 2017
- Paper 16: Donahue, Krähenbühl and Darrell Adversarial Feature Learning (BiGAN, concurrent and similar to ALI) ICLR 2017
- Paper 17: Donahue and Simonyan (2019) Large Scale Adversarial Representation Learning (Big BiGAN) ArXiv 2019
- Paper 18: Chen et al (2020) Generative Pretraining from Pixels (iGPT) ICML 2020
BYoL: boostrap your own latents
- Paper 19: Tarvainen and Valpola (2017) Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 2017
- Paper 20: Grill et al (2020) Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning (BYoL). ArXiv 2020
- Paper 21: Abe Fetterman, Josh Albrecht, (2020) Understanding self-supervised and contrastive learning with "Bootstrap Your Own Latent" (BYOL) Blog post
- Paper 22: Schwarzer and Anand et al. (2020) Data-Efficient Reinforcement Learning with Momentum Predictive Representations. ArXiv 2020
self-distillation methods
- Paper 23: Furlanello et al (2017) Born Again Neural Networks. NeurIPS 2017
- Paper 24: Yang et al. (2019) Training Deep Neural Networks in Generations: A More Tolerant Teacher Educates Better Students. AAAI 2019
- Paper 25: Ahn et al (2019) Variational information distillation for knowledge transfer. CVPR 2019
- Paper 26: Zhang et al (2019) Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation ICCV 2019
- Paper 27: Müller et al (2019) When Does Label Smoothing Help? NeurIPS 2019
- Paper 28: Yuan et al. (2020) Revisiting Knowledge Distillation via Label Smoothing Regularization. CVPR 2020
- Paper 29: Zhang and Sabuncu (2020) Self-Distillation as Instance-Specific Label Smoothing ArXiv 2020
- Paper 30: Mobahi et al. (2020) Self-Distillation Amplifies Regularization in Hilbert Space. ArXiv 2020
self-training / pseudo-labeling methods
- Paper 31: Xie et al (2020) Self-training with Noisy Student improves ImageNet classification. CVPR 2020
- Paper 32: Sohn and Berthelot et al. (2020) FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence. ArXiv 2020
- Paper 33: Chen et al. (2020) Self-training Avoids Using Spurious Features Under Domain Shift. ArXiv 2020
Iterated learning/emergence of compositional structure
- Paper 34: Ren et al. (2020) Compositional languages emerge in a neural iterated learning model. ICLR 2020
- Paper 35: Guo, S. et al (2019) The emergence of compositional languages for numeric concepts through iterated learning in neural agents. ArXiv 2020
- Paper 36: Cogswell et al. (2020) Emergence of Compositional Language with Deep Generational Transmission ArXiv 2020
- Paper 37: Kharitonov and Baroni (2020) Emergent Language Generalization and Acquisition Speed are not tied to Compositionality ArXiv 2020
NLP
- Paper 38: Peters et al (2018) Deep contextualized word representations (ELMO), NAACL 2018
- Paper 39: Devlin et al (2019) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. (BERT) NAACL 2019
- Paper 40: Brown et al (2020) Language Models are Few-Shot Learners (GPT-3, see also GPT-1and 2for more context) ArXiv 2020
- Paper 41: Clark et al (2020) ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators ICLR 2020
- Paper 42: He and Gu et al. (2020) REVISITING SELF-TRAINING FOR NEURAL SEQUENCE GENERATION (Unsupervised NMT) ICLR 2020
video/multi-modal data
- Paper 43: Wang and Gupta (2015) Unsupervised Learning of Visual Representations using Videos ICCV 2015
- Paper 44: Misra, Zitnick and Hebert (2016) Shuffle and Learn: Unsupervised Learning using Temporal Order Verification ECCV 2016
- Paper 45: Lu et al (2019) ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks, NeurIPS 2019
- Paper 46: Hjelm and Bachman (2020) Representation Learning with Video Deep InfoMax. (VDIM) Arxiv 2020
the role of noise in representation learning
- Paper 47: Bachman, Alsharif and Precup (2014) Learning with Pseudo-Ensembles NeurIPS 2014
- Paper 48: Bojanowski and Joulin (2017 ) Unsupervised Learning by Predicting Noise. ICML 2017
SSL for RL, control and planning
- Paper 49: Pathak et al. (2017) Curiosity-driven Exploration by Self-supervised Prediction (see also a large-scale follow-up) ICML 2017
- Paper 50: Aytar et al. (2018) Playing hard exploration games by watching YouTube (TDC) NeurIPS 2018
- Paper 51: Anand et al. (2019) Unsupervised State Representation Learning in Atari (ST-DIM) NeurIPS 2019
- Paper 52: Sekar and Rybkin et al. (2020) Planning to Explore via Self-Supervised World Models. ICML 2020
- Paper 53: Schwarzer and Anand et al. (2020) Data-Efficient Reinforcement Learning with Momentum Predictive Representations. ArXiv 2020
SSL theory
- Paper 54: Arora et al (2019) A Theoretical Analysis of Contrastive Unsupervised Representation Learning. ICML 2019
- Paper 55: Lee et al (2020) Predicting What You Already Know Helps: Provable Self-Supervised Learning ArXiv 2020
- Paper 56: Tschannen, et al (2019) On mutual information maximization for representation learning. ArXiv 2019.
Unsupervised domain adaption
- Paper 57: Shu et al (2018) A DIRT-T APPROACH TO UNSUPERVISED DOMAIN ADAPTATION. ICLR 2018
- Paper 58: Wilson and Cook (2019) A Survey of Unsupervised Deep Domain Adaptation. ACM Transactions on Intelligent Systems and Technology 2020.
- Paper 59: Mao et al. (2019) Virtual Mixup Training for Unsupervised Domain Adaptation. CVPR 2019
- Paper 60: Vu et al. (2018) ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation CVPR 2019
Scaling
- Paper 61: Kaplan et al (2020) Scaling Laws for Neural Language Models. ArXiv 2020