Table of Contents
- 1. Bayesian Learning vs Deep Learning
- 2. Papers
1 Bayesian Learning vs Deep Learning
- Bridge the gap between human learning and deep learning
- Author states biased disclaimer
- Author paper - "Learning-Algorithms from Bayesian Principles"
- Author site
- Author Github
1.1 BL models
- GP, BayesNets, PGMS,
- Trained with prior/posterior
- Can estimate uncertainty
- Sequential active online learning (lifelong learning)
- Integration (Global) allows for uncertainty quantification
1.2 DL Models
- Frequentist
- Stocastic, Scalable training
- better with larger more comlex models
- Differentiation (local) aka black box
1.3 Bayesian Principles
- Sample -> prior
- Score -> likelihood
- Normalize -> posterior = likelihood x Prior
- Bayesian learning is able to account for changes in data that cannot be accounted for in typical batch style learning.
1.4 Bringing the two worlds together
- Adding to loss by taking loss from a distribution
- Second order methods can be dervied from bayes by choosing a mutlivariate Gaussian
- Showed gradient descent, and newtons method from Bayes
1.4.1 RMSProp and Adam from Bayes
- Choose gaussian from diagonal covariance
- Replace Hessian by square of gradients
- Add square root for scaling vector
1.4.2 Bayes as optimization
- Expecctation of the loss function over a distribution plus entropy creates variational inference.
1.5 Uncertainty
- What the model doesnt know
- gave example of the frequency of earthquakes and two lines of fit.
- quantifying uncertainty avoids bias in data
1.6 Some bayesian deep learning mthods
- MC-Dropout [1]
- SWAG [2]
1.7 Variational inference methods
- Enable flexible distributions
- do not scale well to large problems (imagenet scale)
1.8 Variational Online Gauss-Newton: VOGN
- improve RMSprop with a bayesian touch
1.9 Model View vs. Data View
- Most ML models have the "model view" which can only see the line between the two classes.
- Bayesian models see the datapoints which are close to the line which is much more important than the line itself
- this constitutes the difference between the view of model and data
2 Papers
- "Dropout as a bayesian approximation" [1]
- "A simple baseline for bayesian uncertainty in deep learning" [2]
- "Practical variational inference for neural networks"
- "Weight uncertainty in neural networks"
- VOGN: "Fast and scalable Bayesian deep learning by weight pertubation in Adam" Khan et al
- "Overcoming catastrophic forgetting" kirkpatrick - Elastic Weight Consolidation paper