‎

1. NeurIPS 2019

1 NeurIPS 2019

This repository is a collection of the papers and my notes from attended talks and other various parts of NeurIPS 2019. I created this repository mostly for my own sake, but also because I found it painstaking to find the papers and slides for the various talks I wanted to go to. Included also are my personal notes that will be updated as the week goes on.

Slides Live

1.1 Sunday

1.1.1 Habana Labs

Notes
Talk was a brief showcase of the Habana Goya Inference Processor. If youre not familiar with this type of processor, they allow you to fine-tune the inference of your model by reducing the precision of your model without accuracy lost. This sentence liberally taken from the Nvidia website and summerizes the importance of reduced precision:

Reduced precision inference significantly reduces application
latency, which is a requirement for many real-time services,
auto and embedded applications.

1.1.2 Facebook Hardware

Notes
Very interesting talk on the Open Accelerator Intrastructure. Couldn't stay for the whole thing, but the jist of it is that the OAI stack brings an open source software feel to hardware. The stack aims to reduce the time to integration for AI systems. There is an impressive number of companies behind it including Habana. I couldl't stay for whole talk, but for more detail see link below.
OAI

1.1.3 ML in Finance

Notes
Personal Note: Two Sigma folks were very nice and approachable.
Attend the end of the J.P. Morgan talk and the Two Sigma talk in this session. Both were very high level talks about ML in finance. J.P. Morgan covered the basics of RL and their application to financial data. Two Sigma's talk broke down different uses of ML for finding opportunies within the market. They provided an example of how break down an instagram photo into consumable data for CNNs(object-recognition) and LSTMs(sentiment analysis). The process of training forecasting models was also covered.

1.2 Monday (tutorials)

1.2.1 Deep Learning with Bayesian Principles

Notes
Author, Mohammad Emtiyaz Khan, focused on the benefits of combining Bayesian Learning and Deep Learning approaches. He showed how to derive common DL optimizers like Adam and RMSProp from Bayesian principles. Khan stressed the importance of such Bayesian principles in lifelong learning. This talk was one of my favorites because Khan did a great job of distilling the benefits of Bayesian Learning into well explained equations.

1.2.2 Efficent Processing of Deep Neural Network

Notes
Vivenne Sze discussed the various processing methods available and being researched for AI computation. Specifically, Sze drilled home the impact of reads from DRAM in training. This talk was quite dense and covered many aspects of AI processing from chip specifics to co-design and Neural Architecture Search (NAS).

1.2.3 Reinforcement Learning: Past, Present, and future Prospectives

Notes
Katja Hofmann, a part of Microsoft Research, talked about Reinforcement Learning (Rl) from inception to standard practice. Hofmann outlined Deep Q-Learning and some improvements that have been made to the method such as Boostrapped DQNs from Osband et al. (2018). She then explained the Actor Critic model and shared a number of papers I am going to go read.

1.3 Tuesday

1.3.1 Uniform Convergence may be unable to Explain generalization in Deep Learning

Paper

1.3.2 Logarithmic Regret for Online Control

Slides
Paper

1.3.3 Legendre Memory Units: Continuous Time Representation in RNNs

Supplementary material
Code
Paper
Great talk that showed a nice hybrid arch of LSTM and LMU units for DL

1.3.4 Point Voxel CNN for Efficent 3D Deep Learning

Site
Paper
Won gold model in Lyft Challenge
Shows large improvement from Point NN

1.3.5 Conditiional Independance Testing Using GANs

Slides
Paper
GANS in high dimensional spaces

1.3.6 Machine Learning Meets Single-Cell Biology: Insights and Challenges

Speaker: Dana Pe'er
Biology is becoming a data science
Humancellatlas.org
Mentioned the success of tSNE for biological data
- tSNE and PCA
Was able to use DS methods to discover a new cell type that formed a checkpoint in DNA structure. Cell was as rare as 7 in 10000.
Was able to map the spatio-temporal development of mammalian endoderm.
- Setty et al (2019), Nature

1.3.7 Causal Confusion in Imitation Learning

1.3.8 Generative Modeling by Estimating Gradients of the Data Distribution

Paper
Stable training as opposed to GANs
Better or comparable sample quality to GANs
Inpainting example was quite cool

1.3.9 Reducing the Varience in Online Optimization by Transporting Past Gradients

1.3.10 SySCD: A System-Aware Parallel Coordinate Descent Algorithm

1.3.11 Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies

Slides

1.3.12 Hindsight Credit Assignment

Video
Slides

1.3.13 Weight Agnostic Neural Networks

Paper
Slides
Site
Questions the importance of architechture in the learning process. Makes the comparison to Precocial species who have certain abilities provided at birth.
Shows that a neural architechture search can perform on multiple reinforcement learning and supervised learning tasks.
Based on NEAT

1.3.14 Other Papers

Neural Networks with Cheap Differential Operators
Sequential Neural Processes

1.4 Wednesday

1.4.1 Fast and Accurate Least-Mean-Squares Solvers

Paper
Novel method to compute caratheodory set
Mentioned heavy dependance on the dimensionality being low for benefits to show.

1.4.2 Calibration tests in multi-class classification: A unifying framework

Paper

1.4.3 Verified Uncertainty Calibration

Code
Paper
Platt scaling, cannot estimate calibration
debiased estimator

1.4.4 Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations

1.4.5 Principal Component Projection and Regression in Nearly Linear Time through Asymmetric SVRG

Slides
Paper
Combining PCA and linear projection and regression (PCP, PCR)

1.4.6 PIDForest: Anomaly Detection via Partial Identification

Slides
Paper

1.4.7 Guided Similarity Seperation for Image Retrieval

Paper
Graph convolutionall network that models the descriptor graph where the descriptors are what are compared against for images to be retrieved.

1.4.8 CNAPs: Fast and Flexible Multi-Task Classification Using Conditional Neural Adaptive Processes

Paper
Slides
Code
Showed added classes and added test images without re-training
Meta-Dataset: a dataset of datasets

1.4.9 Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation

Slides

1.4.10 Efficient Meta Learning via Minibatch Proximal Update

Slides

1.4.11 Reconciling meta-learning and continual learning with online mixtures of tasks

Slides

1.4.12 Beyond Online Balanced Descent: An Optimal Algorithmfor Smoothed Online Convex Optimization

Slides
Paper
Goal wasnt to minimize regret, was to acheive optimal competative ratio.
Competitive ratio: Minimize cost of learner action compared to best possible course of action

1.4.13 Strategizing against No-regret Learners

Two types of agents: Strategic and Learning
Strategic agents maximize utility whereas learning agents play to learn how to play
Bidders behavior in online auctions is laregly consistent with a no-regret learner
mean-based algorithm: play the historically best action

1.5 Thursday

1.5.1 Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs

Video
Code
Introduced "Brain-Score" evaluation system
- Measures similarty in human mistakes to artificial neural mistakes
CORnet-s performs similar to ResNet on imagenet with signifigantly fewer layers
Theme = shallow networks with recurrent layers can outperform deep networks

1.5.2 Learning Perceptual Inference by Contrasting

Site
Slides

1.5.3 Universality and individuality in neural dynamics across large populations of recurrent networks

Slides

1.5.4 Better Transfer Learning with Inferred Successor Maps

Slides
Transfer learning in RL
Mentioned Dayan 1993, fifth time ive heard that here. Need to read that.
Clustered tasks by similarty in reward
Bayesian Successor Representation

1.5.5 A Unified Theory for the Origin of Grid Cells through the Lens of Pattern Formation

Slides
Site
Code (doesnt seem to be posted yet)
Neural networks learn grid patterns
High level but very interesting talk

1.5.6 Infra-slow brain dynamics as a marker for cognitive function and decline

Slides

1.5.7 Agency + Automation: Designing Artificial Intelligence into Interactive Systems

1.5.8 Making AI Forget You: Data Deletion in Machine Learning

Slides
Paper

1.5.9 XLNet: Generalized Autoregressive Pretraining for Language Understanding

Paper

1.5.10 Other Papers

On the Downstream Performance of Compressed Word Embeddings

1.6 Workshops

1.6.1 MLSys: Systems for machine learning

This workshop surronds tools for machine learning. Many of the tools focused on distributed training, model compilation, and workflow improvements
Some highlights:
- SKTime: Think scikit-learn but for time series.
- Condensa: Programmable Model Compression
- NeMo: toolkit for building AI applications using neural modules
Vivenne Sze Talk
- stressing that DRAM reads are expensive and MACS are not (relatively)
- energy estimation tool
- (More or less the same talk as the other keynote she gave)

1.6.2 Machine Learning for Physical Sciences

Modeling Turbulent Flow
- Rayleigh Bernard convection model
- Author's Tf-flow model was able to accurately model small scale eddies
- Could not speak to what would happen at an increased resolution
- Trained in 10K sequences of RBC data
JAX: MD simulations in pure python
- JIT compiled for GPU
- Ability to run model step by step
- ML as a first class citizen, any function can be a neural network
- JAX
- Paper
Katie Boumann: Black Hole discovery
Alán Aspuru-Guzik
- ML and MD
- SMILES
- Autoencoders for drug discovery
- "Self driving laboratory" = least amount of experiments, optimal outcome
- ChemOS, ChemOS
- Author warns agasint the use of smiles due to grammer constraints. suggests that SELFIES should be used. SELFIES is feature incomplete
Lenka Zdeborova

Table of Contents