‎

Table of Contents

1. Efficent Processing of Neural Networks
2. Co-Design
3. Tools

1 Efficent Processing of Neural Networks

Speaker: Vivienne Sze
processing at the edge instead of the cloud
ex. autonomous vehicles 6 gigs of data every three seconds
existing processors consume too much power
Given slowdown of moores law and denard scaling, we need specialized hardware.

1.1 Points of Talk

What are the ky metrics?
what are the callenges towards acheiving these metrics?
what are the design considerations and tradeoffs?

1.2 DNNs

Key operation is the multiply and accumulate (MAC) 90% of computation

1.3 Metrics

Accuracy
- Consider quality of result
Throughout
- important for real time performance
Latency
- autonomous driving
Energy and Power consumption
- embedded devices are limited battery capacity
Hardware Cost
Flexibility
- Range of DNN models and abiltiy on tasks
- ability to support future models
Scalability
- performance should scale with more resources

1.4 Design objectives of a NN processor

1.4.1 Reduce the time per MAC

reduce instruction overhead
increase clock frequency

1.4.2 Avoid unecessary MACS

1.4.3 increase parallelism

Perform MACS in parallel

1.4.4 increase PE utilization

distribute workload
balanced workload, weakest link phenomena
memory bandwidth to get workload to the PE

1.4.5 evaluation: Eyexam

graph of MAC/cycle vs MAC/data
slope at beginning due to memory bound compute, once data is there problem becomes compute bound.

1.5 Power Consumption of a NN processor

MACS actually are not what are taking power, its actually reading the data.
DRAM is orders of magnitude more expensive than 16b FP multiply

1.5.1 To reduce power usage

Reduce data movement
Reduce energy per MAC
Reduce unnecessary MACS

1.6 Specifications to evaluate metrics

1.6.1 Accuracy

difficulty of dataset and task should be considered

1.6.2 Throughput

Numer of PEs with utilization stats

1.6.3 Latency

batch size used in evaluation

1.6.4 Energy and Power

no sufficent to just report chip power consumption but also off chip memory access power consumption
Without DRAM estimates one could claim low power consumption but fail drastically at evaluation time

1.6.5 Hardware Cost

on chip storage, # of PEs, chip area

1.6.6 Flexibility

number of models supported without customization

1.7 Reduce ops in Matrix Multiply

FFT: direct conv but increases storage
Strassen: slightly faster but can lead to numerical instability
Winograd

1.8 Reduce Instruction Overhead

Perform more macs per second
- GPU: HMMA nivida instruction 64 macs
- CPU: Specialized vector neural network instruction

1.9 Properties we can leverge

Throughput: DRAM accesses are the bottleneck, around 200x more than MACS for Alexnet
Input data reuse
- filter resuse
- convolutional reuse
Spatial Arch (efficent dataflow)
- small on node memory, with inter node communication
- allows weights, activations and partial sums to live closer to the chip

2 Co-Design

2.1 Quantization

reduce the precision to reduce latency in training and inference
methods
- linear quantiation
- log quantization
- non-linear quantization
- 8-bit training with stochasic rounding

2.2 Design considerations for reduced precision

impact on accuracy
does hardware cost exceed benefits?
evaluation
- 8 bit for inference and 16 bit for training (standard benchmark)

2.3 Sparsity

when using activations like ReLu there are a number of weights to 0
Gate Operation (reduce power consumption)
Skip operations (increase throughput)
Compression to reduce data movement
Pruning (optimal brain damage, love that term)

2.4 Design considerations for Sparsity

similar to reduced precision
impact on accuracy
Do you need extra hardware to identify sparsity?

2.5 Neural Architecture Search (NAS)

complexity = num_smaples x time_persample

2.5.1 three main components

search space (what is the set of all samples)
optimization (where to sample)
performance evaluation (how to evaluate)

2.5.2 design considerations

optimization alogrithm may limit search space
probability of convergence to a good model is a commonly overlooked property

3 Tools

NetAdapt
- Platform Aware DNN adaption
- on github.

Author: Sam Partee

Created: 2019-12-09 Mon 13:00