Table of Contents
1 Hands on Workshop: Implementing High Performance AI Workloads with Habana AI Processors
- Met with speaker earlier in the day.
- Seemed to have chip running on a PC with ubuntu 18.04
- AI processor needed to address high latency in GPU and low throughput in CPU
- Bulk of processing resides in matrix multiples
1.1 Goya AI processor
- uses 8 TPC cores(tensor processing cores) with local memory
- shared memory between all cores
- Fits into PCIe 4.0
1.2 Synapse API
- Supports ONNX, MxNet, and tensorflow parsers
- Habana Graph compiler compiles into recipe that will be given to inference stack
1.3 Habana IR
- profile + network description -> recipe
- Profile is vehicle for fine-tuning performance vs accuracy
1.4 Quantization
- very important
- should be the same for all examples, gave example of resnet on imagenet with three catagories.
- called through `hb.model.calibrate.quantize()`
1.5 compilation
- quantize, setinputsinfo, aquire device, compile the recipe
1.6 difference between TesnorRT and Habana AI
- TensorRT is agnostic to batch size
- Habana is optimized to a given batch size
- Habana is better with smaller batches, TensorRT is better with larger batches
1.7 Latency
- Acheived senteces/seq for Bert: 1507.62
- Thousands of images a second for ResNet
1.8 Market
- Claims that they have the best "available" card on the market
- Mentioned that Nvidia also has a similar card but that consumes much more power
- So the draw of the card is the inference capabilty and the low power draw.
2 Tools used in Talk
- Netron
- Chrome://tracing
- a bunch of builtin habana tools