‎

Table of Contents

1. Hands on Workshop: Implementing High Performance AI Workloads with Habana AI Processors
2. Tools used in Talk

1 Hands on Workshop: Implementing High Performance AI Workloads with Habana AI Processors

Met with speaker earlier in the day.
Seemed to have chip running on a PC with ubuntu 18.04
AI processor needed to address high latency in GPU and low throughput in CPU
Bulk of processing resides in matrix multiples

1.1 Goya AI processor

uses 8 TPC cores(tensor processing cores) with local memory
shared memory between all cores
Fits into PCIe 4.0

1.2 Synapse API

Supports ONNX, MxNet, and tensorflow parsers
Habana Graph compiler compiles into recipe that will be given to inference stack

1.3 Habana IR

profile + network description -> recipe
Profile is vehicle for fine-tuning performance vs accuracy

1.4 Quantization

very important
should be the same for all examples, gave example of resnet on imagenet with three catagories.
called through `hb.model.calibrate.quantize()`

1.5 compilation

quantize, set_inputs_info, aquire device, compile the recipe

1.6 difference between TesnorRT and Habana AI

TensorRT is agnostic to batch size
Habana is optimized to a given batch size
Habana is better with smaller batches, TensorRT is better with larger batches

1.7 Latency

Acheived senteces/seq for Bert: 1507.62
Thousands of images a second for ResNet

1.8 Market

Claims that they have the best "available" card on the market
Mentioned that Nvidia also has a similar card but that consumes much more power
So the draw of the card is the inference capabilty and the low power draw.

2 Tools used in Talk

Netron
Chrome://tracing
a bunch of builtin habana tools

Author: Sam Partee

Created: 2019-12-08 Sun 15:04