Support batching in model binary format

Depends on #22 (closed) Fatbin extractor

We have to compile a separate TVM model for each different batch size, because the optimal execution configuration differs depending on batch size.

Currently this means completely separate model instances for different batch sizes.

However in practice different batch sizes do at least share model weights, which is the bulk of their footprint.

One hiccup is that the abstract model representation uses different names and weight layouts in the weights file.

We can untangle this though.

Some initial tests exist in commented-out code in convert.cpp which verifies that for two compiled models with different batch sizes, they have exactly the same weight blobs.

Ultimately the expected outcome of this is the following:

a single weights file, shared by all batch versions
a separate .clockwork file, .so file, and .cuda file for each batch size

e.g something like:

model.clockwork_params // weights model.1.clockwork // batch size 1 model.1.so
model.1.cuda model.4.clockwork // batch size 4 model.4.so model.4.cuda

Edited Oct 10, 2019 by Jonathan Mace