Support batching in model binary format
Depends on #22 (closed) Fatbin extractor
We have to compile a separate TVM model for each different batch size, because the optimal execution configuration differs depending on batch size.
Currently this means completely separate model instances for different batch sizes.
However in practice different batch sizes do at least share model weights, which is the bulk of their footprint.
One hiccup is that the abstract model representation uses different names and weight layouts in the weights file.
We can untangle this though.
Some initial tests exist in commented-out code in convert.cpp which verifies that for two compiled models with different batch sizes, they have exactly the same weight blobs.
Ultimately the expected outcome of this is the following:
- a single weights file, shared by all batch versions
- a separate .clockwork file, .so file, and .cuda file for each batch size
e.g something like:
model.clockwork_params // weights
model.1.clockwork // batch size 1
model.1.so
model.1.cuda
model.4.clockwork // batch size 4
model.4.so
model.4.cuda