GPU frequency scaling
Different models are more computationally intensive and heat up the GPU more or less. We can profile the steady-state frequency of the GPU for each model, and include this in the performance profile for the model. Then, instead of reacting to frequency changes of the GPU, we can simply pre-select this frequency before running the model. This would further attest to the "exploiting predictability" part of Clockwork.
To do this, we would need to check the following:
- Is it fast to change the GPU clock speed on the critical path of inference requests?
- Does this actually work?
- Do we get the same steady-state measurements in practice as we do in isolation?