Implement a generic controller that handles loading workers current state, loading new models, and profiling models
This MR introduces a base controller implementation that will handle the model-loading and setup, before handing control over to the desired scheduler.
Scheduler
The Scheduler interface is a simpler version of controller logic. It only has to deal with infer
requests, and it doesn't have to set up workers.
The scheduler has a method call start
where it will be handed the worker connections (for it to use to send workers actions), as well as a ClockworkState
object which captures the current state of all workers.
The scheduler can assume that, once it begins, no further model loading will occur. It only needs to deal with infer requests from clients, sending actions to workers, and receiving results back from workers.
ClockworkState
This includes all of the workers, gpus, the size of their memory, all loaded models, and profiled execution and loading times.
Currently, profiling isn't implemented, so the exec and load times will show up as 0. This will be implemented soon
ControllerWithStartupPhase
The ControllerWithStartupPhase class in src/clockwork/controller/controller.h
.
This is the underlying controller implementation. To use it, pass it an instance of a scheduler that you want it to transition to once startup is complete.
The logic for the startup phase is as follows:
- First, it queries workers for their currently loaded models
- It will then wait for
LoadModelFromDisk
requests from clients. - After 10 seconds of inactivity (ie, once it stops receiving any new
LoadModelFromDisk
requests), the controller will start profiling the loaded models - After profiling has completed, the controller will activate the scheduler, and forward all infer requests and worker results to it.
Subtle points:
- The 10-second countdown only begins after at least one
LoadModelFromDisk
has been received. The countdown will reset if any new requests are received. It also doesn't start counting down until all pending LoadModel requests have completed. - The profiling stage isn't currently implemented, and all measurements are initialized to 0. This will come soon.
- You can send infer requests during the startup phase; they will simply buffer on the controller, and time out after 10 seconds. They don't time out immediately, to prevent spamming.
- After the scheduler begins, any LoadModelFromDisk commands will be rejected.
This MR includes a few bugfixes and updates here and there but shouldn't break anything.