The source project of this merge request has been removed.
WIP SIMPLE scheduler
Very simple scheduler.
For now, I am using default earliest and latest values, i.e., earliest set to 0, and latest set to UINT64_MAX. Thus, any request pruning happens only on the controller side.
Batching works, if concurrent requests are enabled.
Only one GPU per model, for now. GPU are still assigned in round-robin, not load-based, as discussed in the meeting today.