Getting rid of CPS style Mergable classes

We currently have CPS-style typeclasses for merging hypotheses: MergablePersist and MergableConsume, which are somewhat awkward to write and prove. They were implemented this way since I initially suspected the naive approach would scale badly in larger Iris proof environments.

On second thought, I was not so sure of this anymore, since surely TCEq-hacks must be slower than native typeclass stuff. Additionally, the TCEq stuff causes bigger proof terms, which also negatively affects performance.

After trying and investigating in ci/simplify-merge, we know now the following:

CPS style indeed has higher constant time cost, which makes a difference of up to 3% in examples.
CPS style scales much better for larger environments. The main example that encounters this is rwlock_ticket_bounded. This one gets 14% slower with the naive implementation. If one inspects the merging of the auth and frag own tokens in, for example, acquire_reader_int_spec, we see that the solveStep responsible for this takes 0.7 seconds, and 0.2 seconds if all but the relevant hypothesis are cleared (keeping 1 of 5). In contrast, the CPS style implementation takes 0.3 seconds, ± 0.02 if 4 extra hypotheses are present.

The total build time decreases with 0.6% with the naive implementation. I do not consider this enough to take its bad scaling for granted. It would be nice to use the naive implementation, or maybe have notation and tactics that allow users to easily provide instances, as if the naive implementation was used.

Alternatively, one could delve in Coq's source code and investigate why typeclass resolution gets slow in this example. One of the strange things is that first [ simple apply instance_1 | ... | simple apply instance_n] is also much slower than iSolveTC. What causes this slowdown?