Optimize iIntoEmpValid
This PR is the result of me trying to understand why iPoseProof
was very slow (taking up to 20s) in some instances in our Coq development (https://github.com/logsem/cerise). This happens in particular when: 1) the (Coq) context is large, and 2) the lemma passed to iPoseProof
starts with many forall quantifiers (up to ~35).
It seems then that the source of the slowness is iIntoEmpValid
, which is implemented as first [ notypeclasses refine ... | notypeclasses refine | ... ]
. The forall case is the second one, and it seems that with large contexts, the first notypeclasses refine
(which fails) is responsible for the majority of the time spent in the tactic. (On the example taking 20s, manually doing do 35 notypeclasses refine (into_emp_valid_forall _ _ _ _); iIntoEmpValid
only takes 1s.)
The optimization that I propose here is to add a "fast path" to iIntoEmpValid
for the (_ -> _)
and (forall _, _)
cases, where the application of the corresponding lemma is gated by an Ltac pattern matching. This avoids relying on unification to select the lemma to apply. The fix should be fully backwards compatible: if no "fast path" patterns match, we just fall back to the current implementation.
With this patch, the cumulated time for a full build of our cerise project (on a fast computer) goes from 75 min
to 58 min
, which I think is a good speedup.
Merge request reports
Activity
- Resolved by Armaël Guéneau
- Resolved by Armaël Guéneau
- Resolved by Robbert Krebbers
This is an interesting catch! I wonder how much of a difference this makes on other Iris developments. @jung Could you run CI on reverse dependencies?
An alternative to avoid slow unification would be to seal the definition
IntoEmpValid
. If you have time for that, I would be curious to find out if that also fixes the problem. The advantage is that we don't have the duplicate code for your "fast paths".
- Resolved by Armaël Guéneau
- Resolved by Armaël Guéneau
- Resolved by Armaël Guéneau
- Resolved by Ralf Jung
I think this MR is ready.
You can add a CHANGELOG if you want an attribution (but since it does not make any functional changes, a CHANGELOG is not strictly necessary I would say).
@jung Can we run timing on the reverse dependencies so we can see if we obtain performance gain on other projects too?
- Resolved by Ralf Jung
Thanks for preserving the result of !338 (merged); I do wonder whether this optimization could be moved from Ltac to typeclass hints (via either instances or
Hint Extern
)… guess because you want to generate goals and not evars for non-dependent premises and you need to do it by hand (e.g. becauseshelve_unifiable
does not quite match what you want).EDIT: no action is needed — you might want to add a comment on why this doesn’t work, or not bother. The question might be too half-baked to answer.
Edited by Paolo G. Giarrusso
- Resolved by Armaël Guéneau
- Resolved by Ralf Jung