Skip to content
Snippets Groups Projects

Optimize iIntoEmpValid

Merged Armaël Guéneau requested to merge Armael/iris:optimize_iIntoEmpValid into master

This PR is the result of me trying to understand why iPoseProof was very slow (taking up to 20s) in some instances in our Coq development (https://github.com/logsem/cerise). This happens in particular when: 1) the (Coq) context is large, and 2) the lemma passed to iPoseProof starts with many forall quantifiers (up to ~35).

It seems then that the source of the slowness is iIntoEmpValid, which is implemented as first [ notypeclasses refine ... | notypeclasses refine | ... ]. The forall case is the second one, and it seems that with large contexts, the first notypeclasses refine (which fails) is responsible for the majority of the time spent in the tactic. (On the example taking 20s, manually doing do 35 notypeclasses refine (into_emp_valid_forall _ _ _ _); iIntoEmpValid only takes 1s.)

The optimization that I propose here is to add a "fast path" to iIntoEmpValid for the (_ -> _) and (forall _, _) cases, where the application of the corresponding lemma is gated by an Ltac pattern matching. This avoids relying on unification to select the lemma to apply. The fix should be fully backwards compatible: if no "fast path" patterns match, we just fall back to the current implementation.

With this patch, the cumulated time for a full build of our cerise project (on a fast computer) goes from 75 min to 58 min, which I think is a good speedup.

Merge request reports

Merge request pipeline #53020 passed

Merge request pipeline passed for 8805f473

Merged by Ralf JungRalf Jung 3 years ago (Sep 6, 2021 6:39pm UTC)

Loading

Pipeline #53456 passed

Pipeline passed for 95df9887 on master

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • added 1 commit

    Compare with previous version

  • Armaël Guéneau resolved all threads

    resolved all threads

  • added 1 commit

    Compare with previous version

    • Resolved by Robbert Krebbers

      This is an interesting catch! I wonder how much of a difference this makes on other Iris developments. @jung Could you run CI on reverse dependencies?

      An alternative to avoid slow unification would be to seal the definition IntoEmpValid. If you have time for that, I would be curious to find out if that also fixes the problem. The advantage is that we don't have the duplicate code for your "fast paths".

  • Robbert Krebbers
  • Robbert Krebbers
  • added 1 commit

    Compare with previous version

  • Robbert Krebbers resolved all threads

    resolved all threads

  • added 1 commit

    Compare with previous version

  • Armaël Guéneau resolved all threads

    resolved all threads

    • Resolved by Ralf Jung

      I think this MR is ready.

      You can add a CHANGELOG if you want an attribution (but since it does not make any functional changes, a CHANGELOG is not strictly necessary I would say).

      @jung Can we run timing on the reverse dependencies so we can see if we obtain performance gain on other projects too?

    • Resolved by Ralf Jung

      Thanks for preserving the result of !338 (merged); I do wonder whether this optimization could be moved from Ltac to typeclass hints (via either instances or Hint Extern)… guess because you want to generate goals and not evars for non-dependent premises and you need to do it by hand (e.g. because shelve_unifiable does not quite match what you want).

      EDIT: no action is needed — you might want to add a comment on why this doesn’t work, or not bother. The question might be too half-baked to answer.

      Edited by Paolo G. Giarrusso
  • Ralf Jung
  • added 1 commit

    Compare with previous version

  • Author Contributor

    Thanks for the benchmarks! I'm currently rebuilding our development with/without the current state of the PR just in case, but otherwise the PR should be good to go. I've fixed the comment wrt tforall and added a changelog entry.

  • Author Contributor

    I can confirm that the results are still the same on our repo: 75min (before the PR) -> 58min (after the PR).

  • Ralf Jung
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading