Skip to content
Snippets Groups Projects

More efficient `encode`/`decode` for `string`/`ascii`

Merged Robbert Krebbers requested to merge robbert/string_ascii_countable into master
All threads resolved!

When experimenting with the benchmarks in https://github.com/xavierleroy/canonical-binary-tries I noticed that our encode function for Countable of string is slow: it creates a list for each char which is then converted to a positive and appended to the result.

https://github.com/xavierleroy/canonical-binary-tries/blob/v2/lib/String2pos.v contains a better version, which I have adapted to std++ style. (That repository is BSD licensed so we can use it, our comment contains a link to give credit, of course.).

I also changed the coding of ascii by reusing the one for string; this one is also faster as the benchmark below shows.

The decode function still converts to list, but I don't know how to avoid that without writing 2^8 = 256 cases.

From stdpp Require Import strings countable.
Import Ascii.

Definition long := "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec euismod aliquam dui, non ultrices nisl rhoncus pretium. Donec tristique at diam eget commodo. Ut elit diam, maximus ac metus at, commodo malesuada diam. Morbi vehicula tortor et mauris malesuada, ac congue mauris ultrices. Donec id faucibus tellus. Vestibulum vulputate molestie urna, in sagittis tortor. Pellentesque placerat orci ut velit mattis consectetur. Mauris non nulla sit amet nisi egestas pellentesque. Aenean tincidunt id leo et placerat. Nam fermentum lectus sit amet pretium egestas. Lorem ipsum dolor sit amet, consectetur adipiscing elit.".

Time Eval vm_compute in Pos.iter (λ s, default "" $ decode (encode s)) long 10000.
(* old: 4.759s *)
(* new encode: 1.876s *)
(* new encode/decode: 0.916s *)

Time Eval vm_compute in (Pos.iter (λ s, default "b" $ decode (encode s)) "a" 1000000)%char.
(* old: 2.636s *)
(* new encode: 0.493s *)
(* newer encode/decode: 0.308s *)
Edited by Robbert Krebbers

Merge request reports

Merge request pipeline #88552 passed

Merge request pipeline passed for e6e864c2

Approval is optional

Merged by Robbert KrebbersRobbert Krebbers 1 year ago (Sep 5, 2023 12:21pm UTC)

Merge details

  • Changes merged into master with 1ce70965.
  • Deleted the source branch.
  • Auto-merge enabled

Pipeline #88553 passed

Pipeline passed for 1ce70965 on master

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • added 1 commit

    • d08d2b81 - Better `decode` for `string` that is generated.

    Compare with previous version

  • The decode function still converts to list, but I don't know how to avoid that without writing 2^8 = 256 cases.

    I auto generate this definition now using some Ltac, which indeed improves performance (I updated the MR description).

    The generated code is what you expect:

    string_of_pos =
    fix string_of_pos (p : positive) : string :=
      match p with
      | (p7~1~1~1~1~1~1~1~1)%positive => String "255" (string_of_pos p7)
      | (p7~0~1~1~1~1~1~1~1)%positive => String "127" (string_of_pos p7)
      | (p7~1~0~1~1~1~1~1~1)%positive => String "191" (string_of_pos p7)
      | (p7~0~0~1~1~1~1~1~1)%positive => String "?" (string_of_pos p7)
      | (p7~1~1~0~1~1~1~1~1)%positive => String "223" (string_of_pos p7)
      | (p7~0~1~0~1~1~1~1~1)%positive => String "_" (string_of_pos p7)
      | (p7~1~0~0~1~1~1~1~1)%positive => String "159" (string_of_pos p7)
      | (p7~0~0~0~1~1~1~1~1)%positive => String "031" (string_of_pos p7)
      | (p7~1~1~1~0~1~1~1~1)%positive => String "239" (string_of_pos p7)
      ...
  • Robbert Krebbers changed the description

    changed the description

  • added 1 commit

    Compare with previous version

  • added 1 commit

    Compare with previous version

  • Robbert Krebbers changed title from Better coding for string/ascii to {+More efficient encode/decode+} for {++}string{+/+}ascii{++}

    changed title from Better coding for string/ascii to {+More efficient encode/decode+} for {++}string{+/+}ascii{++}

    • Resolved by Robbert Krebbers
        | (p7~1~0~1~1~1~1~1~1)%positive => String "191" (string_of_pos p7)
        | (p7~0~0~1~1~1~1~1~1)%positive => String "?" (string_of_pos p7)

      That doesn't look like what I would expect, why are some of these numbers and some are characters?

  • Ralf Jung
  • Ralf Jung
  • added 1 commit

    Compare with previous version

  • added 1 commit

    • ffe5989a - Comment, rename, make Local.

    Compare with previous version

  • Ralf Jung
  • LGTM with a few more comments.

  • Robbert Krebbers resolved all threads

    resolved all threads

  • added 1 commit

    Compare with previous version

  • added 1 commit

    Compare with previous version

  • Merging. Thanks for feedback.

  • Robbert Krebbers enabled an automatic merge when the pipeline for e6e864c2 succeeds

    enabled an automatic merge when the pipeline for e6e864c2 succeeds

  • Robbert Krebbers mentioned in commit 1ce70965

    mentioned in commit 1ce70965

  • Please register or sign in to reply
    Loading