More efficient `encode`/`decode` for `string`/`ascii`
When experimenting with the benchmarks in https://github.com/xavierleroy/canonical-binary-tries I noticed that our encode
function for Countable
of string
is slow: it creates a list
for each char which is then converted to a positive
and appended to the result.
https://github.com/xavierleroy/canonical-binary-tries/blob/v2/lib/String2pos.v contains a better version, which I have adapted to std++ style. (That repository is BSD licensed so we can use it, our comment contains a link to give credit, of course.).
I also changed the coding of ascii
by reusing the one for string
; this one is also faster as the benchmark below shows.
The decode
function still converts to list
, but I don't know how to avoid that without writing 2^8 = 256 cases.
From stdpp Require Import strings countable.
Import Ascii.
Definition long := "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec euismod aliquam dui, non ultrices nisl rhoncus pretium. Donec tristique at diam eget commodo. Ut elit diam, maximus ac metus at, commodo malesuada diam. Morbi vehicula tortor et mauris malesuada, ac congue mauris ultrices. Donec id faucibus tellus. Vestibulum vulputate molestie urna, in sagittis tortor. Pellentesque placerat orci ut velit mattis consectetur. Mauris non nulla sit amet nisi egestas pellentesque. Aenean tincidunt id leo et placerat. Nam fermentum lectus sit amet pretium egestas. Lorem ipsum dolor sit amet, consectetur adipiscing elit.".
Time Eval vm_compute in Pos.iter (λ s, default "" $ decode (encode s)) long 10000.
(* old: 4.759s *)
(* new encode: 1.876s *)
(* new encode/decode: 0.916s *)
Time Eval vm_compute in (Pos.iter (λ s, default "b" $ decode (encode s)) "a" 1000000)%char.
(* old: 2.636s *)
(* new encode: 0.493s *)
(* newer encode/decode: 0.308s *)
Merge request reports
Activity
added 1 commit
- d08d2b81 - Better `decode` for `string` that is generated.
The
decode
function still converts tolist
, but I don't know how to avoid that without writing 2^8 = 256 cases.I auto generate this definition now using some Ltac, which indeed improves performance (I updated the MR description).
The generated code is what you expect:
string_of_pos = fix string_of_pos (p : positive) : string := match p with | (p7~1~1~1~1~1~1~1~1)%positive => String "255" (string_of_pos p7) | (p7~0~1~1~1~1~1~1~1)%positive => String "127" (string_of_pos p7) | (p7~1~0~1~1~1~1~1~1)%positive => String "191" (string_of_pos p7) | (p7~0~0~1~1~1~1~1~1)%positive => String "?" (string_of_pos p7) | (p7~1~1~0~1~1~1~1~1)%positive => String "223" (string_of_pos p7) | (p7~0~1~0~1~1~1~1~1)%positive => String "_" (string_of_pos p7) | (p7~1~0~0~1~1~1~1~1)%positive => String "159" (string_of_pos p7) | (p7~0~0~0~1~1~1~1~1)%positive => String "031" (string_of_pos p7) | (p7~1~1~1~0~1~1~1~1)%positive => String "239" (string_of_pos p7) ...
- Resolved by Robbert Krebbers
| (p7~1~0~1~1~1~1~1~1)%positive => String "191" (string_of_pos p7) | (p7~0~0~1~1~1~1~1~1)%positive => String "?" (string_of_pos p7)
That doesn't look like what I would expect, why are some of these numbers and some are characters?
- Resolved by Ralf Jung
- Resolved by Ralf Jung
- Resolved by Robbert Krebbers
enabled an automatic merge when the pipeline for e6e864c2 succeeds
mentioned in commit 1ce70965