Benchmark vectorized reverse and reverse_copy, use traits, optimize reverse_copy tail
#5493
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
⚙️ Changes
reverse_copy, but not forreverse. Possibly due to overlapping masked stores -- they do overlap, although overlapped part is masked out in at least one of them. (Commit history preserved this attempt).⏱️ Benchmark results
r<std::uint8_t>/3449r<std::uint8_t>/63r<std::uint8_t>/31r<std::uint8_t>/15r<std::uint8_t>/7r<std::uint16_t>/3449r<std::uint16_t>/63r<std::uint16_t>/31r<std::uint16_t>/15r<std::uint16_t>/7r<std::uint32_t>/3449r<std::uint32_t>/63r<std::uint32_t>/31r<std::uint32_t>/15r<std::uint32_t>/7r<std::uint64_t>/3449r<std::uint64_t>/63r<std::uint64_t>/31r<std::uint64_t>/15r<std::uint64_t>/7rc<std::uint8_t>/3449rc<std::uint8_t>/63rc<std::uint8_t>/31rc<std::uint8_t>/15rc<std::uint8_t>/7rc<std::uint16_t>/3449rc<std::uint16_t>/63rc<std::uint16_t>/31rc<std::uint16_t>/15rc<std::uint16_t>/7rc<std::uint32_t>/3449rc<std::uint32_t>/63rc<std::uint32_t>/31rc<std::uint32_t>/15rc<std::uint32_t>/7rc<std::uint64_t>/3449rc<std::uint64_t>/63rc<std::uint64_t>/31rc<std::uint64_t>/15rc<std::uint64_t>/7