KEMBAR78
Improve `search` and `find_end` vectorization by AlexGuteniev · Pull Request #5519 · microsoft/STL · GitHub
Skip to content

Conversation

@AlexGuteniev
Copy link
Contributor

@AlexGuteniev AlexGuteniev commented May 17, 2025

Two orthogonal but related changes:

Nothing more to tell about the 8-bit AVX2 search path removal. Some details on SSE4,2 path for 32-bit and 64-bit elements:

  • Traits are separate for AVX2 and SSE4.2, like algorithms in _Sorting or bitset algorithms
  • Unlike _Sorting, BSF and BSR are traits members. In _Sorting, these functions were called infrequently, so they were not worth differentiating for AVX2/SSE4.2. Here we use them in tightest loops, so the performance of better ones may make a difference. Beware that CodeQL comments are harder to verify here
  • The tricky part is making sure no beyond SSE4.2 instructions are used in SSE4.2 path, but different types help a lot, and tzcnt/lzcnt were verified manually.
  • _Size_bytes_1 is not passed to extracted functions deliberately. In case of non-inline functions it looks more optimal to compute again than to pass 5th parameter and start using stack on x64. in case of inline functions the compiler will combine the same way computed variables if needed.
  • Originally, I've planned to remove find_end cmpestrm path, and always use cmpeq in find_end. The Vectorize search for 32-bit and 64-bit elements, also improve 8-bit and 16-bit vectorization #5484 (comment) results convinced me not to do this, and keep all existing SSE4.2 paths for 8-bit and 16-bit elements

@AlexGuteniev AlexGuteniev requested a review from a team as a code owner May 17, 2025 20:28
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews May 17, 2025
@StephanTLavavej StephanTLavavej added the performance Must go faster label May 18, 2025
@StephanTLavavej StephanTLavavej self-assigned this May 18, 2025
@StephanTLavavej StephanTLavavej removed their assignment May 21, 2025
@StephanTLavavej StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews May 21, 2025
@StephanTLavavej StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews May 22, 2025
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit 38c0237 into microsoft:main May 22, 2025
40 checks passed
@github-project-automation github-project-automation bot moved this from Merging to Done in STL Code Reviews May 22, 2025
@StephanTLavavej
Copy link
Member

Population, 9 billion. All SIMD instructions. 🤖 🛸 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Must go faster

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

2 participants