KEMBAR78
Use `SearchValues<string>` for prefix searching in RegexCompiler / source generator by stephentoub · Pull Request #96402 · dotnet/runtime · GitHub
Skip to content

Conversation

@stephentoub
Copy link
Member

@stephentoub stephentoub commented Jan 2, 2024

We currently use IndexOf(literal), but every call to that incurs a little overhead to determine how best to do the search. Now that we have SearchValues<string>, even though it's bread-and-butter is searching for multiple substrings, we can use it to search for a single substring, in which case it's effectively the same as IndexOf(literal) but caching the result of that examination in order to only do it once rather than on every call.

This also introduces some of the infrastructure necessary to subsequently enable multi-substring search.

Contributes to #85693

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Text.RegularExpressions;

BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);

[HideColumns("Error", "StdDev", "Median", "RatioSD")]
[MemoryDiagnoser(false)]
public partial class Tests
{
    private Regex _regex;
    private string _haystack;

    [Params(true, false)]
    public bool IgnoreCase { get; set; }

    [Params("hello", "hithere")]
    public string Haystack { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        _regex = new Regex(@"hello\d", RegexOptions.Compiled | (IgnoreCase ? RegexOptions.IgnoreCase : RegexOptions.None));
        _haystack = string.Concat(Enumerable.Repeat(Haystack, 1000));
    }

    [Benchmark]
    public int Count() => _regex.Count(_haystack);
}
Method Toolchain IgnoreCase Haystack Mean Ratio
Count \main\corerun.exe False hello 13,957.2 ns 1.00
Count \pr\corerun.exe False hello 12,158.0 ns 0.87
Count \main\corerun.exe False hithere 556.9 ns 1.00
Count \pr\corerun.exe False hithere 370.6 ns 0.67
Count \main\corerun.exe True hello 15,978.6 ns 1.00
Count \pr\corerun.exe True hello 12,183.9 ns 0.76
Count \main\corerun.exe True hithere 485.9 ns 1.00
Count \pr\corerun.exe True hithere 499.3 ns 1.03

…urce generator

We currently use IndexOf(literal), but every call to that incurs a little overhead to determine how best to do the search. Now that we have `SearchValues<string>`, even though it's bread-and-butter is searching for multiple substrings, we can use it to search for a single substring, in which case it's effectively the same as IndexOf(literal) but caching the result of that examination in order to only do it once rather than on every call.

This also introduces some of the infrastructure necessary to subsequently enable multi-substring search.
@ghost
Copy link

ghost commented Jan 2, 2024

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

We currently use IndexOf(literal), but every call to that incurs a little overhead to determine how best to do the search. Now that we have SearchValues<string>, even though it's bread-and-butter is searching for multiple substrings, we can use it to search for a single substring, in which case it's effectively the same as IndexOf(literal) but caching the result of that examination in order to only do it once rather than on every call.

This also introduces some of the infrastructure necessary to subsequently enable multi-substring search.

Contributes to #85693

Author: stephentoub
Assignees: stephentoub
Labels:

area-System.Text.RegularExpressions

Milestone: -

Copy link
Member

@MihaZupan MihaZupan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet

This may regress some cases with longer literals until we resolve #96142

@stephentoub stephentoub merged commit ae051b7 into dotnet:main Jan 2, 2024
@stephentoub stephentoub deleted the usesearchvaluesinregexforsinglestring branch January 2, 2024 18:55
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants