KEMBAR78
Avoid runtime "atomic write" check in ConcurrentDictionary for shared generics by jakobbotsch · Pull Request #77005 · dotnet/runtime · GitHub
Skip to content

Conversation

@jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Oct 13, 2022

ConcurrentDictionary caches an s_isValueWriteAtomic in a static readonly, but this trick does not work for shared generics. Instead add an eager check for the reference types and extract the static readonly field so that its value can be baked in even if the key is System.__Canon.

        [Benchmark]
        public int BenchIntString()
        {
            ConcurrentDictionary<int, string> cd = new();
            for (int i = 0; i < 1000000; i++)
                cd[0] = "";

            return cd.Count;
        }

        [Benchmark]
        public int BenchStringInt()
        {
            ConcurrentDictionary<string, int> cd = new();
            for (int i = 0; i < 1000000; i++)
                cd[""] = i;

            return cd.Count;
        }

        [Benchmark]
        public int BenchStringString()
        {
            ConcurrentDictionary<string, string> cd = new();
            for (int i = 0; i < 1000000; i++)
                cd[""] = "";

            return cd.Count;
        }

        [Benchmark]
        public int BenchStringDayOfWeek()
        {
            ConcurrentDictionary<string, DayOfWeek> cd = new();
            for (int i = 0; i < 1000000; i++)
                cd[""] = DayOfWeek.Friday;

            return cd.Count;
        }

        [Benchmark]
        public int BenchStringGuid()
        {
            ConcurrentDictionary<string, Guid> cd = new();
            Guid g = Guid.NewGuid();
            for (int i = 0; i < 1000000; i++)
                cd[""] = g;

            return cd.Count;
        }

        [Benchmark]
        public int BenchIntInt()
        {
            ConcurrentDictionary<int, int> cd = new();
            for (int i = 0; i < 1000000; i++)
                cd[0] = i;

            return cd.Count;
        }

        [Benchmark]
        public int BenchGuidInt()
        {
            ConcurrentDictionary<Guid, int> cd = new();
            Guid g = Guid.NewGuid();
            for (int i = 0; i < 1000000; i++)
                cd[g] = i;

            return cd.Count;
        }
Method Job Toolchain Mean Error StdDev Median Ratio RatioSD
BenchIntString Job-WELKHT \main\corerun.exe 20.97 ms 0.325 ms 0.304 ms 20.96 ms 1.00 0.00
BenchIntString Job-XNZJCP \pr\corerun.exe 18.96 ms 0.372 ms 0.365 ms 18.96 ms 0.90 0.03
BenchStringInt Job-WELKHT \main\corerun.exe 23.22 ms 0.051 ms 0.061 ms 23.22 ms 1.00 0.00
BenchStringInt Job-XNZJCP \pr\corerun.exe 20.06 ms 0.159 ms 0.124 ms 20.10 ms 0.86 0.01
BenchStringString Job-WELKHT \main\corerun.exe 24.90 ms 0.465 ms 0.652 ms 24.91 ms 1.00 0.00
BenchStringString Job-XNZJCP \pr\corerun.exe 22.80 ms 0.454 ms 1.196 ms 22.79 ms 0.91 0.05
BenchStringDayOfWeek Job-WELKHT \main\corerun.exe 23.25 ms 0.098 ms 0.092 ms 23.23 ms 1.00 0.00
BenchStringDayOfWeek Job-XNZJCP \pr\corerun.exe 20.17 ms 0.289 ms 0.256 ms 20.08 ms 0.87 0.01
BenchStringGuid Job-WELKHT \main\corerun.exe 30.43 ms 0.242 ms 0.227 ms 30.42 ms 1.00 0.00
BenchStringGuid Job-XNZJCP \pr\corerun.exe 27.28 ms 0.248 ms 0.232 ms 27.26 ms 0.90 0.01
BenchIntInt Job-WELKHT \main\corerun.exe 16.37 ms 0.050 ms 0.044 ms 16.37 ms 1.00 0.00
BenchIntInt Job-XNZJCP \pr\corerun.exe 16.29 ms 0.029 ms 0.024 ms 16.29 ms 1.00 0.00
BenchGuidInt Job-WELKHT \main\corerun.exe 20.17 ms 0.395 ms 0.439 ms 19.87 ms 1.00 0.00
BenchGuidInt Job-XNZJCP \pr\corerun.exe 20.35 ms 0.362 ms 0.321 ms 20.53 ms 1.02 0.03

(The BenchGuidInt one swings, I can get between 0.95-1.05 ratio by running them more times).

…ictionary

ConcurrentDictionary caches an s_isValueWriteAtomic in a static
readonly, but this trick does not work for shared generics. Instead add
an eager check for the intrinsified cases and fall back to the old check
otherwise.
@ghost ghost assigned jakobbotsch Oct 13, 2022
@ghost
Copy link

ghost commented Oct 13, 2022

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Issue Details

ConcurrentDictionary caches an s_isValueWriteAtomic in a static readonly, but this trick does not work for shared generics. Instead add an eager check for the intrinsified cases and fall back to the old check otherwise.

[Benchmark]
public int BenchIntString()
{
    ConcurrentDictionary<int, string> cd = new();
    for (int i = 0; i < 1000000; i++)
        cd[0] = "";

    return cd.Count;
}

[Benchmark]
public int BenchStringInt()
{
    ConcurrentDictionary<string, int> cd = new();
    for (int i = 0; i < 1000000; i++)
        cd[""] = i;

    return cd.Count;
}

[Benchmark]
public int BenchStringString()
{
    ConcurrentDictionary<string, string> cd = new();
    for (int i = 0; i < 1000000; i++)
        cd[""] = "";

    return cd.Count;
}

[Benchmark]
public int BenchIntInt()
{
    ConcurrentDictionary<int, int> cd = new();
    for (int i = 0; i < 1000000; i++)
        cd[0] = i;

    return cd.Count;
}

[Benchmark]
public int BenchGuidInt()
{
    ConcurrentDictionary<Guid, int> cd = new();
    Guid g = Guid.NewGuid();
    for (int i = 0; i < 1000000; i++)
        cd[g] = i;

    return cd.Count;
}
Method Job Toolchain Mean Error StdDev Median Ratio RatioSD
BenchIntString Job-MGGPJI \main\corerun.exe 20.68 ms 0.410 ms 0.384 ms 20.50 ms 1.00 0.00
BenchIntString Job-AEWXDI \pr\corerun.exe 17.57 ms 0.083 ms 0.078 ms 17.55 ms 0.85 0.01
BenchStringInt Job-MGGPJI \main\corerun.exe 23.34 ms 0.096 ms 0.090 ms 23.34 ms 1.00 0.00
BenchStringInt Job-AEWXDI \pr\corerun.exe 19.37 ms 0.029 ms 0.026 ms 19.37 ms 0.83 0.00
BenchStringString Job-MGGPJI \main\corerun.exe 24.44 ms 0.468 ms 0.460 ms 24.37 ms 1.00 0.00
BenchStringString Job-AEWXDI \pr\corerun.exe 22.07 ms 0.441 ms 0.850 ms 22.06 ms 0.91 0.03
BenchIntInt Job-MGGPJI \main\corerun.exe 16.51 ms 0.050 ms 0.047 ms 16.52 ms 1.00 0.00
BenchIntInt Job-AEWXDI \pr\corerun.exe 16.43 ms 0.030 ms 0.028 ms 16.44 ms 1.00 0.00
BenchGuidInt Job-MGGPJI \main\corerun.exe 20.13 ms 0.386 ms 0.396 ms 19.84 ms 1.00 0.00
BenchGuidInt Job-AEWXDI \pr\corerun.exe 20.05 ms 0.398 ms 0.391 ms 19.82 ms 1.00 0.00
Author: jakobbotsch
Assignees: -
Labels:

area-System.Collections

Milestone: -

return true;
}

return s_isValueUnderlyingTypeWriteAtomic;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, why is this fallback field still needed? Couldn't it have simply been false?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's needed to avoid regressing enums -- they also return true via this check.

I can get a few more cases here by extracting the field into a separate generic class I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a comment explaining why that fallback is necessary would help.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've essentially kept the existing code now except extracted it into a new generic class only keyed on the value type. Then I have inlined the shared generic case since there are some limitations around inlining with different shared generics.

Can you take another look? Do you want me to add comment that the check also works for enums?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I'm guessing you've reverted to type codes for some of the primitives because of these limitations?

Could you update your benchmark results just in case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The typeof checks were unnecessary once I factored the field out into a new static class on only the value type. We can rely on JIT baking in the value of the static readonly field in tier-1 code as long as TValue is a value type, so we only need the reference type check up front after that.

Updated the benchmarks.

@stephentoub
Copy link
Member

Why would the BenchGuidInt get faster here? That it did suggests there's a fair amount of noise, yes?

@jakobbotsch
Copy link
Member Author

Why would the BenchGuidInt get faster here? That it did suggests there's a fair amount of noise, yes?

Yep, there seems to be some variance, I commented on it above -- it seems to swing between 0.95 and 1.05 ratio. BDN also complains about modality so there may be some data alignment effects going on.

@jkotas
Copy link
Member

jkotas commented Oct 14, 2022

Are there cases that this is going to regress? For example, ConcurrentDictionary<string, ValueTuple<string>> ?

}
}

internal static class ConcurrentDictionaryTypeProps<T>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you are paying several hundred bytes in type system data structures per ConcurrentDictionary instantiation for this optimization. I do not think it is a problem. ConcurrentDictionary instantiations are pretty expensive. Just pointing it out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my initial attempt I did not have the extra class: 6fde9d8
Of course that does not get as many cases, but I guess there is trade-off as you say.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course that does not get as many cases

Can you elaborate? Other than extra fields for the same TValue answer being stored for different TKeys, and for reference types having a field, what "cases" suffer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConcurrentDictionary<shared_for_generics, not_shared_for_generics>, like ConcurrentDictionary<object, int>.

When the static field is part of the ConcurrentDictionary, these cases require generic lookup. The JIT is not able to see that the value depends on not_shared_for_generics only.
When the static field is in separate type, these cases do not require generic lookup and the tiered JIT can treat the readonly field as constant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks.

@jakobbotsch
Copy link
Member Author

Are there cases that this is going to regress? For example, ConcurrentDictionary<string, ValueTuple<string>> ?

Yes:

Method Job Toolchain Mean Error StdDev Ratio RatioSD
BenchIntValueTupleStringString Job-FTJBOM \main\corerun.exe 27.27 ms 0.522 ms 0.512 ms 1.00 0.00
BenchIntValueTupleStringString Job-BEETVB \pr\corerun.exe 28.32 ms 0.142 ms 0.126 ms 1.04 0.02

But it turns out it is because we cannot inline get_IsValueWriteAtomic:

*************** Inline @[000110] Finishing PHASE Post-import [no changes]
Inlining [000110] failed, so bashing STMT00030 to NOP
INLINER: during 'fgInline' result 'failed this call site' reason 'ldfld needs helper' for 'System.Collections.Concurrent.ConcurrentDictionary`2[int,System.ValueTuple`2[System.__Canon,System.__Canon]]:TryAddInternal(int,System.Nullable`1[int],System.ValueTuple`2[System.__Canon,System.__Canon],bool,bool,byref):bool:this' calling 'System.Collections.Concurrent.ConcurrentDictionary`2[int,System.ValueTuple`2[System.__Canon,System.__Canon]]:get_IsValueWriteAtomic():bool'

This is actually a bit odd to me given that we can inline it fine in the other cases, but I assume there's a good reason. cc @EgorBo anyway for visibility.

If I inline it by hand performance is the same (or very close). I guess I will go back to that and add a comment about it.

@jakobbotsch
Copy link
Member Author

I guess it makes sense since in the other cases we do not need to access the field at runtime. I will inline it by hand.

@jakobbotsch jakobbotsch merged commit e8fa4a7 into dotnet:main Oct 18, 2022
@jakobbotsch jakobbotsch deleted the ConcurrentDictionary-IsValueWriteAtomic-eliminate-in-tier1 branch October 18, 2022 08:30
@ghost ghost locked as resolved and limited conversation to collaborators Nov 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants