KEMBAR78
[easy] Remove redundant `loadKernel` operations · Issue #105553 · pytorch/pytorch · GitHub
Skip to content

[easy] Remove redundant loadKernel operations #105553

@gmagogsfm

Description

@gmagogsfm

Repro:

TORCH_COMPILE_DEBUG=1 CUDA_VISIBLE_DEVICES=3 benchmarks/dynamo/torchbench.py --inductor --cpp-wrapper --bfloat16 --accuracy --inference --device cuda   --only BERT_pytorch

In the generated output_code.py, you can see the following code is generated multiple times. Because we don't have control flow in our graph, we only really need to generate it once.

        if (triton_poi_fused_gelu_7 == nullptr) {
             triton_poi_fused_gelu_7 = loadKernel("/tmp/torchinductor_binbao/jo/cjokzfaqrvkztfhvz2yqzxlmpzub2sdi57zctqbad4js7crxsm5n.cubin", "triton_poi_fused_gelu_7_0d1d");
        }

cc @ezyang @msaroufim @wconstab @bdhirsh @anijain2305

Metadata

Metadata

Labels

hackathononcall: pt2triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions