[easy] Remove redundant `loadKernel` operations

Repro:
```
TORCH_COMPILE_DEBUG=1 CUDA_VISIBLE_DEVICES=3 benchmarks/dynamo/torchbench.py --inductor --cpp-wrapper --bfloat16 --accuracy --inference --device cuda   --only BERT_pytorch
```

In the generated output_code.py, you can see the following code is generated multiple times. Because we don't have control flow in our graph, we only really need to generate it once.
```
        if (triton_poi_fused_gelu_7 == nullptr) {
             triton_poi_fused_gelu_7 = loadKernel("/tmp/torchinductor_binbao/jo/cjokzfaqrvkztfhvz2yqzxlmpzub2sdi57zctqbad4js7crxsm5n.cubin", "triton_poi_fused_gelu_7_0d1d");
        }
 ```

cc @ezyang @msaroufim @wconstab @bdhirsh @anijain2305

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[easy] Remove redundant `loadKernel` operations #105553

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[easy] Remove redundant loadKernel operations #105553

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[easy] Remove redundant `loadKernel` operations #105553