[inductor] FX graph cache: Add support for symbolic shapes #111421

masnesral · 2023-10-17T15:41:55Z

Stack from ghstack (oldest at bottom):

-> [inductor] FX graph cache: Add support for symbolic shapes #111421

Summary: Add support for caching graphs that have tensor args with symbolic shapes. The high-level appraoch is to serialize guards with the on-disk cached object and validating those guards pass before serving a cached object.

Test Plan: New unit tests

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler

Summary: Add support for caching graphs that have tensor args with symbolic shapes. The high-level appraoch is to serialize guards with the on-disk cached object and validating those guards pass before serving a cached object. Test Plan: New unit tests [ghstack-poisoned]

pytorch-bot · 2023-10-17T15:41:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/111421

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e0135af with merge base 12a9e09 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: Add support for caching graphs that have tensor args with symbolic shapes. The high-level appraoch is to serialize guards with the on-disk cached object and validating those guards pass before serving a cached object. Test Plan: New unit tests ghstack-source-id: 4b027a0 Pull Request resolved: #111421

masnesral · 2023-10-17T15:44:09Z

test/inductor/test_codecache.py

+        self.assertEqual(fn(a, b), compiled_fn(a, b))
+        self.assertEqual(counters["inductor"]["fxgraph_cache_miss"], 1)
+        self.assertEqual(counters["inductor"]["fxgraph_cache_hit"], 1)
+


What's missing here is validation that guards were added after loading from the cache. Still investigating, but does anyone know if there's a straightforward way to access the guards in a structured way so I can do some validation here?

@ezyang do you have a recommendation? If I have a torch.compiled function, is there a straightforward way to see that our guards were properly added even in the case of a cache hit?

test/inductor/test_codecache.py

masnesral · 2023-10-17T15:47:52Z

torch/_inductor/codecache.py

-        .decode("utf-8")
-        .lower()
-    )
+    return "c" + sha256_hash(hashing_str)


The prefix character seems to be the existing scheme here to differentiate different kinds of hashes for different types of cached objects. Probably I should use an enum. TODO.

yeah, when I wrote the original PR that was my thought too

Is there any reason not to use a longer, more descriptive prefix? It's not like we have a file path length limitation or something

I was sticking with the convention already started, but can definitely turn it into a a more descriptive string. There's also an existing precedent for adding an extension, e.g., "cg" which I guessed means "code graph"?

torch/_inductor/codecache.py

masnesral · 2023-10-17T15:50:28Z

torch/_inductor/codecache.py

-        self.fx_args = fx_args
+    def __init__(
+        self,
+        gm: torch.fx.GraphModule,


Since we need to access the example_args, it was convenient to make the gm and example_args arguments explicit rather than packaging them up in a list.

torch/_inductor/codecache.py

masnesral · 2023-10-17T15:53:57Z

torch/_inductor/codecache.py

    # in an in-memory cache after loading from disk.
-    @classmethod
-    def save_graph(cls, key: str, compiled_graph: CompiledFxGraph):
+    @staticmethod


I kept this implementation under the FxGraphCache class, but all the methods are static. I prefer the namespacing of putting these methods in the class, but lemme know if a class of all static methods hurts your eyeballs.

Sometimes people prefer @classmethod and cls.method for this reason i think. up to you.

masnesral · 2023-10-17T15:55:55Z

torch/_inductor/codecache.py

+            if hit:
+                # Now re-evaluate to add the guards to the current shape env.
+                # We have to clear the `evaluate_expr` lru_cache to force evaluation.
+                shape_env.evaluate_expr.cache_clear()


Please comment. I don't know if clearing the whole cache is problematic, e.g., from a performance perspective. I could, for example, introduce a new context manager analogous to suppress_guards() above that selectively uses a cached vs. non-cached version of shape_env.evaluate_expr, but I thought it was worth asking if that's overkill.

Seems off to me that we would clear cache. leaving this to @ezyang to comment

I agree it seems kinda weird that we would need to clear the cache. If a cache hit is preventing you from adding a guard, then that should mean that that guard was already evaluated, no?

See some of the comments above. The approach here is to:

Load a possible match from the cache.

Evaluate the guards, but in a mode that does not modify the guards in the current environment, because: in the case that there's a miss, we don't actually want the current env to change.

If there's a hit, only then re-evaluate to cause the guards to be added.

The problem here is that the caching at evaluate_expr() interferes with #3

@masnesral I think a better solution would be to customize the caching on evaluate_expr so that it respects whether or not guards were suppressed or not. WDYT?

@ezyang is that correct though? If the expression has been evaluated before (even guards suppressed), don't we still want to evaluate it again in order to get the guards added?

Hmm, well I guess it didn't matter when the hints were ints but it does matter if the hints are symints. I mean, I wouldn't be opposed to just turning off this caching entirely when you're playing funny tricks with symint hints.

masnesral · 2023-10-17T15:57:01Z

torch/_inductor/codecache.py


    _boxed_call: Optional[bool] = None

+    def __init__(


I preferred a ctor here rather than extracting all these fields at the caller.

masnesral · 2023-10-17T15:57:37Z

torch/_inductor/compile_fx.py

    # Inputs to fx_codegen_and_compile
    # Anything that affects codegen should go here, so if the signature
-    # of fx_codegen_and_compile changes, the list and dict should be updated accordingly
-    graph_args = [gm, example_inputs]


See comment above. I changed this to be explicit about the two args (gm, example_inputs)

masnesral · 2023-10-17T15:59:02Z

torch/fx/experimental/symbolic_shapes.py


        self._check_translation_validate()
        return exprs



The changes here are just about splitting apart the evaluate_guards_for_args function into smaller pieces that I can use for the cache impl. Namely, I need separate phases for creating the guards expression and evaluating it (in a different context)

Seems like a good comment to put into the code comments

…symbolic shapes" Summary: Add support for caching graphs that have tensor args with symbolic shapes. The high-level appraoch is to serialize guards with the on-disk cached object and validating those guards pass before serving a cached object. Test Plan: New unit tests cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

Summary: Add support for caching graphs that have tensor args with symbolic shapes. The high-level appraoch is to serialize guards with the on-disk cached object and validating those guards pass before serving a cached object. Test Plan: New unit tests ghstack-source-id: bc2da94 Pull Request resolved: #111421

masnesral · 2023-10-17T23:08:02Z

torch/_inductor/codecache.py

-        write(pickle.dumps(disk_compiled_graph), "cg", extra=key, hash_type="cg")

-    @classmethod
-    def load_graph(cls, cg_path: str) -> CompiledFxGraph:


The diff looks weird here. load_graph is replaced by lookup_graph above and the implementation here is for save_graph. So ignore the red lines I guess.

eellison

looks good but let's wait for full review from @ezyang

eellison · 2023-10-18T20:38:48Z

torch/_inductor/codecache.py

+    """
+    See FxGraphCachePickler. Custom reducer to pickle SymInts.
+    """
+    # For hashing purposes, we only care about the name of the symbol and


It might be useful in the future to canonicalize symints so that hash(tensor([s1, s1, s2])) == hash(tensor([s3, s3, s4])).. not needed now

It's not clear to me that this is sound, given https://github.com/pytorch/pytorch/pull/111421/files#diff-c9b517f8db609ffa866804dfa2689188a4fee20abacaa0b0dca91625c1b5cb8dR705

if you say [s0, s1] == [s4, s3], you have to make sure that you know how to flip the SymInt arguments when eval'ing the guards. That sounds difficult.

torch/_inductor/codecache.py

eellison · 2023-10-18T20:45:06Z

torch/_inductor/codecache.py

    # in an in-memory cache after loading from disk.
-    @classmethod
-    def save_graph(cls, key: str, compiled_graph: CompiledFxGraph):
+    @staticmethod


Sometimes people prefer @classmethod and cls.method for this reason i think. up to you.

eellison · 2023-10-18T20:46:51Z

torch/_inductor/codecache.py

+            if hit:
+                # Now re-evaluate to add the guards to the current shape env.
+                # We have to clear the `evaluate_expr` lru_cache to force evaluation.
+                shape_env.evaluate_expr.cache_clear()


Seems off to me that we would clear cache. leaving this to @ezyang to comment

eellison · 2023-10-18T20:47:18Z

torch/_inductor/codecache.py

+        path = os.path.join(subdir, sha256_hash(content) + ".cg")
+        write_atomic(path, content)


do we need FileLock here ?

It is held from the caller

Chillee · 2023-10-19T07:52:39Z

torch/_inductor/codecache.py

+      FxGraph (the graph module, graph inputs, system settings etc.) into an
+      FxGraphCacheDetails object, pickle it, and compute a hash for the key.
+      See FxGraphCachePickler.
+    - Among the metadata we store, we also include a guards expression that's


Is it possible to end up loading a more generic graph from cache than we need?

Ohhhh, yes. That's a nice catch. So really I need to consider all versions and pick the "best" option, don't I? Hmm, is there any obvious criteria I can use to determine whether one option is more appropriate than the other? cc @eellison

I think maybe as a follow up - not needed in initial pr imo. Generally the behavior for symbolic shapes is to reuse a compilation if it works.

Suppose you have a cache for (s0, 4) and another for (5, s1). There is no unique best choice cache entry to load.

Probably the only way to actually do this is some sort of autotuning based thing, where for any given concrete size you've benchmarked which one runs fastest, and you load that particular one.

Chillee · 2023-10-19T07:54:06Z

torch/_inductor/codecache.py

+            if hit:
+                # Now re-evaluate to add the guards to the current shape env.
+                # We have to clear the `evaluate_expr` lru_cache to force evaluation.
+                shape_env.evaluate_expr.cache_clear()


I agree it seems kinda weird that we would need to clear the cache. If a cache hit is preventing you from adding a guard, then that should mean that that guard was already evaluated, no?

ani300 · 2023-10-19T15:49:10Z

test/inductor/test_codecache.py

    @parametrize("device", ("cuda", "cpu"))
    @parametrize("dtype", (torch.float32, torch.bfloat16))
-    def test_cache_load_function(self, device, dtype):
+    @parametrize("dynamic", (False, True))


Given that dynamic=None has a different behavior compared to True and False, it might be worth adding here as an option too

So it turns out there's a shortcoming with that change. None seems to be equivalent to False in terms of the FX graph that gets generated (at least for these tests). So I'd need to rearchitect the tests slightly to make sure each test gets a clean tmp directory. I can do that, but I sorta liked the current behavior of leaving the tmp directory intact for the run of the full set of tests because we'd catch cases of a cache hit when we shouldn't (see my comment on setUpClass)

ani300 · 2023-10-19T15:49:22Z

test/inductor/test_codecache.py

    @parametrize("device", ("cuda", "cpu"))
    @parametrize("dtype", (torch.float32, torch.bfloat16))
-    def test_cache_load_model(self, device, dtype):
+    @parametrize("dynamic", (False, True))


Same as previous comment

ani300 · 2023-10-19T17:32:17Z

test/inductor/test_codecache.py

+
+        # Mark all tensor arg dimensions as dynamic to cause all shapes
+        # to be symbolic
+        def rand_dynamic(*args):


this is equivalent to dynamic=True on the compile function

… symbolic shapes" Summary: Add support for caching graphs that have tensor args with symbolic shapes. The high-level appraoch is to serialize guards with the on-disk cached object and validating those guards pass before serving a cached object. Test Plan: New unit tests cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

Summary: Add support for caching graphs that have tensor args with symbolic shapes. The high-level appraoch is to serialize guards with the on-disk cached object and validating those guards pass before serving a cached object. Test Plan: New unit tests ghstack-source-id: 459114b Pull Request resolved: #111421

…hapes" Summary: Add support for caching graphs that have tensor args with symbolic shapes. The high-level appraoch is to serialize guards with the on-disk cached object and validating those guards pass before serving a cached object. Test Plan: New unit tests cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

Summary: Add support for caching graphs that have tensor args with symbolic shapes. The high-level appraoch is to serialize guards with the on-disk cached object and validating those guards pass before serving a cached object. Test Plan: New unit tests ghstack-source-id: 12a9195 Pull Request resolved: #111421

…hapes" Summary: Add support for caching graphs that have tensor args with symbolic shapes. The high-level appraoch is to serialize guards with the on-disk cached object and validating those guards pass before serving a cached object. Test Plan: New unit tests cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

Summary: Add support for caching graphs that have tensor args with symbolic shapes. The high-level appraoch is to serialize guards with the on-disk cached object and validating those guards pass before serving a cached object. Test Plan: New unit tests ghstack-source-id: 478ac7f Pull Request resolved: #111421

…hapes" Summary: Add support for caching graphs that have tensor args with symbolic shapes. The high-level appraoch is to serialize guards with the on-disk cached object and validating those guards pass before serving a cached object. Test Plan: New unit tests cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

Summary: Add support for caching graphs that have tensor args with symbolic shapes. The high-level appraoch is to serialize guards with the on-disk cached object and validating those guards pass before serving a cached object. Test Plan: New unit tests ghstack-source-id: 1d36a4e Pull Request resolved: #111421

Summary: Add support for caching graphs that have tensor args with symbolic shapes. The high-level appraoch is to serialize guards with the on-disk cached object and validating those guards pass before serving a cached object. Test Plan: New unit tests [ghstack-poisoned]

Summary: Add support for caching graphs that have tensor args with symbolic shapes. The high-level appraoch is to serialize guards with the on-disk cached object and validating those guards pass before serving a cached object. Test Plan: New unit tests ghstack-source-id: 7813788 Pull Request resolved: #111421

masnesral · 2023-10-31T17:01:58Z

@pytorchbot merge

pytorchmergebot · 2023-10-31T17:04:16Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…11421) Summary: Add support for caching graphs that have tensor args with symbolic shapes. The high-level appraoch is to serialize guards with the on-disk cached object and validating those guards pass before serving a cached object. Test Plan: New unit tests Pull Request resolved: pytorch#111421 Approved by: https://github.com/ezyang

pytorch-bot bot added the release notes: fx release notes category label Oct 17, 2023

github-actions bot added module: inductor ciflow/inductor labels Oct 17, 2023