KEMBAR78
Improve schema gen for nested dataclasses by sydney-runkle · Pull Request #9114 · pydantic/pydantic · GitHub
Skip to content

Conversation

@sydney-runkle
Copy link
Contributor

@sydney-runkle sydney-runkle commented Mar 26, 2024

When we have the schema available for a dataclass (or other type with __pydantic_core_schema) don't rebuild! Greatly improves schema generation time + greatly minimizes the number of recursive schema building calls.

import timeit
from typing import Literal, Annotated
from pydantic import Discriminator, TypeAdapter, BaseModel
from pydantic.dataclasses import dataclass


@dataclass(frozen=True, kw_only=True)
class Cat:
    type: Literal["cat"] = "cat"


@dataclass(frozen=True, kw_only=True)
class Dog:
    type: Literal["dog"] = "dog"


@dataclass(frozen=True, kw_only=True)
class NestedDataClass:
    animal: Annotated[Cat | Dog, Discriminator("type")]


class NestedModel(BaseModel):
    animal: Annotated[Cat | Dog, Discriminator("type")]


def construct_schema():
    @dataclass(frozen=True, kw_only=True)
    class Root:
        data_class: NestedDataClass
        model: NestedModel

        animal: Annotated[Cat | Dog, Discriminator("type")]

# not dividing by 1000 bc we want to view time in ms
print(f"Average schema build time (ish): {timeit.timeit(construct_schema, number=1000)} ms")
# Main: Average schema build time (ish): 1.632956000015838 ms
# This branch: Average schema build time (ish): 1.1692142079991754 ms

Selected Reviewer: @hramezani

@sydney-runkle sydney-runkle added relnotes-performance Used for performance improvements. relnotes-fix Used for bugfixes. labels Mar 26, 2024
@codspeed-hq
Copy link

codspeed-hq bot commented Mar 26, 2024

CodSpeed Performance Report

Merging #9114 will improve performances by 53.81%

Comparing make_schema_gen_better (ee3ebfb) with main (548feec)

Summary

⚡ 1 improvements
✅ 12 untouched benchmarks

Benchmarks breakdown

Benchmark main make_schema_gen_better Change
test_construct_schema 36.8 ms 23.9 ms +53.81%

@sydney-runkle sydney-runkle removed the relnotes-fix Used for bugfixes. label Mar 26, 2024
@sydney-runkle
Copy link
Contributor Author

Please review

@adriangb
Copy link
Member

This is very nice!

Do we have any benchmarks? That'd be a nice to have so we can track this. Assuming you wrote some to come up with this change, can we commit them along with other benchmarks (if they don't already exist)?

There's several scenarios where I imagine we do need to rebuild:

# unsubstituted typevars
class Inner(BaseModel, Generic[T]):
    x: T

class Outer(BaseModel):
    inner: Inner[int]  # need to rebuild innner

# cyclical forward references
class InnerFwd(BaseModel):
    outer: OuterFwd

class OuterFwd(BaseModel):
    inner: InnerFwd

And probably others I'm not thinking of. Do we have tests showing that these use cases work correctly?

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Mar 27, 2024

Deploying pydantic-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: ee3ebfb
Status: ✅  Deploy successful!
Preview URL: https://bdfaf89c.pydantic-docs2.pages.dev
Branch Preview URL: https://make-schema-gen-better.pydantic-docs2.pages.dev

View logs

@sydney-runkle
Copy link
Contributor Author

@adriangb,

There's several scenarios where I imagine we do need to rebuild

Good point - I've added tests for both of these cases both to test_main.py and test_dataclasses.py. Interestingly, the generic handling is different for dataclasses vs models (re concrete subclasses), but I think that's something we can address separately (and loop in @dmontagu).

Do we have any benchmarks? That'd be a nice to have so we can track this.

I've added a benchmark for the basic case with a nested model and dataclass. I think this should work, though I might have to add this in a separate PR so that we get a comparison in main before we merge this...

@sydney-runkle
Copy link
Contributor Author

Ah, added this to a separate PR: #9121

@adriangb adriangb enabled auto-merge (squash) March 27, 2024 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready for review relnotes-performance Used for performance improvements.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants