TL;DR - When creating NamedTuples dynamically, there should be a single interface that’d allow to pass all 3 - field name, field annotation, and field default. Because collections.namedtuple() accepts defaults, but NOT annotations, and typing.NamedTuple() accepts annotations but NOT defaults.

E.g. by allowing to add annotations to collections.namedtuple().

Or by allowing to add defaults to typing.NamedTuple().

Context

I’m building a frontend library for Django - django-components (e.g. think Vue in Python). There, I want to make the life of our users as simple as possible.

To allow for input validation and typing, we’ve come up with this construct (simplified):

class MyTable(Component):
    class Kwargs(NamedTuple):
        title: str
        title_height_px: int = 40

    def get_template_data(self, kwargs: Kwargs):
        return {
            "title": kwargs.title.trim(),
            "title_height_px": kwargs.title_height_px,
        }

MyTable.render(
    kwargs=MyTable.Kwargs(
        title="MyPage",
        title_height_px=50,
    )
)

What’s going on in the example is that:

  1. User defines a UI component by subclassing Component.
  2. They can defined nested class Component.Kwargs to define component inputs (similar to Vue’s props)
  3. To get type hints, user can annotate the kwargs argument inget_template_data()with the Component.Kwargs class they just defined. Internally, we transform kwargs to an instance of Component.Kwargs. This way they get type checking inside the component :white_check_mark:
  4. To get type checking when calling the component from outside (MyTable.render()), users can reuse the Component.Kwargs class that they defined :white_check_mark:

In the example above, the Kwargs class subclasses NamedTuple. Our users can choose any other class - e.g. if they use Pydantic’s BaseModel, they will get not only type checking, but also runtime input validation.

As I said, I want to make the API as simple as possible. Needing to remember that Component.Kwargs must subclass NamedTuple (or other) is not great. And so in a recent PR, I made it possible to skip the specifying of the parent class. So now our users can simply do:

class MyTable(Component):
    class Kwargs:  # <----- No longer explicitly subclasses NamedTuple
        title: str
        title_height_px: int = 40

To implement this simplification, I needed to take a plain class like the Component.Kwargs above, and behind-the-scenes convert it into a NamedTuple. That way, even if user simply defines class Kwargs, it will still eventually be a NamedTuple, and the behaviour above would not break.

So I had a challenge - How can I take a plain class and convert it into a NamedTuple?

Why NamedTuple? It might be outdated info, but I read that they should be faster than dataclasses. On some of my larger web pages at work, this class may be instantiated ~3-4k times when rendering a page, so I try to be performance-conscious.

Problem

Turns out that there’s a lot of nuance and a minefield when it comes to dynamically creating a NamedTuple class for which one wants to specify BOTH annotations AND defaults.

Hence why I think this should be implemented in Python. So that we, Python users, don’t have to think about the details, but simply call collections.namedtuple() or similar, and get a NamedTuple that has both annotations AND defaults.

See final implementation here

:cross_mark: For normal classes, you could simply make a new class that subclasses from both.

class X(MyClass, NamedTuple):
    pass

But NamedTuples don’t support that.

:cross_mark: And you can’t further subclass the subclass of NamedTuples:

class Another(NamedTuple):
    x: int = 1

class X(Another):
    y: str

:cross_mark: When using typing.NamedTuple as a function, you can’t pass in defaults:

my_class = typing.NamedTuple("MyClass", [("x", int), ("y", str)])

I tried setting the defaults (_field_defaults) manually, but Python wasn’t picking that up.

:cross_mark: One option was to define the NamedTuple with a class syntax as a string, and then evaluate that string. But that had 2 problems - 1) security risk, and 2) we’d need to import all the types used in annotations:

my_cls_str = """
from typing import NamedTuple

from path.to.custom import CustomClass

class MyClass(NamedTuple):
    x: int
    y: str
    z: CustomClass
"""
my_cls = eval(my_cls_str, {}, {})

:white_check_mark: Lastly I managed to get it working using collections.namedtuple. This function doesn’t define the field annotations, but it is able to handle defaults.

So if I have NamedTuple with 3 fields - x, y, and z, and I set defaults to ["hello", 123]:

my_cls = namedtuple("MyClass", ["x", "y", "z"], defaults=["hello", 123])

then this is the same as writing:

class MyClass(NamedTuple):
    x: int
    y: str = "hello"
    z: int = 123

To get the annotations back in, I had set MyClass.__annotations__ at the end.

NOTE: I can’t confirm if using __annotations__ is the right way to set these, since I don’t know what tools or functions I could use to test this out.

NOTE 2: One annoying thing about collections.namedtuple was how it sets defaults the same way as Python functions do - it takes defaults as a list, and it assigns the defaults to the last N entries of the class signature. It’s annoying becuase it’s easy to make a mistake when contructing the defaults list.


NOTE 3: One thing I didn’t know - when defining NamedTuple with the class syntax, the fields it recognizes are only those that are annotated. If a field is NOT annotated, it’s treated as a CLASS VARIABLE:

class MyClass(NamedTuple):
    x: int            # Required field
    y: str = "hello"  # Optional field (with default)
    z = 123           # Class var!

So at the end I also assigned the class variables to the new namedtuple class.

But this already feels outside of the scope of namedtupleconstructor, because in other cases one may not want to copy the class vars.


NOTE 4: Surprisingly LLMs were really bad at solving this, so I’m hoping this comment gets into the next training set.

Proposal

When creating NamedTuples dynamically, there should be a single interface that’d allow to pass all 3 - field name, field annotation, and field default. Because collections.namedtuple() accepts defaults, but NOT annotations, and typing.NamedTuple() accepts annotations but NOT defaults.

E.g. by allowing to add annotations to collections.namedtuple().

Or by allowing to add defaults to typing.NamedTuple().

The API of collections.namedtuple is (in Py 3.11):

def namedtuple(
    typename: str,
    field_names: str | Iterable[str],
    *,
    rename: bool = False,
    module: str | None = None,
    defaults: Iterable[Any] | None = None
) -> type[tuple[Any, ...]]

So it could receive one more kwarg, annotations, that would be an iterable like defaults

def namedtuple(
    typename: str,
    field_names: str | Iterable[str],
    *,
    rename: bool = False,
    module: str | None = None,
    defaults: Iterable[Any] | None = None,
    annotations: Iterable[Any] | None = None,  # <---- NEW
) -> type[tuple[Any, ...]]

Don’t know about how Python handles the defaults iterable, but I reckon there should be an error when defaults and annotations are both non-null and their lenghts don’t match.

The API of typing.NamedTuple as function call is:

Employee = NamedTuple('Employee', [('name', str), ('id', int)])

Currently it’s 2-tuple of (name, type). So NamedTuple could be made to optionally accept a 3-tuple, in which case it would be (name, type, default):

Employee = NamedTuple('Employee', [
    ('name', str),
    ('id', int),
    ('active', bool, True),
])

This should work the same way as when defining a NamedTuple with the class syntax - fields with defaults MUST be AFTER fields without defaults.

Why don’t you use a dataclass?

1 Like

This wasn’t that hard to test, and it is false. @dataclass(slots=True) is significantly faster than NamedTuple. Below is my testing script:

from dataclasses import dataclass
import timeit
from typing import NamedTuple

from attrs import frozen, mutable


class A(NamedTuple):
    x: int
    y: str = "hello"
    z: int = 123

@dataclass(frozen=True, slots=True)
class B:
    x: int
    y: str = "hello"
    z: int = 123

@dataclass(slots=True)
class C:
    x: int
    y: str = "hello"
    z: int = 123

@dataclass(frozen=True)
class D:
    x: int
    y: str = "hello"
    z: int = 123

@dataclass
class E:
    x: int
    y: str = "hello"
    z: int = 123

@mutable
class F:
    x: int
    y: str = "hello"
    z: int = 123

@frozen
class G:
    x: int
    y: str = "hello"
    z: int = 123

default_N = 10_000

def a(N=default_N):
    return [A(i) for i in range(N)]

def b(N=default_N):
    return [B(i) for i in range(N)]

def c(N=default_N):
    return [C(i) for i in range(N)]

def d(N=default_N):
    return [D(i) for i in range(N)]

def e(N=default_N):
    return [E(i) for i in range(N)]

def f(N=default_N):
    return [F(i) for i in range(N)]

def g(N=default_N):
    return [G(i) for i in range(N)]
{name: timeit.timeit(f"{name}()", globals=globals(), number=100) for name in "abcdefg" }
1 Like

I ran a variation the code that @peterc shared and noticed that using frozen=True along with slots=True results in object creation time that is slower than NamedTuple objects while using just slots=True is faster.

For 10,000 objects it’s the difference between 4ms (NamedTuple), 6ms (dataclass with slots and frozen), and 2ms (dataclass with just slots). That seems like a pretty small difference in web application performance terms.

1 Like

This feels like unnecessary scope creep for named tuples to me, and I tend to agree with what other people have written elsewhere about not using namedtuples for new apis. I don’t think it’s worth tearing namedtuples out of most existing apis, but I think anything that doesn’t already neatly fit into “this is perfect for a namedtuple” should just use one of the other options.

If you run into a case where the performance of creating these objects is a measurable impact worth reducing, I’d suggest either writing a very minimal class in Cython, or using msgspec

In most long-lived applications, I wouldn’t pick which behavior to use based on the performance, but of the desired characteristics. both msgspec and cython add to your required dependencies, and may slow adoption of new python version slightly, but in the two cases I’ve personally had where it matters, these were by far the best options, both on object creation, and on attribute access, across various options like being frozen or not.

In shorter lived applications and scripts, you may find that manually writing the equivalent classes (or writing a small script to generate them, it’s just a string template and you can commit the result to source to not eval at runtime) gets more enticing, as you still don’t get the increased API surface of namedtuples of dataclasses for any of the features of them your application doesn’t use, and may be able to remove them from your import graph, speeding up initial response times, without also adding a native dependency (though you may find that one of these is already in your dependency graph)

5 Likes

Hey all, thanks for replies! I posted this also on Reddit, I was told that NamedTuple is considered an outdated API, and that Dataclasses are actually faster. So my assumptions don’t hold.

I also ran some benchmarks on Py3.11, and turns out that Dataclasses are actually generaly faster than NamedTuples (except for instantiation of frozen dataclasses).

If someone’s interested in why I used NamedTuples and not dataclasses, see the deep dive here. It is also to some extend for backwards compatibility.

One problem why I can’t use fully only dataclasses is because there is one case where the used construct (NamedTuple, Dataclass, whatever) needs to be instantiated with a varargs - e.g. fn(*args). NamedTuples support that, but Dataclasses don’t (AFAIK). If there’s a way to modify dataclasses so their __init__ accepts varargs, then I could fully use only dataclasses.

@mikeshardmind Great take!

You could define your own __init__, in which case @dataclass won’t add its own.

3 Likes