KEMBAR78
bpo-40503: Add tests and implementation for ZoneInfo by pganssle · Pull Request #19909 · python/cpython · GitHub
Skip to content

Conversation

@pganssle
Copy link
Member

@pganssle pganssle commented May 4, 2020

This PR brings in the tests, C and pure Python implementations for the zoneinfo module, as specified in PEP 615. Migrating this over from the reference implementation.

For the sake of keeping the review somewhat tightly scoped (HAH) it does not cover:

  1. What's new entry. (bpo-40503: Add documentation and what's new entry for zoneinfo #20006)
  2. Docs (bpo-40503: Add documentation and what's new entry for zoneinfo #20006)
  3. The compile-time configuration of TZPATH (bpo-40503: Add compile-time configuration to PEP 615 #20034) (Moved to this PR)

Those will come in later / separate PRs.

https://bugs.python.org/issue40503

@pganssle pganssle marked this pull request as ready for review May 8, 2020 15:18
@pganssle
Copy link
Member Author

pganssle commented May 8, 2020

@python/windows-team I can't really make heads or tails of the azure pipelines configuration, but I need to try to add a PyPI dependency to the tests (it's particularly important on Windows).

Can one of y'all point me in the right direction here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that this is happening but not actually installing tzdata in the environment correctly (I added a deliberately failing test to the TZ Data tests to check this and it continued to pass), but the GHA workflows are hitting these code paths, so we can defer getting the test requirements installed on AP until a later PR.

@pganssle pganssle added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label May 11, 2020
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @pganssle for commit 87d8c51 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label May 11, 2020
@pganssle
Copy link
Member Author

@aeros (or anyone) — any other suggested changes? Are you still planning on further review?

Hoping to get this in ASAP because it's blocking PRs #20034 and bpo-40536, and there's only 1 week until the feature freeze.

@pganssle
Copy link
Member Author

@serhiy-storchaka Would you mind taking a look at the C code here? There's about 2000 lines of it and it would be good to get an expert eye on it.

Also CC-ing @vstinner because it's getting close to feature freeze and there's a ton of code to review so I'm now trying to dragoon specific people instead of sending out general requests for volunteers 😛

@pganssle pganssle requested a review from a team as a code owner May 12, 2020 14:37
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importing @zooba's comments from the compile-time-flag PR:

As we discussed, I don't like relying on pip to install into the source tree before we've run the test suite.

In general, system-wide dependencies are installed using the system package manager, or downloaded using specialised scripts and an existing Python installation (see PCbuild/get-externals.py). This would be a new network dependency, and I'd rather keep us to things that are checked in or directly grabbed from GitHub.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My response:

I think it's particularly important that we test against the actual installed tzdata, particularly in the early days of the PR and particularly on Windows, because that's what a majority of Windows users will be using for time zone data.

I don't think we should check in a tzdata wheel to handle optional test dependencies, though.

If there's another mechanism for pulling in a wheel during the PR, I'm open to the idea, though if it takes too much time or effort I'd ask that we temporarily use PyPI and fix it after the main features are merged. I imagine any problems this would cause would be either immediate or very rare (after all, every other CI job for all my other projects and probably the majority of Python projects hits PyPI as part of the set-up, it's not a fly-by-night endpoint), so if we're already at the point where this is passing I'd rather accept it as a temporary additional dependency until after feature freeze when I'll have a bit more breathing room.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather skip the tests that rely on an installed tzdata and test them on buildbots, where we can rely on manual configuration for stuff like this. We need to prioritise validation when it's not present anyway, as that's the default case (and hence what the real majority of users will use).

If we can't mock up detection testing well enough without the real package, then we should probably refactor it so we can. I couldn't easily spot which tests rely on the real installation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, found the tests, and it seems like you've got full coverage both with and without the package, as well as a warning that the ones with the package may randomly break in the future.

We don't like randomly breaking tests independent from CPython changes, so I'm much more comfortable not running those tests in CI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see what failure modes you are worried about here.

These tests are:

  1. Quick to run
  2. Only rely on a single first-party package (i.e. we control the endpoint — no worry about malicious changes to the package)
  3. Only rely on PyPI (which is not a major failure point on any of the CI systems)
  4. For a feature that is guaranteed in the PEP.

It's already the case that the AP runs don't use tzdata (mostly because I couldn't figure out how to get it to work, but I'm fine with just not fixing that), so we have coverage of the "tzdata present" and "tzdata not present" test cases.

I think it's at least a no-brainer to include them in the coverage builds, but configuring the buildbots to use them is much more complicated than configuring the CI to build them.

I understand the reluctance to introduce pip or installed modules as a mandatory part of the testing, because you should be able to develop offline, but CI builds are always done online.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is also that you're relying on pip working with a brand new build of Python. If someone breaks pip in a PR, they'll see your install fail and no tests will run, which is not going to help them fix it.

This is a valid point, but there are some pretty simple ways around it, for example, we could make it so the pip install step doesn't fail the build job, or fails the build job only after the tests are run, e.g.:

# Install tzdata, continue if failing
pip install -rMisc/requirements-test.txt || :

Or to still count "failed to install tzdata" as a failure, but only after the tests run:

PIP_SUCCESS=1
pip install -rMisc/requirements-test.txt || PIP_SUCCESS=0

... # Run the tests

# Set failing exit code if pip install failed.
[[ "$PIP_SUCCESS" -eq 0 ]] && exit 1

I personally think it's a bit overkill (you could make the same argument about "what if someone breaks urllib"), but at least in this specific case it doesn't need to be a blocker.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That link doesn't show me which section you're referring to, unfortunately.

Sorry, I don't know why it changes the URL if it's not a direct link. Anyway, I'm referring to zoneinfo._common.load_tzdata:

def load_tzdata(key):
import importlib.resources
components = key.split("/")
package_name = ".".join(["tzdata.zoneinfo"] + components[:-1])
resource_name = components[-1]
try:
return importlib.resources.open_binary(package_name, resource_name)
except (ImportError, FileNotFoundError, UnicodeEncodeError) as e:
raise ZoneInfoNotFoundError(f"No time zone found with key {key}") from e

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could make it so the pip install step doesn't fail the build job, or fails the build job only after the tests are run,

While it sounds like a good idea, my experience has shown that it's a great way to simply never run the tests. You need your dependencies to be deterministic, and you want them to fail if they're essential to the test suite. Silently ignoring failures here means we may as well just skip the relevant tests.

You can also easily mock out enough of a directory structure to make that test pass. Anything that breaks beyond that needs to be fixed in the tzdata module itself, and so the tests should be run there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pganssle: I suggest you to revert all CI config changes and remove test-requirements.txt for now, merge the PR without it, so we will have more time to discuss the best option to run test_zoneinfo with tzdata installed by pip.

There are other options like running pip in test.test_zoneinfo, or in test.regrtest, or in "make test" and "make buildbottest", etc. ;-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it sounds like a good idea, my experience has shown that it's a great way to simply never run the tests. You need your dependencies to be deterministic, and you want them to fail if they're essential to the test suite. Silently ignoring failures here means we may as well just skip the relevant tests.

Well, as I mentioned, we could easily fail the test suite afterwards.

You can also easily mock out enough of a directory structure to make that test pass.

This would defeat the purpose of the test, in my opinion. We're not trying to test any of ZoneInfo's behavior from having loaded the data, we're trying to test that loading data from the PyPI package works.

Anything that breaks beyond that needs to be fixed in the tzdata module itself, and so the tests should be run there.

Strong disagreement here. tzdata presents an interface and may have multiple consumers (dateutil, for example, will be a consumer). Generally speaking, you don't tests a package against the things that depend on it, you test the package's interface and anything that depends on it tests against the dependency.

We don't test that Python works with numpy, even though numpy is a huge and important consumer of our work. Maybe we should to decrease the feedback loop, but it's more important for numpy to tests against CPython than for CPython to test against numpy.

@pganssle: I suggest you to revert all CI config changes and remove test-requirements.txt for now, merge the PR without it, so we will have more time to discuss the best option to run test_zoneinfo with tzdata installed by pip.

There are other options like running pip in test.test_zoneinfo, or in test.regrtest, or in "make test" and "make buildbottest", etc. ;-)

I guess I'm fine with this. I definitely was not suggesting that this needs to do the configuration long term — migrating into the make file and presumably whatever the Windows equivalent is would be better — but I did think the default should be "let's test the whole thing" rather than "let's not test it rather than have a harmless hack in the short term.

@zooba Can we compromise on including tzdata in the coverage builds? I think it adds no additional risk since coverage is already getting installed, and getting coverage feedback is important for ensuring that the tests are working as expected (the reference implementation has, for example, 100% coverage in the Python builds).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another imported comment from @zooba:

Also as discussed, coverage is the one exception, but it's also not a core part of the PR workflow. We already switch CI providers for PRs every time Victor gets annoyed at network instability, so let's try and keep PR clean. We can leave it in the post-merge workflow if you really want.

My response is the same as to the first one. I'm OK with finding another workflow for getting the wheel, but if it's not simple I'd rather not bog this feature down with it.

@pganssle
Copy link
Member Author

Rebasing this to fix merge conflicts, and I'm going to squash down many of the fixup commits as part of that. Hopefully it doesn't hurt peoples' ability to give this an incremental review too badly.

}

static PyObject *
zoneinfo_from_file(PyTypeObject *type, PyObject *args, PyObject *kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be nice to convert these functions to argument clinic later, to get more efficient code to parse arguments and use FASTCALL calling convention.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

I wanted to do this but it was too much work to do it outside of CPython in the reference implementation, and I've been trying to keep them as in sync as possible until the merge.

{NULL}, /* Sentinel */
};

static PyTypeObject PyZoneInfo_ZoneInfoType = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later, it would be interesting to use heap allocated types, rather than static types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

At the moment, this is not going to block anything from the point of view of subinterpreters, though, because we still have a lot of static allocation happening in datetime.

I did spend a bunch of time on this, and ended up giving it up as too much work for too little benefit for the initial implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely not block this PR. It doesn't have to be you who handle this issue. And you're right that datetime should be enhanced first ;-)

dst_abbr = m.group("dst")
dst_offset = None

if std_abbr:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed in the coverage job that this doesn't have branch coverage, and I think it's actually impossible for this to be False, because std_abbr is not an optional group and it is required to have at least one character in it.

@vstinner
Copy link
Member

@pganssle @ambv: I would prefer to get this PR merged ASAP and not wait the last minute (like monday evening). It's very common that when such large feature is merged, many issues pops up and block a release.

It seems like the the CI configuration changes (pip install -r Misc/test-requirements.txt) is kind of controversial. I discussed in private with Paul and he told me that not all test_zoneinfo tests are skipped if tzdata is missing. I suggest to remove Misc/test-requirements.txt and all related CI changes for now, and then propose a follow-up PR which the same changes (or a subset, it's up to you). The idea is just to unblock the PR to be able to merge ASAP.

I also expect some bad surprised with "pip install tzdata" on CIs, and so I would prefer to wait after beta1 dust fall, to have more time to investigate these issues.

Otherwise, the feature will land late, cause issues, and delay Python 3.9 beta1 release.

I'm not going to approve the PR with the current CI configuration changes, because @zooba asked for changes, but also because I'm not sure that it's a best approach (and I don't have time to investigate the issue to see what are the different options). PyPI dependencies is something new and we should take our time to agree on the best solution. https://discuss.python.org/t/optional-pypi-dependencies-in-ci-jobs/4111


I don't have the bandwidth to review such large PR. @pganssle asked for review on python-dev, Twitter and other ways. Well, there are few people available and few people who understands time zones to review properly the code.

I'm fine with merging the code as it and enhance the code later (if the CI config changes are reverted :-)). Well, basically do a "post-commit review" :-) It's just that everybody it busy with other stuff just before the feature freeze.

I would like to have one Python 3.9 release on the announced date ;-) Especially such important milestone (feature freeze)!

I would strongly prefer to not merge zoneinfo after the beta1.

@pganssle
Copy link
Member Author

Per Victor's suggestion, I plan to merge this today, so if anyone has been putting off reviewing this until the last minute, please do so now.

As mentioned earlier, I would prefer it if we can keep tzdata in the coverage job at least for now, because it has already helped me identify one issue since making this PR (unnecessary if statement), and if we have to iterate on this quickly after the beta release, I'm going to want to be able to see cross-platform coverage numbers (which I cannot generate myself, since I don't have a Windows machine).

@pganssle
Copy link
Member Author

Per a sidebar discussion with @zooba, for now we'll compromise on leaving the tzdata installation in the coverage jobs and remove the installation from the other jobs.

I will come up with a more thorough proposal for how to do those tests later, but as long as it's hit during the Python coverage job I'm more comfortable leaving it out of the main tests.

I believe this is ready to merge now, I'll give it a few hours for any last minute reviews, but anyone can feel free to make inline comments even after the merge.

This is the initial implementation of PEP 615, the zoneinfo module,
ported from the standalone reference implementation (see
https://www.python.org/dev/peps/pep-0615/#reference-implementation for a
link, which has a more detailed commit history).

This includes (hopefully) all functional elements described in the PEP,
but documentation is found in a separate PR. This includes:

1. A pure python implementation of the ZoneInfo class
2. A C accelerated implementation of the ZoneInfo class
3. Tests with 100% branch coverage for the Python code (though C code
   coverage is less than 100%).
4. A compile-time configuration option on Linux (though not on Windows)

Differences from the reference implementation:

- The module is arranged slightly differently: the accelerated module is
  `_zoneinfo` rather than `zoneinfo._czoneinfo`, which also necessitates
  some changes in the test support function. (Suggested by Victor
  Stinner and Steve Dower.)
- The tests are arranged slightly differently and do not include the
  property tests. The tests live at test/test_zoneinfo/test_zoneinfo.py
  rather than test/test_zoneinfo.py or test/test_zoneinfo/__init__.py
  because we may do some refactoring in the future that would likely
  require this separation anyway; we may:
        - include the property tests
        - automatically run all the tests against both pure Python and C,
          rather than manually constructing C and Python test classes (similar
          to the way this works with test_datetime.py, which generates C
          and Python test cases from datetimetester.py).
- This includes a compile-time configuration option on Linux (though not
  on Windows); added with much help from Thomas Wouters.
- Integration into the CPython build system is obviously different from
  building a standalone zoneinfo module wheel.
- This includes configuration to install the tzdata package as part of
  CI, though only on the coverage jobs. Introducing a PyPI dependency as
  part of the CI build was controversial, and this is seen as less of a
  major change, since the coverage jobs already depend on pip and PyPI.

Additional changes that were introduced as part of this PR, most / all of
which were backported to the reference implementation:

- Fixed reference and memory leaks

    With much debugging help from Pablo Galindo

- Added smoke tests ensuring that the C and Python modules are built

    The import machinery can be somewhat fragile, and the "seamlessly falls
    back to pure Python" nature of this module makes it so that a problem
    building the C extension or a failure to import the pure Python version
    might easily go unnoticed.

- Adjustments to zoneinfo.__dir__

    Suggested by Petr Viktorin.

- Slight refactorings as suggested by Steve Dower.

- Removed unnecessary if check on std_abbr

    Discovered this because of a missing line in branch coverage.
@pganssle pganssle added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label May 15, 2020
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @pganssle for commit ac56fcc 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label May 15, 2020
@pganssle
Copy link
Member Author

I have squashed the commits manually so that I could write a thorough commit message. If anyone wants to see the pre-squashed version for whatever reason, I've preserved it on this branch (plus the reference implementation has a much more detailed commit history anyway): https://github.com/pganssle/cpython/tree/zoneinfo_pre_squash

I have kicked off a final run of the buildbots, and unless there are any objections, I will merge it when everything is green.

@vstinner vstinner merged commit 62972d9 into python:master May 16, 2020
@vstinner
Copy link
Member

I merged your PR, thanks Paul. The PEP is great addition and long awaited feature!

We still have some months to refine the implementation and decide how to install tzdata when running tests.

It is good to have it the first beta1, so users can start to play with it.

Should the Windows installer install tzdata? Or should something explains how to install it if it is missing?

Impressive: we have 72 CI! All are green. Even Refleaks buildbots!

pganssle added a commit that referenced this pull request May 16, 2020
)

This adds the documentation for the `zoneinfo` module added in PEP 615: https://www.python.org/dev/peps/pep-0615/

The implementation itself was GH-19909: #19909

bpo-40503: https://bugs.python.org/issue40503

Co-authored-by: Victor Stinner <vstinner@python.org>
arturoescaip pushed a commit to arturoescaip/cpython that referenced this pull request May 24, 2020
…9909)

This is the initial implementation of PEP 615, the zoneinfo module,
ported from the standalone reference implementation (see
https://www.python.org/dev/peps/pep-0615/#reference-implementation for a
link, which has a more detailed commit history).

This includes (hopefully) all functional elements described in the PEP,
but documentation is found in a separate PR. This includes:

1. A pure python implementation of the ZoneInfo class
2. A C accelerated implementation of the ZoneInfo class
3. Tests with 100% branch coverage for the Python code (though C code
   coverage is less than 100%).
4. A compile-time configuration option on Linux (though not on Windows)

Differences from the reference implementation:

- The module is arranged slightly differently: the accelerated module is
  `_zoneinfo` rather than `zoneinfo._czoneinfo`, which also necessitates
  some changes in the test support function. (Suggested by Victor
  Stinner and Steve Dower.)
- The tests are arranged slightly differently and do not include the
  property tests. The tests live at test/test_zoneinfo/test_zoneinfo.py
  rather than test/test_zoneinfo.py or test/test_zoneinfo/__init__.py
  because we may do some refactoring in the future that would likely
  require this separation anyway; we may:
        - include the property tests
        - automatically run all the tests against both pure Python and C,
          rather than manually constructing C and Python test classes (similar
          to the way this works with test_datetime.py, which generates C
          and Python test cases from datetimetester.py).
- This includes a compile-time configuration option on Linux (though not
  on Windows); added with much help from Thomas Wouters.
- Integration into the CPython build system is obviously different from
  building a standalone zoneinfo module wheel.
- This includes configuration to install the tzdata package as part of
  CI, though only on the coverage jobs. Introducing a PyPI dependency as
  part of the CI build was controversial, and this is seen as less of a
  major change, since the coverage jobs already depend on pip and PyPI.

Additional changes that were introduced as part of this PR, most / all of
which were backported to the reference implementation:

- Fixed reference and memory leaks

    With much debugging help from Pablo Galindo

- Added smoke tests ensuring that the C and Python modules are built

    The import machinery can be somewhat fragile, and the "seamlessly falls
    back to pure Python" nature of this module makes it so that a problem
    building the C extension or a failure to import the pure Python version
    might easily go unnoticed.

- Adjustments to zoneinfo.__dir__

    Suggested by Petr Viktorin.

- Slight refactorings as suggested by Steve Dower.

- Removed unnecessary if check on std_abbr

    Discovered this because of a missing line in branch coverage.
arturoescaip pushed a commit to arturoescaip/cpython that referenced this pull request May 24, 2020
…nGH-20006)

This adds the documentation for the `zoneinfo` module added in PEP 615: https://www.python.org/dev/peps/pep-0615/

The implementation itself was pythonGH-19909: python#19909

bpo-40503: https://bugs.python.org/issue40503

Co-authored-by: Victor Stinner <vstinner@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants