-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
Description
Bug report
Resolution 1 (outdated): #139482 (comment).
Resolution 2 (merged): #139482 (comment).
Bug description:
Problem
Summary
The os.environ.clear() method currently has O(N²) time complexity. This is because it inherits the generic MutableMapping.clear() implementation, which repeatedly calls popitem(). Each popitem() call invokes __iter__(), and for _Environ, __iter__() creates a full snapshot of all environment keys every time. As a result, clearing large environments is extremely slow (e.g., 23 seconds to clear 100,000 variables).
Details
os.environ is an instance of _Environ, which subclasses MutableMapping. _Environ implements the required abstract methods (__getitem__, __setitem__, __delitem__, __iter__, __len__), but inherits generic implementations for methods like clear, pop, popitem, update, and setdefault.
The inherited clear() method is implemented as follows:
class MutableMapping(Mapping):
...
def popitem(self):
try:
key = next(iter(self))
except StopIteration:
raise KeyError from None
value = self[key]
del self[key]
return key, value
def clear(self):
'D.clear() -> None. Remove all items from D.'
try:
while True:
self.popitem()
except KeyError:
pass
...For every item deleted, clear() calls iter(self) inside popitem().
This becomes a performance issue for _Environ because _Environ.__iter__() creates a snapshot of all keys:
class _Environ(MutableMapping):
...
def __iter__(self):
# list() from dict object is an atomic operation
keys = list(self._data)
for key in keys:
yield self.decodekey(key)This leads to quadratic time complexity for os.environ.clear().
Performance Impact
This problem is especially evident on large environments.
On my M3 MacBook Pro, it takes 500ms to clear an environment with only 10K variables.
A more extreme example: 100K variables take 23s to clear.
Environments with thousands of variables are rare, but do exist.
In [3]: for i in range(100000):
...: os.environ[str(i)] = str(i)
...:
In [4]: %time os.environ.clear()
CPU times: user 23.3 s, sys: 94.3 ms, total: 23.4 s
Wall time: 23.4 s
Suggested solution
Implement a custom _Environ.clear() that avoids repeated snapshots (for every key):
def clear(self):
while self._data:
for key in list(self._data):
try:
del self[key]
except KeyError:
passThis implementation makes use of the atomicity of list(dict()), which is important in multithreaded environments and is the reason we call list(self._data) inside __iter__(self).
Further improvement on Linux/FreeBSD could be achieved by using clearenv() which is part of the standard C library there.
Other considerations
More generally, we could consider challenging the existing design decisions:
- Should
__iter__make a snapshot of all environment keys?
- PR introducing this behavior (2017): bpo-30441: Fix bug when modifying os.environ while iterating over it #2409
- Bug report that motivated the change: https://bugs.python.org/issue30441
- Should
MutableMapping.clear()rely onpopitem()in a loop, assuming__iter__is cheap?
CPython versions tested on:
3.12, CPython main branch
Operating systems tested on:
macOS