-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Description
What happened?
We have identified a memory leak that affects Beam Python SDK versions 2.47.0 and above. The leak was triggered by an upgrade to protobuf==4.x.x. We rootcaused this leak to protocolbuffers/protobuf#14571 and it has been remediated in Beam 2.52.0.
[update: 2023-12-19]: Due to another issue related to protobuf upgrade, Python streaming users should continue to apply the mitigation steps below with Beam 2.52.0 or switch to Beam 2.53.0 once available.
Mitigation
Until Beam 2.52.0 is released, consider any of the following workarounds:
-
Use
apache-beam==2.46.0or below. -
Install protobuf 3.x in the submission and runtime environment. For example, you can use a
--requirements_filepipeline option with a file that includes:protobuf==3.20.3 grpcio-status==1.48.2For more information, see: https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/
-
Use a
pythonimplementation of protobuf by setting aPROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=pythonenvironment variable in the runtime environment. This might degrade the performance since python implementation is less efficient. For example, you could create a custom Beam SDK container from aDockerfilethat looks like the following:FROM apache/beam_python3.10_sdk:2.47.0 ENV PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=pythonFor more information, see: https://beam.apache.org/documentation/runtime/environments/
-
Install protobuf==4.25.0 or newer in the submission and runtime environment.
Users of Beam 2.50.0 SDK should additionally follow mitigation options for #28318.
Additional details
The leak can be reproduced by a pipeline:
with beam.Pipeline(options=pipeline_options) as p:
# duplicate reads to increase throughput
inputs = []
for i in range(32):
inputs.append(
p | f"Read pubsub{i}" >> ReadFromPubSub(topic='projects/pubsub-public-data/topics/taxirides-realtime', with_attributes=True)
)
inputs | beam.Flatten()
Dataflow pipeline options for the above pipeline: --max_num_workers=1 --autoscaling_algorithm=NONE --worker_machine_type=n2-standard-32
The leak was triggered by Beam switching default protobuf package version from 3.19.x to 4.22.x in #24599. The new versions of protobuf also switched the default protobuf implemetation to a upb implementation. The upb implementation had two known leaks that have since been mitigated by protobuf team in: protocolbuffers/protobuf#10088, https://github.com/protocolbuffers/upb/issues/1243 . The latest available protobuf==4.24.4 does not yet have the fix, but we have confirmed that using a patched version built in https://github.com/protocolbuffers/upb/actions/runs/6028136812 fixes the leak.
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner