fix: update BlobReadSession ScatteringByteChannel projection to use less CPU #3324

BenWhitehead · 2025-10-02T23:40:40Z

Switch to using a graceful poll of the queue rather than a spin loop poll.

Elements being added to the queue will happen asynchronously from a background gRPC thread. Rather than immediately returning if the queue is empty allow waiting up to 10 microseconds (possibly not optimal duration, but testing shows good results -- before ~51% of cpu time is spent in StreamingRead.read, after ~13% of cpu time is spent in StreamingRead.read for roughly the same wall time 1.59% before, 1.52% after). This also has the added benefit of reduced perceived latency for an application.

profiled workload results

Download 128MiB object from 0-EOF 1000 times, while async profiler is profiling event=cpu. Using StorageDataClient.fastOpenReadSession to begin the read while creating the session.

All values below are latency in milliseconds

	p50	p90	p95
before	~ 266	~ 408	~ 468
after	~ 253	~ 384	~ 431

Remove no longer necessary Buffers.totalRemaining call

async-profilers flame graphs

	cpu	wall
before
after

…ess CPU Switch to using a graceful poll of the queue rather than a spin loop poll. Elements being added to the queue will happen asynchronously from a background gRPC thread. Rather than immediately returning if the queue is empty allow waiting up to 10 microseconds (possibly not optimal duration, but testing shows good results -- before ~51% of cpu time is spent in `StreamingRead.read`, after ~13% of cpu time is spent in `StreamingRead.read` for roughly the same wall time 1.59% before, 1.52% after. This also has the added benefit of reduced perceived latency for an application. ##### profiled workload results Download 128MiB object from 0-EOF 1000 times, while async profiler is profiling `event=cpu`. Using `StorageDataClient.fastOpenReadSession` to begin the read while creating the session. All values below are latency in milliseconds | | p50 | p90 | p95 | |--------|------:|------:|------:| | before | ~ 266 | ~ 408 | ~ 468 | | after | ~ 253 | ~ 384 | ~ 431 | * Remove no longer necessary Buffers.totalRemaining call

…ment from the queue

BenWhitehead requested a review from a team as a code owner October 2, 2025 23:40

product-auto-label bot added size: m Pull request size is medium. api: storage Issues related to the googleapis/java-storage API. labels Oct 2, 2025

bajajneha27 previously approved these changes Oct 3, 2025

View reviewed changes

chore: ensure leftovers is fully consumed before getting the next ele…

970242e

…ment from the queue

BenWhitehead dismissed bajajneha27’s stale review via 970242e October 3, 2025 18:49

bajajneha27 approved these changes Oct 6, 2025

View reviewed changes

BenWhitehead merged commit 678fecc into main Oct 6, 2025
25 checks passed

BenWhitehead deleted the fix/brs/less-cpu branch October 6, 2025 17:41

release-please bot mentioned this pull request Oct 6, 2025

chore(main): release 2.58.1 #3318

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: update BlobReadSession ScatteringByteChannel projection to use less CPU #3324

fix: update BlobReadSession ScatteringByteChannel projection to use less CPU #3324

Uh oh!

BenWhitehead commented Oct 2, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: update BlobReadSession ScatteringByteChannel projection to use less CPU #3324

fix: update BlobReadSession ScatteringByteChannel projection to use less CPU #3324

Uh oh!

Conversation

BenWhitehead commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

profiled workload results

async-profilers flame graphs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BenWhitehead commented Oct 2, 2025 •

edited

Loading