KEMBAR78
[pytorch] Invoke `vector.reserve()` consistently for non-inplace foreach operations by tsunghsienlee · Pull Request #161128 · pytorch/pytorch · GitHub
Skip to content

Conversation

@tsunghsienlee
Copy link
Contributor

Summary:
The reserve() method is used to pre-allocate memory for the result vector before adding elements to it. This is an optimization that makes sense for several reasons:

  1. Performance improvement: By pre-allocating memory for the exact number of elements needed, it avoids multiple reallocations and memory copies that would occur as the vector grows dynamically.

  2. Memory efficiency: It ensures that the vector allocates exactly the amount of memory needed, no more and no less, which is efficient when we know the final size in advance.

  3. Reduced overhead: Each reallocation typically involves:

  • Allocating a new, larger block of memory
  • Copying all existing elements to the new location
  • Destroying the old elements
  • Deallocating the old memory block
  • Consistent performance: Without reservation, vector growth typically follows a geometric progression (like 1, 2, 4, 8, 16...), which can lead to unpredictable performance spikes when reallocation occurs.

Test Plan:
OSS CI & tests

Rollback Plan:

Differential Revision: D80674453

…ach operations

Summary:
The `reserve()` method is used to pre-allocate memory for the result vector before adding elements to it. This is an optimization that makes sense for several reasons:

1. Performance improvement: By pre-allocating memory for the exact number of elements needed, it avoids multiple reallocations and memory copies that would occur as the vector grows dynamically.

2. Memory efficiency: It ensures that the vector allocates exactly the amount of memory needed, no more and no less, which is efficient when we know the final size in advance.

3. Reduced overhead: Each reallocation typically involves:
- Allocating a new, larger block of memory
- Copying all existing elements to the new location
- Destroying the old elements
- Deallocating the old memory block
- Consistent performance: Without reservation, vector growth typically follows a geometric progression (like 1, 2, 4, 8, 16...), which can lead to unpredictable performance spikes when reallocation occurs.

Test Plan:
OSS CI & tests

Rollback Plan:

Differential Revision: D80674453
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 21, 2025

This appears to be a diff that was exported from phabricator, but the PR author does not have sufficient permissions to run CI. @tsunghsienlee, please do step 2 of internal wiki to get write access so you do not need to get CI approvals in the future. If you think this is a mistake, please contact the Pytorch Dev Infra team.

@pytorch-bot pytorch-bot bot added the release notes: foreach_frontend release notes category label Aug 21, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 21, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161128

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6c9ba7b with merge base f987516 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80674453

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 21, 2025
@tsunghsienlee
Copy link
Contributor Author

@pytorchmergebot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…ach operations (pytorch#161128)

Summary:
The `reserve()` method is used to pre-allocate memory for the result vector before adding elements to it. This is an optimization that makes sense for several reasons:

1. Performance improvement: By pre-allocating memory for the exact number of elements needed, it avoids multiple reallocations and memory copies that would occur as the vector grows dynamically.

2. Memory efficiency: It ensures that the vector allocates exactly the amount of memory needed, no more and no less, which is efficient when we know the final size in advance.

3. Reduced overhead: Each reallocation typically involves:
- Allocating a new, larger block of memory
- Copying all existing elements to the new location
- Destroying the old elements
- Deallocating the old memory block
- Consistent performance: Without reservation, vector growth typically follows a geometric progression (like 1, 2, 4, 8, 16...), which can lead to unpredictable performance spikes when reallocation occurs.

Test Plan:
OSS CI & tests

Rollback Plan:

Differential Revision: D80674453

Pull Request resolved: pytorch#161128
Approved by: https://github.com/Skylion007
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged release notes: foreach_frontend release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants