Optimize DPO log probability calculation by retaining necessary cache, saving up to 30GB of memory (#1968) #1969
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.


What does this PR do?
This PR addresses an issue where unnecessary cache clearing was performed during the log probability calculation in the training loop. The original code included calls to
torch.cuda.empty_cache()andself.accelerator.free_memory()to avoid out-of-memory (OOM) errors, but these were not necessary and may have introduced performance overhead. By removing these calls, this PR optimizes the memory management during training.Fixes # (issue)
Motivation and Context
This change improves the efficiency of the log probability calculation during training by eliminating redundant memory management operations. This can lead to better performance and more efficient use of GPU memory.
Before submitting
Who can review?
Anyone in the community is welcome to review this PR once the tests have passed. Feel free to tag members or contributors who may be interested in reviewing.