-
Notifications
You must be signed in to change notification settings - Fork 418
fix eval when using subset host loading data #951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3c96a3b to
781eb0e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
Hi, @aireenmei Awesome and thank you for the fix! Do we expect to unblock |
|
@ZhiyuLi-goog It should support eval_per_device_batch_size < 1. But let me actually run the tests to make sure. Will get back to you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @aireenmei. Awesome.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you @aireenmei
| TRAIN_CMD="python3 MaxText/train.py MaxText/configs/base.yml \ | ||
| steps=$STEPS eval_steps=$EVAL_STEPS eval_interval=$STEPS \ | ||
| per_device_batch_size=8.0 learning_rate=3e-4 enable_checkpointing=false \ | ||
| per_device_batch_size=1.0 learning_rate=3e-4 enable_checkpointing=false \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to change the per device batch size here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for catching it! This is for my testing only and shouldn't be checked in. I removed it.
e400f7b to
45fc74d
Compare
add process_indices_eval
45fc74d to
28741d6
Compare
|
@ZhiyuLi-goog I reliazed _tfds_data_processing_c4_mlperf.py also needs update. Please review my latest push^. |
b/371572923
Tested on v4-128: https://cloudlogging.app.goo.gl/YqjMDsc27SxXHSLaA