-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New features for CodeParrot training script #16851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New features for CodeParrot training script #16851
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @loubnabnl, looks pretty clean already! I left a few minor comments, mainly to make the code a bit more concise.
Regarding the saving of the state: I think it is great that we can now save it but I think what is missing currently is the mechanism to start the script from a saved state. I don't think we need to do much and you can probably follow the example here:
https://github.com/huggingface/accelerate/blob/main/examples/complete_nlp_example.py
examples/research_projects/codeparrot/scripts/codeparrot_training.py
Outdated
Show resolved
Hide resolved
| elapsed_time_per_iteration = time.time() - t_start | ||
| checkpoint_factor = 4 if args.gradient_checkpointing else 3 | ||
| batch_size = args.train_batch_size * accelerator.state.num_processes * args.gradient_accumulation_steps | ||
| factor = ( | ||
| 24 * checkpoint_factor * batch_size * args.seq_length * config_model.n_layer * (config_model.n_embd**2) | ||
| ) | ||
| flops_per_iteration = factor * ( | ||
| 1.0 | ||
| + (args.seq_length / (6.0 * config_model.n_embd)) | ||
| + (tokenizer.vocab_size / (16.0 * config_model.n_layer * config_model.n_embd)) | ||
| ) | ||
| tflops = flops_per_iteration / (elapsed_time_per_iteration * accelerator.state.num_processes * (10**12)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we move that to a dedicated function? e.g. compute_tflops(elapsed_time, accelerator, args)? It would be nice if the main training loop would stay concise to make it clearer what's going on.
| accelerator.wait_for_everyone() | ||
| unwrapped_model = accelerator.unwrap_model(model) | ||
| unwrapped_model.save_pretrained(args.save_dir, save_function=accelerator.save) | ||
| accelerator.save_state(args.save_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly does the save_state save? A bunch of files? Maybe we could add them to a folder e.g. args.save_dir + "/state/".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
save_state returns a bunch of files (model, optimizer ..), I'm now saving them in folders corresponding to the steps to be able to resume training from these steps later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and since save_state already saves the model in the folder step, I now use save_pretrained for the unwrapped model only for the last checkpoint to save model in args.save_dir to load direclty from there later
| accelerator.wait_for_everyone() | ||
| unwrapped_model = accelerator.unwrap_model(model) | ||
| unwrapped_model.save_pretrained(args.save_dir, save_function=accelerator.save) | ||
| accelerator.save_state(args.save_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
…add tflops function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only two minor comments and then it is good to go! 🚀
|
|
||
|
|
||
| def compute_tflops(elapsed_time, accelerator, args): | ||
| config_model = accelerator.unwrap_model(model).config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor thing: can you add the link to the formula here? either BigScience or the paper itself. So somebody could find out where that black magic formula actually comes from :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
examples/research_projects/codeparrot/scripts/codeparrot_training.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* add tflops logging and fix grad accumulation * add accelerate tracking and checkpointing * scale loss of last batch correctly * fix typo * compress loss computation Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * add resume from checkpoint argument * add load_state accelerate from checkpoint, register lr scheduler and add tflops function * reformat code * reformat code * add condition on path for resume checkpoint * combine if conditions Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * add source for tflops formula Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
This PR adds some features to CodeParrot training script.
cc @lvwerra @LysandreJik