KEMBAR78
Update self-push workflow by ydshieh · Pull Request #17177 · huggingface/transformers · GitHub
Skip to content

Conversation

@ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented May 11, 2022

What does this PR do?

Update self-push CI workflow file:

  • tests_fetcher.py is updated to output a json file, containing a dictionary mapping test categories to the identified test files (which is used by the updated push CI below)
  • Reorganize the tests into models (e.g. models/bert, models/gpt2, etc.) and modeling categories (pipeline, tokenization), same as in scheduled CI
  • notification_service.py and self-scheduled.yml are updated to use [single/multi]-gpu as artifact name prefixes (i.e. no more -docker at the end): with this minimal change, notification_service.py could be reused

Some workflow runs:

Some tests failed intentionally (to verify their reports). The reports could be found on transformers-ci-feedback-tests channel.

TODO:

  • create new report channel and add the channel ID to the workflow file

I added some reviews that contain some of my questions.

@sgugger Maybe you could have a look for the changes in test_fetcher.py?
@stas00 Maybe for the changes regarding DeepSpeed and multi-gpu configurations?

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented May 11, 2022

The documentation is not available anymore as the PR was closed or merged.

@ydshieh ydshieh requested review from LysandreJik, sgugger and stas00 May 11, 2022 12:50
Copy link
Contributor

@stas00 stas00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it difficult to make sense of some of the changes, due to diff being hard to follow, so I'm totally fine with you going ahead with the deepspeed side of things and sort it out if something breaks afterwards. I test on my machine as well, so if it's broken for a day or two on CI it's no big deal.

@ydshieh
Copy link
Collaborator Author

ydshieh commented May 11, 2022

I find it difficult to make sense of some of the changes, due to diff being hard to follow

OK @stas00 , the big diff is probably because I removed unused blocks. No real big change - I just copied from scheduled CI.
See below if you would like to have a quick look.

My only questions are

  • why we use options: --gpus 0 previously in run_tests_torch_cuda_extensions_multi_gpu
  • could we use options: --gpus all for single gpu case as well as multi gpu?

prev.

image: nvcr.io/nvidia/pytorch:21.03-py3
options: --gpus 0

now.

huggingface/transformers-pytorch-deepspeed-latest-gpu
options: --gpus all

and

prev.

      - name: Install dependencies
        run: |
          apt -y update && apt install -y libaio-dev
          pip install --upgrade pip
          pip install .[deepspeed-testing]

now

      - name: Re-compile DeepSpeed
        working-directory: /workspace
        run: |
          pip install deepspeed # installs the deps correctly
          rm -rf DeepSpeed
          git clone https://github.com/microsoft/DeepSpeed && cd DeepSpeed && rm -rf build
          DS_BUILD_CPU_ADAM=1 DS_BUILD_AIO=1 DS_BUILD_UTILS=1 python3 -m pip install -e . --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check

@stas00
Copy link
Contributor

stas00 commented May 11, 2022

thank you for highlighting the changes @ydshieh - that's super helpful.

  1. So for gpus:
  • multi-gpu should be --gpus all (needs 2 gpus)
  • single-gpu should be --gpus 0 (must have only one gpu)

so looking at the diff the original seems to be correct. perhaps not everywhere?

  1. For dependencies the original is correct.

Have a look at what it signifies:

extras["deepspeed-testing"] = extras["deepspeed"] + extras["testing"] + extras["optuna"]

so the change is missing important dependencies install.

and the new instructions aren't correct.

We only want the bleed edge (your now) install only for nightly build. self-push should use the released deepspeed version, that pip install .[deepspeed-testing] takes care of (but which of course can be moved into the docker if it's running via the docker image). If it's already there, then there is no need for that last pip call either.

Bottom line - no change from the original in either case logically.

If I missed something please let me know.

@ydshieh
Copy link
Collaborator Author

ydshieh commented May 12, 2022

so looking at the diff the original seems to be correct. perhaps not everywhere?

The current main branch has a job run_tests_torch_cuda_extensions_multi_gpu in self-push.yml which has --gpus 0.
In the latest commit in this PR, I reverted to the original version regarding DeepSpeed parts, but set --gpus all for multi-gpu job.

Remark 1: some places in self-scheduled.yml have to be fixed.)

Remarks 2: I checked this doc expose-gpus-for-use, and think we can still use --gpus all even if the host machine has only 1 GPU. --gpus 0 is necessary only if the host has multiple GPUs but we want to use only 1 of them.

  1. For dependencies the original is correct.
    self-push should use the released deepspeed version, that pip install .[deepspeed-testing]

I will change back to the original version for this part (Done), thank you.

@ydshieh ydshieh marked this pull request as draft May 12, 2022 09:31
@ydshieh ydshieh marked this pull request as ready for review May 12, 2022 10:46
@stas00
Copy link
Contributor

stas00 commented May 12, 2022

As long as the tests are run with CUDA_VISIBLE_DEVICES=0 for run_tests_single_gpu jobs it indeed doesn't matter if more than 1 gpu is available.

But it's critical we ensure that it is set correctly, otherwise tests requiring a single gpu will get skipped.

Thank you for fixing where the setting are incorrect, @ydshieh!

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks for working on it @ydshieh!

The notification_service_deprecated.py can likely be removed now.

Before merging it, could you do a test run when modifying the setup.py to ensure that all tests are run correctly? Thank you!

Comment on lines -499 to +324
python utils/notification_service_deprecated.py push
python utils/notification_service.py "${{ needs.setup.outputs.matrix }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love that!

CI_SLACK_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID }}
CI_SLACK_CHANNEL_ID_DAILY: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY }}
CI_SLACK_CHANNEL_DUMMY_TESTS: ${{ secrets.CI_SLACK_CHANNEL_DUMMY_TESTS }}
CI_EVENT: scheduled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smart!

@ydshieh
Copy link
Collaborator Author

ydshieh commented May 13, 2022

Before merging it, could you do a test run when modifying the setup.py to ensure that all tests are run correctly? Thank you!

I had to fix a bug (i.e. when the test list is tests, i.e. when setup.py is changed).
A full test workflow run is here.
After looking some failures, I am convinced that this PR is ready to be merged (the failures are the same as in scheduled CI runs).

Thank you for the reviews!

@ydshieh ydshieh merged commit 38043d8 into main May 13, 2022
@ydshieh ydshieh deleted the update_push_ci branch May 13, 2022 14:28
Narsil pushed a commit to Narsil/transformers that referenced this pull request May 30, 2022
fix tokenizer autodoc

fix minor CI issues

fix minor CI issues

fix minor CI issues

fix style issue

fix minor import issues

fix few issues

remove def main on the test

add require torch

replace decorator with 'with'

fix style

change to bloom

add quick fix tokenizer

fix tokenizer file

fix tokenizer

- merge tests
- small fixes

fix import issue

add bloom to readme

fix consistency

Update docs/source/en/model_doc/bloom.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Apply suggestions from code review

fix comment issues on file headers

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

fix doc issue

small fix - modeling test

some changes

- refactor some code
- taking into account reviews
- more tests should pass
- removed pruning tests

remove useless division

more tests should pass

more tests should pass

more tests should pass

let's try this one

-add alibi offset
- remove all permutes to make the grad operations work
- finger crossed

Update data2vec.mdx to include a Colab Notebook link (that shows fine-tuning) (huggingface#17194)

* Update data2vec.mdx

* Update data2vec.mdx

* Update docs/source/en/model_doc/data2vec.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Dev version

Add test to ensure models can take int64 inputs (huggingface#17210)

* Add test to ensure models can take int64 inputs

* is_integer is an attribute, not a method

* Fix test when some inputs aren't tensors

* Add casts to blenderbot and blenderbot-small

* Add casts to the other failing models

Fix dependency table

update BART docs (huggingface#17212)

Black preview (huggingface#17217)

* Black preview

* Fixup too!

* Fix check copies

* Use the same version as the CI

* Bump black

Fix typo in bug report template (huggingface#17178)

* Fix typo

* Force rerun workflows

Co-authored-by: Felix Marty <felix@huggingface.co>

Added translation of installation.mdx to Portuguese Issue huggingface#16824 (huggingface#16979)

* Added translation of installation.mdx to Portuguese, as well
as default templates of _toctree.yml and _config.py

* [ build_documentation.yml ] - Updated doc_builder to build
documentation in Portuguese.
[ pipeline_tutorial.mdx ] - Created translation for the pipeline_tutorial.mdx.

* [ build_pr_documentation.yml ] - Added pt language to pr_documentation builder.

[ pipeline_tutorial.mdx ] - Grammar changes.

* [ accelerate.mdx ] - Translated to Portuguese the acceleration tutorial.

* [ multilingual.mdx ] - Added portuguese translation for multilingual tutorial.

[ training.mdx ] - Added portuguese translation for training tutorial.

* [ preprocessing.mdx ] - WIP

* Update _toctree.yml

* Adding Pré-processamento to _toctree.yml

* Update accelerate.mdx

* Nits and eliminate preprocessing file while it is ready

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

OPT-fix (huggingface#17229)

* try fixes

* Revert "try fixes"

This reverts commit a8ad75e.

* add correct shape

* add correct path

OPT - fix docstring and improve tests slighly (huggingface#17228)

* correct some stuff

* fix doc tests

* make style

Update self-push workflow (huggingface#17177)

* update push ci

* install git-python

* update comment

* update deepspeed jobs

* fix report

* skip 2 more tests that require fairscale

* Fix changes in test_fetcher.py (to deal with `setup.py` is changed)

* set RUN_PT_TF_CROSS_TESTS=1 and final clean-up

* remove SIGOPT_API_TOKEN

* remove echo "$matrix_folders"

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

fix --gpus option for docker (huggingface#17235)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Handle copyright in add-new-model-like (huggingface#17218)

Fix Trainer for Datasets that don't have dict items (huggingface#17239)

install dev. version of accelerate (huggingface#17243)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Fix push CI channel (huggingface#17242)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Add PR title to push CI report (huggingface#17246)

* add PR title to push CI report

* add link

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

[ fast_tokenizers.mdx ] - Added translation to portuguese to tutorial (huggingface#17076)

* [ fast_tokenizers.mdx ] - Added translation to portuguese to tutorial

* Delete docs/source/pt-br directory

* [ fast_tokenizers.mdx ] - Continuing work on file

* [ fast_tokenizers.mdx ] - Continuing work on file

* Add fast tokenizers to _toctree.yml

* Eliminated config and toctree.yml

* Nits in fast_tokenizers.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

Translated version of model_sharing.mdx doc to spanish (huggingface#16184)

* Translated version of model_sharing to spanish

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Addind model sharing to _toctree.yml

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

Guide to create custom models in Spanish (huggingface#17158)

* file copied and toctree updated

* Intro and configuration translated

* model section translated

* enter hotfix

* Translation over, correction pending

* Typos and corrections

* Update docs/source/es/create_a_model.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/create_a_model.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/create_a_model.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/create_a_model.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

Fix obvious typos in flax decoder impl (huggingface#17279)

Change config.encoder_ffn_dim -> config.decoder_ffn_dim for decoder.

TF - Fix convnext classification example (huggingface#17261)

[WIP] [doc] performance/scalability revamp (huggingface#15723)

* [doc] performance/scalability revamp

* link the new docs

* no :

* mixed precision

* work on the first doc

* expand the main doc

* Trigger CI

* style

* revamp single GPU training section

* work on training performance

* remove files not used anymore or will be added later

* final touches

* fix rebase

* Add hardware section to toctree

* fix toctree again

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* remove `fast_tokenizers` entry that was copied in rebase

* add warning about DP vs DDP

* remove todo

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix missing closure of codeblock

* Update docs/source/en/perf_train_gpu_many.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* sync with huggingface#16860

* update toc

Co-authored-by: leandro <leandro.vonwerra@spoud.io>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

fixed bug in run_mlm_flax_stream.py (huggingface#17203)

* fixed bug run_mlm_flax_stream.py

Fixed bug caused by an update to tokenizer keys introduced in recent transformers versions (between `4.6.2` and `4.18.0`) where additional keys were introduced to the tokenizer output.

* Update run_mlm_flax_stream.py

* adding missing paranthesis

* formatted to black

* remove cols from dataset instead

* reformat to black

* moved rem. columns to map

* formatted to black

Co-authored-by: KennethEnevoldsen <kennethcenevolsen@gmail.com>

 Updated checkpoint support for Sagemaker Model Parallel (huggingface#17219)

* adding partial checkpoint support for optimizer state

* formatted trainer.py

* Refactoring based on comments

* reformatting

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Cavdar <dcavdar@a07817b12d7e.ant.amazon.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Update codeparrot data preprocessing (huggingface#16944)

* add new preprocessing arguments

* add new filters

* add new filters to readme

* fix config and test count, update function names and docstrings

* reformat code

* update readme

* Update readme

* rename config_test filter

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* rename few_assignments filter

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* rename tokenizer in arguments

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* rename functions and add limit_line argument for config_test filter

* update threshold for config_test filter

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Loubna ben allal <loubnabenallal@gmail.com>

CodeParrot data pretokenization (huggingface#16932)

* add pretokenization arguments

* add pretokenization script

* add support for pretokenized data

* reformat code

* fix run command for training

* fix model call from config

* remove a package

* add comments on pretokenization in the readme

* remove explicit parallelization

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* update readme

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* update readme -remove username

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* update readme -remove username

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* keep data parallelization

* reformat code

* reformat code

* update readme

* reformat code

* Update examples/research_projects/codeparrot/README.md

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Loubna ben allal <loubnabenallal@gmail.com>

Remove next sentence prediction from supported ONNX tasks (huggingface#17276)

Align logits and labels in OPT (huggingface#17237)

Mlflowcallback fix nonetype error (huggingface#17171)

* Fix edge cases TypeError: 'NoneType' object is not callable

* fix style

Automatically sort auto mappings (huggingface#17250)

* Automatically sort auto mappings

* Better class extraction

* Some auto class magic

* Adapt test and underlying behavior

* Remove re-used config

* Quality

Make TrainerHyperParameterSigOptIntegrationTest slow test (huggingface#17288)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Better error in the Auto API when a dep is missing (huggingface#17289)

Fix FlavaForPreTrainingIntegrationTest CI test (huggingface#17232)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Use the PR URL in CI report (huggingface#17269)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

logging documentation update (huggingface#17174)

* logging documentation

* style

Co-authored-by: Sander Land <sander@chatdesk.com>

docs(transformers): fix typo (huggingface#17263)

Add Tensorflow Swin model (huggingface#16988)

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

[Tests] Fix slow opt tests (huggingface#17282)

* fix opt tests

* remove unused tok

* make style

* make flake8 happy

* Update tests/models/opt/test_modeling_opt.py

Fix test_model_parallelization (huggingface#17249)

* Fix test_model_parallelization

* Modify

Add Wav2Vec2Conformer (huggingface#16812)

* save intermediate

* add wav2vec2 conformer

* add more code

* more

* first test passes

* make all checkpoints work

* update

* up

* more clean ups

* save clean-up

* save clean-up

* save more

* remove bogus

* finalize design conformer

* remove vision

* finish all tests

* more changes

* finish code

* add doc tests

* add slow tests

* fix autoconfig test

* up

* correct docstring

* up

* update

* fix

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* Update docs/source/en/model_doc/wav2vec2-conformer.mdx

* upload

* save copied from

* correct configs

* fix model outputs

* add to docs

* fix imports

* finish

* finish code

* correct copied from

* correct again

* correct make fix

* improve make fix copies

* save

* correct fix copy from

* correct init structure

* correct

* fix import

* apply suggestions

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

Fix missing job action button in CI report  (huggingface#17270)

* use matrix.machine_type

* fix job names used in job_link

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Fix wrong PT/TF categories in CI report (huggingface#17272)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

[ConvNeXT] Fix drop_path_rate (huggingface#17280)

* Fix drop_path_rate

* Fix TF's drop path rate

fix retribert's `test_torch_encode_plus_sent_to_model` (huggingface#17231)

Fix tests of mixed precision now that experimental is deprecated (huggingface#17300)

* Fix tests of mixed precision now that experimental is deprecated

* Fix mixed precision in training_args_tf.py too

Rewrite TensorFlow train_step and test_step (huggingface#17057)

* Initial commit

* Better label renaming

* Remove breakpoint before pushing (this is your job)

* Test a lot more in the Keras fit() test

* make fixup

* Clarify the case where we flatten y dicts into tensors

* Clarify the case where we flatten y dicts into tensors

* Extract label name remapping to a method

correct opt (huggingface#17301)

refactor

- refactor code
- style changes
- add new threshold for test

major changes

- change BLOOM to Bloom
- add quick doc on bloom.mdx
- move embeddings test on modeling test

modify readme

small fixes

small fix

- better threshold for a test

remove old test file from fetcher

fix small typo

major change

- change BloomLMHead to BloomForCausalLM

remove onnx config

major changes

- refactor the code
- remove asserts
- change tol for test

make style

small change

adding a slow test + commenting old ones for now

make style

Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

make style

fix duplicates

cleaning comments on config

clean a bit conversion file

refacor a bit modeling file

refactor tokenizer file

fix tokenization test issue

fix tokenization issue second try

fix tokenization issue #2

fix test issue

make style + add suggestions

change test fetcher

try this one

- slow tests should pass
- finger crossed

possible final changes

make style

try fix padding side issue

fix side

fix padding issue

fix ko-readme

fix config auto

cleaning modeling file

keep bloom in caps in ko

update config docs

remove pretraining_pp

remove model parallel

update config

- add correct config files

fix duplicates

fix fetcher

fix refactor issue

- remove divide function

try to remove alibi

small fixes

- fix alibi
- remove seq length
- refactor a bit the code

put correct values

- fix bos and eos token ids

fix attention mask loop

Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>

small fixes:

- remove skip bias add

small fixes

- fix typo in readme
- fix typos in config

small changes

- remove a test
- add reconstruction test
- change config

small changes

- change Scaled Softmax to BloomScaledSoftmax

small fixes

- fix alibi dtype

major changes

- removing explicit dtype when loading modules
- fixing test args (torch_dtype=auto)
- add dosctring

fix readmes

major changes

- now bloom supports alibi shifting
- refactor a bit the code
- better test tolerance now

refactor a bit

refactor a bit

put correct name on test

change docstring

small changes

- fix docstring modeling
- fix test tolerance

fix small nit

- take dtype from tensors in the conversion script

minor fix

- fix mdx issue

minor fix

- change config docstring

forward contrib credits from PR14084

Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

apply modifications

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

resolve softmax upcast

Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

Update src/transformers/models/bloom/modeling_bloom.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

final changes modeling

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

Merge commit 'd156898f3b9b2c990e5963f5030a7143d57921a2'

merge commit

Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

apply suggestions

Apply suggestions from Stas comments
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022
* update push ci

* install git-python

* update comment

* update deepspeed jobs

* fix report

* skip 2 more tests that require fairscale

* Fix changes in test_fetcher.py (to deal with `setup.py` is changed)

* set RUN_PT_TF_CROSS_TESTS=1 and final clean-up

* remove SIGOPT_API_TOKEN

* remove echo "$matrix_folders"

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants