Add self training code for text classification #16738
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is an implementation of the self-training algorithm (without task augmentation) for classification tasks proposed in the EMNLP 2021 paper: STraTA: Self-Training with Task Augmentation for Better Few-shot Learning. For the original codebase, please check out https://github.com/google-research/google-research/tree/master/STraTA. Note that this code can be used as a tool for automatic data labeling.
The pull request includes a README.md file with detailed instructions on how to set up a virtual environment and install necessary packages. It also includes a demo
run.sh
on how to perform self-training with a BERT Base model on the SciTail science entailment dataset using 8 labeled examples per class.