-
Notifications
You must be signed in to change notification settings - Fork 30.9k
Add inputs vector to calculate metric method #16461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
|
Unfortunately, we can't change the |
|
Ok, spent a bit of time on this and to enable an class EvalPrediction:
"""
Evaluation output (always contains labels), to be used to compute metrics.
Parameters:
predictions (`np.ndarray`): Predictions of the model.
label_ids (`np.ndarray`): Targets to be matched.
inputs (`np.ndarray`, *optional*): Inputs of the model.
"""
def __init__(
self,
predictions: Union[np.ndarray, Tuple[np.ndarray]],
label_ids: Union[np.ndarray, Tuple[np.ndarray]],
inputs: Optional[Union[np.ndarray, Tuple[np.ndarray]]] = None,
):
self.predictions = predictions
self.label_ids = label_ids
self.inputs = inputs
def __iter__(self):
if self.inputs is not None:
return iter((self.predictions, self.label_ids, self.inputs))
else:
return iter((self.predictions, self.label_ids))
def __getitem__(self, idx):
if idx < 0 or idx > 2:
raise IndexError("tuple index out of range")
if idx == 2 and self.inputs is None:
raise IndexError("tuple index out of range")
if idx == 0:
return self.predictions
elif idx ==1:
return self.label_ids
elif idx == 2:
return self.inputsThen we should add a flag so the user can choose whether or not they want the inputs included for metrics or not (default Can you include them in your PR? |
|
Hi @sgugger , Thanks for adding this change. I've tested this code with the summarization examples from pytorch and I'm afraid this change will break other people's code. The problem is that since the inputs are passed in the trainer by default (based on my PR): trainer.py The inputs in run_summarization.py You can see in the debugger that all 3 (inputs, preds and labels) are coming: In my case, I have my own implementation for compute_metrics, however, for other users it will fail by default. When you suggest the flag, where is it located? In the This is my testing case: test.json Thanks, Laura |
As I said above, this needs to be controlled by a flag (for instance |
|
Thanks for the clarification, that sounds better. I've submitted an additional commit in the PR. Is the first time I update a PR, let me know if I've missed something. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the flag. It should be in the TrainingArguments class however, not the Trainer class itself. Also, the inputs should only be gathered when the flag is set to True, otherwise users that do not need it might get OOM errors.
src/transformers/trainer.py
Outdated
| A function that preprocess the logits right before caching them at each evaluation step. Must take two | ||
| tensors, the logits and the labels, and return the logits once processed as desired. The modifications made | ||
| by this function will be reflected in the predictions received by `compute_metrics`. | ||
| include_inputs_for_metrics: bool = None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow the same format as for other arguments :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps like this?
include_inputs_for_metrics (bool, optional):
src/transformers/trainer.py
Outdated
| tensors, the logits and the labels, and return the logits once processed as desired. The modifications made | ||
| by this function will be reflected in the predictions received by `compute_metrics`. | ||
| include_inputs_for_metrics: bool = None: | ||
| A flag (True|False) determining if inputs will be passed to the EvalPrediction class. This is intended for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| A flag (True|False) determining if inputs will be passed to the EvalPrediction class. This is intended for | |
| Whether or not the inputs will be passed to the `compute_metrics` function. This is intended for |
src/transformers/trainer.py
Outdated
| callbacks: Optional[List[TrainerCallback]] = None, | ||
| optimizers: Tuple[torch.optim.Optimizer, torch.optim.lr_scheduler.LambdaLR] = (None, None), | ||
| preprocess_logits_for_metrics: Callable[[torch.Tensor, torch.Tensor], torch.Tensor] = None, | ||
| include_inputs_for_metrics: bool = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This flag should be added in the TrainingArguments class, not here.
I thought about the OOM issue, but should I add:
Everywhere in the code? I can't think of a more elegant way :) And for the new changes, is it ok just another commit in the pull request just like I did the last time? Just checking, so I don't do a mess :P |
src/transformers/trainer.py
Outdated
|
|
||
| # Prediction step | ||
| loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys) | ||
| inputs_decode = inputs['input_ids'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set None here if the flag is False
You should make sure that the inputs are left as None like the labels/losses by only setting some inputs when the flag is False. I've left a comment to show you where.
That's completely ok, we will squash everything when merging. |
|
Changes done, let me know any further feedback :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Left some last comments and we should be good to merge.
Also remember to run make style on your branch once you're done to apply the code-formatting tools :-)
src/transformers/trainer.py
Outdated
| if args.include_inputs_for_metrics: | ||
| args.include_inputs_for_metrics = True | ||
| else: | ||
| args.include_inputs_for_metrics = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TrainingArguments should not be touched by the Trainer, I don't think this is necessary in any case.
| if args.include_inputs_for_metrics: | |
| args.include_inputs_for_metrics = True | |
| else: | |
| args.include_inputs_for_metrics = False |
src/transformers/trainer.py
Outdated
| preprocess_logits_for_metrics: Callable[[torch.Tensor, torch.Tensor], torch.Tensor] = None | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need the comma back and no new line here.
src/transformers/trainer.py
Outdated
| if args.include_inputs_for_metrics: | ||
| metrics = self.compute_metrics(EvalPrediction(inputs=all_inputs, predictions=all_preds, | ||
| label_ids=all_labels)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if args.include_inputs_for_metrics: | |
| metrics = self.compute_metrics(EvalPrediction(inputs=all_inputs, predictions=all_preds, | |
| label_ids=all_labels)) | |
| if args.include_inputs_for_metrics: | |
| metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, | |
| label_ids=all_labels, inputs=all_inputs)) |
Needs to be last I think.
src/transformers/training_args.py
Outdated
| `huggingface-cli login`. | ||
| gradient_checkpointing (`bool`, *optional*, defaults to `False`): | ||
| If True, use gradient checkpointing to save memory at the expense of slower backward pass. | ||
| include_inputs_for_metrics (bool, optional): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| include_inputs_for_metrics (bool, optional): | |
| include_inputs_for_metrics (`bool`, *optional*, defaults to `False`): |
|
I've added the requested changes. Also, I've run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for all your work on this! It's all good to me apart from the changes at the start of the trainer file.
src/transformers/trainer.py
Outdated
| import numpy as np | ||
| import torch | ||
| from huggingface_hub import Repository | ||
| from packaging import version | ||
| from torch import nn | ||
| from torch.utils.data import DataLoader, Dataset, RandomSampler, SequentialSampler | ||
| from torch.utils.data.distributed import DistributedSampler | ||
| from tqdm.auto import tqdm | ||
|
|
||
| from . import __version__ | ||
| from .configuration_utils import PretrainedConfig | ||
| from .data.data_collator import DataCollator, DataCollatorWithPadding, default_data_collator | ||
| from .debug_utils import DebugOption, DebugUnderflowOverflow | ||
| from .deepspeed import deepspeed_init, deepspeed_reinit, is_deepspeed_zero3_enabled | ||
| from .dependency_versions_check import dep_version_check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be changed by make style. Are you sure you have the proper version of the formatting tools installed (pip install .[quality] in the repo)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm sorry. I tried to install manually some of the dependencies, but it was still failing some parts. I've done the changes to bring it back to its original state, adding your also the changes from your empty lines.
|
|
||
| from .trainer_pt_utils import smp_forward_backward, smp_forward_only, smp_gather, smp_nested_concat | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to remove this line.
|
|
||
| logger = logging.get_logger(__name__) | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
|
|
||
| # if eval is called w/o train init deepspeed here | ||
| if args.deepspeed and not self.deepspeed: | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And same here
|
Thanks! There is one last issue with a docstring badly formatted. Could you run |
|
Sure, no problem. I'm having issues with the dependencies to run This is my log: |
|
It looks like your branch is not up to par with the main branch for the setup. Can you manually install |
|
Now is working :) Thanks for that one. I've committed the updated file. |
|
Yes, it's all good now. Thanks again for your contribution! |
|
Awesome! :) Thanks for leading this effort, good job :D |

What does this PR do?
This is a PR suggestion for including the inputs in the EvalPrediction object to perform metrics calculation that depends on inputs. For example, simplification metrics such as SARI not only use the predictions and references but also the inputs for the score calculation.
The proposed implementation will enable the Trainer to work with the metrics class. However, the compute_metrics method should be implemented locally (in the metrics file, for example), since the original method still receives predictions and references.
Supports #15966
Who can review?
@sgugger