KEMBAR78
use scale=1.0 in floats_tensor called in speech model testers by ydshieh · Pull Request #17007 · huggingface/transformers · GitHub
Skip to content

Conversation

@ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Apr 29, 2022

What does this PR do?

Fix the failure of Speech2TextModelTest.test_pt_tf_model_equivalence. This is caused by

input_features = floats_tensor(
[self.batch_size, self.seq_length, self.input_feat_per_channel], self.vocab_size
)

where the input_features get a large magnitude of 1e2 (from self.vocab_size=99).

(probably this happens because we just copied the input_ids = ids_tensor([self.batch_size, self.seq_length], self.vocab_size) from NLP models?)

I changed it to scale=1.0, but need @patrickvonplaten's expertise to make sure there was no particular reason to use self.vocab_size.

Details

Current speech model testers have

def prepare_config_and_inputs(self):
    input_values = floats_tensor([self.batch_size, self.seq_length], self.vocab_size)

The self.vocab_size argument is the scale, so the generated dummy input_values has the magnitude of self.vocab_size.
For Speech2TextModelTester, we have vocab_size=99.

Furthermore, Speech2TextEncoder has

self.embed_scale = math.sqrt(embed_dim) if config.scale_embedding else 1.0

and from the tester's hidden_size=16, we get embed_scale=4.

The input_features goes through the conv layer(s) and being scaled:

inputs_embeds = self.conv(input_features)
inputs_embeds = self.embed_scale * inputs_embeds

On CPU however, the conv layers of PT/TF gives diff. with a magnitude of 1e-7 for input values with 1s. So with the above 2 scalings, this error becomes 4e-5, and the PT/TF equiv. test fails.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Apr 29, 2022

The documentation is not available anymore as the PR was closed or merged.


def prepare_config_and_inputs(self):
input_values = floats_tensor([self.batch_size, self.seq_length], self.vocab_size)
input_values = floats_tensor([self.batch_size, self.seq_length], scale=1.0)
Copy link
Contributor

@patrickvonplaten patrickvonplaten Apr 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow good catch!

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're 100% right - this was indeed a bad copy paste!

@patrickvonplaten
Copy link
Contributor

Thanks for fixing all the tests!

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix! Thanks a lot!

@ydshieh ydshieh merged commit e952e04 into huggingface:main Apr 29, 2022
@ydshieh ydshieh deleted the fix_speech_to_text_ci_failure branch April 29, 2022 12:41
stevhliu pushed a commit to stevhliu/transformers that referenced this pull request May 3, 2022
…gface#17007)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022
…gface#17007)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants