View source on GitHub
|
Pad model input and generate corresponding input masks.
text.pad_model_inputs(
input, max_seq_length, pad_value=0
)
Used in the notebooks
| Used in the guide |
|---|
pad_model_inputs performs the final packaging of a model's inputs commonly
found in text models. This includes padding out (or simply truncating) to a
fixed-size, max 2-dimensional Tensor and generating mask Tensors (of the
same shape) with values of 0 if the corresponding item is a pad value and 1 if
it is part of the original input.
Note that a simple truncation strategy (drop everything after max sequence
length) is used to force the inputs to the specified shape. This may be
incorrect and users should instead apply a Trimmer upstream to safely
truncate large inputs.
input_data = tf.ragged.constant([[101, 1, 2, 102, 10, 20, 102],[101, 3, 4, 102, 30, 40, 50, 60, 70, 80],[101, 5, 6, 7, 8, 9, 102, 70],], np.int32)data, mask = pad_model_inputs(input=input_data, max_seq_length=9)print("data: %s, mask: %s" % (data, mask))data: tf.Tensor([[101 1 2 102 10 20 102 0 0][101 3 4 102 30 40 50 60 70][101 5 6 7 8 9 102 70 0]], shape=(3, 9), dtype=int32),mask: tf.Tensor([[1 1 1 1 1 1 1 0 0][1 1 1 1 1 1 1 1 1][1 1 1 1 1 1 1 1 0]], shape=(3, 9), dtype=int32)
View source on GitHub