View source on GitHub
|
Concatenate input segments for a model's input sequence.
text.concatenate_segments(
segments
)
concatenate_segments combines the tokens of one or more input segments to a
single sequence of token values and generates matching segment ids.
concatenate_segments can follow a Trimmer, who limit segment lengths and
emit RaggedTensor outputs, and can be followed up by ModelInputPacker.
concatenate_segments first flattens and combines a list of one or more
segments
(RaggedTensors of n dimensions) together along the 1st axis, then packages
any special tokens into a final n dimensional RaggedTensor.
And finally concatenate_segments generates another RaggedTensor (with the
same rank as the final combined RaggedTensor) that contains a distinct int
id for each segment.
Example usage:
segment_a = [[1, 2],
[3, 4,],
[5, 6, 7, 8, 9]]
segment_b = [[10, 20,],
[30, 40, 50, 60,],
[70, 80]]
expected_combined, expected_ids = concatenate_segments([segment_a, segment_b])
# segment_a and segment_b have been concatenated as is.
expected_combined=[
[1, 2, 10, 20],
[3, 4, 30, 40, 50, 60],
[5, 6, 7, 8, 9, 70, 80],
]
# ids describing which items belong to which segment.
expected_ids=[
[0, 0, 1, 1],
[0, 0, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 1, 1]]
View source on GitHub