View source on GitHub
|
Split input by delimiters that match a regex pattern.
text.regex_split(
input,
delim_regex_pattern,
keep_delim_regex_pattern='',
name=None
)
regex_split will split input using delimiters that match a
regex pattern in delim_regex_pattern. Here is an example:
text_input=["hello there"]# split by whitespaceregex_split(input=text_input,delim_regex_pattern="\s")<tf.RaggedTensor [[b'hello', b'there']]>
By default, delimiters are not included in the split string results.
Delimiters may be included by specifying a regex pattern
keep_delim_regex_pattern. For example:
text_input=["hello there"]# split by whitespaceregex_split(input=text_input,delim_regex_pattern="\s",keep_delim_regex_pattern="\s")<tf.RaggedTensor [[b'hello', b' ', b'there']]>
If there are multiple delimiters in a row, there are no empty splits emitted. For example:
text_input=["hello there"] # Note the two spaces between the words.# split by whitespaceregex_split(input=text_input,delim_regex_pattern="\s")<tf.RaggedTensor [[b'hello', b'there']]>
See https://github.com/google/re2/wiki/Syntax for the full list of supported expressions.
Returns | |
|---|---|
| A RaggedTensors containing of type string containing the split string pieces. |
View source on GitHub