KEMBAR78
Strings | PDF
0% found this document useful (0 votes)
58 views2 pages

Strings

Uploaded by

Florence Cheang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views2 pages

Strings

Uploaded by

Florence Cheang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

String manipulation with stringr : : CHEATSHEET

The stringr package provides a set of internally consistent tools for working with character strings, i.e. sequences of characters surrounded by quotation marks.

Detect Matches Subset Strings Manage Lengths


TRUE str_detect(string, pattern, negate = FALSE) str_sub(string, start = 1L, end = -1L) Extract 4 str_length(string) The width of strings (i.e.
TRUE
FALSE
Detect the presence of a pattern match in a substrings from a character vector. 6
2
number of code points, which generally equals
TRUE string. Also str_like(). str_detect(fruit, "a") str_sub(fruit, 1, 3); str_sub(fruit, -2) 3 the number of characters). str_length(fruit)
TRUE str_starts(string, pattern, negate = FALSE) str_subset(string, pattern, negate = FALSE) str_pad(string, width, side = c("le ", "right",
TRUE
FALSE
Detect the presence of a pattern match at Return only the strings that contain a pattern "both"), pad = " ") Pad strings to constant
TRUE the beginning of a string. Also str_ends(). match. str_subset(fruit, "p") width. str_pad(fruit, 17)
str_starts(fruit, "a")
str_extract(string, pattern) Return the first str_trunc(string, width, side = c("right", "le ",
1 str_which(string, pattern, negate = FALSE) NA
pattern match found in each string, as a vector. "center"), ellipsis = "...") Truncate the width
2
4
Find the indexes of strings that contain Also str_extract_all() to return every pattern of strings, replacing content with ellipsis.
a pattern match. str_which(fruit, "a") match. str_extract(fruit, "[aeiou]") str_trunc(sentences, 6)
start end

2 4 str_locate(string, pattern) Locate the str_match(string, pattern) Return the str_trim(string, side = c("both", "le ", "right"))
4 7
NA NA
positions of pattern matches in a string. NA NA
first pattern match found in each string, as Trim whitespace from the start and/or end of
3 4 Also str_locate_all(). str_locate(fruit, "a") a matrix with a column for each ( ) group in a string. str_trim(str_pad(fruit, 17))
pattern. Also str_match_all().
0 str_count(string, pattern) Count the number str_match(sentences, "(a|the) ([^ +])") str_squish(string) Trim whitespace from each
3
1
of matches in a string. str_count(fruit, "a") end and collapse multiple spaces into single
2 spaces. str_squish(str_pad(fruit, 17, "both"))

Mutate Strings Join and Split Order Strings


str_sub() <- value. Replace substrings by str_c(..., sep = "", collapse = NULL) Join 4 str_order(x, decreasing = FALSE, na_last =
identifying the substrings with str_sub() and multiple strings into a single string. 1
3
TRUE, locale = "en", numeric = FALSE, ...)1
assigning into the results. str_c(letters, LETTERS) 2 Return the vector of indexes that sorts a
str_sub(fruit, 1, 3) <- "str" character vector. fruit[str_order(fruit)]
str_flatten(string, collapse = "") Combines
str_replace(string, pattern, replacement) into a single string, separated by collapse. str_sort(x, decreasing = FALSE, na_last =
Replace the first matched pattern in each str_flatten(fruit, ", ") TRUE, locale = "en", numeric = FALSE, ...)1
string. Also str_remove(). Sort a character vector. str_sort(fruit)
str_replace(fruit, "p", "-") str_dup(string, times) Repeat strings times
times. Also str_unique() to remove duplicates.
str_replace_all(string, pattern, replacement)
Replace all matched patterns in each string.
str_dup(fruit, times = 2)
Helpers
Also str_remove_all(). str_split_fixed(string, pattern, n) Split a str_conv(string, encoding) Override the
str_replace_all(fruit, "p", "-") vector of strings into a matrix of substrings encoding of a string. str_conv(fruit,"ISO-8859-1")
(splitting at occurrences of a pattern match).
A STRING str_to_lower(string, locale = "en")1 Also str_split() to return a list of substrings appl<e> str_view(string, pattern, match = NA)
a string Convert strings to lower case. and str_split_n() to return the nth substring. banana View HTML rendering of all regex matches.
str_to_lower(sentences) str_split_fixed(sentences, " ", n=3) p<e>ar str_view(sentences, "[aeiou]")
a string str_to_upper(string, locale = "en")1 {xx} {yy} str_glue(…, .sep = "", .envir = parent.frame()) str_equal(x, y, locale = "en", ignore_case =
A STRING Convert strings to upper case. Create a string from strings and {expressions} TRUE
FALSE, ...)1 Determine if two strings are
str_to_upper(sentences) to evaluate. str_glue("Pi is {pi}") TRUE
FALSE equivalent. str_equal(c("a", "b"), c("a", "c"))
TRUE
a string str_to_title(string, locale = "en")1 Convert str_glue_data(.x, ..., .sep = "", .envir = str_wrap(string, width = 80, indent = 0,
A String strings to title case. Also str_to_sentence(). parent.frame(), .na = "NA") Use a data frame, This is a long sentence. exdent = 0) Wrap strings into nicely formatted
str_to_title(sentences) list, or environment to create a string from paragraphs. str_wrap(sentences, 20)
This is a long
strings and {expressions} to evaluate. sentence.
str_glue_data(mtcars, "{rownames(mtcars)} has
{hp} hp") 1 See bit.ly/ISO639-1 for a complete list of locales.

CC BY SA Posit So ware, PBC • info@posit.co • posit.co • Learn more at stringr.tidyverse.org • Diagrams from @LVaudor on Twitter • HTML cheatsheets at pos.it/cheatsheets • stringr 1.5.0 • Updated: 2023-07
ft
ft
ft
ft
Need to Know Regular Expressions - Regular expressions, or regexps, are a concise language for
describing patterns in strings.
[:space:]
new line
Pattern arguments in stringr are interpreted as MATCH CHARACTERS see <- function(rx) str_view("abc ABC 123\t.!?\\(){}\n", rx)
regular expressions a er any special characters [:blank:] .
have been parsed. string regexp matches example
(type this) (to mean this) (which matches this) space
In R, you write regular expressions as strings, a (etc.) a (etc.) see("a") abc ABC 123 .!?\(){} tab
sequences of characters surrounded by quotes \\. \. . see("\\.") abc ABC 123 .!?\(){}
("") or single quotes('').
\\! \! ! see("\\!") abc ABC 123 .!?\(){} [:graph:]
Some characters cannot be represented directly \\? \? ? see("\\?") abc ABC 123 .!?\(){}
in an R string . These must be represented as \\\\ \\ \ see("\\\\") abc ABC 123 .!?\(){} [:punct:] [:symbol:]
special characters, sequences of characters that \\( \( ( see("\\(") abc ABC 123 .!?\(){}
have a specific meaning., e.g. . , : ; ? ! / *@# | ` = + ^
\\) \) ) see("\\)") abc ABC 123 .!?\(){}
Special Character Represents \\{ \{ { see("\\{") abc ABC 123 .!?\(){} - _ " ' [ ] { } ( ) ~ < > $
\\ \ \\} \} } see( "\\}") abc ABC 123 .!?\(){}
\" " \\n \n new line (return) see("\\n") abc ABC 123 .!?\(){} [:alnum:]
\n new line \\t \t tab see("\\t") abc ABC 123 .!?\(){}
Run ?"'" to see a complete list \\s \s any whitespace (\S for non-whitespaces) see("\\s") abc ABC 123 .!?\(){} [:digit:]
\\d \d any digit (\D for non-digits) see("\\d") abc ABC 123 .!?\(){}
0 1 2 3 4 5 6 7 8 9
Because of this, whenever a \ appears in a regular \\w \w any word character (\W for non-word chars) see("\\w") abc ABC 123 .!?\(){}
expression, you must write it as \\ in the string \\b \b word boundaries see("\\b") abc ABC 123 .!?\(){}
that represents the regular expression. [:digit:]
1
digits see("[:digit:]") abc ABC 123 .!?\(){} [:alpha:]
1
Use writeLines() to see how R views your string [:alpha:] letters see("[:alpha:]") abc ABC 123 .!?\(){} [:lower:] [:upper:]
1
a er all special characters have been parsed. [:lower:] lowercase letters see("[:lower:]") abc ABC 123 .!?\(){}
[:upper:]
1
uppercase letters see("[:upper:]") abc ABC 123 .!?\(){} a b c d e f A B C D E F
writeLines("\\.") 1
# \. [:alnum:] letters and numbers see("[:alnum:]") abc ABC 123 .!?\(){} g h i j k l GH I J K L
[:punct:] 1 punctuation see("[:punct:]") abc ABC 123 .!?\(){}
mn o p q r MNOPQR
writeLines("\\ is a backslash") [:graph:]1 letters, numbers, and punctuation see("[:graph:]") abc ABC 123 .!?\(){}
# \ is a backslash
[:space:]1 space characters (i.e. \s) see("[:space:]") abc ABC 123 .!?\(){} s t u v w x S T U VWX
[:blank:]1 space and tab (but not new line) see("[:blank:]") abc ABC 123 .!?\(){} y z Y Z
. every character except a new line see(".") abc ABC 123 .!?\(){}
INTERPRETATION 1 Many base R functions require classes to be wrapped in a second set of [ ], e.g. [[:digit:]]
Patterns in stringr are interpreted as regexs. To
change this default, wrap the pattern in one of:
ALTERNATES alt <- function(rx) str_view("abcde", rx) QUANTIFIERS quant <- function(rx) str_view(".a.aa.aaa", rx)
regex(pattern, ignore_case = FALSE, multiline = example example
regexp matches regexp matches
FALSE, comments = FALSE, dotall = FALSE, ...)
Modifies a regex to ignore cases, match end of ab|d or alt("ab|d") abcde a? zero or one quant("a?") .a.aa.aaa
lines as well of end of strings, allow R comments [abe] one of alt("[abe]") abcde a* zero or more quant("a*") .a.aa.aaa
within regex's , and/or to have . match everything a+ one or more quant("a+") .a.aa.aaa
including \n. [^abe] anything but alt("[^abe]") abcde
str_detect("I", regex("i", TRUE)) [a-c] range alt("[a-c]") abcde 1 2 ... n a{n} exactly n quant("a{2}") .a.aa.aaa
1 2 ... n a{n, } n or more quant("a{2,}") .a.aa.aaa
fixed() Matches raw bytes but will miss some n ... m a{n, m} between n and m quant("a{2,4}") .a.aa.aaa
characters that can be represented in multiple ANCHORS anchor <- function(rx) str_view("aaa", rx)
ways (fast). str_detect("\u0130", fixed("i")) regexp matches example
^a start of string anchor("^a") aaa GROUPS ref <- function(rx) str_view("abbaab", rx)
coll() Matches raw bytes and will use locale
specific collation rules to recognize characters a$ end of string anchor("a$") aaa Use parentheses to set precedent (order of evaluation) and create groups
that can be represented in multiple ways (slow).
regexp matches example
str_detect("\u0130", coll("i", TRUE, locale = "tr"))
(ab|d)e sets precedence alt("(ab|d)e") abcde
LOOK AROUNDS look <- function(rx) str_view("bacad", rx)
boundary() Matches boundaries between
characters, line_breaks, sentences, or words. regexp matches example Use an escaped number to refer to and duplicate parentheses groups that occur
str_split(sentences, boundary("word")) a(?=c) followed by look("a(?=c)") bacad earlier in a pattern. Refer to each group by its order of appearance
a(?!c) not followed by look("a(?!c)") bacad string regexp matches example
(?<=b)a preceded by look("(?<=b)a") bacad (type this) (to mean this) (which matches this) (the result is the same as ref("abba"))

(?<!b)a not preceded by look("(?<!b)a") bacad \\1 \1 (etc.) first () group, etc. ref("(a)(b)\\2\\1") abbaab

CC BY SA Posit So ware, PBC • info@posit.co • posit.co • Learn more at stringr.tidyverse.org • Diagrams from @LVaudor on Twitter • HTML cheatsheets at pos.it/cheatsheets • stringr 1.5.0 • Updated: 2023-07
ft
ft
ft

You might also like