String Functions
Introduction
The header <utilities/string.h>
supplies several utility functions that work on strings.
Many of the functions come in two flavours. One version alters the input string in place, while the other returns a new string that is a copy of the input appropriately converted, leaving the original untouched.
For example, utilities::upper_case(str)
converts str
to upper-case in place. On the other hand, utilities::upper_cased(str)
returns a fresh string that is a copy of str
converted to upper-case. As you will see below, this is the typical naming style used.
There are other functions where this distinction is unnecessary, such as utilities::starts_with(...)
.
Case Conversions
1void utilities::upper_case(std::string&);
2void utilities::lower_case(std::string&);
3std::string utilities::upper_cased(std::string_view);
4std::string utilities::lower_cased(std::string_view);
- 1
- Converts a string to uppercase.
- 2
- Converts a string to lowercase.
- 3
- Returns a new string, an uppercase copy of the input string.
- 4
- Returns a new string, a lowercase copy of the input string.
Our case conversions rely on the std::tolower and std::toupper functions, which only work for simple character types.
|
Trimming Spaces
1void utilities::trim_left(std::string&);
2void utilities::trim_right(std::string&);
3void utilities::trim(std::string&);
4std::string utilities::trimmed_left(std::string_view);
5std::string utilities::trimmed_right(std::string_view);
6std::string utilities::trim(medstd::string_view);
- 1
- Remove any leading whitespace from the input string.
- 2
- Remove any trailing whitespace from the input string.
- 3
- Remove leading and trailing whitespace from the input string.
- 4
- Returns a new string, a left-trimmed copy of the input string.
- 5
- Returns a new string, a right-trimmed copy of the input string.
- 6
- Returns a new string that is a trimmed copy of the input string on both sides.
Our case conversions rely on the std::isspace function to identify whitespace characters.
|
Replacing Substrings
void
::replace_left(std::string &str,
utilitiesstd::string_view target,
1std::string_view replacement);
void
::replace_right(std::string &str,
utilitiesstd::string_view target,
2std::string_view replacement);
void
::replace(std::string &str,
utilitiesstd::string_view target,
3std::string_view replacement);
std::string
(std::string_view str,
utilities::replaced_leftstd::string_view target,
4std::string_view replacement);
std::string
(std::string_view str,
utilities::replaced_rightstd::string_view target,
5std::string_view replacement);
std::string
(std::string_view str,
utilities::replacedstd::string_view target,
6std::string_view replacement);
- 1
-
Replace the first occurrence of
target
instr
withreplacement
. - 2
-
Replace the final occurrence of
target
instr
withreplacement
. - 3
-
Replace all occurrences of
target
instr
withreplacement
. - 4
-
Returns a new string, a copy of
str
with the first occurrence oftarget
changed toreplacement
. - 5
-
Returns a new string, a copy of
str
with the final occurrence oftarget
changed toreplacement
. - 6
-
Returns a new string, a copy of
str
with all occurrences oftarget
changed toreplacement
.
We also have functions to replace all contiguous white space sequences in a string:
void
::replace_space(std::string &str,
utilitiesconst std::string &with = " ",
1bool also_trim = true);
std::string
(std::string_view str,
utilities::condense2bool also_trim = true);
std::string
(std::string_view &str,
utilities::replaced_spaceconst std::string &with = " ",
3bool also_trim = true);
std::string
(std::string_view str,
utilities::condensed4bool also_trim = true);
- 1
- Replaces all contiguous white space sequences in a string with a single white space character or, optionally, something else. By default, the string is also trimmed of white space on both the left and right.
- 2
- Replaces all contiguous white space sequences in a string with a single white space character. By default, the string is also trimmed of white space on both the left and right.
- 3
-
Returns a new string, a copy of
str
with all contiguous white space sequences replaced with a single white space character or, optionally, something else. By default, the output string is also trimmed of white space on both the left and right. - 4
-
Returns a new string, a copy of
str
with all contiguous white space sequences replaced with a single white space character. By default, the output string is also trimmed of white space on both the left and right.
Erasing Substrings
void
::erase_left(std::string &str,
utilities1std::string_view target);
void
::erase_right(std::string &str,
utilities2std::string_view target);
void
::erase(std::string &str,
utilities3std::string_view target);
std::string
(std::string_view str,
utilities::erased_left4std::string_view target);
std::string
(std::string_view str,
utilities::erased_right5std::string_view target);
std::string
(std::string_view str,
utilities::erased6std::string_view target);
- 1
-
Erases the first occurrence of the
target
substring instr
. - 2
-
Erases the final occurrence of the
target
substring instr
. - 3
-
Erases all occurrences of the
target
substring instr
. - 4
-
Returns a new string, a copy of
str
with the first occurrence oftarget
erased. - 5
-
Returns a new string, a copy of
str
with the final occurrence oftarget
erased. - 6
-
Returns a new string, a copy of
str
with all occurrences oftarget
erased.
“Standardizing” Strings
We often need to parse free-form input while looking for a keyword or phrase. Having a facility that converts strings to some standard form is helpful.
1void utilities::remove_surrounds(std::string&);
2void utilities::standardize(std::string&);
3std::string utilities::removed_surrounds(std::string_view);
4std::string utilities::standardized(std::string_view);
- 1
-
Strips any “surrounds” from the input string.
For example, the string “(text)” becomes “text”. Multiples also work so “[[[text]]]” becomes “text”. Only correctly balanced surrounds are ever removed. - 2
- Standardize the input string — see below
- 3
- Returns a new string, a copy of the input with any “surrounds” removed.
- 4
- Returns a new string, a standardized copy of the input.
The standardize
functions give you a string stripped of extraneous brackets, etc. Moreover, the single space character will replace all interior white space, and all leading and trailing whitespace will be removed. So a string like “< Ace of Clubs >” will become “ACE OF CLUBS”.
It is a lot easier to parse standardized strings.
Searching
1bool utilities::starts_with(std::string_view str, std::string_view prefix);
2bool utilities::ends_with(std::string_view str, std::string_view prefix);
- 1
-
Returns
true
ifstr
starts withprefix
. - 2
-
Returns
true
ifstr
ends withsuffix
.
Tokenizing
We often want to convert a stream of text into tokens. Here are some functions to help with that:
template<std::input_iterator InIter, std::forward_iterator FwdIter, typename Func>
constexpr void
(InIter input_begin, InIter input_end,
for_each_token1, FwdIter delims_end, Func token_func);
FwdIter delims_begin
template<typename Container_t>
constexpr void
(std::string_view input, Container_t &output_container,
tokenize2std::string_view delimiters = "\t,;: ", bool skip = true);
std::vector<std::string>
(std::string_view input,
split3std::string_view delimiters = "\t,;: ", bool skip = true);
- 1
- Given iterators that bracket the input text and others that bracket the possible token delimiters, this method processes the text and passes each token to a user-supplied function.
- 2
-
Tokenizes the input text string and places the tokens into
output_container
. - 3
-
Tokenizes the input text string and returns the tokens as a
std::vector
of strings.
We have based the for_each_token
function on the excellent discussion here.
Function Arguments
Argument | Description |
---|---|
input_begin |
To tokenize the string stored in text input_begin should be std::cbegin(text) . |
input_end |
To tokenize the string stored in text input_end should be std::cend(text) . |
delims_begin |
If the possible delimiters for the tokens are in the string delims , which might be "\t,;: " , then delims_begin should be std::cbegin(delims) . |
delims_end |
If the possible delimiters for the tokens are in the string delims , which might be "\t,;: " , then delims_end should be std::cend(delims) . |
token_func |
This will be called for each token: token_func(token.cbegin(), token.cend()) . |
output_container |
This container needs to be dynamically resizable and support the emplace_back(token.cbegin(), token.cend()) . |
skip |
If true, we ignore empty tokens (e.g., two spaces in a row). |
delimiters |
These are the characters that should delimit our tokens. Tokens break on white space, commas, semi-colons, and colons by default. |
Extracting Values
We also have a function that attempts to parse a value from a string.
template<typename T>
1constexpr std::optional<T> possible(std::string_view str);
- 1
- Tries to read a value of a particular type from a string.
This function uses the std::from_chars
function to retrieve a possible simple type from a string. It returns a std::nullopt
if it fails to parse the input.
Example
auto x = possible<double>(str);
if(x) std::cout << str << ": parsed as the double value " << x << '\n';
If successful, this function tries to fill x with a double value read from a string and print it on std::cout
.