Splitting a String by Another String in C++: A Flexible Utility Function

Posted on In Programming, Tutorial

In this post, we will explore a flexible utility function for splitting a string based on a given delimiter using C++ and the standard library. This allows us to break down complex strings into smaller parts that are easier to process and manipulate.

The C++ Utility Function to Split a String by Another String

Background: regular expressions are an essential tool for text processing and pattern matching. They provide a concise and powerful way to express complex search patterns and are widely used for tasks such as text validation, data extraction, and string manipulation. The C++ Standard Library offers the library, a robust and efficient regular expression library that meets the needs of modern C++ applications. This library supports the ECMAScript regular expression grammar and provides various classes, algorithms, and iterators to work with regular expressions in a type-safe and efficient manner.

The <regex> library in C++ includes several key components, such as the std::regex class which represents a compiled regular expression, std::regex_iterator and std::sregex_iterator which are iterators for traversing matches in a given input, the std::regex_replace and std::regex_search which are algorithms for searching and replacing patterns within strings, and the std::regex_token_iterator and std::sregex_token_iterator for tokenizing strings based on a given pattern.

Here’s the code snippet for our utility function making use of the regex standard library:

#include <regex>

std::vector<std::string>
split_str(const std::string& str, const std::string& delim_str) {
  std::regex delim{delim_str};
  std::vector<std::string> results;
  std::sregex_token_iterator end;
  std::sregex_token_iterator iter(str.begin(), str.end(), delim, -1);
  for ( ; iter != end; ++iter) {
    std::string split(*iter);
    if (split.size()) results.push_back(split);
  }
  return results;
}

Breaking Down the String Splitting C++ Function Implementation

Let’s go through the code step by step:

  1. First, we include the <regex> header, which provides us with the necessary tools to work with regular expressions in C++.
  2. We define a function called split_str that takes two parameters: a const std::string& called str, which is the input string to be split, and a const std::string& called delim_str, which is the delimiter string to be used for splitting.
  3. We create a std::regex object called delim with the delimiter delim_str. This is the pattern that will be used to split the input string.
  4. We declare a std::vector<std::string> called results to store the resulting substrings after splitting.
  5. We define two std::sregex_token_iterator objects: end and iter. The end object serves as a sentinel value indicating the end of the sequence. The iter object is initialized with the beginning and end of the input string, the delimiter pattern, and -1 as the submatch value. The -1 value tells the iterator to return the unmatched parts of the input (i.e., the substrings between the delimiters).
  6. We use a for loop to iterate through the tokens returned by the iterator. Inside the loop, we create a std::string object called split and initialize it with the current token.
  7. We check if the size of the split string is non-zero. If it is, we add it to the results vector.
  8. Finally, we return the results vector containing the substrings.

Using the C++ Utility Function to Split a String

Here’s a C++ example of how you can use the split_str function:

#include <iostream>
#include <vector>
#include <string>
#include <regex>

std::vector<std::string>
split_str(const std::string& str, const std::string& delim_str) {
  std::regex delim{delim_str};
  std::vector<std::string> results;
  std::sregex_token_iterator end;
  std::sregex_token_iterator iter(str.begin(), str.end(), delim, -1);
  for ( ; iter != end; ++iter) {
    std::string split(*iter);
    if (split.size()) results.push_back(split);
  }
  return results;
}

int main() {
  std::string input = "Hello::World::from::C++";
  std::string delimiter = "::";

  std::vector<std::string> results = split_str(input, delimiter);

  for (const auto& word : results) {
    std::cout << word << std::endl;
  }

  return 0;
}

This code snippet would output:

$ g++ -std=c++20 split-string-by-string-example.cpp -o s && ./s
Hello
World
from
C++

That’s it! We’ve created a flexible and reusable utility function to split a string using the <regex> library. You can easily modify the delimiter string to fit your needs, making this function highly adaptable for various text processing tasks.

Leave a Reply

Your email address will not be published. Required fields are marked *