In this tutorial, we will implement a Suffix Array and a Suffix Tree in C++. These data structures are essential for efficient string processing tasks such as substring search, pattern matching, and more.
A suffix array is a sorted array of all suffixes of a given string. Below is a basic implementation in C++.
#include
#include
#include
#include
void buildSuffixArray(const std::string &s, std::vector &suffixArray) {
int n = s.size();
suffixArray.resize(n);
for (int i = 0; i < n; i++) {
suffixArray[i] = i;
}
std::sort(suffixArray.begin(), suffixArray.end(), [&s](int a, int b) {
return s.substr(a) < s.substr(b);
});
}
int main() {
std::string text = "banana";
std::vector suffixArray;
buildSuffixArray(text, suffixArray);
for (int i : suffixArray) {
std::cout << i << " ";
}
return 0;
}
A suffix tree is a compressed trie of all the suffixes of a given string. Below is a basic implementation in C++.
#include
#include
#include
#include
struct SuffixTreeNode {
std::unordered_map> children;
bool isEndOfWord = false;
};
class SuffixTree {
private:
SuffixTreeNode root;
public:
void insert(const std::string &s) {
SuffixTreeNode *node = &root;
for (char c : s) {
if (!node->children.count(c)) {
node->children[c] = std::make_unique();
}
node = node->children[c].get();
}
node->isEndOfWord = true;
}
void build(const std::string &s) {
for (size_t i = 0; i < s.size(); ++i) {
insert(s.substr(i));
}
}
};
int main() {
SuffixTree tree;
std::string text = "banana";
tree.build(text);
std::cout << "Suffix tree built for: " << text << std::endl;
return 0;
}
How do I avoid rehashing overhead with std::set in multithreaded code?
How do I find elements with custom comparators with std::set for embedded targets?
How do I erase elements while iterating with std::set for embedded targets?
How do I provide stable iteration order with std::unordered_map for large datasets?
How do I reserve capacity ahead of time with std::unordered_map for large datasets?
How do I erase elements while iterating with std::unordered_map in multithreaded code?
How do I provide stable iteration order with std::map for embedded targets?
How do I provide stable iteration order with std::map in multithreaded code?
How do I avoid rehashing overhead with std::map in performance-sensitive code?
How do I merge two containers efficiently with std::map for embedded targets?