C String Overview

String handling in C is traditionally performed using raw character arrays and functions from string.h. While efficient, this approach requires manual management of memory allocation, buffer sizing, and null termination, which can introduce risks such as buffer overflows, memory leaks, and undefined behavior in large or safety-critical systems.

The C String module in this library provides a lightweight, allocator-aware string container implemented in pure C and declared in c_string.h. Unlike conventional C strings, this container explicitly tracks:

  • The current string length (excluding the null terminator)

  • The total allocated buffer size (including space for the terminator)

  • The allocator used to create and release the string memory

By integrating directly with the allocator abstraction defined in c_allocator.h, the string container supports heap, arena, pool, slab, or custom allocation strategies without changing the public API. This enables deterministic memory behavior suitable for embedded, real-time, or safety-regulated environments.

String construction follows a capacity-driven model:

  • A requested capacity of 0 allocates exactly enough memory to store the full input string plus the null terminator.

  • A non-zero capacity allocates space for the requested number of characters plus one byte for the null terminator.

  • If the requested capacity is smaller than the input string length, the stored string is safely truncated and always null-terminated.

  • If the requested capacity is larger, unused buffer space remains available for future operations.

All functions return explicit success/error state using the *_expect_t pattern defined in c_error.h, avoiding implicit failure modes common in traditional C string handling.

These characteristics make the C String module appropriate for:

  • Deterministic and allocator-controlled memory management

  • Embedded or real-time software requiring bounded behavior

  • Safety-critical systems emphasizing explicit error handling

  • Large applications seeking consistent container abstractions in C

Data Types

The following data structures and derived data types are defined in c_string.h and implemented in c_string.c to support the allocator-aware string container.

string_t

string_t is a public, non-opaque data structure that represents a dynamically managed C-style string. Unlike opaque container types used elsewhere in the library, this structure is intentionally visible so that users may:

  • Inspect internal metadata directly when appropriate

  • Integrate the container with custom utilities or serialization logic

  • Extend behavior through user-defined helper functions or wrappers

The structure stores both the string data and the metadata required for safe memory management through the allocator abstraction defined in c_allocator.h.

Key properties maintained by string_t include:

  • A pointer to a null-terminated character buffer

  • The current logical length of the string (excluding the terminator)

  • The total allocated buffer size in bytes (including space for the terminator)

  • The allocator instance responsible for allocation and release of memory

These fields allow deterministic control over memory usage while preserving compatibility with standard C string operations.

typedef struct {
    char* str;                 // Pointer to null-terminated character buffer
    size_t len;                // Logical string length (excludes '\0')
    size_t alloc;              // Total allocated bytes (includes space for '\0')
    allocator_vtable_t allocator; // Allocator used for memory management
} string_t;

The following invariants are guaranteed for any valid string_t instance:

  • str always points to a null-terminated character sequence.

  • alloc >= len + 1 to ensure space for the terminator.

  • Memory ownership and release are handled exclusively through the stored allocator.

Because the structure is public, users must preserve these invariants when manipulating fields directly. Violating them may result in undefined behavior or allocator misuse.

string_expect_t

string_expect_t is a lightweight result container used for explicit error handling during string construction and operations. This follows the *_expect_t convention defined in c_error.h and avoids implicit failure modes such as returning NULL without context.

typedef struct {
    bool has_value;
    union {
        string_t* value;
        error_code_t error;
    } u;
} string_expect_t;

When has_value is true, the value field contains a valid string_t pointer that must eventually be released using return_string(). When has_value is false, the error field contains the associated error code describing the failure condition.

String Functions

Creation and Teardown

init_string

string_expect_t init_string(const char *cstr, size_t capacity_bytes, allocator_vtable_t allocator)

Initialize an allocator-backed string container.

Constructs a new string_t instance using the provided C-string input, requested payload capacity, and allocator vtable.

Capacity semantics:

  • If capacity_bytes is 0, the allocation defaults to exactly the length of cstr plus space for the null terminator.

  • If capacity_bytes is non-zero, the container allocates (capacity_bytes + 1) bytes to guarantee space for the terminator.

  • If the requested capacity is smaller than the source string length, the stored string is truncated to fit and always null-terminated.

Memory is obtained exclusively through the supplied allocator and must later be released with return_string().

allocator_vtable_t a = heap_allocator();

string_expect_t r = init_string("hello", 0, a);
if (r.has_value) {
    printf("%s\n", const_string(r.u.value)); // prints "hello"
    return_string(r.u.value);
}

Parameters:
  • cstr[in] Null-terminated source C string.

  • capacity_bytes[in] Requested payload capacity in characters (excluding the null terminator).

  • allocator[in] Allocator vtable used for memory management.

Returns:

string_expect_t

  • .has_value = true → valid string_t pointer in .u.value

  • .has_value = false → error code in .u.error

return_string

void return_string(string_t *s)

Release a string and its associated memory.

Frees the internal character buffer and the string_t structure using the allocator stored within the container. For some allocators like an arena_t this function may be a no-op. Passing NULL is safe and performs no action.

After this call, the pointer must not be used.

string_expect_t r = init_string("example", 0, heap_allocator());
if (r.has_value) {
    return_string(r.u.value); // safe cleanup
}

Parameters:
  • s[in] Pointer to string_t instance or NULL.

Utility Functions

get_string_index

static inline char get_string_index(const string_t *s, size_t index)

Safely retrieve a character from a string at a given index.

Returns the character at position index within the logical contents of the string s. The index must satisfy:

0 <= index < s->len
If the index is out of bounds, or if s or s->str is NULL, this function returns the null character (\0’`).

The logical string length (s->len) is authoritative. The null terminator stored at s->str[s->len] is considered an implementation detail and is not treated as a valid character.

  • This function does not distinguish between an out-of-bounds access and a valid embedded \0’` character.

  • For applications requiring explicit error signaling, consider using a boolean-return variant with an output parameter.

allocator_vtable_t a = heap_allocator();

string_expect_t r = init_string("hello", 0u, a);
if (r.has_value) {
    string_t* s = r.u.value;

    char c0 = get_string_index(s, 0);  // 'h'
    char c4 = get_string_index(s, 4);  // 'o'
    char c5 = get_string_index(s, 5);  // '\0' (out of bounds)

    return_string(s);
}
Parameters:
  • s – Pointer to the source string_t.

  • index – Zero-based character index.

Returns:

The character at the specified index if valid.

const_string

static inline const char *const_string(const string_t *s)

Retrieve the internal null-terminated C string.

Returns a pointer to the underlying character buffer owned by the string_t container. The returned pointer remains valid until the string is released with return_string().

Passing NULL is safe and returns NULL.

const char* text = const_string(str);
if (text) {
    puts(text);
}

Parameters:
  • s[in] Pointer to string_t instance or NULL.

Returns:

Pointer to null-terminated character buffer, or NULL.

string_size

static inline size_t string_size(const string_t *s)

Get the logical length of the string.

Returns the number of characters stored in the container, excluding the null terminator.

Passing NULL is safe and returns 0.

size_t n = string_size(str);
printf("length = %zu\n", n);

Parameters:
  • s[in] Pointer to string_t instance or NULL.

Returns:

Character count excluding the null terminator.

string_alloc

static inline size_t string_alloc(const string_t *s)

Get the total allocated buffer size in bytes.

Returns the number of bytes allocated for the internal buffer, including space reserved for the null terminator.

Passing NULL is safe and returns 0.

size_t cap = string_alloc(str);
printf("capacity = %zu bytes\n", cap);

Parameters:
  • s[in] Pointer to string_t instance or NULL.

Returns:

Total allocated bytes for the string buffer.

str_compare

int8_t str_compare(const string_t *s, const char *str)

Compare a bounded string_t against a C string.

Performs a lexicographical comparison between the contents of s and the null-terminated C string str. The comparison is bounded by s->len, meaning the function never reads beyond the initialized region of the string_t buffer.

This function uses a scalar implementation to guarantee:

  • Deterministic execution

  • Strict bounds safety

  • MISRA-compatible control flow

allocator_vtable_t a = heap_allocator();

string_expect_t r = init_string("alpha", 0, a);
if (r.has_value) {
    string_t* s = r.u.value;

    int8_t cmp = str_compare(s, "alphabet");
    // cmp == -1  ("alpha" < "alphabet")

    return_string(s);
}

Note

  • Comparison stops at s->len or the first differing character.

  • The function does not read beyond the bounds of s.

Parameters:
  • s – Pointer to the source string_t to compare.

  • str – Pointer to a null-terminated C string.

Return values:
  • INT8_MIN – Invalid argument (NULL pointer or corrupt state).

  • 0 – Strings are equal within the bounded region.

  • -1s is lexicographically less than str.

  • 1s is lexicographically greater than str.

string_compare

int8_t string_compare(const string_t *s, const string_t *str)

Compare two bounded string_t objects.

Performs a lexicographical comparison between s and str, both of which are bounded string_t instances.

This function may use SIMD acceleration when supported by the target architecture and enabled at compile time:

  • AVX / AVX2 / AVX-512 on x86

  • SSE2 / SSE3 / SSE4.1 on x86

  • NEON on ARM

  • SVE / SVE2 on ARM

When SIMD is unavailable, the implementation falls back to a fully scalar, MISRA-safe comparison with identical semantics.

allocator_vtable_t a = heap_allocator();

string_expect_t r1 = init_string("delta", 0, a);
string_expect_t r2 = init_string("gamma", 0, a);

if (r1.has_value && r2.has_value) {
    string_t* s1 = r1.u.value;
    string_t* s2 = r2.u.value;

    int8_t cmp = string_compare(s1, s2);
    // cmp == -1  ("delta" < "gamma")

    return_string(s1);
    return_string(s2);
}

Note

  • SIMD is used only for bounded byte comparison and never reads past either string’s initialized region.

  • Return values are architecture-independent.

Parameters:
  • s – Pointer to the first string_t.

  • str – Pointer to the second string_t.

Return values:
  • INT8_MIN – Invalid argument (NULL pointer or corrupt state).

  • 0 – Strings are equal.

  • -1s is lexicographically less than str.

  • 1s is lexicographically greater than str.

is_string_ptr

bool is_string_ptr(const string_t *s, const void *ptr)

Checks whether a pointer lies within a string’s allocated buffer.

Determines if ptr points to a memory location inside the character storage owned by the string_t instance s.

The valid range is:

[s->str, s->str + string_alloc(s))

This function does not verify:

  • Whether the pointer references initialized characters

  • Whether the pointer aligns to a character boundary

  • Whether the allocator globally owns the pointer

It only checks containment within the string’s allocation.

string_expect_t r = init_string("hello", 0, heap_allocator());
if (r.has_value) {
    string_t* s = r.u.value;

    char* p = s->str + 1;

    if (is_string_ptr(s, p)) {
        *p = 'a';  // safe mutation
    }

    return_string(s);
}

Note

  • Safe for defensive validation in low-level APIs.

  • Useful before performing pointer arithmetic or in-place mutation.

Parameters:
  • s – Pointer to the string instance.

  • ptr – Pointer to test.

Return values:
  • true – Pointer lies within the string’s allocated buffer.

  • false – Pointer is NULL, string is invalid, or outside range.

is_string_ptr_sized

bool is_string_ptr_sized(const string_t *s, const void *ptr, size_t bytes)

Checks whether a sized memory range lies fully within a string’s allocated buffer.

Determines whether the range [ptr, ptr + bytes) is entirely contained within the character storage owned by s.

The string’s valid allocation range is: [s->str, s->str + s->alloc).

This function is useful for validating that an object, sub-buffer, or typed view fits completely within a string before performing operations such as parsing, casting, or in-place mutation.

Note

  • This checks allocation containment, not “used characters” containment. If you want containment within initialized characters, bound against s->len + 1 (or s->len) instead of s->alloc.

  • bytes == 0 returns false to avoid “vacuously true” ranges.

Parameters:
  • s – Pointer to the string instance.

  • ptr – Pointer to the start of the candidate region.

  • bytes – Size (in bytes) of the candidate region.

Return values:
  • true – The entire range lies within the string’s allocated buffer.

  • false – Invalid inputs, overflow, or the range extends خارج the buffer.

find_substr

size_t find_substr(const string_t *haystack, const string_t *needle, const uint8_t *begin, const uint8_t *end, direction_t dir)

Finds the first occurrence of a substring within a bounded region.

Searches for the string needle inside the character data of haystack, restricted to the memory range [begin, end).

The search direction is controlled by dir:

  • FORWARD — scans from begin toward end and returns the earliest match.

  • REVERSE — scans from end toward begin and returns the latest match within the region.

If begin or end is NULL, the search defaults to the used character region of the string:

[haystack->str, haystack->str + haystack->len)

This function is implemented to take advantage of SIMD (Single Instruction, Multiple Data) instructions when supported by the target architecture.

At compile time, architecture-specific vectorized implementations may be selected, including:

  • AVX-512, AVX2, AVX

  • SSE4.1, SSE3, SSE2

  • ARM NEON

  • ARM SVE / SVE2

When SIMD is available, the search compares multiple characters in parallel, significantly improving performance for:

  • long haystacks

  • repeated substring searches

  • forward and reverse scans over large buffers

If no SIMD capability is detected, the function safely falls back to a fully portable scalar implementation with identical semantics.

SIMD usage is completely transparent to the caller:

  • No API differences

  • No alignment requirements

  • No behavioral changes

The return value is a 1-based offset relative to :

Return value

Meaning

0

Not found or invalid arguments

1

Match begins exactly at begin

k + 1

Match begins k bytes after begin

This convention avoids ambiguity between:

  • “not found”, and

  • “match at index 0”.

  • Region pointers must lie within the allocated buffer of haystack; otherwise the function returns 0.

  • The search is limited to the used string length, not slack allocation beyond haystack->len.

  • An empty needle is treated as found at the beginning of the region and returns 1.

  • SIMD acceleration is optional and architecture-dependent but never changes correctness.

Example: Forward search
string_expect_t h = init_string("bananana", 0, heap_allocator());
string_expect_t n = init_string("ana", 0, heap_allocator());

if (h.has_value && n.has_value) {
    size_t pos = find_substr(h.u.value, n.u.value, NULL, NULL, FORWARD);
    // "ana" first appears at index 1 → return value = 2 (1-based)
}
Example: Reverse search
size_t pos = find_substr(h.u.value, n.u.value, NULL, NULL, REVERSE);
// Last occurrence at index 5 → return value = 6
Example: Bounded window search
const uint8_t* base = (const uint8_t*)h.u.value->str;

size_t pos = find_substr(
    h.u.value,
    n.u.value,
    base + 3,                 // begin search inside string
    base + h.u.value->len,    // end of used region
    FORWARD);

// Position is relative to begin, not start of string

Parameters:
  • haystack – String being searched.

  • needle – Substring to locate.

  • begin – Pointer to start of searchable region inside haystack->str (may be NULL).

  • end – Pointer to one-past-end of searchable region inside haystack->str (may be NULL).

  • dir – Search direction: FORWARD or REVERSE.

Return values:
  • 0 – Not found or invalid input.

  • >0 – 1-based offset of first match relative to begin.

Returns:

size_t

find_substr_lit

size_t find_substr_lit(const string_t *haystack, const char *needle_lit, const uint8_t *begin, const uint8_t *end, direction_t dir)

Find the first occurrence of a literal substring within a string range.

This function searches for the first case-sensitive occurrence of the NUL-terminated C string needle_lit inside the string haystack, optionally constrained to the byte range [begin, end).

The search semantics match find_substr, but the needle is provided as a string literal instead of a string_t object. Internally, the literal length is determined via strlen, and the search is delegated to the same SIMD/scalar substring engine used by find_substr.

See also

find_substr

See also

word_count_lit

Example
allocator_vtable_t a = heap_allocator();

string_expect_t r = init_string("Hello world, hello again", 0u, a);
if (!r.has_value) {
    // handle error
}
string_t* text = r.u.value;

// Find first lowercase "hello"
size_t pos = find_substr_lit(text, "hello", NULL, NULL, DIR_FWD);

// pos == 13

return_string(text);

Note

  • Matching is case-sensitive and substring-based (not word-delimited).

  • The search range [begin, end) is validated against the underlying allocation and clamped to the used length of the string.

  • An empty literal ("") is defined as found at the start of the search region and returns the offset corresponding to begin.

  • The literal is not copied; only its length is computed before searching.

Parameters:
  • haystack – Pointer to the source string object to be searched.

  • needle_lit – Pointer to a NUL-terminated C string literal representing the substring to locate.

  • begin – Optional pointer to the beginning of the search region within haystack->str

    .

    If

    NULL, the search begins at the start of the used string.

  • end

    Optional pointer to one-past-the-last byte of the search region.

    If

    NULL, the search continues to the end of the used string length.

  • dir – Search direction (implementation-defined; typically forward or reverse).

Return values:

SIZE_MAX – Returned if:

  • haystack == NULL

  • haystack->str == NULL

  • needle_lit == NULL

  • the search range is invalid or outside the allocation

  • the literal is not found within the specified region

Returns:

Offset in bytes from the beginning of haystack->str to the first matching occurrence of needle_lit.

String Manipulation

str_concat

bool str_concat(string_t *s, const char *str)

Concatenate a C string onto a CSalt string.

Appends the null-terminated string str to the end of the destination string s. If additional capacity is required, the function attempts to grow the underlying buffer using the allocator associated with s.

This function is safe for overlapping memory regions. If the source pointer lies within the destination buffer and reallocation is required, a temporary copy is created before growth to preserve correctness.

string_expect_t r = init_string("Hello", 0, heap_allocator());
if (r.has_value) {
    string_t* s = r.u.value;

    if (str_concat(s, ", world!")) {
        printf("%s\n", const_string(s)); // "Hello, world!"
    }

    return_string(s);
}

Note

  • The allocator stored in s determines growth behavior.

  • Arena allocators may not reclaim intermediate buffers until the arena itself is reset or destroyed.

  • The resulting string is always null-terminated on success.

Parameters:
  • s – Destination string to be extended.

  • str – Null-terminated C string to append.

Return values:
  • true – Concatenation succeeded.

  • false – Invalid arguments, allocation failure, or size overflow.

string_concat

bool string_concat(string_t *s, const string_t *str)

Concatenate one CSalt string onto another.

Appends the contents of str to the destination string s. This function behaves identically to str_concat but obtains the source characters from another managed string_t instance.

The operation respects allocator semantics and may trigger buffer growth using the destination string’s allocator.

string_expect_t a = init_string("CSalt", 0, heap_allocator());
string_expect_t b = init_string(" Library", 0, heap_allocator());

if (a.has_value && b.has_value) {
    string_t* s1 = a.u.value;
    string_t* s2 = b.u.value;

    if (string_concat(s1, s2)) {
        printf("%s\n", const_string(s1)); // "CSalt Library"
    }

    return_string(s1);
    return_string(s2);
}

Note

  • Source and destination may reference the same underlying buffer. Overlap is handled safely.

  • The destination string remains null-terminated on success.

Parameters:
  • s – Destination string to be extended.

  • str – Source string whose contents will be appended.

Return values:
  • true – Concatenation succeeded.

  • false – Invalid arguments, allocation failure, or size overflow.

reset_string

static inline void reset_string(string_t *str)

Reset a string to the empty state without releasing memory.

Sets the logical length of the string to zero and, when a backing buffer exists, writes a null terminator at the first character position.

This allows subsequent concatenation operations (e.g.,

str_concat or string_concat) to begin writing from the start of the buffer while preserving the previously allocated capacity.

This operation is O(1) and does not invoke the allocator.

allocator_vtable_t a = heap_allocator();

string_expect_t r = init_string("hello", 0u, a);
if (r.has_value) {
    string_t* s = r.u.value;

    reset_string(s);

    // String is now empty but reusable
    str_concat(s, "world");
    printf("%s\n", const_string(s));   // prints "world"

    return_string(s);
}

Note

  • If str is NULL, the function performs no action.

  • If the string has no backing buffer (str->str == NULL), the function performs no action.

  • Capacity remains unchanged; only the logical contents are cleared.

Parameters:
  • str – Pointer to the string_t instance to reset.

copy_string

string_expect_t copy_string(const string_t *s, allocator_vtable_t allocator)

Creates a deep value copy of an existing string.

Allocates a new string_t instance using the supplied allocator and copies the character data from the source string s.

The copied string:

  • Contains identical character contents to s

  • Has independent storage (no shared buffer)

  • Uses the provided allocator for memory management

  • Allocates the minimal required capacity of string_size(s) + 1 bytes to store the characters and null terminator

This function performs a value copy, not a structural clone of the original allocation. Any unused capacity in the source string is not preserved in the copy.

allocator_vtable_t a = heap_allocator();

string_expect_t r1 = init_string("hello", 0, a);
if (!r1.has_value) {
    return;
}

string_t* original = r1.u.value;

string_expect_t r2 = copy_string(original, a);
if (r2.has_value) {
    string_t* copy = r2.u.value;

    // Independent modification
    str_concat(copy, " world");

    printf("%s\n", const_string(original)); // "hello"
    printf("%s\n", const_string(copy));     // "hello world"

    return_string(copy);
}

return_string(original);

Note

  • The returned string must be released with return_string.

  • Modifying the copy does not affect the source.

  • Suitable for transferring string ownership between allocators or subsystems.

Parameters:
  • s – Source string to copy.

  • allocator – Allocator used to create the new string.

Return values:
  • has_value – = true .u.value points to a newly allocated deep copy.

  • has_value – = false .u.error contains:

    • NULL_POINTER if s or s->str is NULL

    • Any error propagated from init_string

Returns:

string_expect_t

word_count

size_t word_count(const string_t *s, const string_t *word, const uint8_t *start, const uint8_t *end)

Count case-sensitive occurrences of a substring within a string range.

This function counts the number of non-overlapping, case-sensitive occurrences of word inside the string s, optionally constrained to the byte range [start, end).

Internally, this function repeatedly calls find_substr() and advances the search cursor past each successful match, ensuring forward progress and preventing infinite loops.

See also

find_substr

Example
allocator_vtable_t a = heap_allocator();
string_expect_t r = init_string("Hello world thisHello is hello again Hello", 45, a);
if (!r.has_value) {
    // handle error
}
string_t* text = r.u.value;

r = init_string("Hello", 5, a);
if (!r.has_value) {
   // Handle error
}
string_t* word = r.u.value;
size_t count = word_count(text, word, NULL, NULL);

// count == 2 because matching is case-sensitive:
//   "Hello"
//   "thisHello"
//   "Hello"
//

Note

  • Matching is case-sensitive.

  • Matches are substring-based, not whole-word delimited. For example, searching "hello" will match "jonhello".

  • Occurrences are counted non-overlapping. To count overlapping matches, advance the cursor by +1 instead of +word->len after each match.

Parameters:
  • s – Pointer to the source string object to be searched.

  • word – Pointer to the substring to search for.

  • start – Optional pointer to the beginning of the search region within s->str. If NULL, the search begins at the start of the string.

  • end – Optional pointer to one-past-the-last byte of the search region. If NULL, the search continues to the end of the used string length.

Return values:

0 – Returned if:

  • s == NULL

  • s->str == NULL

  • word == NULL

  • word->str == NULL

  • word->len == 0

  • no matches are found

Returns:

The number of non-overlapping occurrences of word found within the specified region.

word_count_lit

size_t word_count_lit(const string_t *s, const char *word, const uint8_t *start, const uint8_t *end)

Count case-sensitive occurrences of a literal substring within a string range.

This function counts the number of non-overlapping, case-sensitive occurrences of the C string literal word inside the string s, optionally constrained to the byte range [start, end).

Internally, this function constructs a temporary non-owning substring view and repeatedly calls find_substr(), advancing the search cursor past each successful match to ensure forward progress and prevent infinite loops.

See also

word_count

See also

find_substr

Example
allocator_vtable_t a = heap_allocator();
string_expect_t r = init_string("Hello world thisHello is hello again Hello", 45, a);
if (!r.has_value) {
    // handle error
}
string_t* text = r.u.value;

size_t count = word_count_lit(text, "Hello", NULL, NULL);

// count == 3 because matching is case-sensitive:
//   "Hello"
//   "thisHello"
//   "Hello"
//

Note

  • Matching is case-sensitive.

  • Matches are substring-based, not whole-word delimited. For example, searching "Hello" will match "thisHello".

  • Occurrences are counted non-overlapping. To count overlapping matches, advance the cursor by +1 instead of +strlen(word) after each match.

  • The literal word is not copied; only a temporary non-owning view is created for the duration of the search.

Parameters:
  • s – Pointer to the source string object to be searched.

  • word – Pointer to a NUL-terminated C string literal representing the substring to search for.

  • start – Optional pointer to the beginning of the search region within s->str. If NULL, the search begins at the start of the string.

  • end – Optional pointer to one-past-the-last byte of the search region. If NULL, the search continues to the end of the used string length.

Return values:

0 – Returned if:

  • s == NULL

  • s->str == NULL

  • word == NULL

  • word is an empty string ("")

  • no matches are found

Returns:

The number of non-overlapping occurrences of word found within the specified region.

token_count

size_t token_count(const string_t *s, const string_t *delim, const uint8_t *begin, const uint8_t *end)

Count tokens in a string using a string_t delimiter set.

Counts the number of non-empty tokens within the specified byte range of s, where tokens are sequences of bytes not contained in the delimiter set stored in delim.

A token start is defined as a transition from:

delimiter → non-delimiter
The beginning of the search window is treated as if it were preceded by a delimiter, ensuring that a leading non-delimiter byte forms a token.

See also

token_count_lit

Example
allocator_vtable_t a = heap_allocator();

string_expect_t r1 = init_string("one|two||three", 0u, a);
string_expect_t r2 = init_string("|", 0u, a);

if (!r1.has_value || !r2.has_value) {
    // handle error
}

string_t* text  = r1.u.value;
string_t* delim = r2.u.value;

size_t count = token_count(text, delim, NULL, NULL);

// Tokens: "one", "two", "three"
// count == 3

return_string(delim);
return_string(text);

Note

  • Matching is byte-wise and case-sensitive.

  • The window [begin,end) is validated against the allocation and clamped to the used string length.

  • If delim->len == 0, the entire non-empty window is treated as a single token.

  • SIMD acceleration may be used internally depending on the build configuration and target architecture.

Parameters:
  • s – Pointer to the source string_t to analyze.

  • delim – Pointer to a string_t containing delimiter bytes. The delimiter set consists of the first delim->len bytes.

  • begin – Optional pointer to the first byte of the search window within s->str. If NULL, the search begins at the start of the used string.

  • end – Optional pointer to one-past-the-last byte of the search window. If NULL, the search ends at the used length of the string.

Return values:
  • SIZE_MAX – Returned if:

    • s == NULL

    • s->str == NULL

    • delim == NULL

    • delim->str == NULL

    • [begin,end) lies outside the string allocation

  • 0 – Returned if:

    • the window is empty

    • the window contains only delimiter bytes

Returns:

Number of tokens found in the specified window.

token_count_lit

size_t token_count_lit(const string_t *s, const char *delim, const uint8_t *begin, const uint8_t *end)

Count tokens in a string using a C-string delimiter set.

Counts the number of non-empty tokens within the specified byte range of s, where tokens are sequences of bytes not contained in the delimiter set delim.

A token start is defined as a transition from:

delimiter → non-delimiter
The beginning of the search window is treated as if it were preceded by a delimiter, ensuring that a leading non-delimiter byte forms a token.

The delimiter set is interpreted as the first strlen(delim) bytes of the NUL-terminated C string delim.

See also

token_count

Example
allocator_vtable_t a = heap_allocator();

string_expect_t r = init_string("  alpha, beta;gamma  ", 0u, a);
if (!r.has_value) {
    // handle error
}

string_t* text = r.u.value;

// Delimiters: space, comma, semicolon
size_t count = token_count_lit(text, " ,;", NULL, NULL);

// Tokens: "alpha", "beta", "gamma"
// count == 3

return_string(text);

Note

  • Matching is byte-wise and case-sensitive.

  • The window [begin,end) is validated against the allocation and clamped to the used string length.

  • If delim is an empty string (""), the entire non-empty window is treated as a single token.

  • SIMD acceleration may be used internally depending on the build configuration and target architecture.

Parameters:
  • s – Pointer to the source string_t to analyze.

  • delim – Pointer to a NUL-terminated C string containing delimiter bytes. Each byte in this string is treated as an independent delimiter.

  • begin – Optional pointer to the first byte of the search window within s->str. If NULL, the search begins at the start of the used string.

  • end – Optional pointer to one-past-the-last byte of the search window. If NULL, the search ends at the used length of the string.

Return values:
  • SIZE_MAX – Returned if:

    • s == NULL

    • s->str == NULL

    • delim == NULL

    • [begin,end) lies outside the string allocation

  • 0 – Returned if:

    • the window is empty

    • the window contains only delimiter bytes

Returns:

Number of tokens found in the specified window.

to_uppercase

void to_uppercase(string_t *s, uint8_t *start, uint8_t *end)

Convert ASCII lowercase characters to uppercase in-place.

Converts all bytes in the specified window of s from ‘a..’z’to’A’..’Z’` using ASCII-only rules. Bytes outside this range are left unchanged.

The conversion is performed in-place and may be internally accelerated using SIMD instructions depending on the build configuration and target architecture.

See also

to_lowercase

Example
allocator_vtable_t a = heap_allocator();

string_expect_t r = init_string("Hello world", 0u, a);
if (!r.has_value) {
    // handle allocation failure
}

string_t* s = r.u.value;

to_uppercase(s, NULL, NULL);
// s->str == "HELLO WORLD"

return_string(s);

Note

  • The window [start, end) must lie within the string allocation.

  • If end extends beyond the used length, it is clamped to s->len.

  • If arguments are invalid or the window is empty, the function performs a silent no-op.

  • Only ASCII

    case conversion is performed.

    UTF-8 multibyte sequences and locale-dependent characters are not modified.

Parameters:
  • s – Pointer to the string_t to modify.

  • start – Optional pointer to the first byte of the conversion window within s->str

    .

    If

    NULL, conversion begins at the start of the used string.

  • end

    Optional pointer to one-past-the-last byte of the conversion window.

    If

    NULL, conversion continues to the end of the used string.

to_lowercase

void to_lowercase(string_t *s, uint8_t *start, uint8_t *end)

Convert ASCII uppercase characters to lowercase in-place.

Converts all bytes in the specified window of s from ‘A..’Z’to’a’..’z’` using ASCII-only rules. Bytes outside this range are left unchanged.

The conversion is performed in-place and may be internally accelerated using SIMD instructions depending on the build configuration and target architecture.

See also

to_uppercase

Example
allocator_vtable_t a = heap_allocator();

string_expect_t r = init_string("HELLO WORLD", 0u, a);
if (!r.has_value) {
    // handle allocation failure
}

string_t* s = r.u.value;

to_lowercase(s, NULL, NULL);
// s->str == "hello world"

return_string(s);

Note

  • The window [start, end) must lie within the string allocation.

  • If end extends beyond the used length, it is clamped to s->len.

  • If arguments are invalid or the window is empty, the function performs a silent no-op.

  • Only ASCII

    case conversion is performed.

    UTF-8 multibyte sequences and locale-dependent characters are not modified.

Parameters:
  • s – Pointer to the string_t to modify.

  • start – Optional pointer to the first byte of the conversion window within s->str

    .

    If

    NULL, conversion begins at the start of the used string.

  • end

    Optional pointer to one-past-the-last byte of the conversion window.

    If

    NULL, conversion continues to the end of the used string.

drop_substr

void drop_substr(string_t *s, const string_t *substring, uint8_t *min_ptr, uint8_t *max_ptr)

Remove all non-overlapping occurrences of a substring within a window.

This function removes every non-overlapping occurrence of substring from the string s that lies within the byte range [begin, end).

Removal is performed in-place by shifting the remaining suffix of the string left using memmove, preserving the terminating NUL byte and maintaining valid C-string semantics.

To minimize data movement, matches are located using a reverse search strategy so that shrinking operations occur from right-to-left.

After each removal, if a single ASCII space ‘` immediately follows the removed substring, that space is also removed.

(This helps avoid leaving double-spaces when removing words.)

See also

drop_substr_lit

See also

find_substr

Example
allocator_vtable_t a = heap_allocator();

string_expect_t r1 = init_string("alpha beta beta gamma", 0u, a);
string_expect_t r2 = init_string("beta", 0u, a);

if (r1.has_value && r2.has_value) {
    string_t* text = r1.u.value;
    string_t* word = r2.u.value;

    drop_substr(text, word, NULL, NULL);
    // Result: "alpha gamma"

    return_string(word);
    return_string(text);
}

Note

  • The window [begin, end) must lie within the string allocation.

  • If end exceeds the used length, it is clamped to s->len.

  • If arguments are invalid, the function performs a silent no-op.

  • Matches are substring-based, not word-delimited.

  • Only one trailing ASCII space is removed per match.

Parameters:
  • s – Pointer to the destination string_t to modify.

  • substring – Pointer to the substring to remove.

  • begin – Optional pointer to the first byte of the search window within s->str

    .

    If

    NULL, the window begins at the start of the used string.

  • end

    Optional pointer to one-past-the-last byte of the search window.

    If

    NULL, the window extends to the end of the used string.

drop_substr_lit

void drop_substr_lit(string_t *s, const char *substring, uint8_t *min_ptr, uint8_t *max_ptr)

Remove all non-overlapping occurrences of a C-string literal substring.

This function behaves identically to drop_substr, except the substring is provided as a NUL-terminated C string literal rather than a string_t object.

Each non-overlapping occurrence of substring found within the window [begin, end) of s is removed in-place by shifting the remaining suffix left, preserving the terminating NUL byte.

Matches are processed using a reverse search strategy to minimize the total amount of memory movement required.

If a single ASCII space ‘` immediately follows a removed occurrence, that space is also removed.

See also

drop_substr

See also

find_substr_lit

Example
allocator_vtable_t a = heap_allocator();

string_expect_t r = init_string("one two two three", 0u, a);
if (r.has_value) {
    string_t* text = r.u.value;

    drop_substr_lit(text, "two", NULL, NULL);
    // Result: "one three"

    return_string(text);
}

Note

  • The window [begin, end) must lie within the string allocation.

  • If end exceeds the used length, it is clamped to s->len.

  • If arguments are invalid, the function performs a silent no-op.

  • Matches are case-sensitive and substring-based.

  • Only one trailing ASCII space is removed per match.

Parameters:
  • s – Pointer to the destination string_t to modify.

  • substring – NUL-terminated C string containing the substring to remove.

  • begin – Optional pointer to the first byte of the search window within s->str

    .

    If

    NULL, the window begins at the start of the used string.

  • end

    Optional pointer to one-past-the-last byte of the search window.

    If

    NULL, the window extends to the end of the used string.

replace_substr

bool replace_substr(string_t *string, const string_t *pattern, const string_t *replace_string, char *min_ptr, char *max_ptr)

Replace all non-overlapping occurrences of a substring in-place.

Replaces every case-sensitive, non-overlapping occurrence of pattern with replace_string inside the byte window [min_ptr, max_ptr) of string.

This function is the string_t-based counterpart to replace_substr_lit and follows the same allocator-aware algorithm:

  1. Match count determined using word_count.

  2. Final length computed before modification.

  3. Buffer resized once via the string’s allocator if required.

  4. Replacement performed using reverse search (find_substr with REVERSE) to minimize memory movement.

See also

find_substr

See also

word_count

Example
allocator_vtable_t a = heap_allocator();

string_expect_t r1 = init_string("one two two three", 0u, a);
string_expect_t r2 = init_string("two", 0u, a);
string_expect_t r3 = init_string("four", 0u, a);

if (r1.has_value && r2.has_value && r3.has_value) {
    string_t* s   = r1.u.value;
    string_t* pat = r2.u.value;
    string_t* rep = r3.u.value;

    replace_substr(s, pat, rep, NULL, NULL);
    // Result: "one four four three"

    return_string(rep);
    return_string(pat);
    return_string(s);
}

Note

  • Matching is case-sensitive and substring-based.

  • Replacements are non-overlapping.

  • The window is interpreted as [min_ptr, max_ptr) (end exclusive).

  • The terminating NUL byte is preserved.

  • On failure, the original string contents remain unchanged.

Parameters:
  • string – Pointer to the destination string_t to modify.

  • pattern – Substring to search for.

  • replace_string – Replacement substring.

  • min_ptr – Optional pointer to the first byte of the replacement window within string->str

    .

    If

    NULL, the window begins at the start of the used string.

  • max_ptr

    Optional pointer to one-past-the-last byte of the replacement window.

    If

    NULL, the window extends to the end of the used string.

Returns:

true if the operation completed successfully or no replacements were required. false if:

  • any argument is invalid

  • the window lies outside the string allocation

  • memory reallocation fails

replace_substr_lit

bool replace_substr_lit(string_t *string, const char *pattern, const char *replace_string, uint8_t *min_ptr, uint8_t *max_ptr)

Replace all non-overlapping occurrences of a literal substring in-place.

Replaces every case-sensitive, non-overlapping occurrence of the NUL-terminated C string pattern with replace_string inside the byte window [min_ptr, max_ptr) of string.

The operation is performed in-place using allocator-aware resizing:

  1. The number of matches is determined using word_count_lit.

  2. The final required string length is computed before modification.

  3. If necessary, the buffer is reallocated once via the string’s associated allocator.

  4. Matches are processed using reverse search (find_substr_lit with REVERSE) to minimize the total memmove cost.

Replaces every case-sensitive, non-overlapping occurrence of the NUL-terminated C string pattern with replace_string inside the byte window [min_ptr, max_ptr) of string.

See also

replace_substr

See also

find_substr_lit

See also

word_count_lit

Example
allocator_vtable_t a = heap_allocator();

string_expect_t r = init_string("red green red blue", 0u, a);
if (!r.has_value) {
    // handle allocation failure
}

string_t* s = r.u.value;

replace_substr_lit(s, "red", "yellow", NULL, NULL);
// Result: "yellow green yellow blue"

return_string(s);

The replacement is performed in-place using allocator-aware resizing:

  1. The number of matches is determined using word_count_lit.

  2. The final string length is computed before modification.

  3. If necessary, the string buffer is reallocated once via the associated allocator.

  4. Matches are processed using reverse search (find_substr_lit with REVERSE) to minimize the total amount of memmove shifting.

See also

replace_substr

See also

find_substr_lit

See also

word_count_lit

Example
allocator_vtable_t a = heap_allocator();

string_expect_t r = init_string("red green red blue", 0u, a);
if (!r.has_value) {
    // handle allocation failure
}

string_t* s = r.u.value;

replace_substr_lit(s, "red", "yellow", NULL, NULL);
// Result: "yellow green yellow blue"

return_string(s);

Note

  • Matching is case-sensitive and substring-based.

  • Replacements are non-overlapping.

  • The window is interpreted as [min_ptr, max_ptr) (end exclusive).

  • The terminating NUL byte is always preserved.

  • On failure, the original string contents remain unchanged.

Note

  • Matching is case-sensitive and substring-based.

  • Replacements are non-overlapping.

  • The window is interpreted as [min_ptr, max_ptr) (end exclusive).

  • The terminating NUL byte is always preserved.

  • On failure, the original string contents remain unchanged.

Parameters:
  • string – Pointer to the destination string_t to modify.

  • pattern – NUL-terminated substring to search for.

  • replace_string – NUL-terminated replacement substring.

  • min_ptr – Optional pointer to the first byte of the replacement window within string->str

    .

    If

    NULL, the window begins at the start of the used string.

  • max_ptr

    Optional pointer to one-past-the-last byte of the replacement window.

    If

    NULL, the window extends to the end of the used string.

  • string – Pointer to the destination string_t to modify.

  • pattern – NUL-terminated substring to search for.

  • replace_string – NUL-terminated replacement substring.

  • min_ptr – Optional pointer to the first byte of the replacement window within string->str

    .

    If

    NULL, the window begins at the start of the used string.

  • max_ptr

    Optional pointer to one-past-the-last byte of the replacement window.

    If

    NULL, the window extends to the end of the used string.

Returns:

true if the operation completed successfully or no replacements were required. false if:

  • any argument is invalid

  • the window lies outside the string allocation

  • memory reallocation fails

Returns:

true if the operation completed successfully or no replacements were required. false if:

  • any argument is invalid

  • the window lies outside the string allocation

  • memory reallocation fails

pop_str_token_lit

string_expect_t pop_str_token_lit(string_t *s, const char *token, allocator_vtable_t allocator)

Pop the substring to the right of the last literal token occurrence.

Searches for the last (reverse) occurrence of the C string literal token within the used portion of s. If found, all characters strictly to the right of the token are:

  1. Copied into a newly allocated string_t (using the supplied allocator), and

  2. Removed from s by shrinking its logical length and resetting the null terminator.

The token itself is also removed from s.

Example:

Input string: “alpha/beta/gamma” Token: “/”

Result: Returned string -> “gamma” Modified input -> “alpha/beta”

Matching is case-sensitive.

allocator_vtable_t a = heap_allocator();

string_expect_t r = init_string("red/green/blue", 0u, a);
assert_true(r.has_value);

string_t* s = r.u.value;

string_expect_t popped = pop_str_token_lit(s, "/", a);
assert_true(popped.has_value);

// popped.u.value->str == "blue"
// s->str == "red/green"

return_string(popped.u.value);
return_string(s);

See also

pop_str_token

See also

find_substr_lit

Note

  • The original string is modified only if the token is found.

  • The returned string is independent and must be released by the caller.

  • The search is performed using find_substr_lit in REVERSE mode.

Parameters:
  • s – Pointer to the source string_t to modify.

  • token – Null-terminated C string literal representing the token. Must not be NULL or empty.

  • allocator – Allocator used to construct the returned string.

Return values:
  • has_value – == true A newly allocated string containing the substring to the right of the last token occurrence.

  • has_value – == false If:

    • s == NULL

    • s->str == NULL

    • token == NULL

    • token is empty

    • token is not found in s

    • allocation fails

Returns:

A string_expect_t containing:

pop_str_token

string_expect_t pop_str_token(string_t *s, const string_t *token, allocator_vtable_t allocator)

Pop the substring to the right of the last string_t token occurrence.

Searches for the last (reverse) occurrence of the substring specified by token within the used portion of s.

If found:

  1. The substring strictly to the right of the token is copied into a new string_t using the supplied allocator.

  2. The original string s is truncated at the beginning of the token.

The token itself is removed from s.

Example:

Input string: “one::two::three” Token: “::”

Result: Returned string -> “three” Modified input -> “one::two”

Matching is case-sensitive.

allocator_vtable_t a = heap_allocator();

string_expect_t rs = init_string("path/to/file.txt", 0u, a);
string_expect_t rt = init_string("/", 0u, a);

if (rs.has_value && rt.has_value) {
    string_t* s = rs.u.value;
    string_t* t = rt.u.value;

    string_expect_t out = pop_str_token(s, t, a);
    assert_true(out.has_value);

    // out.u.value->str == "file.txt"
    // s->str == "path/to"

    return_string(out.u.value);
    return_string(t);
    return_string(s);
}

See also

find_substr

Note

  • The original string is modified only if the token is found.

  • The returned string must be released by the caller.

  • The search is performed using find_substr in REVERSE mode.

Parameters:
  • s – Pointer to the source string_t to modify.

  • token – Pointer to a string_t representing the token substring. Must not be NULL, and must have non-zero length.

  • allocator – Allocator used to construct the returned string.

Return values:
  • has_value – == true A newly allocated string containing the substring to the right of the last token occurrence.

  • has_value – == false If:

    • s == NULL

    • s->str == NULL

    • token == NULL

    • token->str == NULL

    • token->len == 0

    • token is not found in s

    • allocation fails

Returns:

A string_expect_t containing:

Generic Macros

The generic macros described in this section are only available when ARENA_USE_CONVENIENCE_MACROS is enabled and the code is not compiled with NO_FUNCTION_MACROS.

concat_string

concat_string(dst, src)

Type-safe generic string concatenation convenience macro.

Dispatches to the correct concatenation routine based on the type of the source argument src using the C11 _Generic selection mechanism.

Supported source types:

  • const char* → str_concat

  • char* → str_concat

  • const string_t* → string_concat

  • string_t* → string_concat

Any unsupported source type triggers a compile-time error in C11 builds via _concat_string_type_error, ensuring strong type safety without runtime overhead.

This macro is available only when:

  • ARENA_USE_CONVENIENCE_MACROS is defined, and

  • NO_FUNCTION_MACROS is not defined (to preserve MISRA-style builds).

string_expect_t r = init_string("Answer: ", 0, heap_allocator());
if (r.has_value) {
    string_t* s = r.u.value;

    concat_string(s, "42");
    printf("%s\n", const_string(s));  // "Answer: 42"

    return_string(s);
}

Note

  • The destination string’s allocator controls any required buffer growth.

  • For arena allocators, intermediate buffers may persist until the arena is reset or destroyed.

  • No runtime type checks are performed; dispatch occurs entirely at compile time.

Parameters:
  • dst – Destination string_t instance to be extended.

  • src – Source data to append (const char* or string_t*).

Return values:
  • true – Concatenation succeeded.

  • false

    Concatenation failed (allocation error, overflow, or invalid arguments).

    This return value originates from the selected function.

compare_string

compare_string(lhs, rhs)

Type-safe generic string comparison convenience macro.

compare_string(lhs, rhs) provides a single comparison interface that selects the correct implementation at compile time using the C11 _Generic operator.

Compile-time dispatch rules:

  • If rhs is a C string (const char* or char*), this macro expands to: str_compare((const string_t*)lhs, (const char*)rhs)

  • If rhs is a string object (const string_t* or string_t*), this macro expands to: string_compare((const string_t*)lhs, (const string_t*)rhs)

In other words, the macro performs zero runtime type checks and adds no dispatch overhead—selection happens entirely at compile time.

Availability:

  • Enabled only when ARENA_USE_CONVENIENCE_MACROS is defined, and

  • Disabled when NO_FUNCTION_MACROS is defined (to support MISRA-style builds).

allocator_vtable_t a = heap_allocator();

string_expect_t r1 = init_string("alpha", 0u, a);
string_expect_t r2 = init_string("alphabet", 0u, a);

if (r1.has_value && r2.has_value) {
    string_t* s1 = r1.u.value;
    string_t* s2 = r2.u.value;

    // Dispatches to str_compare(s1, "alphabet")
    int8_t c1 = compare_string(s1, "alphabet");  // -> -1 

    // Dispatches to string_compare(s1, s2) 
    int8_t c2 = compare_string(s1, s2);          // -> -1 

    (void)c1;
    (void)c2;

    return_string(s1);
    return_string(s2);
}

Note

If rhs is not one of the supported types, this macro triggers a compile-time error in C11 builds via COMPARE_STRING_TYPECHECK_.

Note

  • When dispatching to str_compare, comparison is bounded by lhs->len.

  • When dispatching to string_compare, the implementation may use SIMD acceleration internally (depending on build/architecture), but the return semantics remain identical across platforms.

Parameters:
  • lhs – Pointer to the left-hand string_t (treated as const string_t*).

  • rhs – Right-hand operand. Must be one of: const char*, char*, const string_t*, string_t*.

Return values:
  • INT8_MIN – Invalid argument / error sentinel (e.g., NULL input).

  • -1lhs is lexicographically less than rhs.

  • 0lhs is equal to rhs.

  • 1lhs is lexicographically greater than rhs.

Returns:

int8_t using the semantics of the selected function:

count_words

count_words(s, word, start, end)

Type-safe generic substring occurrence counting convenience macro.

count_words(s, word, start, end) provides a single counting interface that selects the correct implementation at compile time using the C11 _Generic operator.

Compile-time dispatch rules:

  • If word is a C string (const char* or char*), this macro expands to: word_count_lit((const string_t*)s, (const char*)word, start, end)

  • If word is a string object (const string_t* or string_t*), this macro expands to: word_count((const string_t*)s, (const string_t*)word, start, end)

In other words, the macro performs zero runtime type checks and adds no dispatch overhead—selection happens entirely at compile time.

Availability:

  • Enabled only when ARENA_USE_CONVENIENCE_MACROS is defined, and

  • Disabled when NO_FUNCTION_MACROS is defined (to support MISRA-style builds).

allocator_vtable_t a = heap_allocator();

string_expect_t r = init_string("Hello world thisHello is hello again Hello", 45u, a);
if (!r.has_value) {
    // handle error
}
string_t* text = r.u.value;

// Dispatches to word_count_lit(text, "Hello", NULL, NULL)
size_t c1 = count_words(text, "Hello", NULL, NULL);   // -> 3

// Dispatches to word_count(text, word_obj, NULL, NULL)
string_expect_t r2 = init_string("hello", 0u, a);
if (r2.has_value) {
    string_t* w = r2.u.value;
    size_t c2 = count_words(text, w, NULL, NULL);     // -> 1
    return_string(w);
    (void)c2;
}

(void)c1;
return_string(text);

See also

word_count

See also

word_count_lit

See also

find_substr

Note

If word is not one of the supported types, this macro triggers a compile-time error in C11 builds via COUNT_WORDS_TYPECHECK_.

Note

Matching is case-sensitive. Occurrences are counted non-overlapping by default (implementation-defined by the selected function).

Parameters:
  • s – Pointer to the source string_t (treated as const string_t*).

  • word – Substring to search for. Must be one of: const char*, char*, const string_t*, string_t*.

  • start – Optional pointer to the beginning of the search region within s->str. If NULL, the search begins at the start of the string.

  • end – Optional pointer to one-past-the-last byte of the search region. If NULL, the search continues to the end of the used string length.

Return values:

0 – Returned if the selected implementation considers the arguments invalid (e.g., s == NULL, s->str == NULL, word == NULL, empty word, etc.) or if no matches are found.

Returns:

size_t count using the semantics of the selected function.

find_substring

find_substring(haystack, needle, begin, end, dir)

Type-safe generic substring search convenience macro.

find_substring(haystack, needle, begin, end, dir) selects the correct substring search implementation at compile time using the C11 _Generic operator.

Compile-time dispatch rules:

  • If needle is a C string (const char* or char*), this macro expands to: find_substr_lit((const string_t*)haystack, (const char*)needle, begin, end, dir)

  • If needle is a string object (const string_t* or string_t*), this macro expands to: find_substr((const string_t*)haystack, (const string_t*)needle, begin, end, dir)

This macro performs zero runtime type checks and adds no dispatch overhead—selection happens entirely at compile time.

Availability:

  • Enabled only when ARENA_USE_CONVENIENCE_MACROS is defined, and

  • Disabled when NO_FUNCTION_MACROS is defined (to support MISRA-style builds).

See also

find_substr

See also

find_substr_lit

Note

If needle is not one of the supported types, this macro triggers a compile-time error in C11 builds via FIND_SUBSTR_TYPECHECK_.

Parameters:
  • haystack – Pointer to the source string_t (treated as const string_t*).

  • needle – Substring to search for. Supported types: const char*, char*, const string_t*, string_t*.

  • begin – Optional start pointer within haystack->str (or NULL).

  • end – Optional end pointer within haystack->str (or NULL).

  • dir – Search direction (implementation-defined by underlying functions).

Returns:

size_t offset from the beginning of haystack, or SIZE_MAX if not found or if arguments are invalid (per the selected implementation).

count_tokens

count_tokens(s, delim, begin, end)

Type-safe generic token counting convenience macro.

count_tokens(s, delim, begin, end) selects the appropriate token counting implementation at compile time using the C11 _Generic operator.

Dispatch rules:

  • If delim is a C string (const char* or char*), expands to: token_count_lit((const string_t*)s, (const char*)delim, begin, end)

  • If delim is a string object (const string_t* or string_t*), expands to: token_count((const string_t*)s, (const string_t*)delim, begin, end)

No runtime type checks are performed; selection occurs entirely at compile time with zero dispatch overhead.

Availability:

  • Enabled only when ARENA_USE_CONVENIENCE_MACROS is defined

  • Disabled when NO_FUNCTION_MACROS is defined

allocator_vtable_t a = heap_allocator();

string_expect_t r = init_string("alpha,beta gamma", 0u, a);
assert_true(r.has_value);

string_t* text = r.u.value;

// Dispatch → token_count_lit
size_t c1 = count_tokens(text, ", ", NULL, NULL);   // == 2

string_expect_t d = init_string(", ", 0u, a);
assert_true(d.has_value);

// Dispatch → token_count
size_t c2 = count_tokens(text, d.u.value, NULL, NULL); // == 2

return_string(d.u.value);
return_string(text);

Note

Passing an unsupported delimiter type triggers a compile-time error via TOKEN_COUNT_TYPECHECK_.

Parameters:
  • s – Pointer to the source string_t.

  • delim – Delimiter specification. Must be one of:

    • const char*

    • char*

    • const string_t*

    • string_t*

  • begin – Optional pointer to the beginning of the search window.

  • end – Optional pointer to one-past-the-last byte of the search window.

Return values:

SIZE_MAX – Invalid argument or out-of-range window.

Returns:

Number of tokens detected in the specified range.

drop_substring

drop_substring(s, needle, begin, end)

Type-safe generic substring removal convenience macro.

drop_substring(s, needle, begin, end) provides a single interface for removing all occurrences of a substring from a string_t within the byte window [begin, end). The correct implementation is selected at compile time using the C11 _Generic operator.

Dispatch rules:

  • If needle is a C string (const char* or char*), this macro expands to: drop_substr_lit((string_t*)s, (const char*)needle, begin, end)

  • If needle is a string object (const string_t* or string_t*), this macro expands to: drop_substr((string_t*)s, (const string_t*)needle, begin, end)

In other words, the macro performs zero runtime type checks and adds no dispatch overhead—selection happens entirely at compile time.

Availability:

  • Enabled only when ARENA_USE_CONVENIENCE_MACROS is defined, and

  • Disabled when NO_FUNCTION_MACROS is defined (to support MISRA-style builds).

allocator_vtable_t a = heap_allocator();

string_expect_t r = init_string("alpha beta beta gamma", 0u, a);
assert_true(r.has_value);
string_t* text = r.u.value;

// Dispatches to drop_substr_lit(text, "beta", NULL, NULL)
drop_substring(text, "beta", NULL, NULL);
// text->str == "alpha gamma"

// Rebuild input for the string_t needle example:
return_string(text);
r = init_string("alpha beta beta gamma", 0u, a);
assert_true(r.has_value);
text = r.u.value;

string_expect_t rn = init_string("beta", 0u, a);
assert_true(rn.has_value);
string_t* needle = rn.u.value;

// Dispatches to drop_substr(text, needle, NULL, NULL)
drop_substring(text, needle, NULL, NULL);
// text->str == "alpha gamma"

return_string(needle);
return_string(text);

See also

drop_substr

See also

drop_substr_lit

Note

If needle is not one of the supported types, this macro triggers a compile-time error in C11 builds via DROP_SUBSTRING_TYPECHECK_.

Note

The behavior (non-overlapping removal, reverse search optimization, optional removal of a single trailing ASCII space after each match, window clamping) is defined by the selected underlying function:

  • drop_substr

  • drop_substr_lit

Parameters:
  • s – Pointer to the destination string_t to modify.

  • needle – Substring to remove. Must be one of: const char*, char*, const string_t*, string_t*.

  • begin – Optional pointer to the first byte of the removal window within s->str. Pass NULL to start at the beginning of the used string.

  • end – Optional pointer to one-past-the-last byte of the removal window. Pass NULL to end at the used length of the string.

replace_substring

replace_substring(s, pattern, replacement, min_ptr, max_ptr)

Type-safe generic substring replacement convenience macro.

replace_substring(s, pattern, replacement, min_ptr, max_ptr) provides a single replacement interface that selects the correct implementation at compile time using the C11 _Generic operator.

Dispatch rules:

  • If pattern (and replacement) are C strings (const char* or char*), this macro expands to: replace_substr_lit((string_t*)s, (const char*)pattern, (const char*)replacement, (uint8_t*)min_ptr, (uint8_t*)max_ptr)

  • If pattern (and replacement) are string objects (const string_t* or string_t*), this macro expands to: replace_substr((string_t*)s, (const string_t*)pattern, (const string_t*)replacement, (char*)min_ptr, (char*)max_ptr)

The macro enforces at compile time that:

  • pattern is one of: const char*, char*, const string_t*, string_t*

  • replacement is one of the same supported types

  • pattern and replacement belong to the same category (both literal, or both string objects)

Availability:

  • Enabled only when ARENA_USE_CONVENIENCE_MACROS is defined, and

  • Disabled when NO_FUNCTION_MACROS is defined (to support MISRA-style builds).

allocator_vtable_t a = heap_allocator();

// Literal version
string_expect_t r = init_string("red green red", 0u, a);
assert_true(r.has_value);
string_t* s = r.u.value;

// Dispatches to replace_substr_lit(...)
(void)replace_substring(s, "red", "blue", NULL, NULL);
// s->str == "blue green blue"

return_string(s);

// string_t version
string_expect_t r1 = init_string("one two two", 0u, a);
string_expect_t r2 = init_string("two", 0u, a);
string_expect_t r3 = init_string("four", 0u, a);

if (r1.has_value && r2.has_value && r3.has_value) {
    string_t* t   = r1.u.value;
    string_t* pat = r2.u.value;
    string_t* rep = r3.u.value;

    // Dispatches to replace_substr(...)
    (void)replace_substring(t, pat, rep, NULL, NULL);
    // t->str == "one four four"

    return_string(rep);
    return_string(pat);
    return_string(t);
}

See also

replace_substr

Note

The window [min_ptr, max_ptr) is interpreted as end-exclusive. Both underlying implementations validate that the window lies within the string allocation and clamp to the used length.

Parameters:
  • s – Pointer to the destination string_t to modify.

  • pattern – Substring pattern to search for (literal or string_t).

  • replacement – Replacement substring (must match the category of pattern).

  • min_ptr – Optional pointer to the first byte of the replacement window within s->str. Pass NULL to start at the beginning of the used string.

  • max_ptr – Optional pointer to one-past-the-last byte of the replacement window. Pass NULL to end at the used length of the string.

Returns:

true/false as returned by the selected underlying function.

pop_string_token

pop_string_token(s, token, allocator)

Type-safe generic token pop convenience macro.

pop_string_token(s, token, allocator) selects the correct implementation at compile time using the C11 _Generic operator.

Dispatch rules:

  • If token is a C string (const char* or char*), dispatch to pop_str_token_lit.

  • If token is a string object (const string_t* or string_t*), dispatch to pop_str_token.

This macro performs no runtime type checks and adds no dispatch overhead.

Availability:

  • Enabled only when ARENA_USE_CONVENIENCE_MACROS is defined, and

  • Disabled when NO_FUNCTION_MACROS is defined.

allocator_vtable_t a = heap_allocator();
string_expect_t r = init_string("a/b/c", 0u, a);
assert_true(r.has_value);
string_t* s = r.u.value;

// Dispatches to pop_str_token_lit(s, "/", a)
string_expect_t out1 = pop_string_token(s, "/", a);
assert_true(out1.has_value);
// out1.u.value->str == "c"
// s->str == "a/b"
return_string(out1.u.value);

string_expect_t rt = init_string("/", 0u, a);
assert_true(rt.has_value);
string_t* tok = rt.u.value;

// Dispatches to pop_str_token(s, tok, a)
string_expect_t out2 = pop_string_token(s, tok, a);
(void)out2;

return_string(tok);
return_string(s);

See also

pop_str_token

Note

If token is not a supported type, this macro triggers a compile-time error in C11 builds.

Parameters:
  • s – Pointer to the source string_t to modify.

  • token – Token to search for (literal or string_t). Must be one of: const char*, char*, const string_t*, string_t*.

  • allocator – Allocator used to construct the returned string.

Returns:

A string_expect_t as returned by the selected underlying function.