Thanks to visit codestin.com
Credit goes to github.com

Skip to content

A modern, performant regular expression library for Zig.

License

zig-utils/zig-regex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

zig-regex

A modern, high-performance regular expression library for Zig

Zig License

FeaturesInstallationQuick StartDocumentationPerformance


Overview

zig-regex is a comprehensive regular expression engine for Zig featuring Thompson NFA construction with linear time complexity, extensive pattern support, and advanced optimization capabilities. Built with zero external dependencies and full memory control through Zig allocators.

Features

Core Regex Features

Feature Syntax Description
Literals abc, 123 Match exact characters and strings
Quantifiers *, +, ?, {n}, {m,n} Greedy repetition
Lazy Quantifiers *?, +?, ??, {n,m}? Non-greedy repetition
Possessive Quantifiers *+, ++, ?+, {n,m}+ Atomic repetition (no backtracking)
Alternation a|b|c Match any alternative
Character Classes \d, \w, \s, \D, \W, \S Predefined character sets
Custom Classes [abc], [a-z], [^0-9] User-defined character sets
Unicode Classes \p{Letter}, \p{Number}, \X Unicode property support
Anchors ^, $, \A, \z, \Z, \b, \B Position matching
Wildcards . Match any character
Groups (...) Capturing groups
Named Groups (?P<name>...), (?<name>...) Named capturing groups
Non-capturing (?:...) Grouping without capture
Atomic Groups (?>...) Possessive grouping
Lookahead (?=...), (?!...) Positive/negative lookahead
Lookbehind (?<=...), (?<!...) Positive/negative lookbehind
Backreferences \1, \2, \k<name> Reference previous captures
Conditionals (?(condition)yes|no) Conditional patterns
Escaping \\, \., \n, \t, etc. Special character escaping

Advanced Features

  • Hybrid Execution Engine: Automatically selects between Thompson NFA (O(n×m)) and optimized backtracking
  • AST Optimization: Constant folding, dead code elimination, quantifier simplification
  • NFA Optimization: Epsilon transition removal, state merging, transition optimization
  • Pattern Macros: Composable, reusable pattern definitions
  • Type-Safe Builder API: Fluent interface for programmatic pattern construction
  • Thread Safety: Safe concurrent matching with proper synchronization
  • C FFI: Complete C API for interoperability
  • WASM Support: WebAssembly compilation target
  • Profiling & Analysis: Built-in performance profiling and pattern linting
  • Comprehensive API: compile, find, findAll, replace, replaceAll, split, iterator support

Quality & Performance

  • Zero Dependencies: Only Zig standard library
  • Linear Time Matching: Thompson NFA guarantees O(n×m) worst-case
  • Memory Safety: Full control via Zig allocators, no hidden allocations
  • Extensive Tests: Comprehensive test suite with 150+ test cases
  • Battle-Tested: Compliance tests against standard regex behavior

Installation

Using Zig Package Manager (zon)

// build.zig.zon
.{
    .name = "your-project",
    .version = "0.1.0",
    .dependencies = .{
        .regex = .{
            .url = "https://github.com/zig-utils/zig-regex/archive/main.tar.gz",
            .hash = "...", // zig will provide this
        },
    },
}
// build.zig
const regex = b.dependency("regex", .{
    .target = target,
    .optimize = optimize,
});
exe.root_module.addImport("regex", regex.module("regex"));

Manual Installation

git clone https://github.com/zig-utils/zig-regex.git
cd zig-regex
zig build

Quick Start

Basic Pattern Matching

const std = @import("std");
const Regex = @import("regex").Regex;

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    // Simple matching
    const regex = try Regex.compile(allocator, "\\d{3}-\\d{4}");
    defer regex.deinit();

    if (try regex.find("Call me at 555-1234")) |match| {
        std.debug.print("Found: {s}\n", .{match.slice}); // "555-1234"
    }
}

Named Capture Groups

const regex = try Regex.compile(allocator, "(?P<year>\\d{4})-(?P<month>\\d{2})-(?P<day>\\d{2})");
defer regex.deinit();

if (try regex.find("Date: 2024-03-15")) |match| {
    const year = match.getCapture("year");   // "2024"
    const month = match.getCapture("month"); // "03"
    const day = match.getCapture("day");     // "15"
}

Unicode Support

// Match any Unicode letter
const regex = try Regex.compile(allocator, "\\p{Letter}+");

// Match emoji
const emoji_regex = try Regex.compile(allocator, "\\p{Emoji}");

// Match grapheme clusters
const grapheme_regex = try Regex.compile(allocator, "\\X+");

Atomic Groups & Possessive Quantifiers

// Prevent catastrophic backtracking
const regex = try Regex.compile(allocator, "(?>a+)b");
const poss_regex = try Regex.compile(allocator, "a++b");

// These won't match "aaaa" - no backtracking allowed
try std.testing.expect(try regex.find("aaaa") == null);
try std.testing.expect(try poss_regex.find("aaaa") == null);

Conditional Patterns

// Match different patterns based on a condition
const regex = try Regex.compile(allocator, "(a)?(?(1)b|c)");

try std.testing.expectEqualStrings("ab", (try regex.find("ab")).?.slice);
try std.testing.expectEqualStrings("c", (try regex.find("c")).?.slice);

Builder API

const Builder = @import("regex").Builder;

var builder = Builder.init(allocator);
defer builder.deinit();

const pattern = try builder
    .startGroup()
    .literal("https?://")
    .oneOrMore(Builder.Patterns.word())
    .literal(".")
    .oneOrMore(Builder.Patterns.alpha())
    .endGroup()
    .build();

const regex = try Regex.compile(allocator, pattern);
defer regex.deinit();

Pattern Macros

const MacroRegistry = @import("regex").MacroRegistry;
const CommonMacros = @import("regex").CommonMacros;

var macros = MacroRegistry.init(allocator);
defer macros.deinit();

// Load common macros
try CommonMacros.loadInto(&macros);

// Define custom macros
try macros.define("phone", "\\d{3}-\\d{4}");
try macros.define("email", "${email_local}@${email_domain}");

// Expand macros in patterns
const pattern = try macros.expand("Contact: ${email} or ${phone}");
defer allocator.free(pattern);

Documentation

Performance

zig-regex uses Thompson NFA construction to guarantee O(n×m) worst-case time complexity:

  • n = input string length
  • m = pattern length

This prevents catastrophic backtracking that plagues traditional regex engines.

Benchmarks

Pattern: /\d{3}-\d{4}/
Input: 1000-byte string
Time: ~850ns (M1 MacBook Pro)

Pattern: /(?:a|b)*c/
Input: 10000 'a's + 'c'
Time: Linear growth (no exponential backtracking)

Run benchmarks: zig build bench

Building

# Build library
zig build

# Run tests
zig build test

# Run examples
zig build example

# Run benchmarks
zig build bench

# Generate documentation
zig build docs

Development Roadmap

See TODO.md for the complete development roadmap and planned features.

Requirements

  • Zig 0.15.1 or later
  • No external dependencies

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

License

MIT License - see LICENSE file for details.

Acknowledgments

Inspired by:

  • Ken Thompson's NFA construction algorithm
  • RE2 (Google's regex engine)
  • Rust's regex crate
  • PCRE (Perl Compatible Regular Expressions)

Support


Made with ❤️ for the Zig community

About

A modern, performant regular expression library for Zig.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages