Visigoth

A Go package for full-text search and indexing

Features

Text analysis with tokenization, filtering, and stemming
Memory-efficient inverted index
Multiple search algorithms (Linear, Hits-based, Noop)
Spanish language support with Snowball stemming
Repository pattern for data persistence

Installation

go get github.com/sonirico/visigoth

Usage

package main

import (
    "context"
    "fmt"
    "log"
    "github.com/sonirico/visigoth"
)

type VisigothSearcher struct {
    repo visigoth.Repo
}

type SearchPayload struct {
    Index string
    Terms string
}

func (v *VisigothSearcher) Search(p SearchPayload) error {
    stream, err := v.repo.Search(p.Index, p.Terms, visigoth.HitsSearch)
    if err != nil {
        return err
    }
    
    // Process results from stream
    for result := range stream.Chan() {
        fmt.Printf("Document: %s, Hits: %d\n", 
            result.Doc().ID(), result.Hits)
    }
    
    return nil
}

func main() {
    // Create tokenization pipeline with Spanish support
    tokenizer := visigoth.NewKeepAlphanumericTokenizer()
    pipeline := visigoth.NewTokenizationPipeline(
        tokenizer,
        visigoth.NewLowerCaseTokenizer(),
        visigoth.NewStopWordsFilter(visigoth.SpanishStopWords),
        visigoth.NewSpanishStemmer(true),
    )
    
    // Create repository with memory index builder
    repo := visigoth.NewIndexRepo(visigoth.NewMemoryIndexBuilder(pipeline))
    
    // Create searcher
    searcher := &VisigothSearcher{repo: repo}
    
    // Index some documents
    repo.Put("courses", visigoth.NewDocRequest("java-course", "Curso de programación en Java"))
    repo.Put("courses", visigoth.NewDocRequest("go-course", "Curso de programación en Go"))
    repo.Put("courses", visigoth.NewDocRequest("python-course", "Curso de programación en Python"))
    
    // Search
    if err := searcher.Search(SearchPayload{
        Index: "courses",
        Terms: "programación java",
    }); err != nil {
        log.Fatal(err)
    }
}

Architecture

The package is organized into focused components:

analyze - Text processing pipeline
index - Document indexing and storage
search - Query processing and ranking
stemmer - Language-specific text normalization
repos - Data persistence layer with aliasing
loaders - Document loading utilities
entities - Core data structures

Search Algorithms

Visigoth provides two main search algorithms, both implementing AND logic (all query tokens must be present in matching documents):

HitsSearch vs LinearSearch

Feature	HitsSearch	LinearSearch
Algorithm	Hit counting + threshold filtering	Set intersection
Time Complexity	O(T × D + R log R)	O(T × D + I)
Space Complexity	O(R)	O(I)
Result Ordering	Relevance-based (hit count) + document order	Document index order
Best For	Relevance ranking, scoring	Boolean matching, performance
Multi-token Efficiency	Constant per token	Early termination possible

Where:

T = number of search tokens
D = average documents per token
R = number of matching documents
I = size of intersection sets

Algorithm Details

HitsSearch

// Uses hit counting to rank results by relevance
results := HitsSearch([]string{"programming", "tutorial"}, indexer)
// Returns documents sorted by relevance (most matching tokens first)

Process:

Count hits (unique tokens) per document
Filter documents with hits ≥ threshold (AND logic)
Sort by hit count (relevance), then by document order (determinism)

Best for:

✅ Relevance ranking needed
✅ User expects "best matches first"
✅ Flexible scoring systems
✅ Apps with ranked suggestions

LinearSearch

// Uses set intersection for exact boolean matching
results := LinearSearch([]string{"programming", "tutorial"}, indexer)
// Returns documents in document index order

Process:

Get document sets for each token
Compute intersection of all sets (AND logic)
Return matches in document index order

Best for:

✅ Pure boolean matching
✅ Performance-critical applications
✅ Large queries (many tokens)
✅ Consistent ordering needed

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
analyze_clean_tokenizer.go		analyze_clean_tokenizer.go
analyze_compose.go		analyze_compose.go
analyze_lowercase_filter.go		analyze_lowercase_filter.go
analyze_stem_filter.go		analyze_stem_filter.go
analyze_stopwords_filter.go		analyze_stopwords_filter.go
entities_common.go		entities_common.go
entities_doc.go		entities_doc.go
entities_hash.go		entities_hash.go
entities_request.go		entities_request.go
go.mod		go.mod
go.sum		go.sum
index_index.go		index_index.go
index_memory_index.go		index_memory_index.go
index_memory_index_test.go		index_memory_index_test.go
repos_repo.go		repos_repo.go
repos_repo_test.go		repos_repo_test.go
search_hits_search.go		search_hits_search.go
search_hits_search_test.go		search_hits_search_test.go
search_linear_search.go		search_linear_search.go
search_linear_search_test.go		search_linear_search_test.go
search_noop_search.go		search_noop_search.go
search_result.go		search_result.go
search_result_easyjson.go		search_result_easyjson.go
search_search.go		search_search.go
stemmer_spanish_snowball.go		stemmer_spanish_snowball.go
stemmer_spanish_snowball_test.go		stemmer_spanish_snowball_test.go
stemmer_stem.go		stemmer_stem.go
visigoth.gif		visigoth.gif
visigoth.png		visigoth.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Visigoth

Features

Installation

Usage

Architecture

Search Algorithms

HitsSearch vs LinearSearch

Algorithm Details

HitsSearch

LinearSearch

Contributing

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Languages

License

sonirico/visigoth

Folders and files

Latest commit

History

Repository files navigation

Visigoth

Features

Installation

Usage

Architecture

Search Algorithms

HitsSearch vs LinearSearch

Algorithm Details

HitsSearch

LinearSearch

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Languages

Packages