Primitive (c) TextIndexing library

This library provides the way to index text documents by words they contain.

Basic usage

First you should create the IndexerSet instance

var indexerSet = IndexerSet.Create();

An IndexerCreationOptions instance with the following additional options can be specified as an argument to this method:

string comparison type used to compare words in index
IStreamParser or ILineParser that defines the way words would be extracted from documents content

Then you can add one or more document sources to obtain documents from. There are two standard implementations of document source: SingleFileDocumentSource and DirectoryDocumentSource.

indexerSet.Add(new DirectoryDocumentSource(baseDirectory, "*.cs")); 
indexerSet.Add(new SingleFileDocumentSource(Path.Combine(baseDirectory, "example.txt"));

The IndexerSet provides the Index property which can be used then to query documents from the index:

// matches only "apple" word, returns single WordDocuments collection
var appleDocuments = indexerSet.Index.GetExactWord("apple");

The returned instance of WordDocuments is a collection of DocumentInfos, each pointing to the original document containing the word being searched for.

// matches all words starting with "ban" and returns the list of WordDocuments
// for each word matched
var banWords = indexerSet.Index.GetWordsStartWith("ban");

This query will return sequence of WordDocuments, one for each matching word. Then you can flatten this sequence with SelectMany operation: var banDocuments = banWords.SelectMany(wordDocuments => wordDocuments);

Advanced usage

An IIndex instance can be used without creating IndexerSet. The following example shows, how to create an index and attach an Indexer to it:

var options = new IndexerCreationOptions();
var index = options.CreateIndex();
var parser = options.GetDefaultStreamParser();

var indexer = new Indexer(
    index, 
    new DirectoryDocumentSource(AppDomain.CurrentDomain.BaseDirectory, "*.txt"), 
    parser);
indexer.StartIndexing();

Custom document parsers

To create custom document parser one should implement ILineParser or IStreamParser interface. The former being provided with the text line by line should extract all words it can from each individual line. No state should be shared between calls by ILineParser implementation. The latter is provided with TextReader and can read entire document with it.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
examples/Primitive.Text.Indexing.UI		examples/Primitive.Text.Indexing.UI
src		src
tests/Primitive.Text.Indexing.Tests		tests/Primitive.Text.Indexing.Tests
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Primitive (c) TextIndexing library

Basic usage

Advanced usage

Custom document parsers

About

Uh oh!

Releases

Packages

Languages

ilya-g/TextIndexing

Folders and files

Latest commit

History

Repository files navigation

Primitive (c) TextIndexing library

Basic usage

Advanced usage

Custom document parsers

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages