A Crystal library for building Markov Chains and running Markov Processes.
A Markov Chain is essentially a mechanism for guessing probable future events based on a sample of past events. For a great explanation, watch this Khan Academy video.
Visit the API Documentation for a more in-depth look at the library's functionality.
Add this to your application's shard.yml:
dependencies:
markov:
github: mccallofthewild/markovIn your terminal, install Crystal dependencies with:
$ shards installor
$ crystal depsBegin by requiring the Markov module:
require "markov"A classic Markov text generator. This example will work well for small (array-sized) data sets.
NOTE: Markov::Chain is a generic type which contains, receives and generates elements of LinkType.
We'll start with the sample text:
example_string = "how much wood would a woodchuck chuck if a woodchuck could chuck wood"There are several Markov::Chain constructors to choose from. The simplest one takes in a LinkType array of elements as sample and a seed of LinkType. seed is the element in sample you want to start the chain with. If not provided, a random element will be chosen.
example_arr = example_string.split(" ") #=> ["how","much","wood","would","a","woodchuck","chuck","if","a","woodchuck","could","chuck","wood"]
seed = example_arr[0] #=> "how"
example_chain = Markov::Chain(String).new sample: example_arr, seed: seedFinally, we'll generate a probable sequence of elements with the Markov::Chain#generate method:
puts example_chain.generate(10)Output:
["much", "wood", "would", "a", "woodchuck", "could", "chuck", "if", "a", "woodchuck"]That's it!
If we wanted to get the elements one at a time, we could use the Markov::Chain#next method instead:
puts example_chain.next #=> "much"
puts example_chain.next #=> "wood"
puts example_chain.next #=> "would"This implementation was built for larger data sets, with asynchronous input in mind.
In this example, we will create a Markov::Chain which can generate realistic movie titles.
To begin, we instantiate a Markov::TransitionTable. A TransitionTable is a mechanism for training and implementing Markov processes.
example_table = Markov::TransitionTable(String).newNow we'll add a movie title using the Markov::TransitionTable#add method:
movie_one = %w(the great gatsby) # shortcut syntax for ["the","great","gatsby"]
movie_one.each do |word|
example_table.add(word)
endMarkov::TransitionTable#add adds elements one at a time. At a deeper level, it's adding each new word to the previous word's Transition Matrix (Markov::TransitionMatrix).
For syntactic sugar, if we have an array of elements, we can avoid looping through and #add-ing them by using the Markov::TransitionTable#fill method instead:
movie_one = %w(the great gatsby) # shortcut syntax for ["the","great","gatsby"]
example_table.fill table_with: movie_oneA problem arises at this point:
movie_two = %w(great expectations)
example_table.fill table_with: movie_twoThe above code sequentially adds each word to the TransitionTable. But The Great Gatsby and Great Expectations are two separate movie titles; the "Great" at the beginning of Great Expectations is not a probable transition from the "Gatsby" at the end of The Great Gatsby.
To solve this, use Markov::TransitionTable#reset. #reset clears the TransitionTable's last added key, allowing us to separate titles like so:
movie_one = %w(the great gatsby)
example_table.fill table_with: movie_one
example_table.reset
movie_two = %w(great expectations)
example_table.fill table_with: movie_two
example_table.reset
movie_three = %w(the great escape)
example_table.fill table_with: movie_threeFinally, we can put the TransitionTable to use by passing it to a Markov::Chain constructor as transition_table:
example_chain = Markov::Chain(String).new transition_table: example_table, seed: "great"With small and/or unique data sets, Markov chains are fallible to reaching dead ends. That is, they can often reach a point where there is nothing to transition to.
When this happens in the Markov module, Markov::Exceptions::EmptyTransitionMatrixException is raised.
For example:
dead_end_array = %w(some say the world will end in fire)
dead_end_chain = Markov::Chain(String).new sample: dead_end_array, seed: "fire"
# nothing comes after "fire", so the chain is at a dead end.
dead_end_chain.next # raises `EmptyTransitionMatrixException`To prevent this, use the Markov::Chain#on_dead_end exception handler.
This method takes in a callback block with arguments of: the Markov::Chain's @transition_table, the Markov::Chain instance, and the EmptyTransitionMatrixException raised.
The block's return value of LinkType fills in as the next item in the chain.
dead_end_array = %w(some say the world will end in fire)
dead_end_chain = Markov::Chain(String).new sample: dead_end_array, seed: "fire"
dead_end_chain.on_dead_end do |transition_table, chain, exception|
"some"
end
dead_end_chain.next #=> "some"
dead_end_chain.next #=> "say"
dead_end_chain.next #=> "the"- Fork it ( https://github.com/mccallofthewild/markov/fork )
- Create your feature branch (git checkout -b my-new-feature)
- Commit your changes (git commit -am 'Add some feature')
- Push to the branch (git push origin my-new-feature)
- Create a new Pull Request
- McCall Alexander mccallofthewild - creator, maintainer