Thanks to visit codestin.com
Credit goes to github.com

Skip to content

whitfin/hypex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hypex

Build Status Coverage Status Hex.pm Version Documentation

Hypex is a fast HyperLogLog implementation in Elixir which provides an easy way to count unique values with a small memory footprint. This library is based on the paper documenting the algorithm written by Philippe Flajolet et al.

Installation

Hypex is available on Hex. You can install the package via:

def deps do
  [{ :hypex, "~> 2.1" }]
end

Usage

Hypex is extremely straightforward to use, you simply create a new Hypex record and start adding values to it:

iex> hypex = Hypex.new(4)
{:hypex, Hypex.Register.Array, 4, {:array, 16, 0, 0, 100}}
iex> hypex = Hypex.update(hypex, "my term")
{:hypex, Hypex.Register.Array, 4,
 {:array, 16, 0, 0,
  {10, {0, 2, 0, 0, 0, 0, 0, 0, 0, 0}, 10, 10, 10, 10, 10, 10, 10, 10, 10}}}
iex> Hypex.cardinality(hypex)
1.0326163382011386

The 4 being passed to Hypex.new/1 is the width which determines the underlying memory structure of a Hypex instance. This value can be within the range 4 <= width <= 16, per the HyperLogLog algorithm. If you don't provide a width, it defaults to 8. Be aware that you should typically scale this number higher based upon the more unique values you expect to see.

You can control the underlying storage register via the second parameter of Hypex.new/2. This defaults to an :array implementation, but Hypex includes a few different implementations based on your use case. Please see the documentation to view the registers available in your current version of Hypex.

For any other examples of how to use Hypex, please read the documentation.

Registers

In most cases it won't matter which register you choose, so it's a good idea to start with the default until you see some reason to change. If you would like some very rough guidance, here are some simple characteristics:

  • Array is the best general purpose, scaling well to all of cardinality, memory and width
  • Bitstring is the most memory efficient, and scales very well for cardinality and width
  • List is good for small widths (i.e. up to 10) and is more inspectable and debuggable
  • Map is sparse with fast writes, can be good for low-to-medium cardinality use cases
  • Tuple is very quick access for read-heavy use cases of small width (i.e. up to 12)

These guidelines are based on Elixir 1.19; it's possible that they display different characteristics on earlier Elixir/OTP versions. Make sure to measure inside your application using your real traffic and data!

Below are some rough benchmarks for the different Hypex registers; these benchmarks can be found in the repository. Any metrics with updates will be inserting a unique value, as a duplicate value is significantly faster due to skipping modification:

## Hypex (Array)

Array Hypex.new/1               0.02 μs/op
Array Hypex.update/2            0.17 μs/op
Array Hypex.cardinality/1       1.89 μs/op
Array Hypex.merge/2             3.91 μs/op

## Hypex (Bitstring)

Bitstring Hypex.new/1           0.07 μs/op
Bitstring Hypex.update/2        0.14 μs/op
Bitstring Hypex.cardinality/1   3.60 μs/op
Bitstring Hypex.merge/2         2.65 μs/op

## Hypex (List)

List Hypex.new/1                0.33 μs/op
List Hypex.update/2             0.53 μs/op
List Hypex.cardinality/1        1.27 μs/op
List Hypex.merge/2              2.40 μs/op

## Hypex (Map)

Map Hypex.new/1                 0.01 μs/op
Map Hypex.update/2              0.10 μs/op
Map Hypex.cardinality/1         2.75 μs/op
Map Hypex.merge/2               0.05 μs/op

## Hypex (Tuple)

Tuple Hypex.new/1                0.68 μs/op
Tuple Hypex.update/2             0.14 μs/op
Tuple Hypex.cardinality/1        1.41 μs/op
Tuple Hypex.merge/2              1.90 μs/op

These tests use a width of 8, and it should be noted that the width heavily impacts these numbers. The smallest widths (4) are measured in ns rather than μs, whereas the largest widths (16) are typically in the millisecond range for cardinality calculations.

Contributions

If you feel something can be improved, or have any questions about certain behaviours or pieces of implementation, please feel free to file an issue. Proposed changes should be taken to issues before any PRs to avoid wasting time on code which might not be merged upstream.

If you do make changes to the codebase, please make sure you test your changes thoroughly, and include any unit tests alongside new or changed behaviours. Hypex currently uses the excellent excoveralls to track code coverage.

$ mix test
$ mix coveralls
$ mix coveralls.html && open cover/excoveralls.html

About

Fast HyperLogLog implementation for Elixir/Erlang

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages