finalfusion-python

Introduction

finalfusion-python is a Python module for reading, writing, and using finalfusion embeddings, but also offers methods to read and use fastText, word2vec and GloVe embeddings. This module is implemented in Rust as a wrapper around the finalfusion crate.

The Python module supports the same types of embeddings:

Vocabulary:
- No subwords
- Subwords
Embedding matrix:
- Array
- Memory-mapped
- Quantized
Format:
- finalfusion
- fastText
- word2vec
- GloVe

Installation

The finalfusion module is available on PyPi for some platforms. You can use pip to install the module:

$ pip install --upgrade finalfusion

Building from source

finalfusion can also be built from source. This requires a Rust toolchain that is installed through rustup. First, you need maturin:

$ cargo install maturin

finalfusion currently requires a nightly version of Rust. You can use rustup to switch to a nightly build:

# Use the nightly toolchain in the current directory.
$ rustup override set nightly

Now you can build finalfusion-python wheels for Python versions that are detected by maturin:

$ maturin build --release

The wheels are then in the target/wheels directory.

Getting embeddings

finalfusion uses its own embedding format, which supports memory mapping, subword units, and quantized matrices. Moreover, finalfusion can read fastText, GloVe and word2vec embeddings, but does not support memory mapping those formats. Such embedddings can be converted to finalfusion format using finalfusion-utils' convert.

Embeddings trained with finalfrontier version 0.4.0 and later are in finalfusion format and can be used directly with this Python module.

Usage

Embeddings can be loaded as follows:

import finalfusion
# Loading embeddings in finalfusion format
embeds = finalfusion.Embeddings("myembeddings.fifu")

# Or if you want to memory-map the embedding matrix:
embeds = finalfusion.Embeddings("myembeddings.fifu", mmap=True)

# fastText format
embeds = finalfusion.Embeddings.read_fasttext("myembeddings.bin")

# word2vec format
embeds = finalfusion.Embeddings.read_word2vec("myembeddings.w2v")

You can then compute an embedding, perform similarity queries, or analogy queries:

e = embeds.embedding("Tübingen")
# default similarity query for "Tübingen"
embeds.word_similarity("Tübingen")

# similarity query based on a vector, returning the closest embedding to
# the input vector, skipping "Tübingen"
embeds.embeddings_similarity(e, skip={"Tübingen"})

# default analogy query
embeds.analogy("Berlin", "Deutschland", "Amsterdam")

# analogy query allowing "Deutschland" as answer
embeds.analogy("Berlin", "Deutschland", "Amsterdam", mask=(True,False,True))

If you want to operate directly on the full embedding matrix, you can get a copy of this matrix through:

# get copy of embedding matrix, changes to this won't touch the original matrix
e.matrix_copy()

Finally access to the vocabulary is provided through:

v = e.vocab()
# get a list of indices associated with "Tübingen"
v.item_to_indices("Tübingen")

# get a list of `(ngram, index)` tuples for "Tübingen"
v.ngram_indices("Tübingen")

# get a list of subword indices for "Tübingen"
v.subword_indices("Tübingen")

More usage examples can be found in the examples directory.

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
examples		examples
finalfusion/tests		finalfusion/tests
nix		nix
src		src
.appveyor.yml		.appveyor.yml
.gitignore		.gitignore
.travis.yml		.travis.yml
CONTRIBUTORS		CONTRIBUTORS
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE.md		LICENSE.md
MANIFEST.in		MANIFEST.in
README.md		README.md
_config.yml		_config.yml
default.nix		default.nix
pyproject.toml		pyproject.toml
setup.py		setup.py
shell.nix		shell.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

finalfusion-python

Introduction

Installation

Building from source

Getting embeddings

Usage

Where to go from here

About

Uh oh!

Releases 11

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

finalfusion/finalfusion-python

Folders and files

Latest commit

History

Repository files navigation

finalfusion-python

Introduction

Installation

Building from source

Getting embeddings

Usage

Where to go from here

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 11

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages