pyparticle

pyparticle is a drop-in replacement for pyspark that runs much faster for automated tests.

The problem with pyspark is that when you are executing test code, it takes a long time (as in many seconds) to create data-frames and run operations on them, because all that work is done through the JVM. It's great for big data, but test code generally has small data, so it doesn't need the power of distributed computing. So, by reimplementing pyspark with native Python code, we can completely sidestep JVM, thus saving a lot of time on tests.

How to use

The current way to use pyparticle is to dynamically replace the pyspark module with it:

import sys

sys.modules["pyspark"] = __import__("pyparticle")

import pyspark # Automatically imports pyparticle.

The above works because pyparticle strictly implements the same API as pyspark's, even the submodule paths. Do note that such replacement needs to happen before any code tries to import pyspark.

If you use pytest, which is most probably, then you can just drop that code snippet inside tests/conftest.py, and all your tests should start using pyparticle by default.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
pyparticle/sql		pyparticle/sql
tests/pyparticle/sql		tests/pyparticle/sql
.envrc		.envrc
.gitignore		.gitignore
.python-version		.python-version
.testmondata		.testmondata
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pyparticle

How to use

About

Uh oh!

Releases

Packages

Languages

License

feroldi/pyparticle

Folders and files

Latest commit

History

Repository files navigation

pyparticle

How to use

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages