Custom spark mllib extension with a python wrapper

Version: Spark 2.2.1

For a more detailed explanation check: https://raufer.github.io/2018-02-08-custom-spark-models-with-python-wrappers/

For a clean packaging:

sbt clean assembly

To create an environment for the python subfolder:

pipenv install --three

To run the tests:

sbt test

or

pipenv run nosetests tests/

From scala:

val df = Seq(
  (1.0, 0.0),
  (5.0, 2.0),
  (0.0, 0.0),
  (7.0, 2.0),
  (4.0, 1.0),
  (8.0, 3.0),
  (10.0, 3.0)
).toDF("input", "expected")

val nBins = 4

val bucketizer = new Bucketizer()
  .setInputCol("input")
  .setOutputCol("bin")
  .setNumberBins(nBins)

val model = bucketizer.fit(df)

model.transform(df).show()

+-----+--------+---+
|input|expected|bin|
+-----+--------+---+
|  1.0|       0|  0|
|  5.0|       2|  2|
|  0.0|       0|  0|
|  7.0|       2|  2|
|  4.0|       1|  1|
|  8.0|       3|  3|
| 10.0|       4|  4|
+-----+--------+---+

From python:

from custom_spark_ml.feature.bucketizer import Bucketizer, BucketizerModel

data = [
    (1.0, 0),
    (5.0, 2),
    (0.0, 0),
    (7.0, 2),
    (4.0, 1),
    (8.0, 3),
    (10.0, 4)
]

df = spark.createDataFrame(data, ["input", "expected"])

nBins = 4

bucketizer = Bucketizer() \
    .setInputCol("input") \
    .setOutputCol("bin") \
    .setNumberBins(nBins)

model = bucketizer.fit(df)

model.transform(df).show()

+-----+--------+---+
|input|expected|bin|
+-----+--------+---+
|  1.0|       0|  0|
|  5.0|       2|  2|
|  0.0|       0|  0|
|  7.0|       2|  2|
|  4.0|       1|  1|
|  8.0|       3|  3|
| 10.0|       4|  4|
+-----+--------+---+

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
project		project
python		python
src		src
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Custom spark mllib extension with a python wrapper

About

Uh oh!

Releases

Packages

Languages

raufer/custom-spark-models

Folders and files

Latest commit

History

Repository files navigation

Custom spark mllib extension with a python wrapper

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages