-
Couldn't load subscription status.
- Fork 13
Open
Description
Hi, I managed to get sparkhpc and imnet running on our institute HPC cluster, however, when I run the code to generate a distributed graph:
import findspark; findspark.init()
import sparkhpc
template_path = '/cluster/home/eirikhoy/sparkhpc/build/lib/sparkhpc/templates/sparkjob.slurm.template'
sj = sparkhpc.sparkjob.SLURMSparkJob(ncores=4, template=template_path)
from pyspark import SparkContext
sc = SparkContext(master=sj.master_url())
import imnet
import numpy as np
from scipy.sparse import csr_matrix
import pyspark
strings = imnet.random_strings.generate_random_sequences(5000)
g_rdd = imnet.process_strings.generate_spark_graph(strings, sc, max_ld=2).cache()
I get the error:
UnboundLocalError Traceback (most recent call last)
<ipython-input-15-af167cc949f4> in <module>()
----> 1 g_rdd = imnet.process_strings.generate_spark_graph(strings, sc, max_ld=2).cache()
/cluster/home/eirikhoy/.conda/envs/imnet_v0.2/lib/python2.7/site-packages/imnet/process_strings.pyc in generate_spark_graph(strings, sc, mat, min_ld, max_ld)
189 warn("Problem importing pyspark -- are you sure your SPARK_HOME is set?")
190
--> 191 sqc = SQLContext(sc)
192
193 strings_b = sc.broadcast(strings)
UnboundLocalError: local variable 'SQLContext' referenced before assignment
Note, I tested it on a local VM and got the same error, so maybe the issue is not with incorrect dependencies?
Both SPARK_HOME and JAVA_HOME environment variable are assigned:
>>> os.environ['SPARK_HOME']
'/cluster/software/Spark/2.4.0-intel-2018b-Python-3.6.6'
>>> os.environ['JAVA_HOME']
'/cluster/software/Java/1.8.0_212'
The rest of the code examples ran fine.
Metadata
Metadata
Assignees
Labels
No labels