- 
                Notifications
    You must be signed in to change notification settings 
- Fork 113
Open
Labels
bugSomething isn't workingSomething isn't workingquestionFurther information is requestedFurther information is requested
Description
Hey, folks --
I'm having trouble getting the basic example provided to run.  Specifically the failure I'm encountering is at the daily-osci-rankings stage. I have confirmed that I have a functioning local version of Hadoop installed. Running on Ubuntu 20.04 LTS VPS with a fresh install.
I pulled the two most visible errors from the log out below (full log expandable at bottom of issue). It's unclear to me if they are related though.
Any help pointing me in the right direction would be appreciated!
$ python3 osci-cli.py get-github-daily-push-events -d 2020-01-01
# success
$ python3 osci-cli.py process-github-daily-push-events -d 2020-01-01
# success
$ python3 osci-cli.py daily-osci-rankings -td 2020-01-02
# failure (see full log below)
# ...
[2022-03-22 18:11:11,850] [DEBUG] Answer received: !ysorg.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.;\n	at org.apache.spark.sql.execution.datasources.DataSource.$anonfun$getOrInferFileFormatSchema$12(DataSource.scala:200)\n	at scala.Option.getOrElse(Option.scala:189)\n	at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:200)\n	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:408)\n	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:297)\n	at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:286)\n	at scala.Option.getOrElse(Option.scala:189)\n	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:286)\n	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n	at java.lang.reflect.Method.invoke(Method.java:498)\n	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\n	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n	at py4j.Gateway.invoke(Gateway.java:282)\n	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\n	at py4j.commands.CallCommand.execute(CallCommand.java:79)\n	at py4j.GatewayConnection.run(GatewayConnection.java:238)\n	at java.lang.Thread.run(Thread.java:748)\n
<osci.datalake.local.landing.LocalLandingArea object at 0x7fa5e8753f40> /data landing
<osci.datalake.local.staging.LocalStagingArea object at 0x7fa5e87609a0> /data staging
<osci.datalake.local.public.LocalPublicArea object at 0x7fa5e8760940> /data public
<osci.datalake.local.web.LocalWebArea object at 0x7fa5e8760a90> /web data
# ...
[2022-03-22 18:11:11,855] [DEBUG] Answer received: !yv
Traceback (most recent call last):
  File "osci-cli.py", line 93, in <module>
    cli(standalone_mode=False)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ubuntu/.local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/ubuntu/OSCI/osci/actions/base.py", line 59, in execute
    return self._execute(**self._process_params(kwargs))
  File "/home/ubuntu/OSCI/osci/actions/process/generate_daily_osci_rankings.py", line 49, in _execute
    commits = osci_ranking_job.extract(to_date=to_day).cache()
  File "/home/ubuntu/OSCI/osci/jobs/base.py", line 44, in extract
    commits=Session().load_dataframe(paths=self._get_dataset_paths(to_date, from_date))
  File "/home/ubuntu/OSCI/osci/jobs/session.py", line 39, in load_dataframe
    return self.spark_session.read.load(paths, **options)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pyspark/sql/readwriter.py", line 182, in load
    return self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
  File "/home/ubuntu/.local/lib/python3.8/site-packages/py4j/java_gateway.py", line 1304, in __call__
    return_value = get_return_value(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pyspark/sql/utils.py", line 134, in deco
    raise_from(converted)
  File "<string>", line 3, in raise_from
Full Error Log:
[2022-03-22 18:11:05,996] [INFO] ENV: None
[2022-03-22 18:11:05,997] [DEBUG] Check config file for env local exists
[2022-03-22 18:11:05,997] [DEBUG] Read config from /home/ubuntu/OSCI/osci/config/files/local.yml
[2022-03-22 18:11:06,000] [DEBUG] Prod yml load: {'meta': {'config_source': 'yaml'}, 'file_system': {'type': 'local', 'base_path': '/data'}, 'areas': {'landing': {'container': 'landing'}, 'staging': {'container': 'staging'}, 'public': {'container': 'public'}}, 'bq': {'project': '', 'secret': '{}'}, 'web': {'fs': 'local', 'base_path': '/web', 'account_name': '', 'account_key': '', 'container': 'data'}, 'github': {'token': ''}, 'company': {'default': 'EPAM'}}
[2022-03-22 18:11:06,000] [DEBUG] Prod yml res: {'meta': {'config_source': 'yaml'}, 'file_system': {'type': 'local', 'base_path': '/data'}, 'areas': {'landing': {'container': 'landing'}, 'staging': {'container': 'staging'}, 'public': {'container': 'public'}}, 'bq': {'project': '', 'secret': '{}'}, 'web': {'fs': 'local', 'base_path': '/web', 'account_name': '', 'account_key': '', 'container': 'data'}, 'github': {'token': ''}, 'company': {'default': 'EPAM'}}
[2022-03-22 18:11:06,000] [INFO] Full config: {'meta': {'config_source': 'yaml'}, 'file_system': {'type': 'local', 'base_path': '/data'}, 'areas': {'landing': {'container': 'landing'}, 'staging': {'container': 'staging'}, 'public': {'container': 'public'}}, 'bq': {'project': '', 'secret': '{}'}, 'web': {'fs': 'local', 'base_path': '/web', 'account_name': '', 'account_key': '', 'container': 'data'}, 'github': {'token': ''}, 'company': {'default': 'EPAM'}}
[2022-03-22 18:11:06,000] [INFO] Configuration loaded for env: local
[2022-03-22 18:11:06,000] [DEBUG] Create new <class 'osci.config.base.LocalFileSystemConfig'>
[2022-03-22 18:11:06,000] [DEBUG] {'fs': 'local', 'base_path': '/web', 'account_name': '', 'account_key': '', 'container': 'data'}
[2022-03-22 18:11:06,000] [DEBUG] Create new <class 'osci.config.base.Config'>
[2022-03-22 18:11:06,000] [DEBUG] Create new <class 'osci.datalake.datalake.DataLake'>
[2022-03-22 18:11:06,113] [INFO] Execute action `daily-osci-rankings`
[2022-03-22 18:11:06,113] [INFO] Action params `{'to_day': '2020-01-02'}`
[2022-03-22 18:11:06,114] [DEBUG] Create new <class 'osci.datalake.reports.general.osci_ranking.OSCIRankingFactory'>
[2022-03-22 18:11:06,114] [DEBUG] Create new <class 'osci.datalake.reports.general.commits_ranking.OSCICommitsRankingFactory'>
[2022-03-22 18:11:06,114] [DEBUG] Create new <class 'osci.jobs.session.Session'>
[2022-03-22 18:11:06,115] [DEBUG] Loaded paths for (None 2020-01-02 00:00:00) []
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
[2022-03-22 18:11:08,127] [DEBUG] Command to send: A
fb324a0d50b599ec733f3b3b1bc1d7f4d1c894100f14d4ad6f4af9db025d37ea
[2022-03-22 18:11:08,142] [DEBUG] Answer received: !yv
[2022-03-22 18:11:08,142] [DEBUG] Command to send: j
i
rj
org.apache.spark.SparkConf
e
[2022-03-22 18:11:08,143] [DEBUG] Answer received: !yv
[2022-03-22 18:11:08,143] [DEBUG] Command to send: j
i
rj
org.apache.spark.api.java.*
e
[2022-03-22 18:11:08,144] [DEBUG] Answer received: !yv
[2022-03-22 18:11:08,144] [DEBUG] Command to send: j
i
rj
org.apache.spark.api.python.*
e
[2022-03-22 18:11:08,144] [DEBUG] Answer received: !yv
[2022-03-22 18:11:08,144] [DEBUG] Command to send: j
i
rj
org.apache.spark.ml.python.*
e
[2022-03-22 18:11:08,144] [DEBUG] Answer received: !yv
[2022-03-22 18:11:08,144] [DEBUG] Command to send: j
i
rj
org.apache.spark.mllib.api.python.*
e
[2022-03-22 18:11:08,144] [DEBUG] Answer received: !yv
[2022-03-22 18:11:08,144] [DEBUG] Command to send: j
i
rj
org.apache.spark.sql.*
e
[2022-03-22 18:11:08,144] [DEBUG] Answer received: !yv
[2022-03-22 18:11:08,144] [DEBUG] Command to send: j
i
rj
org.apache.spark.sql.api.python.*
e
[2022-03-22 18:11:08,145] [DEBUG] Answer received: !yv
[2022-03-22 18:11:08,145] [DEBUG] Command to send: j
i
rj
org.apache.spark.sql.hive.*
e
[2022-03-22 18:11:08,146] [DEBUG] Answer received: !yv
[2022-03-22 18:11:08,146] [DEBUG] Command to send: j
i
rj
scala.Tuple2
e
[2022-03-22 18:11:08,146] [DEBUG] Answer received: !yv
[2022-03-22 18:11:08,146] [DEBUG] Command to send: r
u
SparkConf
rj
e
[2022-03-22 18:11:08,147] [DEBUG] Answer received: !ycorg.apache.spark.SparkConf
[2022-03-22 18:11:08,148] [DEBUG] Command to send: i
org.apache.spark.SparkConf
bTrue
e
[2022-03-22 18:11:08,154] [DEBUG] Answer received: !yro0
[2022-03-22 18:11:08,154] [DEBUG] Command to send: c
o0
contains
sspark.serializer.objectStreamReset
e
[2022-03-22 18:11:08,158] [DEBUG] Answer received: !ybfalse
[2022-03-22 18:11:08,158] [DEBUG] Command to send: c
o0
set
sspark.serializer.objectStreamReset
s100
e
[2022-03-22 18:11:08,158] [DEBUG] Answer received: !yro1
[2022-03-22 18:11:08,158] [DEBUG] Command to send: m
d
o1
e
[2022-03-22 18:11:08,159] [DEBUG] Answer received: !yv
[2022-03-22 18:11:08,159] [DEBUG] Command to send: c
o0
contains
sspark.rdd.compress
e
[2022-03-22 18:11:08,159] [DEBUG] Answer received: !ybfalse
[2022-03-22 18:11:08,159] [DEBUG] Command to send: c
o0
set
sspark.rdd.compress
sTrue
e
[2022-03-22 18:11:08,159] [DEBUG] Answer received: !yro2
[2022-03-22 18:11:08,159] [DEBUG] Command to send: m
d
o2
e
[2022-03-22 18:11:08,159] [DEBUG] Answer received: !yv
[2022-03-22 18:11:08,160] [DEBUG] Command to send: c
o0
contains
sspark.master
e
[2022-03-22 18:11:08,160] [DEBUG] Answer received: !ybtrue
[2022-03-22 18:11:08,160] [DEBUG] Command to send: c
o0
contains
sspark.app.name
e
[2022-03-22 18:11:08,160] [DEBUG] Answer received: !ybtrue
[2022-03-22 18:11:08,160] [DEBUG] Command to send: c
o0
contains
sspark.master
e
[2022-03-22 18:11:08,160] [DEBUG] Answer received: !ybtrue
[2022-03-22 18:11:08,160] [DEBUG] Command to send: c
o0
get
sspark.master
e
[2022-03-22 18:11:08,161] [DEBUG] Answer received: !yslocal[*]
[2022-03-22 18:11:08,161] [DEBUG] Command to send: c
o0
contains
sspark.app.name
e
[2022-03-22 18:11:08,162] [DEBUG] Answer received: !ybtrue
[2022-03-22 18:11:08,162] [DEBUG] Command to send: c
o0
get
sspark.app.name
e
[2022-03-22 18:11:08,162] [DEBUG] Answer received: !yspyspark-shell
[2022-03-22 18:11:08,162] [DEBUG] Command to send: c
o0
contains
sspark.home
e
[2022-03-22 18:11:08,163] [DEBUG] Answer received: !ybfalse
[2022-03-22 18:11:08,163] [DEBUG] Command to send: c
o0
getAll
e
[2022-03-22 18:11:08,163] [DEBUG] Answer received: !yto3
[2022-03-22 18:11:08,163] [DEBUG] Command to send: a
e
o3
e
[2022-03-22 18:11:08,164] [DEBUG] Answer received: !yi7
[2022-03-22 18:11:08,164] [DEBUG] Command to send: a
g
o3
i0
e
[2022-03-22 18:11:08,164] [DEBUG] Answer received: !yro4
[2022-03-22 18:11:08,164] [DEBUG] Command to send: c
o4
_1
e
[2022-03-22 18:11:08,165] [DEBUG] Answer received: !ysspark.rdd.compress
[2022-03-22 18:11:08,165] [DEBUG] Command to send: c
o4
_2
e
[2022-03-22 18:11:08,165] [DEBUG] Answer received: !ysTrue
[2022-03-22 18:11:08,166] [DEBUG] Command to send: a
e
o3
e
[2022-03-22 18:11:08,166] [DEBUG] Answer received: !yi7
[2022-03-22 18:11:08,166] [DEBUG] Command to send: a
g
o3
i1
e
[2022-03-22 18:11:08,166] [DEBUG] Answer received: !yro5
[2022-03-22 18:11:08,166] [DEBUG] Command to send: c
o5
_1
e
[2022-03-22 18:11:08,166] [DEBUG] Answer received: !ysspark.serializer.objectStreamReset
[2022-03-22 18:11:08,167] [DEBUG] Command to send: c
o5
_2
e
[2022-03-22 18:11:08,167] [DEBUG] Answer received: !ys100
[2022-03-22 18:11:08,167] [DEBUG] Command to send: a
e
o3
e
[2022-03-22 18:11:08,167] [DEBUG] Answer received: !yi7
[2022-03-22 18:11:08,167] [DEBUG] Command to send: a
g
o3
i2
e
[2022-03-22 18:11:08,167] [DEBUG] Answer received: !yro6
[2022-03-22 18:11:08,167] [DEBUG] Command to send: c
o6
_1
e
[2022-03-22 18:11:08,170] [DEBUG] Answer received: !ysspark.master
[2022-03-22 18:11:08,170] [DEBUG] Command to send: c
o6
_2
e
[2022-03-22 18:11:08,171] [DEBUG] Answer received: !yslocal[*]
[2022-03-22 18:11:08,171] [DEBUG] Command to send: a
e
o3
e
[2022-03-22 18:11:08,171] [DEBUG] Answer received: !yi7
[2022-03-22 18:11:08,171] [DEBUG] Command to send: a
g
o3
i3
e
[2022-03-22 18:11:08,171] [DEBUG] Answer received: !yro7
[2022-03-22 18:11:08,171] [DEBUG] Command to send: c
o7
_1
e
[2022-03-22 18:11:08,172] [DEBUG] Answer received: !ysspark.submit.pyFiles
[2022-03-22 18:11:08,172] [DEBUG] Command to send: c
o7
_2
e
[2022-03-22 18:11:08,172] [DEBUG] Answer received: !ys
[2022-03-22 18:11:08,172] [DEBUG] Command to send: a
e
o3
e
[2022-03-22 18:11:08,172] [DEBUG] Answer received: !yi7
[2022-03-22 18:11:08,172] [DEBUG] Command to send: a
g
o3
i4
e
[2022-03-22 18:11:08,173] [DEBUG] Answer received: !yro8
[2022-03-22 18:11:08,173] [DEBUG] Command to send: c
o8
_1
e
[2022-03-22 18:11:08,173] [DEBUG] Answer received: !ysspark.submit.deployMode
[2022-03-22 18:11:08,173] [DEBUG] Command to send: c
o8
_2
e
[2022-03-22 18:11:08,173] [DEBUG] Answer received: !ysclient
[2022-03-22 18:11:08,173] [DEBUG] Command to send: a
e
o3
e
[2022-03-22 18:11:08,173] [DEBUG] Answer received: !yi7
[2022-03-22 18:11:08,173] [DEBUG] Command to send: a
g
o3
i5
e
[2022-03-22 18:11:08,173] [DEBUG] Answer received: !yro9
[2022-03-22 18:11:08,174] [DEBUG] Command to send: c
o9
_1
e
[2022-03-22 18:11:08,174] [DEBUG] Answer received: !ysspark.ui.showConsoleProgress
[2022-03-22 18:11:08,174] [DEBUG] Command to send: c
o9
_2
e
[2022-03-22 18:11:08,174] [DEBUG] Answer received: !ystrue
[2022-03-22 18:11:08,174] [DEBUG] Command to send: a
e
o3
e
[2022-03-22 18:11:08,174] [DEBUG] Answer received: !yi7
[2022-03-22 18:11:08,174] [DEBUG] Command to send: a
g
o3
i6
e
[2022-03-22 18:11:08,174] [DEBUG] Answer received: !yro10
[2022-03-22 18:11:08,175] [DEBUG] Command to send: c
o10
_1
e
[2022-03-22 18:11:08,175] [DEBUG] Answer received: !ysspark.app.name
[2022-03-22 18:11:08,175] [DEBUG] Command to send: c
o10
_2
e
[2022-03-22 18:11:08,175] [DEBUG] Answer received: !yspyspark-shell
[2022-03-22 18:11:08,175] [DEBUG] Command to send: a
e
o3
e
[2022-03-22 18:11:08,175] [DEBUG] Answer received: !yi7
[2022-03-22 18:11:08,175] [DEBUG] Command to send: m
d
o3
e
[2022-03-22 18:11:08,175] [DEBUG] Answer received: !yv
[2022-03-22 18:11:08,175] [DEBUG] Command to send: r
u
JavaSparkContext
rj
e
[2022-03-22 18:11:08,186] [DEBUG] Answer received: !ycorg.apache.spark.api.java.JavaSparkContext
[2022-03-22 18:11:08,186] [DEBUG] Command to send: i
org.apache.spark.api.java.JavaSparkContext
ro0
e
[2022-03-22 18:11:09,483] [DEBUG] Answer received: !yro11
[2022-03-22 18:11:09,483] [DEBUG] Command to send: c
o11
sc
e
[2022-03-22 18:11:09,489] [DEBUG] Answer received: !yro12
[2022-03-22 18:11:09,490] [DEBUG] Command to send: c
o12
conf
e
[2022-03-22 18:11:09,499] [DEBUG] Answer received: !yro13
[2022-03-22 18:11:09,500] [DEBUG] Command to send: r
u
PythonAccumulatorV2
rj
e
[2022-03-22 18:11:09,501] [DEBUG] Answer received: !ycorg.apache.spark.api.python.PythonAccumulatorV2
[2022-03-22 18:11:09,502] [DEBUG] Command to send: i
org.apache.spark.api.python.PythonAccumulatorV2
s127.0.0.1
i45879
sfb324a0d50b599ec733f3b3b1bc1d7f4d1c894100f14d4ad6f4af9db025d37ea
e
[2022-03-22 18:11:09,502] [DEBUG] Answer received: !yro14
[2022-03-22 18:11:09,502] [DEBUG] Command to send: c
o11
sc
e
[2022-03-22 18:11:09,502] [DEBUG] Answer received: !yro15
[2022-03-22 18:11:09,503] [DEBUG] Command to send: c
o15
register
ro14
e
[2022-03-22 18:11:09,505] [DEBUG] Answer received: !yv
[2022-03-22 18:11:09,505] [DEBUG] Command to send: r
u
PythonUtils
rj
e
[2022-03-22 18:11:09,506] [DEBUG] Answer received: !ycorg.apache.spark.api.python.PythonUtils
[2022-03-22 18:11:09,506] [DEBUG] Command to send: r
m
org.apache.spark.api.python.PythonUtils
isEncryptionEnabled
e
[2022-03-22 18:11:09,506] [DEBUG] Answer received: !ym
[2022-03-22 18:11:09,506] [DEBUG] Command to send: c
z:org.apache.spark.api.python.PythonUtils
isEncryptionEnabled
ro11
e
[2022-03-22 18:11:09,507] [DEBUG] Answer received: !ybfalse
[2022-03-22 18:11:09,508] [DEBUG] Command to send: r
u
org
rj
e
[2022-03-22 18:11:09,509] [DEBUG] Answer received: !yp
[2022-03-22 18:11:09,510] [DEBUG] Command to send: r
u
org.apache
rj
e
[2022-03-22 18:11:09,510] [DEBUG] Answer received: !yp
[2022-03-22 18:11:09,510] [DEBUG] Command to send: r
u
org.apache.spark
rj
e
[2022-03-22 18:11:09,510] [DEBUG] Answer received: !yp
[2022-03-22 18:11:09,511] [DEBUG] Command to send: r
u
org.apache.spark.SparkFiles
rj
e
[2022-03-22 18:11:09,511] [DEBUG] Answer received: !ycorg.apache.spark.SparkFiles
[2022-03-22 18:11:09,511] [DEBUG] Command to send: r
m
org.apache.spark.SparkFiles
getRootDirectory
e
[2022-03-22 18:11:09,511] [DEBUG] Answer received: !ym
[2022-03-22 18:11:09,511] [DEBUG] Command to send: c
z:org.apache.spark.SparkFiles
getRootDirectory
e
[2022-03-22 18:11:09,512] [DEBUG] Answer received: !ys/tmp/spark-133764be-4844-4a91-a340-210c1b419fda/userFiles-58b63090-eb7f-4872-8939-2710678287d1
[2022-03-22 18:11:09,512] [DEBUG] Command to send: c
o13
get
sspark.submit.pyFiles
s
e
[2022-03-22 18:11:09,512] [DEBUG] Answer received: !ys
[2022-03-22 18:11:09,513] [DEBUG] Command to send: r
u
org
rj
e
[2022-03-22 18:11:09,514] [DEBUG] Answer received: !yp
[2022-03-22 18:11:09,514] [DEBUG] Command to send: r
u
org.apache
rj
e
[2022-03-22 18:11:09,515] [DEBUG] Answer received: !yp
[2022-03-22 18:11:09,515] [DEBUG] Command to send: r
u
org.apache.spark
rj
e
[2022-03-22 18:11:09,515] [DEBUG] Answer received: !yp
[2022-03-22 18:11:09,515] [DEBUG] Command to send: r
u
org.apache.spark.util
rj
e
[2022-03-22 18:11:09,515] [DEBUG] Answer received: !yp
[2022-03-22 18:11:09,516] [DEBUG] Command to send: r
u
org.apache.spark.util.Utils
rj
e
[2022-03-22 18:11:09,517] [DEBUG] Answer received: !ycorg.apache.spark.util.Utils
[2022-03-22 18:11:09,517] [DEBUG] Command to send: r
m
org.apache.spark.util.Utils
getLocalDir
e
[2022-03-22 18:11:09,519] [DEBUG] Answer received: !ym
[2022-03-22 18:11:09,519] [DEBUG] Command to send: c
o11
sc
e
[2022-03-22 18:11:09,519] [DEBUG] Answer received: !yro16
[2022-03-22 18:11:09,519] [DEBUG] Command to send: c
o16
conf
e
[2022-03-22 18:11:09,520] [DEBUG] Answer received: !yro17
[2022-03-22 18:11:09,520] [DEBUG] Command to send: c
z:org.apache.spark.util.Utils
getLocalDir
ro17
e
[2022-03-22 18:11:09,520] [DEBUG] Answer received: !ys/tmp/spark-133764be-4844-4a91-a340-210c1b419fda
[2022-03-22 18:11:09,520] [DEBUG] Command to send: r
u
org
rj
e
[2022-03-22 18:11:09,521] [DEBUG] Answer received: !yp
[2022-03-22 18:11:09,521] [DEBUG] Command to send: r
u
org.apache
rj
e
[2022-03-22 18:11:09,522] [DEBUG] Answer received: !yp
[2022-03-22 18:11:09,522] [DEBUG] Command to send: r
u
org.apache.spark
rj
e
[2022-03-22 18:11:09,522] [DEBUG] Answer received: !yp
[2022-03-22 18:11:09,522] [DEBUG] Command to send: r
u
org.apache.spark.util
rj
e
[2022-03-22 18:11:09,523] [DEBUG] Answer received: !yp
[2022-03-22 18:11:09,523] [DEBUG] Command to send: r
u
org.apache.spark.util.Utils
rj
e
[2022-03-22 18:11:09,523] [DEBUG] Answer received: !ycorg.apache.spark.util.Utils
[2022-03-22 18:11:09,523] [DEBUG] Command to send: r
m
org.apache.spark.util.Utils
createTempDir
e
[2022-03-22 18:11:09,523] [DEBUG] Answer received: !ym
[2022-03-22 18:11:09,524] [DEBUG] Command to send: c
z:org.apache.spark.util.Utils
createTempDir
s/tmp/spark-133764be-4844-4a91-a340-210c1b419fda
spyspark
e
[2022-03-22 18:11:09,524] [DEBUG] Answer received: !yro18
[2022-03-22 18:11:09,524] [DEBUG] Command to send: c
o18
getAbsolutePath
e
[2022-03-22 18:11:09,525] [DEBUG] Answer received: !ys/tmp/spark-133764be-4844-4a91-a340-210c1b419fda/pyspark-bc66966b-69a0-4a5b-b7ab-b0b7c8e45101
[2022-03-22 18:11:09,525] [DEBUG] Command to send: c
o13
get
sspark.python.profile
sfalse
e
[2022-03-22 18:11:09,525] [DEBUG] Answer received: !ysfalse
[2022-03-22 18:11:09,525] [DEBUG] Command to send: r
u
SparkSession
rj
e
[2022-03-22 18:11:09,544] [DEBUG] Answer received: !ycorg.apache.spark.sql.SparkSession
[2022-03-22 18:11:09,545] [DEBUG] Command to send: r
m
org.apache.spark.sql.SparkSession
getDefaultSession
e
[2022-03-22 18:11:09,567] [DEBUG] Answer received: !ym
[2022-03-22 18:11:09,567] [DEBUG] Command to send: c
z:org.apache.spark.sql.SparkSession
getDefaultSession
e
[2022-03-22 18:11:09,568] [DEBUG] Answer received: !yro19
[2022-03-22 18:11:09,568] [DEBUG] Command to send: c
o19
isDefined
e
[2022-03-22 18:11:09,569] [DEBUG] Answer received: !ybfalse
[2022-03-22 18:11:09,569] [DEBUG] Command to send: r
u
SparkSession
rj
e
[2022-03-22 18:11:09,570] [DEBUG] Answer received: !ycorg.apache.spark.sql.SparkSession
[2022-03-22 18:11:09,570] [DEBUG] Command to send: c
o11
sc
e
[2022-03-22 18:11:09,571] [DEBUG] Answer received: !yro20
[2022-03-22 18:11:09,571] [DEBUG] Command to send: i
org.apache.spark.sql.SparkSession
ro20
e
[2022-03-22 18:11:09,620] [DEBUG] Answer received: !yro21
[2022-03-22 18:11:09,620] [DEBUG] Command to send: c
o21
sqlContext
e
[2022-03-22 18:11:09,621] [DEBUG] Answer received: !yro22
[2022-03-22 18:11:09,621] [DEBUG] Command to send: r
u
SparkSession
rj
e
[2022-03-22 18:11:09,622] [DEBUG] Answer received: !ycorg.apache.spark.sql.SparkSession
[2022-03-22 18:11:09,622] [DEBUG] Command to send: r
m
org.apache.spark.sql.SparkSession
setDefaultSession
e
[2022-03-22 18:11:09,623] [DEBUG] Answer received: !ym
[2022-03-22 18:11:09,623] [DEBUG] Command to send: c
z:org.apache.spark.sql.SparkSession
setDefaultSession
ro21
e
[2022-03-22 18:11:09,623] [DEBUG] Answer received: !yv
[2022-03-22 18:11:09,623] [DEBUG] Command to send: r
u
SparkSession
rj
e
[2022-03-22 18:11:09,624] [DEBUG] Answer received: !ycorg.apache.spark.sql.SparkSession
[2022-03-22 18:11:09,624] [DEBUG] Command to send: r
m
org.apache.spark.sql.SparkSession
setActiveSession
e
[2022-03-22 18:11:09,624] [DEBUG] Answer received: !ym
[2022-03-22 18:11:09,624] [DEBUG] Command to send: c
z:org.apache.spark.sql.SparkSession
setActiveSession
ro21
e
[2022-03-22 18:11:09,625] [DEBUG] Answer received: !yv
[2022-03-22 18:11:09,625] [DEBUG] Command to send: c
o22
read
e
[2022-03-22 18:11:10,432] [DEBUG] Answer received: !yro23
[2022-03-22 18:11:10,432] [DEBUG] Command to send: r
u
PythonUtils
rj
e
[2022-03-22 18:11:10,433] [DEBUG] Answer received: !ycorg.apache.spark.api.python.PythonUtils
[2022-03-22 18:11:10,433] [DEBUG] Command to send: r
m
org.apache.spark.api.python.PythonUtils
toSeq
e
[2022-03-22 18:11:10,433] [DEBUG] Answer received: !ym
[2022-03-22 18:11:10,433] [DEBUG] Command to send: i
java.util.ArrayList
e
[2022-03-22 18:11:10,433] [DEBUG] Answer received: !ylo24
[2022-03-22 18:11:10,434] [DEBUG] Command to send: c
z:org.apache.spark.api.python.PythonUtils
toSeq
ro24
e
[2022-03-22 18:11:10,434] [DEBUG] Answer received: !yro25
[2022-03-22 18:11:10,434] [DEBUG] Command to send: m
d
o24
e
[2022-03-22 18:11:10,435] [DEBUG] Answer received: !yv
[2022-03-22 18:11:10,435] [DEBUG] Command to send: c
o23
load
ro25
e
22/03/22 18:11:10 WARN DataSource: All paths were ignored:
  
[Stage 0:>                                                          (0 + 1) / 1]
                                                                                
[2022-03-22 18:11:11,839] [DEBUG] Answer received: !xro26
[2022-03-22 18:11:11,839] [DEBUG] Command to send: c
o26
toString
e
[2022-03-22 18:11:11,840] [DEBUG] Answer received: !ysorg.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.;
[2022-03-22 18:11:11,840] [DEBUG] Command to send: c
o26
getCause
e
[2022-03-22 18:11:11,840] [DEBUG] Answer received: !yn
[2022-03-22 18:11:11,840] [DEBUG] Command to send: r
u
org
rj
e
[2022-03-22 18:11:11,842] [DEBUG] Answer received: !yp
[2022-03-22 18:11:11,842] [DEBUG] Command to send: r
u
org.apache
rj
e
[2022-03-22 18:11:11,844] [DEBUG] Answer received: !yp
[2022-03-22 18:11:11,844] [DEBUG] Command to send: r
u
org.apache.spark
rj
e
[2022-03-22 18:11:11,848] [DEBUG] Answer received: !yp
[2022-03-22 18:11:11,848] [DEBUG] Command to send: r
u
org.apache.spark.util
rj
e
[2022-03-22 18:11:11,849] [DEBUG] Answer received: !yp
[2022-03-22 18:11:11,849] [DEBUG] Command to send: r
u
org.apache.spark.util.Utils
rj
e
[2022-03-22 18:11:11,849] [DEBUG] Answer received: !ycorg.apache.spark.util.Utils
[2022-03-22 18:11:11,849] [DEBUG] Command to send: r
m
org.apache.spark.util.Utils
exceptionString
e
[2022-03-22 18:11:11,849] [DEBUG] Answer received: !ym
[2022-03-22 18:11:11,849] [DEBUG] Command to send: c
z:org.apache.spark.util.Utils
exceptionString
ro26
e
[2022-03-22 18:11:11,850] [DEBUG] Answer received: !ysorg.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.;\n	at org.apache.spark.sql.execution.datasources.DataSource.$anonfun$getOrInferFileFormatSchema$12(DataSource.scala:200)\n	at scala.Option.getOrElse(Option.scala:189)\n	at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:200)\n	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:408)\n	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:297)\n	at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:286)\n	at scala.Option.getOrElse(Option.scala:189)\n	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:286)\n	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n	at java.lang.reflect.Method.invoke(Method.java:498)\n	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\n	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n	at py4j.Gateway.invoke(Gateway.java:282)\n	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\n	at py4j.commands.CallCommand.execute(CallCommand.java:79)\n	at py4j.GatewayConnection.run(GatewayConnection.java:238)\n	at java.lang.Thread.run(Thread.java:748)\n
<osci.datalake.local.landing.LocalLandingArea object at 0x7fa5e8753f40> /data landing
<osci.datalake.local.staging.LocalStagingArea object at 0x7fa5e87609a0> /data staging
<osci.datalake.local.public.LocalPublicArea object at 0x7fa5e8760940> /data public
<osci.datalake.local.web.LocalWebArea object at 0x7fa5e8760a90> /web data
[2022-03-22 18:11:11,852] [DEBUG] Command to send: m
d
o0
e
[2022-03-22 18:11:11,853] [DEBUG] Answer received: !yv
[2022-03-22 18:11:11,853] [DEBUG] Command to send: m
d
o4
e
[2022-03-22 18:11:11,853] [DEBUG] Answer received: !yv
[2022-03-22 18:11:11,853] [DEBUG] Command to send: m
d
o5
e
[2022-03-22 18:11:11,853] [DEBUG] Answer received: !yv
[2022-03-22 18:11:11,853] [DEBUG] Command to send: m
d
o6
e
[2022-03-22 18:11:11,853] [DEBUG] Answer received: !yv
[2022-03-22 18:11:11,853] [DEBUG] Command to send: m
d
o7
e
[2022-03-22 18:11:11,853] [DEBUG] Answer received: !yv
[2022-03-22 18:11:11,853] [DEBUG] Command to send: m
d
o8
e
[2022-03-22 18:11:11,854] [DEBUG] Answer received: !yv
[2022-03-22 18:11:11,854] [DEBUG] Command to send: m
d
o9
e
[2022-03-22 18:11:11,854] [DEBUG] Answer received: !yv
[2022-03-22 18:11:11,854] [DEBUG] Command to send: m
d
o10
e
[2022-03-22 18:11:11,854] [DEBUG] Answer received: !yv
[2022-03-22 18:11:11,854] [DEBUG] Command to send: m
d
o12
e
[2022-03-22 18:11:11,854] [DEBUG] Answer received: !yv
[2022-03-22 18:11:11,854] [DEBUG] Command to send: m
d
o15
e
[2022-03-22 18:11:11,854] [DEBUG] Answer received: !yv
[2022-03-22 18:11:11,854] [DEBUG] Command to send: m
d
o16
e
[2022-03-22 18:11:11,854] [DEBUG] Answer received: !yv
[2022-03-22 18:11:11,855] [DEBUG] Command to send: m
d
o17
e
[2022-03-22 18:11:11,855] [DEBUG] Answer received: !yv
[2022-03-22 18:11:11,855] [DEBUG] Command to send: m
d
o18
e
[2022-03-22 18:11:11,855] [DEBUG] Answer received: !yv
[2022-03-22 18:11:11,855] [DEBUG] Command to send: m
d
o19
e
[2022-03-22 18:11:11,855] [DEBUG] Answer received: !yv
[2022-03-22 18:11:11,855] [DEBUG] Command to send: m
d
o20
e
[2022-03-22 18:11:11,855] [DEBUG] Answer received: !yv
Traceback (most recent call last):
  File "osci-cli.py", line 93, in <module>
    cli(standalone_mode=False)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ubuntu/.local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/ubuntu/OSCI/osci/actions/base.py", line 59, in execute
    return self._execute(**self._process_params(kwargs))
  File "/home/ubuntu/OSCI/osci/actions/process/generate_daily_osci_rankings.py", line 49, in _execute
    commits = osci_ranking_job.extract(to_date=to_day).cache()
  File "/home/ubuntu/OSCI/osci/jobs/base.py", line 44, in extract
    commits=Session().load_dataframe(paths=self._get_dataset_paths(to_date, from_date))
  File "/home/ubuntu/OSCI/osci/jobs/session.py", line 39, in load_dataframe
    return self.spark_session.read.load(paths, **options)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pyspark/sql/readwriter.py", line 182, in load
    return self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
  File "/home/ubuntu/.local/lib/python3.8/site-packages/py4j/java_gateway.py", line 1304, in __call__
    return_value = get_return_value(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pyspark/sql/utils.py", line 134, in deco
    raise_from(converted)
  File "<string>", line 3, in raise_from
pyspark.sql.utils.AnalysisException[2022-03-22 18:11:11,879] [DEBUG] Command to send: r
u
org
rj
e
[2022-03-22 18:11:11,881] [DEBUG] Answer received: !yp
[2022-03-22 18:11:11,881] [DEBUG] Command to send: r
u
org.apache
rj
e
[2022-03-22 18:11:11,882] [DEBUG] Answer received: !yp
[2022-03-22 18:11:11,882] [DEBUG] Command to send: r
u
org.apache.spark
rj
e
[2022-03-22 18:11:11,882] [DEBUG] Answer received: !yp
[2022-03-22 18:11:11,882] [DEBUG] Command to send: r
u
org.apache.spark.sql
rj
e
[2022-03-22 18:11:11,882] [DEBUG] Answer received: !yp
[2022-03-22 18:11:11,882] [DEBUG] Command to send: r
u
org.apache.spark.sql.internal
rj
e
[2022-03-22 18:11:11,883] [DEBUG] Answer received: !yp
[2022-03-22 18:11:11,883] [DEBUG] Command to send: r
u
org.apache.spark.sql.internal.SQLConf
rj
e
[2022-03-22 18:11:11,883] [DEBUG] Answer received: !ycorg.apache.spark.sql.internal.SQLConf
[2022-03-22 18:11:11,883] [DEBUG] Command to send: r
m
org.apache.spark.sql.internal.SQLConf
get
e
[2022-03-22 18:11:11,885] [DEBUG] Answer received: !ym
[2022-03-22 18:11:11,885] [DEBUG] Command to send: c
z:org.apache.spark.sql.internal.SQLConf
get
e
[2022-03-22 18:11:11,885] [DEBUG] Answer received: !yro27
[2022-03-22 18:11:11,885] [DEBUG] Command to send: c
o27
pysparkJVMStacktraceEnabled
e
[2022-03-22 18:11:11,886] [DEBUG] Answer received: !ybfalse
: Unable to infer schema for Parquet. It must be specified manually.;
[2022-03-22 18:11:11,924] [DEBUG] Command to send: m
d
o27
e
[2022-03-22 18:11:11,927] [DEBUG] Answer received: !yv
[2022-03-22 18:11:11,965] [DEBUG] Command to send: m
d
o26
e
[2022-03-22 18:11:11,966] [DEBUG] Answer received: !yv
[2022-03-22 18:11:11,966] [DEBUG] Command to send: m
d
o25
e
[2022-03-22 18:11:11,966] [DEBUG] Answer received: !yv
[2022-03-22 18:11:11,966] [DEBUG] Command to send: m
d
o23
e
[2022-03-22 18:11:11,966] [DEBUG] Answer received: !yv
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingquestionFurther information is requestedFurther information is requested