-
Notifications
You must be signed in to change notification settings - Fork 984
Closed
Labels
Description
Code of Conduct
- I agree to follow this project's Code of Conduct
Search before asking
- I have searched in the issues and found no similar issues.
Describe the bug
- Spark context stooped dut to OOM in
spark-listener-group-shared, and calltryOrStopSparkContext.
25/08/03 19:08:06 ERROR Utils: uncaught error in thread spark-listener-group-shared, stopping SparkContext
java.lang.OutOfMemoryError: GC overhead limit exceeded
25/08/03 19:08:06 INFO OperationAuditLogger: operation=a7f134b9-373b-402d-a82b-2d42df568807 opType=ExecuteStatement state=INITIALIZED user=b_hrvst session=6a90d01c-7627-4ae6-a506-7ba826355489
...
25/08/03 19:08:23 INFO SparkSQLSessionManager: Opening session for [email protected]
25/08/03 19:08:23 ERROR SparkTBinaryFrontendService: Error opening session:
org.apache.kyuubi.KyuubiSQLException: Cannot call methods on a stopped SparkContext.
This stopped SparkContext was created at:
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:951)
org.apache.kyuubi.engine.spark.SparkSQLEngine$.createSpark(SparkSQLEngine.scala:337)
org.apache.kyuubi.engine.spark.SparkSQLEngine$.main(SparkSQLEngine.scala:415)
org.apache.kyuubi.engine.spark.SparkSQLEngine.main(SparkSQLEngine.scala)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:732)
The currently active SparkContext was created at:
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:951)
org.apache.kyuubi.engine.spark.SparkSQLEngine$.createSpark(SparkSQLEngine.scala:337)
org.apache.kyuubi.engine.spark.SparkSQLEngine$.main(SparkSQLEngine.scala:415)
org.apache.kyuubi.engine.spark.SparkSQLEngine.main(SparkSQLEngine.scala)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:732)
at org.apache.kyuubi.KyuubiSQLException$.apply(KyuubiSQLException.scala:69)
at org.apache.kyuubi.KyuubiSQLException$.apply(KyuubiSQLException.scala:73)
- The kyuubi engine stop after 12 hours.
25/08/04 07:13:25 ERROR ZookeeperDiscoveryClient: Zookeeper client connection state changed to: LOST, but failed to reconnect in 3 seconds. Give up retry and stop gracefully .
25/08/04 07:13:25 INFO ClientCnxn: Session establishment complete on server zeus-slc-zk-3.vip.hadoop.ebay.com/10.147.141.240:2181, sessionid = 0x3939e22c983032e, negotiated timeout = 40000
25/08/04 07:13:25 INFO ConnectionStateManager: State change: RECONNECTED
25/08/04 07:13:25 INFO ZookeeperDiscoveryClient: Zookeeper client connection state changed to: RECONNECTED
25/08/04 07:13:25 INFO SparkSQLEngine: Service: [SparkTBinaryFrontend] is stopping.
25/08/04 07:13:25 INFO SparkTBinaryFrontendService: Service: [EngineServiceDiscovery] is stopping.
25/08/04 07:13:25 WARN EngineServiceDiscovery: The Zookeeper ensemble is LOST
25/08/04 07:13:25 INFO EngineServiceDiscovery: Service[EngineServiceDiscovery] is stopped.
25/08/04 07:13:25 INFO SparkTBinaryFrontendService: Service[SparkTBinaryFrontend] is stopped.
25/08/04 07:13:25 INFO SparkTBinaryFrontendService: SparkTBinaryFrontend has stopped
25/08/04 07:13:25 INFO SparkSQLEngine: Service: [SparkSQLBackendService] is stopping.
25/08/04 07:13:25 INFO SparkSQLBackendService: Service: [SparkSQLSessionManager] is stopping.
25/08/04 07:13:25 INFO SparkSQLSessionManager: Service: [SparkSQLOperationManager] is stopping.
25/08/04 07:13:45 INFO SparkSQLOperationManager: Service[SparkSQLOperationManager] is stopped.
25/08/04 07:13:45 INFO SparkSQLSessionManager: Service[SparkSQLSessionManager] is stopped.
-
seem the shutdown hook does not work in such case
Lines 375 to 376 in 9a0c49e
// Stop engine before SparkContext stopped to avoid calling a stopped SparkContext addShutdownHook(() => engine.stop(), SPARK_CONTEXT_SHUTDOWN_PRIORITY + 2) -
and
SparkSQLEngineListenerdid not receiveApplicationEndmessage, maybe due tospark-listener-group-sharedOOM? I do not have jstack for that, and can not check whether the thread alive.
Lines 55 to 63 in 9a0c49e
override def onApplicationEnd(event: SparkListenerApplicationEnd): Unit = { server.getServiceState match { case ServiceState.STOPPED => debug("Received ApplicationEnd Message form Spark after the engine has stopped") case state => info(s"Received ApplicationEnd Message from Spark at $state, stopping") server.stop() } }
Affects Version(s)
master
Kyuubi Server Log Output
Kyuubi Engine Log Output
Kyuubi Server Configurations
Kyuubi Engine Configurations
Additional context
No response
Are you willing to submit PR?
- Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
- No. I cannot submit a PR at this time.