-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Description
Play Version (2.5.x / etc)
2.6.3
API
Scala
Operating System
CentOS Linux release 7.4.1708 (Core)
JDK
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)
Library Dependencies
N/A
Behavior
I'm considering using play for writing an app managing a large amount of web sockets and as part of validating this I compared performance with a couple of frameworks including a pure netty implementation for I/Os to compare with (thus non-play).
My play app is using the NettyServerProvider with native enabled.
Once all the sockets are connected, CPU stays high (above 90%) during the whole run while with the comparing pure netty implementation goes back to a 40% after ramp up.
I looked at https://github.com/playframework/playframework/blob/master/framework/src/play-netty-server/src/main/scala/play/core/server/NettyServer.scala and compared to my own.
Fundamentally nothing is really different except for the akka flow wrappers, however I noticed that the channelSink method reuse the same eventloop as the boss worker used at bootstrap.
I'm not a netty expert, but any example I ever find anywhere have a separate even loop for boss and workers, here is an example: https://github.com/netty/netty/blob/4.0/example/src/main/java/io/netty/example/http/websocketx/server/WebSocketServer.java. The ServerBootstrap class (netty) class also have a specific overload for this even though I know play uses the Bootstrap class
So I decided to do a local patch of this NettyServer and yes it significantly reduced the CPU usages.
It seems that both boss and workers should have separate even loops so that message processing doesn't interfere with accepting new connections, however I can't explain why CPU seems so much released.
Let me know if you are interested by a PR about this, I'll be happy to share my change.
Test setup is as follow:
- 512k concurrent web sockets with a 256 seconds ramp up, thus ~2k new sockets per seconds
- Each connection send dummy binary message to the server every minutes during 10 minutes
- The server check the connectivity to each clients once in the middle of the run
- Runs on a EC2 m3.xlarge
Reproducible Test Case
N/A