SmartFoxServer 2.15.0 provides several improvements for scaling the UDP protocol to uber-high packet rates, which can be useful for real-time games with massive traffic that use very fast updates (e.g. 30 or 50 packet/s).
During the month of February 2020 we were contacted by a customer, running a large online multiplayer game, reporting a potential bottleneck when scaling UDP traffic over a certain point. In particular the problem seemed related more with packet rates (or pps) rather than bandwidth.
After some investigation we confirmed the customer’s findings that threads were blocking on the same DatagramChannel object, causing UDP writes to work serially rather than in parallel (a DatagramChannel is essentially a Java abstraction for a UDP connection between two endpoints).
We found this aspect particularly surprising because of the general design of the Java NIO API and because it seems a largely overlooked “feature” of the Java non-blocking UDP implementation, even among experts (if you want to learn more details about this post from our forum).
» Multiple iterations
To fix this issue we had to dive deep into the guts of Java’s non-blocking API to see exactly which of the many possible solutions would be the best to get rid of potential bottlenecks.
We stacked up the standard SFS2X implementation against multiple prototypes to see the differences:
- Non-blocking UDP w/ DatagramChannel cache (to avoid thread contention)
- Non-blocking UDP w/DatagramChannel cache + dedicated thread pool and message queue (implementing a more aggressive write-policy for UDP, compared to TCP)
- Old school blocking UDP + dedicated thread pool and message queue
Running all these tests on a dedicated quad-core Xeon server we found that the standard SFS2X implementation (v2.14.0) would max out at ~215Kpps while there was plenty of CPU available to keep going.
- Solution #1 provided twice as much the throughput with still some spare CPU available (more details here)
- Solution #2 was able to max out all the CPU and push over 1 Million pps, which is the kind of result we were looking for (for more details check the last section of this article)
- Solution #3 provided almost the same results as #2
At the end of all our tests, Solution #2 emerged as the most efficient way to solve all throughput issues, via the dedicated thread pool and cache that can be fine tuned based on the use case and hardware available.
» Revamped server engine
As we’ve mentioned, SmartFoxServer 2.15 comes with a dedicated thread pool and message queue for UDP communications alongside a new DatagramChannel queue.
What does this all mean for developers?
The good news is that you get all the benefits of the revamped UDP engine out of the box, without any extra requirements. By default the server will scale better than before without the need to tweak the configuration or updating your code.
If you’re running a high traffic real-time game that relies on the UDP protocol you will be interested in learning about a few new settings available in SFS2X 2.15
We have introduced several new low-level settings that can be tweaked in certain scenarios. These settings can be added to SFS2X/config/core.xml:
- udpSocketWriterThreadPoolSize: sets the size of the dedicated UDP thread pool size. If not specified it uses the same value as socketWriterThreadPoolSize
- udpSocketWriterQueueMaxSize: sets the size of the dedicated UDP message queue. Default size = 250000
- datagramChannelCacheSize: set the size of the DatagramChannel cache. Default value = 80
For high traffic real-time games with a high UDP packet rate you may want to fine tune the thread pool setting it to the number of cores available on your machine.
In case you’re running a massively multi-core server (e.g. 32+ cores) you may want to also increase the cache size to something like nThreads * 4.
For instance for a 96-cores machine, with an extremely high UDP traffic, you could set the UDP thread pool size to 64 and the cache to 64*4 = 256.
SmartFoxServer 2.15 comes as an update for any previous installation of 2.14.x, which makes it particularly easy to upgrade existing setups. If you’re running a game based on the UDP protocol for thousands of players we highly recommend this update, otherwise it can be skipped for the time being.
Finally, for those interested in the fine details, we provide below the results of our 1M+ pps (1 million packets/sec) stress test done with SFS2X 2.15, running on a relatively cheap quad core server.
2.15 Test results
- Server hardware:
- Quad-core Xeon E3-1578L (w/ hyper-threading) @2.0Ghz
- 32GB RAM
- 240GB SSD
- 10 Gbps network
- SFS2X 2.15, JRE 8
- SmartFoxServer custom settings:
- Extension thread pool: 12
- UDP thread pool: 8
- DatagramChannel cache: 128
- Stress test parameters:
- Client packet rate: 30pps
- Players per room: 16
- Total CCU: 2200
- Total Rooms: 138
- Client generation speed: 30ms
This means that every Room generates:
16 players * 30 pps = 480 pps (messages sent to server)
480 pps * 16 = 7680 pps (updates sent back from server to single Room)
Global outgoing packet rate is: 7680 pps * 138 Rooms = ~1.07 Mpps