High number of threads with 1000 players

Post here your questions about SFS2X. Here we discuss all server-side matters. For client API questions see the dedicated forums.

Moderators: Lapo, Bax

shahidbits2012
Posts: 16
Joined: 09 Jun 2021, 07:23

High number of threads with 1000 players

Postby shahidbits2012 » 22 Jun 2021, 10:16

Hi,

We are seeing ~1200 thread count in the SFS Admin Dashboard with 1000 users connected to SFS via socket (port=9933). Is this expected or we are doing something wrong here?

Also, we are seeing multiple SFS Worker Ext Threads with high CPU utilisation (around 16% each). What are these worker threads for and how its life-cycle is maintained?

Thanks
User avatar
Lapo
Site Admin
Posts: 22999
Joined: 21 Mar 2005, 09:50
Location: Italy

Re: High number of threads with 1000 players

Postby Lapo » 22 Jun 2021, 14:22

Hi,
thread count can increase if the workload on the server requires it, i.e. if there's lots of long running operations and the thread pools are all busy. Another reason for a high thread usage can be if you use Websockets, as Tomcat uses fairly large thread pools although typically they don't go that high in number.

Also, the thread count can be high if you're creating lots of TaskScheduler instances in your server side code.

What are these worker threads for and how its life-cycle is maintained?

They are the threads that run Extension code (i.e. your custom server side code). If you want to learn more you should probably obtain a thread dump from the AdminTool > Dashboard and post the result here.

Actually, since it's probably going to be a very long text, send us an email to support@... with the dump in text format (as attachment).
We'll take a look.

Question: is the CPU load also high? What's the average?

Thanks
Lapo
--
gotoAndPlay()
...addicted to flash games
shahidbits2012
Posts: 16
Joined: 09 Jun 2021, 07:23

Re: High number of threads with 1000 players

Postby shahidbits2012 » 23 Jun 2021, 05:20

Question: is the CPU load also high? What's the average?
- Yes.. the CPU hits 100%, with average of 40-60%.

We have a use case of leaderboard which needs to be shared with every player in the room at a regular frequency (2s). We are storing the player's game progress in a cache which is also hosted in the same machine. The cache is not consuming more than 10% of CPU.
Every 2 sec, we get all the players from cache, sort them, and send 50 records to each player (+/- 25 from the player's rank) as extension response.

Machine Config
We are using single EC2 machine (8 core, 64GB). The memory looks fine (<16GB) but the CPU has become bottleneck, once the CCU count becomes 1000+.

Stress test client:
We are using Node JS client for stress testing, with each client connecting via socket and sending an extension request with progress every 2 sec to SFS server. The SFS stores this data in cache and performs the sorting logic every 2 sec as mentioned above.

CPU:
Screen Shot 2021-06-23 at 10.27.01 AM.png
CPU
(48.16 KiB) Not downloaded yet

Memory:
Screen Shot 2021-06-23 at 10.28.21 AM.png
Memory
(55.5 KiB) Not downloaded yet

Message Q:
Screen Shot 2021-06-23 at 10.29.22 AM.png
Message Q
(97.83 KiB) Not downloaded yet


We are ready for launch but are blocked because of high CPU usage issue. Can you please help us here?
User avatar
Lapo
Site Admin
Posts: 22999
Joined: 21 Mar 2005, 09:50
Location: Italy

Re: High number of threads with 1000 players

Postby Lapo » 23 Jun 2021, 06:58

Hi,
the graphs you have posted definitely show that most of the hard work is done on the Extension side.
So it seems like the operation of getting the 50 entries and sorting them for each player is using significant resources, unless there's some other heavy work that your Extension does.

To make sure this is the cause of the bottleneck I'd recommend timing the execution of these leaderboard updates, and logging them to the log files so that you can easily see how long it takes for each request to complete. This will give a clearer picture of how expensive the operation is.

Also, is the server side code written in Java?
Maybe you could investigate the sorting code and see if there's any optimization that can be done. Alternatively you could think of using a longer interval for these updates, to offload the server a bit.

A more radical approach could be to move the cache and the sorting logic to a different machine, dedicated to that job. This way SFS2X can focus on handling the game logic rather than working as a database.
In this scenario SFS2X could simply request the pre-sorted results to the other machine running in the private AWS network, even by using plain HTTP.

Hope it helps
Lapo

--

gotoAndPlay()

...addicted to flash games
shahidbits2012
Posts: 16
Joined: 09 Jun 2021, 07:23

Re: High number of threads with 1000 players

Postby shahidbits2012 » 24 Jun 2021, 20:57

Hi Lapo,

Please find more data points.

We are using TaskScheduler class provided by SFS to schedule task at fixed rate (frequency 2 sec). In this task, we are performing following tasks -
Step 1) Get players' data from cache
Step 2) Sort players based on a parameter
Step 3) Send sorted data to every player in the room
Step 4) Save data back to cache

This is how we are starting the scheduler -

Code: Select all

getSfs().getTaskScheduler().scheduleAtFixedRate(scheduler, 0, 2, TimeUnit.SECONDS);


and, this is the task -

Code: Select all

public void run() {
    try {
        long startTime = System.nanoTime();
        extension.trace("CommonLeaderboardScheduler :: Leaderboard Scheduler (roomId= " + CommonUtils.getRoomId(room) + ") iteration #" + runningCycles);
        leaderboardManager.getLeaderboard(leaderboardConfig.leaderboardType).broadcast(this.room);
        runningCycles++;
        long endTime = System.nanoTime();
        long durationInMillis = (endTime - startTime) / 1000000;
        extension.trace("CommonLeaderboardScheduler :: Leaderboard Scheduler (roomId= " + CommonUtils.getRoomId(room) + "): " + durationInMillis + "ms");
    } catch (Exception e) {
        extension.trace("CommonLeaderboardScheduler :: ERROR: Exception occurred in Leaderboard Scheduler (roomId= " + CommonUtils.getRoomId(room) + ") iteration #" + runningCycles, e.getMessage());
        extension.trace(e);
    }
}


We are using Java 8 parallel stream to broadcast the data.

Code: Select all

room.getUserList().parallelStream().forEach(user -> sfs.getAPIManager().getSFSApi().sendExtensionResponse())


The overall execution details of this task can be found below -

CCU | Socket Connection | Task Execution Time (avg) | Dashboard Snapshot

100 | 102 | 65ms
200 | 201 | 110ms |
CCU=200.png
(207.51 KiB) Not downloaded yet

500 | 1070 | 320ms |
CCU=500.png
(227.15 KiB) Not downloaded yet

800 | 2641 | 600ms
1000 | 2531 | 650ms
(after 5 min) 1000 | 1808 | 1300ms
(after 10 min) 1000 | 1849 | 1800ms |
CCU=1000 after 10 mins.png
(243.5 KiB) Not downloaded yet

(after 15 min) 1000 | 1097 | 800ms
(after 30 min) 1000 | 1010 | 700ms

For CCU=1000, here is the breakdown of the subtask timing -

Step 1) + Step 2) = 20 ms
Step3) = 700ms
Step4) = 12ms

Clearly, the broadcasting is taking more than 80% of the execution time of the task execution.
Q1. Can you help us with the best way to broadcast data to 1000-2000 users every 2 sec?
Q2. Should we use Java Executor Fixed Threadpool, instead of Parallel Stream?
Q3. Is there something provided by SFS API for such an operation?
Q4. What should be our thread pool size for system and extension threadpool for this usecase?
Q5. Also, When the socket connections count came down, the performance got improved. Why are we seeing a high socket connections here, like for CCU=800?
Q6. And, for 1000 CCU, is 500-1000ms of CPU intensive job per 2 second is high for 8 core/16GB EC2 machine?

We will be highly grateful.

Thanks!

Return to “SFS2X Questions”

Who is online

Users browsing this forum: No registered users and 48 guests