Problems after upgrading to SFS2X 2.13.0

Post here your questions about SFS2X. Here we discuss all server-side matters. For client API questions see the dedicated forums.

Moderators: Lapo, Bax

User avatar
rewb0rn
Posts: 177
Joined: 26 Jan 2009, 12:54
Location: Berlin, Germany

Problems after upgrading to SFS2X 2.13.0

Postby rewb0rn » 20 Jul 2017, 07:49

Hello team,

we are experiencing issues after doing an upgrade to SFS2X 2.13.0. We downloaded the SFS 2.13.0 package and put it next to the old 2.11.1 package on our Linux machine. We then manually migrated all configurations and settings to the SFS 2.13.0 installation and started the server. I.e. the hardware and OS (Ubuntu 14) remained the same.

The issue is that we experienced a heavy drop of users several times a day:

smartfox-2017-07-19.png
(87.48 KiB) Not downloaded yet


The X is the value that we received from our custom smartfox monitor, that logs in every minute and reports with a SMS if the user count drops significantly, which was the case at 23:57 with 196 CCU.

Our clients are automatically disconnecting if they do not receive a custom PONG command within 20 seconds after sending a PING, this was the only way for us to reliably detect a silent disconnect on mobile devices. It is possible that this is the reason why the users were disconnecting, but we do not know for sure (we added a log for the next client update).

We now want to understand why the server was freezing, dropping users or was unresponsive and what we have to change in our configuration to fix the problem. My first guess was Memory issues and looking at Munin, this still might be the case:

smartfox-munin-2017-07-19.png
(132.43 KiB) Not downloaded yet


You can see that the memory consumption is much lower since we upgraded to 2.13.0. Could it be that the default JVM memory settings have changed?

After the crash I carefully compared old and new server.xml and core.xml config files. The only notable difference I found was that there is a new setting for the bluebox in 2.13.0. Did any default values change since 2.11.1 that might cause performance issues?

Code: Select all

<clientBlueBoxSettings>
    <useBlueBox>true</useBlueBox>
    <blueBoxPollingRate>700</blueBoxPollingRate>
  </clientBlueBoxSettings>
 




We also checked the smarfox logs. There are some exceptions, although it is not clear to us if they are related to the problem:

Code: Select all

19 Jul 2017 | 23:53:47,006 | WARN  | SocketWriter-1 | v3.data.BBClient |     | Client BBox message queue limit was reached. Disconnecting: {BBClient - { Id: 114917, Type: BLUEBOX, Logged: Yes, IP: xxx }, ca6ccaa9bb79995402d9297572f4f264}
19 Jul 2017 | 23:54:31,828 | WARN  | SocketWriter-1 | v3.data.BBClient |     | Client BBox message queue limit was reached. Disconnecting: {BBClient - { Id: 114950, Type: BLUEBOX, Logged: Yes, IP: xxx  }, 6ab1e75683ffd15b8a689c90aae162bd}
19 Jul 2017 | 23:54:35,758 | WARN  | SocketWriter-1 | v3.data.BBClient |     | Client BBox message queue limit was reached. Disconnecting: {BBClient - { Id: 114968, Type: BLUEBOX, Logged: Yes, IP: xxx  }, 8a002ffa689a4f067673f9459804293b}
19 Jul 2017 | 23:54:36,768 | WARN  | SocketWriter-3 | v3.data.BBClient |     | Client BBox message queue limit was reached. Disconnecting: {BBClient - { Id: 114593, Type: BLUEBOX, Logged: Yes, IP: xxx  }, d4ac492c8262416849a9923f47c8586c}
19 Jul 2017 | 23:56:03,628 | WARN  | qtp1495407511-3082 | bluebox.v3.BlueBox |     | Invalid request, session expired. Message: { ssid: a7f0abbd5de644818d2f54387904ba38, cmd: data, data: 46 bytes }
19 Jul 2017 | 23:56:04,333 | WARN  | qtp1495407511-3070 | bluebox.v3.BlueBox |     | Invalid request, session expired. Message: { ssid: a7f0abbd5de644818d2f54387904ba38, cmd: poll, data: NoData }
19 Jul 2017 | 23:56:54,660 | WARN  | SocketWriter-1 | bitswarm.core.SocketWriter |     | Error during write. Session: { Id: 115164, Type: DEFAULT, Logged: Yes, IP: xxx  }
19 Jul 2017 | 23:56:54,661 | WARN  | SocketWriter-1 | bitswarm.core.SocketWriter |     | java.lang.NullPointerException

19 Jul 2017 | 23:57:26,626 | WARN  | qtp1495407511-3062 | bluebox.v3.SessionFilter |     | Invalid request: { ssid: null, cmd: poll, data: NoData }, from: 141.113.11.11
19 Jul 2017 | 23:57:31,992 | WARN  | SFSWorker:Ext:487 | controllers.v290.ExtensionReqController |     | com.smartfoxserver.v2.exceptions.SFSExtensionException: Extension Request refused. Sender is not a User: { Id: 102274, Type: DEFAULT, Logged: No, IP: xxx  }
   com.smartfoxserver.v2.controllers.v290.ExtensionReqController.processRequest(ExtensionReqController.java:93)
   com.smartfoxserver.v2.controllers.v290.ExtensionReqController$1.run(ExtensionReqController.java:68)
   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   java.lang.Thread.run(Thread.java:745)

19 Jul 2017 | 23:57:31,992 | WARN  | SFSWorker:Ext:487 | controllers.v290.ExtensionReqController |     | com.smartfoxserver.v2.exceptions.SFSExtensionException: Extension Request refused. Sender is not a User: { Id: 110302, Type: DEFAULT, Logged: No, IP: xxx  }
   com.smartfoxserver.v2.controllers.v290.ExtensionReqController.processRequest(ExtensionReqController.java:93)
   com.smartfoxserver.v2.controllers.v290.ExtensionReqController$1.run(ExtensionReqController.java:68)
   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   java.lang.Thread.run(Thread.java:745)

19 Jul 2017 | 23:57:34,645 | WARN  | SFSWorker:Ext:489 | controllers.v290.ExtensionReqController |     | com.smartfoxserver.v2.exceptions.SFSExtensionException: Extension Request refused. Sender is not a User: { Id: 115184, Type: BLUEBOX, Logged: No, IP: xxx  }
   com.smartfoxserver.v2.controllers.v290.ExtensionReqController.processRequest(ExtensionReqController.java:93)
   com.smartfoxserver.v2.controllers.v290.ExtensionReqController$1.run(ExtensionReqController.java:68)
   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   java.lang.Thread.run(Thread.java:745)

19 Jul 2017 | 23:57:34,747 | WARN  | SocketWriter-3 | v3.data.BBClient |     | Client BBox message queue limit was reached. Disconnecting: {BBClient - { Id: 95649, Type: BLUEBOX, Logged: Yes, IP: xxx  }, b815b43a37b7c48b1d566c7a7bd681b3}
19 Jul 2017 | 23:57:34,747 | WARN  | SocketWriter-3 | v3.data.BBClient |     | Client BBox message queue limit was reached. Disconnecting: {BBClient - { Id: 106755, Type: BLUEBOX, Logged: Yes, IP: xxx  }, 97c017b82fbd22eb7186821d8f2fc07f}
19 Jul 2017 | 23:57:34,748 | WARN  | SocketWriter-1 | v3.data.BBClient |     | Client BBox message queue limit was reached. Disconnecting: {BBClient - { Id: 111530, Type: BLUEBOX, Logged: Yes, IP: xxx  }, 861b8d2065c9f4afa1fe0d285654d0f9}
19 Jul 2017 | 23:57:34,748 | WARN  | SocketWriter-3 | v3.data.BBClient |     | Client BBox message queue limit was reached. Disconnecting: {BBClient - { Id: 114343, Type: BLUEBOX, Logged: Yes, IP: xxx  }, b1325f52bfe8728ff0ba91ef78b4245c}
19 Jul 2017 | 23:57:34,750 | WARN  | SocketWriter-3 | v3.data.BBClient |     | Client BBox message queue limit was reached. Disconnecting: {BBClient - { Id: 113785, Type: BLUEBOX, Logged: Yes, IP: xxx  }, b9641cdecd094679bfa1124019b44643}
19 Jul 2017 | 23:57:34,822 | WARN  | SocketWriter-2 | bitswarm.core.SocketWriter |     | Error during write. Session: { Id: 114927, Type: DEFAULT, Logged: Yes, IP: xxx  }
19 Jul 2017 | 23:57:34,822 | WARN  | SocketWriter-2 | bitswarm.core.SocketWriter |     | java.lang.NullPointerException

19 Jul 2017 | 23:57:34,856 | WARN  | SocketWriter-2 | bitswarm.core.SocketWriter |     | Error during write. Session: { Id: 114675, Type: DEFAULT, Logged: No, IP: xxx  }
19 Jul 2017 | 23:57:34,856 | WARN  | SocketWriter-2 | bitswarm.core.SocketWriter |     | java.lang.NullPointerException

19 Jul 2017 | 23:57:34,863 | WARN  | SocketWriter-2 | bitswarm.core.SocketWriter |     | Error during write. Session: { Id: 115131, Type: DEFAULT, Logged: No, IP: xxx  }
19 Jul 2017 | 23:57:34,863 | WARN  | SocketWriter-2 | bitswarm.core.SocketWriter |     | java.lang.NullPointerException

19 Jul 2017 | 23:57:34,864 | WARN  | SocketWriter-3 | bitswarm.core.SocketWriter |     | Error during write. Session: { Id: 105521, Type: DEFAULT, Logged: No, IP: xxx  }
19 Jul 2017 | 23:57:34,864 | WARN  | SocketWriter-3 | bitswarm.core.SocketWriter |     | java.lang.NullPointerException

19 Jul 2017 | 23:57:34,883 | WARN  | SocketWriter-3 | bitswarm.core.SocketWriter |     | Error during write. Session: { Id: 113743, Type: DEFAULT, Logged: No, IP: xxx  }
19 Jul 2017 | 23:57:34,883 | WARN  | SocketWriter-3 | bitswarm.core.SocketWriter |     | java.lang.NullPointerException

19 Jul 2017 | 23:57:35,031 | WARN  | SocketWriter-2 | bitswarm.core.SocketWriter |     | Error during write. Session: { Id: 114978, Type: DEFAULT, Logged: No, IP: xxx  }
19 Jul 2017 | 23:57:35,031 | WARN  | SocketWriter-2 | bitswarm.core.SocketWriter |     | java.lang.NullPointerException

19 Jul 2017 | 23:57:35,094 | WARN  | SocketWriter-2 | bitswarm.core.SocketWriter |     | Error during write. Session: { Id: 115596, Type: DEFAULT, Logged: No, IP: xxx  }
19 Jul 2017 | 23:57:35,094 | WARN  | SocketWriter-2 | bitswarm.core.SocketWriter |     | java.lang.NullPointerException

19 Jul 2017 | 23:57:35,820 | WARN  | SocketWriter-2 | bitswarm.core.SocketWriter |     | Error during write. Session: { Id: 105179, Type: DEFAULT, Logged: No, IP: xxx  }
19 Jul 2017 | 23:57:35,820 | WARN  | SocketWriter-2 | bitswarm.core.SocketWriter |     | java.lang.NullPointerException

19 Jul 2017 | 23:57:36,705 | WARN  | SocketWriter-2 | bitswarm.core.SocketWriter |     | Error during write. Session: { Id: 115189, Type: DEFAULT, Logged: No, IP: xxx  }
19 Jul 2017 | 23:57:36,705 | WARN  | SocketWriter-2 | bitswarm.core.SocketWriter |     | java.lang.NullPointerException

19 Jul 2017 | 23:57:37,134 | WARN  | SocketWriter-2 | bitswarm.core.SocketWriter |     | Error during write. Session: { Id: 116524, Type: DEFAULT, Logged: No, IP: xxx  }
19 Jul 2017 | 23:57:37,134 | WARN  | SocketWriter-2 | bitswarm.core.SocketWriter |     | java.lang.NullPointerException

19 Jul 2017 | 23:57:37,242 | WARN  | SocketWriter-1 | bitswarm.core.SocketWriter |     | Error during write. Session: { Id: 116258, Type: DEFAULT, Logged: No, IP: xxx  }
19 Jul 2017 | 23:57:37,242 | WARN  | SocketWriter-1 | bitswarm.core.SocketWriter |     | java.lang.NullPointerException

19 Jul 2017 | 23:57:37,332 | WARN  | SocketWriter-3 | v3.data.BBClient |     | Client BBox message queue limit was reached. Disconnecting: {BBClient - { Id: 115228, Type: BLUEBOX, Logged: Yes, IP: xxx  }, b7c831cbd8511e5f819926cb0e3fbebc}
19 Jul 2017 | 23:57:37,352 | WARN  | SocketWriter-1 | v3.data.BBClient |     | Client BBox message queue limit was reached. Disconnecting: {BBClient - { Id: 115226, Type: BLUEBOX, Logged: Yes, IP: xxx  }, 868ba2e24395dbe4ef8e1afb60e53789}
19 Jul 2017 | 23:57:37,383 | WARN  | SocketWriter-3 | bitswarm.core.SocketWriter |     | Error during write. Session: { Id: 115598, Type: DEFAULT, Logged: No, IP: xxx  }
19 Jul 2017 | 23:57:37,383 | WARN  | SocketWriter-3 | bitswarm.core.SocketWriter |     | java.lang.NullPointerException

19 Jul 2017 | 23:57:37,413 | WARN  | SocketWriter-1 | bitswarm.core.SocketWriter |     | Error during write. Session: { Id: 115704, Type: DEFAULT, Logged: No, IP: xxx  }
19 Jul 2017 | 23:57:37,414 | WARN  | SocketWriter-1 | bitswarm.core.SocketWriter |     | java.lang.NullPointerException

19 Jul 2017 | 23:57:37,469 | WARN  | SocketWriter-1 | v3.data.BBClient |     | Client BBox message queue limit was reached. Disconnecting: {BBClient - { Id: 116757, Type: BLUEBOX, Logged: Yes, xxx  }, b9e68664b7ed43234a48a34c551d59b9}
19 Jul 2017 | 23:57:37,682 | WARN  | SocketWriter-1 | bitswarm.core.SocketWriter |     | Error during write. Session: { Id: 116028, Type: DEFAULT, Logged: No, IP: xxx  }
19 Jul 2017 | 23:57:37,682 | WARN  | SocketWriter-1 | bitswarm.core.SocketWriter |     | java.lang.NullPointerException

19 Jul 2017 | 23:57:39,150 | WARN  | qtp1495407511-3061 | bluebox.v3.BlueBox |     | Invalid request, session expired. Message: { ssid: b7c831cbd8511e5f819926cb0e3fbebc, cmd: poll, data: NoData }
19 Jul 2017 | 23:57:40,950 | WARN  | SocketWriter-2 | v3.data.BBClient |     | Client BBox message queue limit was reached. Disconnecting: {BBClient - { Id: 114076, Type: BLUEBOX, Logged: Yes, IP: xxx  }, f97eeaef253a29f54af680b39f900418}
19 Jul 2017 | 23:57:42,607 | WARN  | qtp1495407511-3072 | bluebox.v3.BlueBox |     | Invalid request, session expired. Message: { ssid: 868ba2e24395dbe4ef8e1afb60e53789, cmd: data, data: 54 bytes }
19 Jul 2017 | 23:57:42,930 | WARN  | SocketWriter-3 | v3.data.BBClient |     | Client BBox message queue limit was reached. Disconnecting: {BBClient - { Id: 116897, Type: BLUEBOX, Logged: Yes, IP: xxx  }, 5c9c02da257ae851e2abcfdd6da6f48b}
19 Jul 2017 | 23:57:43,035 | WARN  | SocketWriter-3 | v3.data.BBClient |     | Client BBox message queue limit was reached. Disconnecting: {BBClient - { Id: 116076, Type: BLUEBOX, Logged: Yes, IP: xxx  }, 39a6051d785a9f5a5e505c06c0f5b9c6}
19 Jul 2017 | 23:57:44,056 | WARN  | qtp1495407511-3072 | bluebox.v3.BlueBox |     | Invalid request, session expired. Message: { ssid: 5c9c02da257ae851e2abcfdd6da6f48b, cmd: poll, data: NoData }
19 Jul 2017 | 23:57:44,277 | WARN  | SFSWorker:Ext:428 | controllers.v290.ExtensionReqController |     | com.smartfoxserver.v2.exceptions.SFSExtensionException: Extension Request refused. Sender is not a User: { Id: 117215, Type: DEFAULT, Logged: No, IP: xxx  }
   com.smartfoxserver.v2.controllers.v290.ExtensionReqController.processRequest(ExtensionReqController.java:93)
   com.smartfoxserver.v2.controllers.v290.ExtensionReqController$1.run(ExtensionReqController.java:68)
   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   java.lang.Thread.run(Thread.java:745)

19 Jul 2017 | 23:57:44,471 | WARN  | qtp1495407511-3019 | bluebox.v3.BlueBox |     | Invalid request, session expired. Message: { ssid: 39a6051d785a9f5a5e505c06c0f5b9c6, cmd: poll, data: NoData }
19 Jul 2017 | 23:57:44,534 | WARN  | SocketWriter-1 | bitswarm.core.SocketWriter |     | Error during write. Session: { Id: 115504, Type: DEFAULT, Logged: No, IP: xxx  }
19 Jul 2017 | 23:57:44,534 | WARN  | SocketWriter-1 | bitswarm.core.SocketWriter |     | java.lang.NullPointerException

19 Jul 2017 | 23:57:45,854 | WARN  | qtp1495407511-3062 | bluebox.v3.BlueBox |     | Invalid request, session expired. Message: { ssid: b9e68664b7ed43234a48a34c551d59b9, cmd: data, data: 45 bytes }
19 Jul 2017 | 23:57:49,100 | WARN  | SFSWorker:Ext:635 | controllers.v290.ExtensionReqController |     | com.smartfoxserver.v2.exceptions.SFSExtensionException: Extension Request refused. Sender is not a User: { Id: 117358, Type: DEFAULT, Logged: No, IP: xxx  }
   com.smartfoxserver.v2.controllers.v290.ExtensionReqController.processRequest(ExtensionReqController.java:93)
   com.smartfoxserver.v2.controllers.v290.ExtensionReqController$1.run(ExtensionReqController.java:68)
   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   java.lang.Thread.run(Thread.java:745)

19 Jul 2017 | 23:57:49,355 | WARN  | SFSWorker:Ext:582 | controllers.v290.ExtensionReqController |     | com.smartfoxserver.v2.exceptions.SFSExtensionException: Extension Request refused. Sender is not a User: { Id: 117495, Type: DEFAULT, Logged: No, IP: xxx  }
   com.smartfoxserver.v2.controllers.v290.ExtensionReqController.processRequest(ExtensionReqController.java:93)
   com.smartfoxserver.v2.controllers.v290.ExtensionReqController$1.run(ExtensionReqController.java:68)
   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   java.lang.Thread.run(Thread.java:745)

19 Jul 2017 | 23:57:49,668 | WARN  | SFSWorker:Ext:67 | controllers.v290.ExtensionReqController |     | com.smartfoxserver.v2.exceptions.SFSExtensionException: Extension Request refused. Sender is not a User: { Id: 117569, Type: DEFAULT, Logged: No, IP: xxx  }
   com.smartfoxserver.v2.controllers.v290.ExtensionReqController.processRequest(ExtensionReqController.java:93)
   com.smartfoxserver.v2.controllers.v290.ExtensionReqController$1.run(ExtensionReqController.java:68)
   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   java.lang.Thread.run(Thread.java:745)

19 Jul 2017 | 23:57:50,004 | WARN  | SFSWorker:Ext:634 | controllers.v290.ExtensionReqController |     | com.smartfoxserver.v2.exceptions.SFSExtensionException: Extension Request refused. Sender is not a User: { Id: 117590, Type: DEFAULT, Logged: No, IP: xxx  }
   com.smartfoxserver.v2.controllers.v290.ExtensionReqController.processRequest(ExtensionReqController.java:93)
   com.smartfoxserver.v2.controllers.v290.ExtensionReqController$1.run(ExtensionReqController.java:68)
   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   java.lang.Thread.run(Thread.java:745)

19 Jul 2017 | 23:57:50,019 | WARN  | SFSWorker:Ext:204 | controllers.v290.ExtensionReqController |     | com.smartfoxserver.v2.exceptions.SFSExtensionException: Extension Request refused. Sender is not a User: { Id: 117558, Type: DEFAULT, Logged: No, IP: xxx  }
   com.smartfoxserver.v2.controllers.v290.ExtensionReqController.processRequest(ExtensionReqController.java:93)
   com.smartfoxserver.v2.controllers.v290.ExtensionReqController$1.run(ExtensionReqController.java:68)
   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   java.lang.Thread.run(Thread.java:745)

19 Jul 2017 | 23:57:50,303 | WARN  | SocketWriter-1 | bitswarm.core.SocketWriter |     | Error during write. Session: { Id: 115946, Type: DEFAULT, Logged: No, IP: xxx  }
19 Jul 2017 | 23:57:50,303 | WARN  | SocketWriter-1 | bitswarm.core.SocketWriter |     | java.lang.NullPointerException

19 Jul 2017 | 23:57:50,481 | WARN  | qtp1495407511-3080 | bluebox.v3.BlueBox |     | Invalid request, session expired. Message: { ssid: 01f517ca2f3bef12b84cb9bd8a3cad97, cmd: poll, data: NoData }
19 Jul 2017 | 23:57:56,042 | WARN  | SFSWorker:Ext:621 | controllers.v290.ExtensionReqController |     | com.smartfoxserver.v2.exceptions.SFSExtensionException: Extension Request refused. Sender is not a User: { Id: 117774, Type: DEFAULT, Logged: No, IP: xxx  }
   com.smartfoxserver.v2.controllers.v290.ExtensionReqController.processRequest(ExtensionReqController.java:93)
   com.smartfoxserver.v2.controllers.v290.ExtensionReqController$1.run(ExtensionReqController.java:68)
   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   java.lang.Thread.run(Thread.java:745)

19 Jul 2017 | 23:57:58,321 | WARN  | qtp1495407511-3087 | bluebox.v3.BlueBox |     | Invalid request, session expired. Message: { ssid: 868ba2e24395dbe4ef8e1afb60e53789, cmd: poll, data: NoData }
19 Jul 2017 | 23:57:58,407 | WARN  | qtp1495407511-3082 | bluebox.v3.BlueBox |     | Invalid request, session expired. Message: { ssid: b9e68664b7ed43234a48a34c551d59b9, cmd: poll, data: NoData }
19 Jul 2017 | 23:58:01,931 | WARN  | SocketWriter-2 | bitswarm.core.SocketWriter |     | Error during write. Session: { Id: 117215, Type: DEFAULT, Logged: No, IP: xxx  }
19 Jul 2017 | 23:58:01,931 | WARN  | SocketWriter-2 | bitswarm.core.SocketWriter |     | java.lang.NullPointerException

19 Jul 2017 | 23:58:02,057 | WARN  | SocketWriter-1 | v3.data.BBClient |     | Client BBox message queue limit was reached. Disconnecting: {BBClient - { Id: 118047, Type: BLUEBOX, Logged: Yes, IP: xxx  }, 2fb39a611ee9ea3861d452e933ec05c6}
19 Jul 2017 | 23:58:02,410 | WARN  | SFSWorker:Ext:421 | controllers.v290.ExtensionReqController |     | com.smartfoxserver.v2.exceptions.SFSExtensionException: Extension Request refused. Sender is not a User: { Id: 118192, Type: DEFAULT, Logged: No, IP: xxx  }
   com.smartfoxserver.v2.controllers.v290.ExtensionReqController.processRequest(ExtensionReqController.java:93)
   com.smartfoxserver.v2.controllers.v290.ExtensionReqController$1.run(ExtensionReqController.java:68)
   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   java.lang.Thread.run(Thread.java:745)

19 Jul 2017 | 23:58:03,146 | WARN  | SocketWriter-3 | bitswarm.core.SocketWriter |     | Error during write. Session: { Id: 117991, Type: DEFAULT, Logged: No, IP: xxx  }
19 Jul 2017 | 23:58:03,146 | WARN  | SocketWriter-3 | bitswarm.core.SocketWriter |     | java.lang.NullPointerException

19 Jul 2017 | 23:58:04,557 | WARN  | qtp1495407511-3087 | bluebox.v3.BlueBox |     | Invalid request, session expired. Message: { ssid: 2fb39a611ee9ea3861d452e933ec05c6, cmd: poll, data: NoData }
19 Jul 2017 | 23:58:32,597 | WARN  | SocketWriter-3 | v3.data.BBClient |     | Client BBox message queue limit was reached. Disconnecting: {BBClient - { Id: 118039, Type: BLUEBOX, Logged: Yes, IP: xxx  }, cd093014b4acc4b45b3c223e51b6db5c}
19 Jul 2017 | 23:59:41,407 | WARN  | SocketWriter-1 | v3.data.BBClient |     | Client BBox message queue limit was reached. Disconnecting: {BBClient - { Id: 118701, Type: BLUEBOX, Logged: Yes, IP: xxx  }, 08ca179b413575cc6f4f183d2db7d484}




Please advise, thanks in advance,
Ruben
User avatar
Lapo
Site Admin
Posts: 23025
Joined: 21 Mar 2005, 09:50
Location: Italy

Re: Problems after upgrading to SFS2X 2.13.0

Postby Lapo » 20 Jul 2017, 08:32

Hello there,
rewb0rn wrote:we are experiencing issues after doing an upgrade to SFS2X 2.13.0. We downloaded the SFS 2.13.0 package and put it next to the old 2.11.1 package on our Linux machine. We then manually migrated all configurations and settings to the SFS 2.13.0 installation and started the server. I.e. the hardware and OS (Ubuntu 14) remained the same.

The issue is that we experienced a heavy drop of users several times a day:

Taking a look at the user diagram, it doesn't look like something very suspicious. The user count gradually diminishes over the course of several hours, it's not a sudden downward spike which could indicate network issues or something like that.

What exactly makes you think this is anomalous instead of the normal flow of users coming and going?
Maybe you have increasing complaints from users being disconnected at certain hours?


Our clients are automatically disconnecting if they do not receive a custom PONG command within 20 seconds after sending a PING, this was the only way for us to reliably detect a silent disconnect on mobile devices. It is possible that this is the reason why the users were disconnecting, but we do not know for sure (we added a log for the next client update).

Okay.
One way to investigate is logging these "silent disconnections" to your log files so that you can later inspect them and see if those are increasing in those hours.


We now want to understand why the server was freezing, dropping users or was unresponsive

I need a clarification.
Do you have evidence that the server froze or dropped users, or is it your assumption based on the above graph?
It would be pretty easy to write a simple Java client that logs in and logs out every few minutes and if it fails it reports an error to your admins.

My first guess was Memory issues and looking at Munin, this still might be the case:

smartfox-munin-2017-07-19.png

You can see that the memory consumption is much lower since we upgraded to 2.13.0. Could it be that the default JVM memory settings have changed?

Well, that's the memory consumption of the whole machine. There is no breakdown of the individual apps running in the system so why are you assuming this is caused by SmartFoxServer and not something else?

It would be better to check/compare the memory stats directly via the SFS AdminTool.

In those munin graphs the apps memory (the green area) is essentially the same, what changes significantly is the blue area ( cache) but I doubt this is related to SmartFoxServer.

After the crash

What do you mean by crash?
A server crash means that it becomes unresponsive and no one can connect. I think you're already jumping to conclusions that are not granted by the data you have shown here.

I carefully compared old and new server.xml and core.xml config files. The only notable difference I found was that there is a new setting for the bluebox in 2.13.0. Did any default values change since 2.11.1 that might cause performance issues?

No, not really.

As regards the attached logs there's nothing that alludes to performance issues, crashes or otherwise.

My suggestion is to monitor the traffic for multiple days and see if the pattern continues, also keep an eye on the SFS AdminTool's Dashboard, especially the "System Queues" tab and see if they get very busy, especially at the hours that you have highlighted.

If you want you can send us screenshost and logs via email to our support@... mail box, if it's more convenient for you.

Hope it helps
Lapo
--
gotoAndPlay()
...addicted to flash games
User avatar
rewb0rn
Posts: 177
Joined: 26 Jan 2009, 12:54
Location: Berlin, Germany

Re: Problems after upgrading to SFS2X 2.13.0

Postby rewb0rn » 20 Jul 2017, 08:42

Hi Lapo,

thanks for your message.

We have dozens of user reports were people complain they were disconnected. The descriptions match the drops that are visible in the screenshot at 18:17, 19:52 and 23:52. As I said we already have a custom monitor that logs in every minute and it reported a CCU of 196 players on 23:52 that is not visible in the screenshot because the CCU logging of the Smartfox has a larger interval than our own monitor. But the CCU at 23:52 definitely was as low as 196 people.

You are right about the memory Cache consumption. While we do not have any additional services installed or removed since the upgrade to 2.13.0, of course some other process might have increased the Cache. However we definitely do have a problem with the service right now and I am just looking at everything I have that might cause the issue. Right now I am unsure how to proceed because we can not have another test run on 2.13.0 until the issue is (presumably) solved.

Thanks in advance.
User avatar
rewb0rn
Posts: 177
Joined: 26 Jan 2009, 12:54
Location: Berlin, Germany

Re: Problems after upgrading to SFS2X 2.13.0

Postby rewb0rn » 20 Jul 2017, 08:54

There are 2 other interesting Munin graphs that correlate to the user drop times. The small increases in memory consumption on the first Munin screenshot also correlate to these peaks. Any idea what might cause the context switches?

smartfox-munin-2-2017-07-19.png
(72.6 KiB) Not downloaded yet


Addendum: We would like to get memory data from the SFS2X dashboard, but the memory and CPU data is not accessible when no admin is online, correct? I do not know of a way to access memory and CPU data for the past.
User avatar
Lapo
Site Admin
Posts: 23025
Joined: 21 Mar 2005, 09:50
Location: Italy

Re: Problems after upgrading to SFS2X 2.13.0

Postby Lapo » 20 Jul 2017, 09:08

rewb0rn wrote:We have dozens of user reports were people complain they were disconnected. The descriptions match the drops that are visible in the screenshot at 18:17, 19:52 and 23:52. As I said we already have a custom monitor that logs in every minute and it reported a CCU of 196 players on 23:52 that is not visible in the screenshot because the CCU logging of the Smartfox has a larger interval than our own monitor. But the CCU at 23:52 definitely was as low as 196 people.

I need to better understand this part. You say your monitoring tool recorded 196 users at around midnight, but from the server monitoring the situation is completely different.

I am not sure I understand how this works. You disconnect people that don't respond to a certain signal, but at the same time the user count is totally different from what you report. I am not sure I understand how this is managed in your server code?

What happens when a client doesn't respond to your PONG signal? Do you physically call a disconnect() on the user and shutdown the connection? Something else?

Thanks
Lapo

--

gotoAndPlay()

...addicted to flash games
User avatar
Lapo
Site Admin
Posts: 23025
Joined: 21 Mar 2005, 09:50
Location: Italy

Re: Problems after upgrading to SFS2X 2.13.0

Postby Lapo » 20 Jul 2017, 09:11

rewb0rn wrote:Addendum: We would like to get memory data from the SFS2X dashboard, but the memory and CPU data is not accessible when no admin is online, correct? I do not know of a way to access memory and CPU data for the past.

No we don't log all those params persistently as it would generate tons of data very quickly.
If there's a pattern (i.e. it happens at the same hour) you can take a look at that hour.

What is the average ratio of socket vs HTTP/BlueBox connections? You can take a look at the AdminTool Dashboard to find it. Maybe take a look during peak hours, such as late afternoon.

Thanks
Lapo

--

gotoAndPlay()

...addicted to flash games
User avatar
rewb0rn
Posts: 177
Joined: 26 Jan 2009, 12:54
Location: Berlin, Germany

Re: Problems after upgrading to SFS2X 2.13.0

Postby rewb0rn » 20 Jul 2017, 09:51

Lapo wrote:I need to better understand this part. You say your monitoring tool recorded 196 users at around midnight, but from the server monitoring the situation is completely different.

I am not sure I understand how this works. You disconnect people that don't respond to a certain signal, but at the same time the user count is totally different from what you report. I am not sure I understand how this is managed in your server code?


I don't think the situation is completely different. It just looks like the smartfox is recording CCU numbers every 5 minutes. Our own monitor logs CCU numbers every minute, so the interval is finer and gives more accurate numbers. This is what our monitor was reporting:

23:57: 196 CCU
23:58: 1636 CCU
23:59 1583 CCU

It looks like the auto-reconnect kicked in, so the drop of users was very short and compensated by the reconnect logic that we implemented on the clients.

Lapo wrote:What happens when a client doesn't respond to your PONG signal? Do you physically call a disconnect() on the user and shutdown the connection? Something else?


Correct, if the client does not get a response within 20 seconds after sending a ping, the client calls sfs.disconnect() and starts a reconnection attempt. However we do not know if all the clients disconnected because of this disconnect() call or because of some other events on the server side.

I will give an estimate on bluebox users later today, but from my experience it should usually be less than 5%. Right now there are still 113 users idling on the SFS 2.13.0 and only one of them is connected with BlueBox.
User avatar
Lapo
Site Admin
Posts: 23025
Joined: 21 Mar 2005, 09:50
Location: Italy

Re: Problems after upgrading to SFS2X 2.13.0

Postby Lapo » 20 Jul 2017, 14:00

rewb0rn wrote:I don't think the situation is completely different. It just looks like the smartfox is recording CCU numbers every 5 minutes. Our own monitor logs CCU numbers every minute, so the interval is finer and gives more accurate numbers. This is what our monitor was reporting:

23:57: 196 CCU
23:58: 1636 CCU
23:59 1583 CCU

It looks like the auto-reconnect kicked in, so the drop of users was very short and compensated by the reconnect logic that we implemented on the clients.

So you're calculating the CCU by reading the amount of Users in the zone, correct?
Then you could check the logs at 23:55 or so and see what happened. It may help to understand what might be going on.

However we do not know if all the clients disconnected because of this disconnect() call or because of some other events on the server side.

With the exception of the "User idle timer", the server never disconnects anyone, ever.
And this is pretty easy to investigate because there's a specific message in the log saying that the user was kicked because of the idle timer.

Bottom line, I'd definitely recommend to inspect the logs around (and before) the time those disconnections happened.
The logs you have posted earlier don't show any data related to mass-disconnections so it must have happened before.

cheers
Lapo

--

gotoAndPlay()

...addicted to flash games
User avatar
rewb0rn
Posts: 177
Joined: 26 Jan 2009, 12:54
Location: Berlin, Germany

Re: Problems after upgrading to SFS2X 2.13.0

Postby rewb0rn » 20 Jul 2017, 17:33

Thanks Lapo. Based on customer feedback we are now sure that the disconnects happened from the client side because the PONG request was not received in time (Players confirmed they were seeing a connection warning before the disconnect happened, which we show if the PONG request was not received within 10 seconds). That means for some reason the Smartfox was unable to respond to any requests within a reasonable amount of time on a global level.

Right now there are 1,700 users online on the SFS 2.13.0 and 20 of them are connected with BlueBox.

I will continue to investigate the issue and report if I gain any additional info.
User avatar
Lapo
Site Admin
Posts: 23025
Joined: 21 Mar 2005, 09:50
Location: Italy

Re: Problems after upgrading to SFS2X 2.13.0

Postby Lapo » 21 Jul 2017, 07:51

Supposing the PONG response comes from your Extension a good place to take a look is the server's Extension Queue which may become very busy if threads are taking too long to execute your code.

This can happen when you have many concurrent "slow calls", such as calls to a database or external web service. If you have hundreds of these you might end up causing other Extension requests to queue up, waiting for a free thread to run them.

SFS2X attempts to solve this issue via auto load-balancing threads pools, but some manual config might be in order if you have lots of slow calls.

You might find messages such as these in your logs:

Code: Select all

Added X new threads, current size is: Y

which means that the server is adding more threads to keep up with an increasing Extension queue size.

As usual you can keep an eye on the queue via AdminTool > Dashboard

cheers
Lapo

--

gotoAndPlay()

...addicted to flash games
User avatar
rewb0rn
Posts: 177
Joined: 26 Jan 2009, 12:54
Location: Berlin, Germany

Re: Problems after upgrading to SFS2X 2.13.0

Postby rewb0rn » 18 Jun 2018, 22:38

Hey Lapo,

what is the log level for these logs? WARN or INFO? I do see these in our logs:

Ext :: Queue size is big: 8658, but all backup thread are already active: 1606


So thread queue backups were happening and reached the limit, however, there are no traces like this:

Added X new threads, current size is: Y


I am looking for these logs to identify when the backup was triggered exactly, but it looks like they do not exist in our log. We log warnings only.

Thanks in advance
User avatar
Lapo
Site Admin
Posts: 23025
Joined: 21 Mar 2005, 09:50
Location: Italy

Re: Problems after upgrading to SFS2X 2.13.0

Postby Lapo » 19 Jun 2018, 07:32

Hi
the former is level WARN, and the latter INFO, so that's why you don't see them.

That's a lot of threads :D Are you running lots of slow I/O operations?

Cheers
Lapo

--

gotoAndPlay()

...addicted to flash games
User avatar
rewb0rn
Posts: 177
Joined: 26 Jan 2009, 12:54
Location: Berlin, Germany

Re: Problems after upgrading to SFS2X 2.13.0

Postby rewb0rn » 19 Jun 2018, 09:50

By default we are running 512 threads each on extension and system queue with backups of 128 threads each because we need to call Facebook and our own servers in the USER_LOGIN event. But in our current case, there seems to be another problem where the queue gets very full and the backups are triggered until the maximum is reached. I am not sure what is the reason, maybe we reached the maximum memory of the JVM but that is only a guess at this point. I am currently running a test with larger amount of memory to see if that solves the problem :)
User avatar
Lapo
Site Admin
Posts: 23025
Joined: 21 Mar 2005, 09:50
Location: Italy

Re: Problems after upgrading to SFS2X 2.13.0

Postby Lapo » 19 Jun 2018, 14:13

Hi,
the reason why you see the message is because too many requests as piling up in the queue and threads are not working fast enough to deal with them.

It's unlikely that this is a memory issue (although 1000s of threads will increase memory usage significantly) but rather a bottleneck, where the producer (users sending requests) is working faster than the consumer (each thread dealing with the long waiting I/O).

You can configure the server to use more threads, it's not an issue. The load balancing thread pool automatically generates the backup threads when needed and shuts them down when the load diminishes.

Which queue is the most busy? Did you check? (AdminTool > Dashboard)

Cheers
Lapo

--

gotoAndPlay()

...addicted to flash games
User avatar
rewb0rn
Posts: 177
Joined: 26 Jan 2009, 12:54
Location: Berlin, Germany

Re: Problems after upgrading to SFS2X 2.13.0

Postby rewb0rn » 20 Jun 2018, 07:48

Yes it is the extension queue. However the problem only appears on 2.13.0 and 2.13.1 (2.13.2 only installed yesterday). The last 3 months we were switching back to 2.11.0 and the problem is not happening there. But we are publishing to HTML5 now so we need binary wss protocol.

The load on the server does not seem to be the only indicator. We had it happening with 2 of our 6 games (~3,000 users online) but also the server was stable for a week with all 6 games (~10,000 users online) before it happened. I was thinking that our backup settings were too sensitive, so I switched backup triggering time from 3 seconds to 20 seconds, but 2 days ago it happened again. So I am looking at memory now (was 1.85GB for JVM, is now 3.82 GB).

If you have any other idea let me know.

By the way: Is there a way to read the values that are tracked in the dashboard through code? I would like to have a better record of statistics and write them to a file instead of having the dashboard open in another tab.

Return to “SFS2X Questions”

Who is online

Users browsing this forum: Google [Bot], Ronaldniff, Stevenor and 42 guests