Building a simple stress test tool

One of the questions that often pops up in our forums is “how do I run a stress test on my game”?

There are several ways in which this can be done. A simple way to stress test your server side Extension is to build a client application that acts as a player, essentially a “bot”, which can be replicated several hundreds or thousands of times to simulate a large amount of clients.

» Building the client

For this example we will build a simple Java client using the standard SFS2X Java API which can be downloaded from here. The same could be done using C# or AS3 etc…

The simple client will connect to the server, login as guest, join a specific Room and start sending messages. This basic example can serve as a simple template to build more complex interactions for your tests.

» Replicating the load

Before we proceed with the creation of the client logic let’s see how the “Replicator” will work. With this name we mean the top-level application that will take a generic client implementation and will generate many copies at a constant interval, until all “test bots” are ready.

public class StressTestReplicator
{
	private final List<BaseStressClient> clients;
	private final ScheduledThreadPoolExecutor generator;

	private String clientClassName;		// name of the client class
	private int generationSpeed = 250; 	// interval between each client is connection
	private int totalCCU = 50;			// # of CCU

	private Class<?> clientClass;
	private ScheduledFuture<?> generationTask;

	public StressTestReplicator(Properties config)
    {
		clients = new LinkedList<>();
		generator = new ScheduledThreadPoolExecutor(1);

		clientClassName = config.getProperty("clientClassName");

		try { generationSpeed = Integer.parseInt(config.getProperty("generationSpeed")); } catch (NumberFormatException e ) {};
		try { totalCCU = Integer.parseInt(config.getProperty("totalCCU")); } catch (NumberFormatException e ) {};

		System.out.printf("%s, %s, %s\n", clientClassName, generationSpeed, totalCCU);

		try
		{
			// Load main client class
			clientClass = Class.forName(clientClassName);

			// Prepare generation
			generationTask = generator.scheduleAtFixedRate(new GeneratorRunner(), 0, generationSpeed, TimeUnit.MILLISECONDS);
		}
		catch (ClassNotFoundException e)
		{
			System.out.println("Specified Client class: " + clientClassName + " not found! Quitting.");
		}
    }

	void handleClientDisconnect(BaseStressClient client)
	{
		synchronized (clients)
        {
	        clients.remove(client);
        }

		if (clients.size() == 0)
		{
			System.out.println("===== TEST COMPLETE =====");
			System.exit(0);
		}
	}

	public static void main(String[] args) throws Exception
    {
		String defaultCfg = args.length > 0 ? args[0] : "config.properties";

		Properties props = new Properties();
		props.load(new FileInputStream(defaultCfg));

	    new StressTestReplicator(props);
    }

	//=====================================================================

	private class GeneratorRunner implements Runnable
	{
		@Override
		public void run()
		{
			try
            {
	            if (clients.size() < totalCCU)
	            	startupNewClient();
	            else
	            	generationTask.cancel(true);
            }
            catch (Exception e)
            {
	            System.out.println("ERROR Generating client: " + e.getMessage());
            }
		}

		private void startupNewClient() throws Exception
		{
			BaseStressClient client = (BaseStressClient) clientClass.newInstance();

			synchronized (clients)
            {
				clients.add(client);
            }

			client.setShell(StressTestReplicator.this);

			client.startUp();
		}
	}
}

The class will startup by loading an external config.properties file which looks like this:

clientClassName=sfs2x.example.stresstest.SimpleChatClient

generationSpeed=500

totalCCU=20

The properties are:

  • the name of the class to be used as the client logic (clientClassName)
  • the total number of clients for the test (totalCCU)
  • the interval between each generated client, expressed in milliseconds (generationSpeed)

Once these parameters are loaded the test will start by generating all the requested clients via a thread-pool based scheduled executor (ScheduledThreadPoolExecutor)

In order for the test class to be “neutral” to the Replicator we have created a base class called BaseStressClient which defines a couple of methods:

public abstract class BaseStressClient
{
	private StressTestReplicator shell;

	public abstract void startUp();

	public void setShell(StressTestReplicator shell)
	{
		this.shell = shell;
	}

	protected void onShutDown(BaseStressClient client)
	{
		shell.handleClientDisconnect(client);
	}
}

The startUp() method is where the client code gets initialized and it must be overridden in the child class. The onShutDown(…) method is invoked by the client implementation to signal the Replicator that the client has disconnected, so that they  can be disposed.

» Building the client logic

This is the code for the client itself:

public class SimpleChatClient extends BaseStressClient
{
	// A scheduler for sending messages shared among all client bots.
	private static ScheduledExecutorService sched = new ScheduledThreadPoolExecutor(1);
	private static final int TOT_PUB_MESSAGES = 50;

	private SmartFox sfs;
	private ConfigData cfg;
	private IEventListener evtListener;
	private ScheduledFuture<?> publicMessageTask;
	private int pubMessageCount = 0;

	@Override
	public void startUp()
	{
	    sfs = new SmartFox();
	    cfg = new ConfigData();
	    evtListener = new SFSEventListener();

	    cfg.setHost("localhost");
	    cfg.setPort(9933);
	    cfg.setZone("BasicExamples");

	    sfs.addEventListener(SFSEvent.CONNECTION, evtListener);
	    sfs.addEventListener(SFSEvent.CONNECTION_LOST, evtListener);
	    sfs.addEventListener(SFSEvent.LOGIN, evtListener);
	    sfs.addEventListener(SFSEvent.ROOM_JOIN, evtListener);
	    sfs.addEventListener(SFSEvent.PUBLIC_MESSAGE, evtListener);

	    sfs.connect(cfg);
	}

	public class SFSEventListener implements IEventListener
	{
		@Override
		public void dispatch(BaseEvent evt) throws SFSException
		{
		    String type = evt.getType();
		    Map<String, Object> params = evt.getArguments();

		    if (type.equals(SFSEvent.CONNECTION))
		    {
		    	boolean success = (Boolean) params.get("success");

		    	if (success)
		    		sfs.send(new LoginRequest(""));
		    	else
		    	{
		    		System.err.println("Connection failed");
		    		cleanUp();
		    	}
		    }

		    else if (type.equals(SFSEvent.CONNECTION_LOST))
		    {
		    	System.out.println("Client disconnected. ");
		    	cleanUp();
		    }

		    else if (type.equals(SFSEvent.LOGIN))
		    {
		    	// Join room
		    	sfs.send(new JoinRoomRequest("The Lobby"));
		    }

		    else if (type.equals(SFSEvent.ROOM_JOIN))
		    {
		    	publicMessageTask = sched.scheduleAtFixedRate(new Runnable()
				{
					@Override
					public void run()
					{
						if (pubMessageCount < TOT_PUB_MESSAGES)
						{
							sfs.send(new PublicMessageRequest("Hello, this is a test public message."));
							pubMessageCount++;

							System.out.println(sfs.getMySelf().getName() + " --> Message: " + pubMessageCount);
						}
						else
						{
							// End of test
							sfs.disconnect();
						}

					}
				}, 0, 2, TimeUnit.SECONDS);
		    }

		}
	}

	private void cleanUp()
	{
		// Remove listeners
    	sfs.removeAllEventListeners();

    	// Stop task
    	if (publicMessageTask != null)
			publicMessageTask.cancel(true);

    	// Signal end of session to Shell
    	onShutDown(this);
	}
}

The class extends the BaseStressClient parent and instantiates the SmartFox API. We then proceed by setting up the event listeners and connection parameters. Finally we invoke the sfs.connect(…) method to get started.

Notice that we also declared a static ScheduledExecutorService at the top of the declarations. This is going to be used as the main scheduler for sending public messages at specific intervals, in this case one message every two second.

We chose to make it static so that we can share the same instance across all client objects, this way only one thread will take care of all our messages. If you plan to run thousands of clients or use faster message rates you will probably need to increase the number of threads in the constructor.

» Performance notes

When replicating many hundreds / thousands of clients we should keep in mind that every new instance of the SmartFox class (the main API class) will use a certain amount of resources, namely RAM and Java threads.

For this simple example each instance should take ~1MB of heap memory which means we can expect 1000 clients to take approximately 1GB of RAM. In this case you will probably need to adjust the heap settings of the JVM by adding the usual -Xmx switch to the startup script.

Similarly the number of threads in the JVM will increase by 2 units for each new client generated, so for 1000 clients we will end up with 2000 threads, which is a pretty high number.

Any relatively modern machine (e.g 2-4 cores, 4GB RAM) should be able to run at least 1000 clients, although the complexity of the client logic and the rate of network messages may reduce this value.

On more powerful hardware, such as a dedicated server, you should be able to run several thousands of CCU without much effort.

Before we start running the test let’s make sure we have all the necessary monitoring tool to watch the basic performance parameters:

  • Open the server’s AdminTool and select the Dashboard module. This will allow you to check all vital parameters of the server runtime.
  • Launch your OS resource monitor so that you can keep an eye on CPU and RAM usage.

Here are some important suggestions to make sure that a stress test is executed successfully:

  • Monitor the CPU and RAM usage after all clients have been generated and make sure you never pass the 90% CPU mark or 90% RAM used. This is of the highest importance to avoid creating a bottleneck between client and server. (NOTE: 90% is meant of the whole CPU, not just a single core)
  • Always run a stress test in a ethernet cabled LAN (local network) where you have access to at least a 100Mbit low latency connection. Even better if you have a 1Gbps or 10Gbps connection.
  • To reinforce the previous point: never run a stress test over a Wifi connection or worse,  a remote server. The bandwidth and latency of a Wifi are horribly slow and bad for these kind of tests. Remember the point of these stress tests is assessing the performance of the server and custom Extension, not the network.
  • Before running a test make sure the ping time between client and server is less or equal to 1-5 milliseconds. More than that may suggest an inadequate network infrastructure.
  • Whenever possible make sure not to deliver the full list of Rooms to each client. This can be a major RAM eater if the test involves hundreds or thousands of Rooms. To do so simply remove all group references to the “Default groups” setting in your test Zone.

» Adding more client machines

What happens when the dreaded 90% of the machine resources are all used up but we need more CCU for our performance test?

It’s probably time to add another dedicated machine to run more clients. If you don’t have access to more hardware you may consider running the whole stress test in the cloud, so that you can choose the size and number of “stress clients” to employ.

The cloud is also convenient as it lets you clone one machine setup onto multiple servers, allowing a quick way for deploying more instances.

In order to choose the proper cloud provider for your tests make sure that they don’t charge you for internal bandwidth costs (i.e. data transfer between private IPs) and have a fast ping time between servers.

We have successfully run many performance tests using Jelastic and Rackspace Cloud. The former is economical and convenient for medium-size tests, while the latter is great for very large scale tests and also provides physical dedicated servers on demand.

Amazon EC2 should also work fine for these purposes and there are probably many other valid options as well. You can do a quick google research, if you want more options.

» Advanced testing

1) Login: in our simple example we have used an anonymous login request and we don’t employ a server side Extension to check the user credentials. Chances are that your system will probably use a database for login and you wish to test how the DB performs with a high traffic.

A simple solution is to pre-populate the user’s database with index-based names such as User-1, User-2 … User-N. This way you can build a simple client side logic that will generate these names with an auto-increment counter and perform the login. Passwords can be handled similarly using the same formula, e.g. Password-1, Password-2… Password-N

TIP: When testing a system with an integrated database always monitor the Queue status under the AdminTool > Dashboard. Slowness with DB transactions will show up in those queues.

2) Joining Rooms: another problem is how to distribute clients to multiple Rooms. Suppose we have a game for 4 players and we want to distribute a 1000 clients into Rooms for 4 users. A simple solution is to create this logic on the server side.

The Extension will take a generic “join” request and perform a bit of custom logic:

  • search for a game Room with free slots:
    • if found it will join the user there
    • otherwise it will create a new game Room and join the user

A similar logic has been discussed in details in this post in our support forum.

» Source files

The sources of the code discussed in this article are available for download as a zipped project for Eclipse. If you are using a different IDE you can unzip the archive and extract the source folder (src/), the dependencies (sfs2x-api/) and build a new project in your editor.

Download the sources.