MineRoyale.io Server Architecture

1/14/19

Built on Mistakes

I made many mistakes through the process of making my game, when it comes to servers. I relied on some libraries that I shouldn't have, and I placed too much stress on easy breaking points in my server infrastructure. I also made choices that increased costs significantly. Here is the process of my development, and what I did to improve the servers.

Stage 1: Colyseus

From the outset of developing my game, I used a multiplayer game server framework called Colyseus. This is a great framework, but it is not yet ready for scaling games. The way I was using it was that it would work on a single machine with many cores. If there was an available lobby with players in it, the matchmaker on each core would know, as all the different processes communicated using Redis. If there was not an available room, a new room would simply be created on the core that the client was load balanced to.

The Redis solution worked well, but there were still some kinks in the framework which were causing issues. Sometimes, there would be rooms that Redis advertised which did not actually exist, preventing players from connecting until the server was restarted. This was a major issue at the start, which I at least partially fixed by directly editing the Colyseus source code. At that point, I knew that Colyseus was not the best option, but I wanted to stick with it just a bit longer before completely extracting the convenience that it created in the backend design.

Stage 2: Leaving AWS

At the start of my development, I was using AWS for servers. I had Redis on a separate server from my game server. A major mistake that I made with this, price-wise, was that I did not have my Redis server in the same availability zone as my game server, incurring data transfer costs. To avoid these costs, the servers must not only be in the same region, but ALSO in the same availability zone.

That being said, you should never use AWS for multiplayer games, as I quickly learned. At the time of writing, they charge $0.09/GB of outbound traffic from your servers. With a game sending 20 messages per second, Gigabytes of outbound data can rack up quickly. I quickly made the decision to move my game to Digital Ocean, then to Vultr soon after. These providers offer hundreds of GB data transfer built into their plans, which are as cheap as $5/month. Going over these limits will only incur $0.01-0.02/GB on these providers.

Stage 3: Switching from Colyseus to uws + load balancer

It was time to scrap Colyseus, as I had my server restarting every few hours just to ensure that players could connect without problems. I chose to use uws. If you want to get into uws quickly, simply look at the older packages of it on npm like this one. I no longer had the Redis capabilities that Colyseus offered for matchmaking. Instead, I made a load balancer with node-http-proxy which I put on a separate server. This communicated with my game servers via private network and proxied the client connection with the server that had a lobby starting soon. This solution worked well, but node-http-proxy seemed to have some problems.

Stage 4: Separating web server from WebSocket server

Because I was proxying everything from my game web servers, I was sending image assets from the servers behind the load balancer, which also had to pass through the load balancer. This event regularly caused a huge CPU increase for a few seconds, when a player is loading the game. To solve this, I separated the server which sent static files like HTML, PNG, etc. from the game servers which handled the WebSocket connections. Then, I simply proxied the WebSocket connection to the soonest starting lobby, and let all the other requests go to a separate web server (which was not proxied to).

Stage 5: Removing load balancer, letting client connect directly

My choice of node-http-proxy seemed to have some problems. I did not want to use things like nginx or HAproxy because I was not doing tradional load balancing, but rather load balancing based on game state. I discovered that at random moments, my load balancer was having random CPU spikes, then eventually crashing. I needed to have the player connect directly to the server, rather than dealing with proxies. I made a front facing web server which listened to messages from the WebSocket game servers. Rather than proxying, this server simply told the client which server to connect to directly. I exposed a port on all of my game servers, making it possible for clients to connect directly. It would not be a concern if players try to jump into a server that is already playing, because new players simply spawn as spectators, not actual players.

Conclusion

I now realize the optimal solution for building a multiplayer infrastructure with matchmaking is to reduce breaking points, such as the load balancer or the Redis server. Let clients connect directly, when you can. Never rely on third-party software to make your job easier, if it is not widely used in production.

Want to learn more? Join the MineRoyale.io Discord to chat!