HTML5 MMORPG Part 3: Fake mongoose documents are cool

Start by reading part one if you haven't.
In the last post, I left you with my conclusion that SQL is not suitable for storing all the entities of a realtime game server.

So I started looking for other ways to manage that while still keeping the benefits of Entity Systems, and finally found something.
It is mongodb, but with a few tweaks.

NoSQL databases for Entity Systems

mongodb is one of the NoSQL databases. It is different from SQL because it is not relationnal, but it is extremely fast and have some nice properties that make it possible to run an Entity System in it.

In mongodb instead of SQL rows, you have documents. Basically, a document is a json object which mongo stores in an efficient indexed way that I don't actually know much about. Documents are created from schemas, which are some sort of "json templates". Each of these documents have a unique ObjectId that can be use to embed references to another document in a document.

It looks like that:

var positionSchema = new mongoose.Schema ({  
    x : {type : Number, default : 0},
    y : {type : Number, default : 0},
    z : {type : Number, default : 0}
}); // Describe the structure of the model
var PositionComponent = mongoose.model('positiondatas', positionSchema); // Creates a mongoose model that will manage the collection  
var position1 = new PositionComponent({  
    x : 1,
    y : 3,
    z : 4 }); // Instantiate a document from the model

position1.save(); // Saves the document  

So basically we apply this to our Entity System this way: when you create a component you instantiate a document for this component.

More precisely, here is the structure of the entities and components:

// The entity Schema
// The components array contains the *name* of every component attached to the object, so we can find them easily
// The data array contains the ID of the data document for each component
var schema = new mongoose.Schema({  
    label : String, // They are here for debugging purposes, and should not be used in game logic
    components : [String],
    data : [mongoose.Schema.Types.ObjectId]
});
var Entity = mongoose.model('entity', schema);  

In addition to that, you need to have an object that I will call dataModels that contains every component models you have created, so you can access them later.

Given that structure, here is how we would create an entity and assign it a "position" component (the one described before):

var entity = new Entity({label : "player"});  
var position = new PositionComponent ({x : 0, y : 0});  
entity.components.push("position"); // We add a position component to the entity's components list  
entity.data.push (position._id); // We add a reference to the actual component to the data list  
entity.save();  

Now imagine I want to get every components for an entity. To do that I need to store somewhere every component model

// Assuming "player" is an entity instance that has a component position
var playerComponents = {}; // Container for the results  
for (var i = 0; i < player.components.length; i++) {  
    var componentName = player.components[i];
    // Finds the component document and put it in our object
    playerComponents[componentName] = dataModels[componentName].find({_id : player._id}); 
}
console.log(playerComponents.position.x); //position of the player  

Note that almost every call to mongoose functions (like the find here) is asynchronous, so we would need to use callbacks in real code.

First issue: Callbacks are not cool

The first problem here is about callbacks. While you can perfectly manage to code with callbacks everywhere, it just makes everything more complicated. Considering every mongo operation is asynchronous, only the piece of code getting the n components for an entity would need a nesting of n callbacks.

It means that all your code that depends on components (which is basically all your code) needs to run asynchronously. Not cool.

I initially wrote a complicated system of nesting callbacks using promises but then realised that developing a game with that will just be pure pain. Not only that, but the performance was just ridiculously low.

So I just got rid of asynchronous stuff. It's actually quite simple: You don't exactly need everything to be constantly stored in mongo. You just need it to be saved there. Here's the trick:

  • You create all your documents at server start
  • You duplicate everything that needs to be accessed in real-time in running memory.
  • When the game systems need to get/set component values, instead of running mongo queries, you work on your duplicate of the data
  • In a background thread, you regularly save the data

With the drawbacks being:

  • You can't use powerfull mongo searching functions. But honestly, using underscore.js collections functions, you can run the most complicated search queries without problems
  • Your data isn't instantly saved, which means a wild server restart could make some data go back (more on that later)
  • You have to store all your game data in running memory. Of course for big data that doesn't need to be accessed in realtime don't do that. Use it only for components that need to be accessed fast.

The implementation is pretty straightforward: You have a json object for each type of components, and one for the entities. Each entry is a mongo document indexed by its objectId. Instead of running mongo requests, you just fetch your data from here.

Models instantiation time

After creating my magic memory cache, I ran other performance benchmarks. I created a test position and stats component, and a player assemblage (which is the combination of a position and a stats components).

My benchmark was simple: Creating 3,000 players, and getting back their position data

Results:

Time for creating 3,000 players: 825ms  
Time for getting 3,000 players position: 50ms  

Which might be usable for a very little game, but was still pretty bad. So I started wondering what made it so long to process. I wasn't saving anything, so... what?

Then I realized that mongoose models instantiation is really long. I don't know why, and I don't really care either. I found a github issue about that, which may explain a few things.

So I thought of a workarround: what if instead of creating documents, I created fake documents, that would later be replaced by the real ones?

Here's the idea for an entity creation:

var reservedId = mongoose.Types.ObjectId();  
var entity = {  
    _id : reservedId,
    label : label,
    components : [],
    data : []
};
this.entities[_id] = entity;  
replaceByRealEntityInBackground(_id, Entity);  
return entity; // returns the fake entity that looks like a real document  

Then you can run the rest of your code while using this temporary entity, and the background function will take care of replacing this.entities[_id] by a real mongoosedocument. A similar logic is applied to components

I don't know yet how to implement the background function. Probably a worker or something, it's probably not that hard anyway. So I ran the benchmark with this new system, and the results were:

Time to create 3,000 players: 196ms  
Time to get 3,000 players position: 41ms  

So the creation time was, as expected, way better. We're talking of a 400% increase here.

ObjectIds are slow too

Still, the get time was still pretty long, and the creation time could have been better. So I started looking for something else that may be eating the performance.

So just a shot in the wild, I tried replacing the mongoose.Type.ObjectId generation by a simple Math.random()

And...

Time to create 3,000 players: 57ms  
Time to get 3,000 players position: 4ms  

Now here it is. That's the performance I was looking for. With these benchmarks, you can safely assume that the entity system will not slow your game down. This is another 400% increase on top of the other one. So, overall, these modifications made a 1,600% increase, making this Entity System actually possible.

So why so much performance increase in the get time? I only changed the ID generation.
Well I think the answer is pretty simple: Mongoose ObjectIdtype must be very slow. They are not just numbers but complicated strings which probably also wrap methods. So, getting stuff based on that type happens to slow down everything that uses it. Considering we use these ids as pointers to components, well, why not.

So how to get rid of ObjectIds? Well, we don't need strings, nor methods wrapped in ids, or whatever. Let's try something dumb and simple:

var EntitySystem = function () {  
    // Code of the entitySystem
    this.ids = []; // Array containing the real IDs for each entity
}
// Other functions
EntitySystem.prototype.createEntity = function (label) {  
  var entityId = this.ids.length;
  this.ids.push(0); // Reserves a temporary ID value
  var entity = {
      _id : entityId,
    label : label,
    components : [],
    data : []
  };
  this.entities[entityId] = entity;
  this.createRealEntityInBackground(entityId);
  return entity;
};
EntitySystem.prototype.createRealEntityInBackground = function (fakeId) {  
    var realId = mongoose.Types.ObjectId(); // Reserves an ID
    this.ids[fakeId] = realId; // Sets the actual ID for the fake Entity in our hashtable
};
EntitySystem.prototype.saveEntity = function (entity) {  
    entity._id = this.ids[entity._id]; // Sets the entity ID to the mongoose real one
    var entity = new Entity(entity); // Creates a document corresponding to this entity
    entity.save(); // Saves the generated document
    return entity;
}

Of course, imagining the createRealEntityInBackground is a background process that runs in a thread or something, so invoking it cost virtually nothing.
You can run similar code to manage the components, maybe with a separate ID list for each component to keep arrays little.

So, let's run the final benchmarks with this system:

Time to create 3,000 players: 45ms  
Time to get 3,000 players position: 2ms (between 1 and 2 actually)  

Turns out the performance is even beter. We won ~10 ms on the creation, which is 20% perf, still cool. And divised by two the getting speed. This is probably due to my previous code using Math.random() which is longer to execute and produces float values.
This benchmark doesn't take into account the cost of launching the background process, which I didn't implement yet. But it shouldn't cost anything more than a function call though, so we can safely ignore it.

Final thoughts

In the end, I went pretty far from the initial "full SQL" idea. But the engine actually works just the same way. The core principles are preserved:

  • Components are standalone pieces of data, accessible by any system
  • Entities are just references to component
  • While we don't have the power of SQL queries, we can still use underscore.js to achieve the same result
  • The data has the same form on the database and on the code base. The only difference is that we just throw it in a mongo document before saving it when we need.

Some drawbacks/potential problems:

  • If you have really a lot of entities, the ids table can become quite big. There are many ways of solving this issue and you should not really worry about it until it becomes a problem.
  • When you save an entity, you have to replace your working entity by this one. For example if your code runs with a fake entity ID, and then you save it, you will need to replace your fake entity by the one returned here. Other options are:
    • Keeping documents separated from working entities. This implies doubling the ram usage (everything is stored twice).
    • Creating a document each time you save. This implies for each save to check if the document already exist (if it is an update or an insert) and then create it/fetch it from mongo, and then only update it.
  • When you destroy an entity, or a component, you create empty space in your ids list. This may be a problem if your server runs for a long time (the more time pass, the more your array is full of "white space"). Maybe making the ids an object instead of an array, with a separate ID counter so you can actually delete entries when you destroy an entity/component?
  • You need to keep in mind that critical data will have to be immadiately saved once acted on if you want to avoid crash server exploits. Imagine you have a trade feature, you would end up with some server-crash-glitch. You can see an example of this in Minecraft: If you put items on the floor and the server crashes, your character is not saved (so you keep the items) but the items are still on the floor (so they are duplicated). Many games have known exploits of this kind. This actually is probably the reason while moving/trading objects in MMO like wow takes a little big of lag: server validation.

Just for fun I ran my benchmarks with absurdly big values:

Time to create 100,000 players: 1462ms  
Time to get 100,000 players position: 62ms  
-
Time to create 1,000,000 players: 12,467ms  
Time to get 1,000,000 players position: 490ms  

I think this will do it.

As usual, the code is on github and you can check it to see my implementation. By looking at the commits you can see the different steps explained in this post (mostly in the mongodb branch I think).


comments powered by Disqus