Last updated: 2014-02-23
OpenTTD has a huge gamestate, which changes all of the time. The savegame contains the complete gamestate at a specific point in time. But this state changes completely each tick: Vehicles move and trees grow.
However, most of these changes in the gamestate are deterministic: Without a player interfering a vehicle follows its orders always in the same way, and trees always grow the same.
In OpenTTD multiplayer synchronisation works by creating a savegame when clients join, and then transferring that savegame to the client, so it has the complete gamestate at a fixed point in time.
Afterwards clients only receive ‘commands’, that is: Stuff which is not predictable, like
These commands contain the information on how to execute the command, and when to execute it. Time is measured in ‘network frames’. Mind that network frames to not match ingame time. Network frames also run while the game is paused, to give a defined behaviour to stuff that is executing while the game is paused.
The deterministic part of the gamestate is run by the clients on their own. All they get from the server is the instruction to run the gamestate up to a certain network time, which basically says that there are no commands scheduled in that time.
When a client (which includes the server itself) wants to execute a command (i.e. a non-predictable action), it does this by
In the ideal case all clients have the same gamestate as the server and run in sync. That is, vehicle movement is the same on all clients, and commands are executed the same everywhere and have the same results.
When a Desync happens, it means that the gamestates on the clients (including the server) are no longer the same. Just imagine that a vehicle picks the left line instead of the right line at a junction on one client.
The important thing here is, that no one notices when a Desync occurs. The desync client will continue to simulate the gamestate and execute commands from the server. Once the gamestate differs it will increasingly spiral out of control: If a vehicle picks a different route, it will arrive at a different time at a station, which will load different cargo, which causes other vehicles to load other stuff, which causes industries to notice different servicing, which causes industries to change production, … the client could run all day in a different universe.
To limit how long a Desync can remain unnoticed, the server transfers some checksums every now and then for the gamestate. Currently this checksum is the state of the random number generator of the game logic. A lot of things in OpenTTD depend on the RNG, and if the gamestate differs, it is likely that the RNG is called at different times, and the state differs when checked.
The clients compare this ‘checksum’ with the checksum of their own gamestate at the specific network frame. If they differ, the client disconnects with a Desync error.
The important thing here is: The detection of the Desync is only an ultimate failure detection. It does not give any indication on when the Desync happened. The Desync may after all have occurred long ago, and just did not affect the checksum up to now. The checksum may have matched 10 times or more since the Desync happened, and only now the Desync has spiraled enough to finally affect the checksum. (There was once a desync which was only noticed by the checksum after 20 game years.)
Desyncs can be caused by the following scenarios:
Desyncs which are caused by improper cache validation can often be found by enabling cache validation:
Mind that this type of debugging can also be done in singleplayer.
If you have a server, which happens to encounter Desyncs often, you can enable recording of the gamestate alterations. This will later allow the replay the gamestate and locate the Desync cause.
There are two levels of Desync recording, which are enabled via ‘-d desync=2’ resp. ‘-d desync=3’. Both will record all commands to a file ‘commands-out.log’ in the autosave folder.
If you have the savegame from the start of the server, and this command log you can replay the whole game. (see Section 3.1)
If you do not start the server from a savegame, there will also be a savegame created just after a map has been generated. The savegame will be named ‘dmp_cmds_*.sav’ and be put into the autosave folder.
In addition to that ‘-d desync=3’ also creates regular savegames at defined spots in network time. (more defined than regular autosaves). These will be created in the autosave folder and will also be named ‘dmp_cmds_*.sav’.
These saves allow comparing the gamestate with the original gamestate during replaying, and thus greatly help debugging. However, they also take a lot of disk space.
To replay a Desync recording, you need these files:
Next, prepare your OpenTTD for replaying:
The replaying will also compare the checksums which are part of the ‘commands-out.log’ with the replayed gamestate. If they differ, it will trigger a ‘NOT_REACHED’.
If the replay succeeds without mismatch, that is the replay reproduces the original server state:
If the replay does not succeed without mismatch, you can check the logs whether there were failed commands. Then you may try to replay with DEBUG_FAILED_DUMP_COMMANDS enabled. If the replay then fails, the command test-run of the failed command modified the game state.
If you have the original ‘dmp_cmds_.sav’, you can also compare those savegames with your own ones from the replay. You can also comment/disable the ‘NOT_REACHED’ mentioned above, to get another ‘dmp_cmds_.sav’ from the replay after the mismatch has already been detected. See Section 3.3 on how to compare savegames. If the saves differ you have located the Desync between the last dmp_cmds that match and the first one that does not. The difference of the saves may point you in the direction of what causes it.
If the replay succeeds without mismatch, and you do not have any ‘dmp_cmd_.sav’ from the original server, it is a lost case. Enable creation of the ‘dmp_cmd_.sav’ on the server, and wait for the next Desync.
Finally, you can also compare the ‘commands-out.log’ from the original server with the one from the replay. They will differ in stuff like dates, and the original log will contain the chat, but otherwise they should match.
The binary form of the savegames from the original server and from your replay will always differ:
To compare savegame more semantically, easiest is to first export them to a JSON format with for example:
https://github.com/TrueBrain/OpenTTD-savegame-reader
By running:
python -m savegame_reader –export-json dmp_cmds_NNN.sav | jq . > NNN.json |
Now you can use any (JSON) diff tool to compare the two savegames in a somewhat human readable way.