Surface and Orbit Authority

Server Admin Notes

Hi folks!

As required by rule 9.2 (“The Official Server”), I keep notes on any changes I make to the official servers. This thread is a running log of those notes.

As always, if you want to play Space Engineers with us, you can join the server yourself at play.lithobrake.club:28015.

Hi folks,

As Space Master, I made a small edit this evening to fix a stuck grid. @Canderthal reported that a ship had become mysteriously immobile. After inspecting the grid, I could see no obvious reason it was immovable: it was not docked, embedded in terrain, a station, or attached to a landing gear. I moved the ship about 10 meters with the space master tools, and it flew normally after that.

Hi, folks!

This afternoon, @David_s pinged me on Discord to let me know that all mod blocks had disappeared from nearby grids and the server’s top speed was the default 100 m/s. On further investigation, I discovered that the server no longer had mods configured at all, and hadn’t since at least the preceding restart half an hour earlier. I immediately turned off the server until I could dig further. The server is now back up and running, but I was unable to restore grids damaged by this incident.

What Happened?

Honestly, I still don’t fully understand why the game deconfigured all mods. I read through the server log files, however, and the most recent log file contained entries indicating that it had been unable to contact Steam during mod refresh (a normal part of server startup). This is my best guess as to the smoking gun.

Why Did We Lose Data?

  • Space Engineers’ default backup configuration creates a new backup save every five minutes, and retains the five most recent backups. This gives us a 25-minute rollback window for any changes, including accidental changes and server configuration problems. We did not discover the problem with the server’s mod list until more than 30 minutes had passed, so none of the remaining backups contained data with the mod list intact.

  • Space Engineers discards individual blocks on load if the block is associated with a mod that is not loaded in the server’s configuration. This caused the game to throw away modded guns, ammunition, and other objects.

  • I rely primarily on these backups for ongoing operational continuity. I do not make regular backups of the game, although I do make as-needed backups before upgrades and other major changes. The most recent of these backups was more than a week old and would not have helped recover from this incident.

What’s Changing

In order to reduce the risk of future data loss from the same root cause, I’ve made changes to the server configuration. Backups are now less frequent, and are created every ten minutes instead of every five minutes. I’ve also extended the number of backups from five to twelve, giving us, in total, a 120-minute rollback window. Given how observant y’all are, I believe this is adequate to allow us to recover if we’re affected by this problem again.

The increased time between backups may make timewarps worse when the server crashes. I believe this happens infrequently enough that it won’t be an issue, but the server did crash last night. It happens. If this turns out to be a problem, we can revisit it and make other choices.

Other Changes

I took the opportunity while I had the server offline to do some chores:

  • I’ve updated the MOTD, as outlined in the MOTD Changes thread.

  • I’ve updated the Trash Removal settings. Grids are now only protected out to three kilometers from the nearest player, down from ten. This should help reduce the sim-speed drops we’ve been seeing lately. (Other trash-removal settings have not changed: stations, powered grids, respawn points, and grids with production facilities are still protected regardless of distance, as are grids with more than 20 blocks.)

Thank you so much derspiny. And for dealing with it a second time this morning. I really hope we can get to the bottom of why this keeps happening.

On the subject of “this morning”…

Once again, @David_s gets credit for catching the situation. Once again, all the mods were gone. Once again, there’s a log entry for network errors during server startup and mod updates.

However, the expanded backups allowed us to recover, this time, as I was able to roll back to an earlier version of the save using the game’s backups. We lost 40 minutes during which nobody was playing, rather than losing all mods and all modded blocks created since last night’s incident.

Hi folks,

This evening @Canderthal notified me that his game was crashing as soon as he loaded. Upon checking the area around his base, my game also started to crash instantly. I was unable to determine what was causing the crash, but it apppeared to be highly localized.

I’ve rolled the server back by half an hour, and that seems to have un-done whatever created the crash. As we have no idea what the problem was, I fully expect we’ll see it again - please let me know immediately if you see this issue, as we only have 120 minutes or so to roll back.

The symptoms to look out for are your game client freezing solid, then exiting after about a minute.

As predicted, we had another round of client crashes this morning. I rolled the server back (~15 minutes lost) to address it.

While investigating, Canderthal noted that the crashes started when he requested a refill from an oxygen tank block on his ship. I don’t know if this is the actual trigger, but it’s worth testing. I won’t have time today to verify it, unfortunately, but please do share any observations.

Edit: I’ve deputized @SurprisingEdge to handle restarts when I’m away from the desk this weekend, as well.

Hi folks!

@SurprisingEdge pointed out today that the sim speed on the server was hovering around 0.7, meaning that the game was running at less than three quarters of its normal speed. After poking around, I concluded that it was likely due to loose debris grids, and did a manual cleanup of a large number of “Small Grid XXXX” grids with small numbers of blocks and no owner. I also removed a few larger grids that were, to visual inspection, obviously unattended debris.

A large proportion of this debris was between 3 and 7 kilometres from a player. The current grid GC settings are to preserve grids within 10 km of a player, something I more or less picked out of a hat when I set up the server. I intend to reduce the garbage collection radius to 3 kilometres sometime after Wednesday, to automate removal of this variety of trash. Please register objections in this thread. As a reminder, trash removal is set up only to remove grids with less than 20 blocks, which are not stations, not powered, not respawn points, and not manufacturing facilities.

By observation, a large proportion of the debris appears to have been spawned by disputes between NPC factions, which is pretty funny.

1 Like

I would like to note that despite it’s PCU usage, and incredible inconvenience, Derspiny declined to remove the pirate base that keeps sending ships my way. Out of some sort of “ethics” or something?

Bastard.

:wink:

I did another round of grid cleanup today after the sim speed dropped to 0.5. Unfortunately, this time I included larger NPC grids, and in the process, removed a grid someone was actively working on looting. To the affected player, I apologize - doing this by hand is pretty error-prone, and I try to avoid it, but the server was chugging badly and it did help a ton.

@derspiny, First thank you for all of your efforts. This sounds very tedious. Hopefully we can figure out the right balance of settings let us have all everything we want without all of the manual effort of clearing out the grids.

I will try to remember to put in a screenshot if I am working on looting an NPC grid. Hopefully I can include the grid description.

Grid names are also a huge help. That’s how the admin entity list displays things. It also includes owner, speed, distance from the world origin, and distance from the nearest player, but the grid name itself is the most obvious thing to check.

Feel free to DM me on Discord if you’re not comfortable disclosing this kind of information publicly! You can also email me (owen@grimoire.ca) if you prefer - I check my mail pretty religiously.

This is now in effect.

A few points for discussion :slight_smile:

Is it worth upping the block threshold for cleanup from 20? Not certain what to, but I can see wreckage easily being more than 20 blocks for a large NPC.

How does a regular scheduled cleanup sound? So for example each week the Space Master goes through the grid list, identifies those that are clearly junk, and deletes them. This would catch those that elude the auto cleanup, such as large wrecks that are still powered but otherwise broken.

1 Like

I’ve been meaning to reply to this.

Surprisingly little wreckage comes in above the 20-block level. However, some does, and that debris will accumulate over time. (Debris rooted in Extra Encounters NPC grids seems to get cleaned up by that mod, but player grids are not.) At some point we’ll need to clean up grids again.

I’m on board with doing it regularly, and I believe that some version of Space Master 2.0 means I can enact a policy of doing so (and reporting on it) pretty much at will. Anyone think I shouldn’t do this?

Assuming 20 blocks includes incomplete blocks, I’m totally for raising that limit, if we think it would help. Laying down 20 incomplete blocks isn’t hard, and if being powered stops it from being cleaned up even if it’s smaller, this seems ok.

No derspiny, I don’t think you shouldn’t do this. :wink: Whether I think you should have to is different though.

I concur. This type of exercise if needed now should be temporary and done with the purpose of tuning the settings so that it is not needed in the future. It probably will be needed for the foreseeable future but just much less frequently.

Hi folks!

Earlier this afternoon, @Canderthal experienced an abrupt and quite severe lag spike, which resulted in an otherwise-avoidable collision with an asteroid and the destruction of his ship. After talking to him about what happened and looking at the screenshots, I concluded that this falls in the bucket of “technical issues” rather than “normal bad outcomes,” and opted to remedy.

The solution we worked out was that he would blueprint his ship and then I would spawn it for him, replacing the destroyed ship. I have done this this evening.

I do think this is right on the line, but lag is clearly more technical than not. Discussion very much welcomed on this.

I agree with your assessment and probably would have done the same. This example is right on the line. I do my best to play as though I might lose connection for 30 seconds at a time but there is only so much one can do to mitigate that.

Once we can reduce the impact of loading a player on the server and the NPC server load, we should have a better baseline on what is a ‘normal’. At that point, we might expect players to operate safely. I am optimistic that this situation is temporary. I agree, that in the long run we don’t want the space master to have to make these type of determinations very often.