Surface and Orbit Authority

Server load experiments

As Owen noted on the “Server Admin Notes” thread, I recently got a copy of the save game to experiment with reducing CPU load.

The basic process I followed was:

  1. Load up savegame locally, and note “Server Sim CPU %”.
  2. Delete all grids, and note the CPU% again.
    a. That gives me the upper and lower bounds.
  3. Reload with the grids back, and start deleting grids and watching CPU%.
    a. Delete unowned grids.
    b. Delete debris and “abandoned” grids.
    c. Delete grids belonging to players that haven’t been around in a while.

I also tried disabling in-game scripting to see how much of an impact TIM and such is having.

So results.

These numbers are all approximations, as the sim CPU doesn’t sit steady, so I sat and watched the range for 10-30 seconds, and then took a stab at estimating the median.

Initial CPU load: 124%
Load with in-game scripting disabled: 114%
Load with no grids: 25%

Incremental improvement after deleting:

  • NPC grids and unowned/debris - 10%
  • Grids belonging to Blind Firepower, ValHallas, DarkEngines, Luggage66, Skoox - 32%
  • Grids belonging to CanderThal, Chass3ur, XanderSae03 - 11%
  • Grids belonging to Xenoc, Derspiny, Hez, Echo, Woof - 46%

The big lessons for me here have been:

  • Scripting isn’t having that much of an impact.
  • NPCs may or may not have much of an impact, depending how many grids, and what they’re composed of. (Some of the player-owned grids I deleted were NPC grids that were taken over.)
  • Player grids have quite a bit of impact.

Given the last point there, I’m wondering what sorts of guidelines/rules we should have for cleaning up grids of people who are no longer active on the server. I’m also pondering whether the criteria should depend on whether or not they’re a “member in good standing”, according to the charter. I think that’s an interesting conversation to have.

The lists of players above don’t really have much rhyme or reason to them: They’re not quite a list of “member” and “non-member” players, they just sort of happened to be how I was deleting groups of grids by owner. But, given we’re regularly bumping up against the limits of what the server can handle without slowdown, I think it’s worth a conversation about a few things:

  1. What categories of players/people/humans do we have here?
    a. What do we call them?
  2. How do we decide what to manually clean up to keep performance acceptable?
    b. How do the categories above influence/impact this?

I’m very interested to hear others’ thoughts on this.

The way you sorted your experiments kinda forces me to talk about some of my beliefs about who the server is “for.” As the current Space Master, these beliefs probably matter to people besides myself, since they drive what kinds of things I’m likely to support, implement, or pitch in on. Brace yourself, I’m going to talk frankly about some uncomfortable topics.

The way I slice the server population when I’m thinking about proposals and administrative work is in terms of how much investment each player has put into the server. “Investment” in this sense comes from multiple places:

  • Time spent playing the game quietly and building things is investment, although often it’s the most self-serving kind as most constructions are not shared that I’ve seen;
  • Time spent playing the game and talking to other players is investment, which pays off in terms of keeping people engaged and socializing and in terms of setting the tone and culture;
  • Time spent outside the game and talking to other players in Discord or on these forums is investment, of a similar kind;
  • Time spent writing, reading, and passing proposals is an investment in changing the nature of the game we all agree we’re playing; and
  • Time and money directly spent on tasks that support the server, forums, site, and Discord sever is an investment in everyone being able to play the game in the first play.

There are probably other kinds of investment I’ve forgotten about.

I’ve sorted them like this because, in my mind, these individual kinds of action show an increasing degree of investment. At the very least, someone who takes the time to read and revise the charter is investing in ways that have a larger, longer-term, and broader impact than someone who builds a very cool spaceship for themselves and their buddies.

I bring this up, because this framework is how I consider what I’m willing to do and support. To me, the question is one of deciding where to divide this list into “important investments” and “not important investments.” Personally, I believe that the server is “for” anyone who invests in the community. Those are the important investments. It’s open to people who want to play solipsistically, too, and I’m more than happy to have people playing even if all they do is build neat things, but silent players’ needs are the least important to me. Those players can play the same game nearly anywhere, and don’t have much dependence on SOA’s community or resources. We even publish the mod list and game configuration in use, so that people who want to play by the same rules can do so even if they don’t do it on the SOA servers.

One important corollary to this is that my sense of who the server is “for” includes people who have made the effort to join as voting members, but also includes - on relatively equal footing - players who participate in the community without joining up or voting.

Someone’s level of investment can and does change over time. I’ve been less active on the server in the last month or so as my life has turned in other directions. In terms of the framework above, I’ve reduced the amount of investment I do in terms of building things and being present on the server, and somewhat reduced my investment in Discord and these forums. As my life swings back the other way, I expect to invest more heavily in those things again, but if I had to be brutally honest with myself, I’d say I’m fairly disinvested right now other than in that last category, where I am fully invested so that the lights stay on. This is normal, in any project.

When we’re talking about limited CPU and about deleting grids, we’re talking about reallocating resources from some players to other players. Inevitably, players whose grids are deleted will lose out, as other players build more projects taking up the CPU cycles freed up. I have faith that we can devise a system for doing this that we all agree is “fair” - there are presently only four people to convince, so it’s not socially that hard to have those conversations and find compromise - but in the long run, we’ll need to be cognizant of how and why we’re doing that reallocation and empathetic to those who bear the costs.

TL,DR I think that, if we want to reckon with reclaiming CPU time on the server, we’re going to need to have a discussion about who will benefit from that reclamation project, and who will lose out.

Having said all of that, I have another suggestion. Torch is an improved version of the Space Engineers dedicated server software, and PingPerfect, our hosting platform, claims to support it. The Torch Concealment Plugin claims to dynamically pause and unpause simulation on a grid by grid basis based on configurable conditions. This might allow us to compromise by concealing grids that aren’t currently in use, without outright deleting them, so that if the players who own those grids return, they can resume without losing the time invested in their projects.

I’m willing to experiment a bit with Torch, but I’ll need time to do that and I don’t know when I can make the time. If there’s interest, I might be encouraged to try it sooner rather than later, though, or if anyone wants to volunteer to figure out how this works, I’d love to see your notes afterwards.

I like the Torch idea. If we can get some performance back without having to remove any grids (other than obviously removable ones) that would be ideal.

“Some performance” turned out to be an understatement.

I rolled out Torch today. With just me online and only my three main ships and their respective shuttles and UCU complement in range, the server CPU thread load was hovering at around 35-40%. Flying far from my ships and watching the server logs allowed me to check the “minimum” load - with all ships concealed, the server consumes about 20% of its CPU capacity. That’s a massive improvement, and probably obviates other work @SurprisingEdge was doing to track down offending grids.

Torch also has plugins to profile grids on demand, to determine which player, grid, faction, or block type is consuming the most simulation time, and by how much. I haven’t installed the profiler plugin, but I might do so when our CPU load creeps up again. That’ll give us much more targetted information about server load, with a lot less effort and data sharing.

1 Like