Ghost’s blog » Setti stuff

April 21, 2008

Server statistics plugins for Munin

Server statistics plugins for Munin

It was summer of 2007 when these plugin first saw the daylight.

http://css.setti.info/code/munin-srcds/

At the time there was lots of fuzz about server FPS and such. People, players and other server admins, were throwing out stupid ideas about what server FPS is and how it’s noticeable in-game. GSPs (Game Server Providers) were (and are) selling 1000 FPS servers. Nobody actually knew anything about how the FPS behaves on a server.

Aside of the server FPS, the server says its uptime, number of players, number of users (which probably means RCON accesses?) and network traffic on console command “stats”. These stats were easy to piece together along with FPS stats. There’s also “CPU” entry in the stats command output, but it’s considered extremely inaccurate, and as such left out from the plugins.

After almost a year of having these plugins mostly for our own use, there came need for something better. The initial versions of the plugins graphed only one server. On April 2008 the plugins got facelift to wildcard versions, supporting as many servers as possible.

Briefing to Munin plugins

Munin plugins are easy to write. All they need to do is print out rrdtool compatible configuration string for the “rrd-database” initialization, and then print out measured statistics in simple “key.value xyz” style. Then there are also “autoconfig” and “suggest” features, which help installing the plugins.

Autoconfig is used to determine if it is even possible to run a plugin on the server. In all Munin plugins there should be this autoconfig feature, which says either “yes” or “no” depending on whether the plugin can be used on the particular host. Additionally it should say “no (reason why it can’t be used)” to help server administrators to get the plugin installed correctly. If the answer is “yes”, then there’s no need to further break down why it can be used - obviously :).

Suggest functionality is slightly more complex. It’s used with so called “wildcard plugins”, which are just like normal plugins, but they take parameter after the base plugin name. For example plugin “if_” is used to gather statistics from network interfaces, but there can be several network interfaces such as eth0 for LAN and eth1 for ADSL modem - or some weirdo GSM interface(?). Thus, the “if_” plugin would be linked as “if_eth0″, “if_eth1″ and “if_gsm”. Then all network interfaces would have their own statistics generated by Munin without the need to have one plugin for each network interface. Suggest feature comes in here. It suggests how the wildcard plugin should be linked to the Munin plugin directory. In this example case it would say “eth0″, “eth1″ and “gsm”. Then Munin would automatically know how to activate the plugin.

Special Munin functionality in the srcds_* plugins

The source dedicated server (srcds) plugins have autoconfig feature but not suggest. Suggest feature would’ve been somewhat difficult to implement, because the plugins need to have RCON password defined separately. The idea in the suggest feature is that there’s no need for further configuration after the somewhat automagical installation phase. Autoconfig however works, because it can say “yes” or “no” depending on whether it found the RCON tool or if it could connect to the defined game server.

Terrible truth

It turned out that the server FPS was fluctuating a lot. There were many custom compiled servers on Setti CSS server, but FPS-wise they all performed badly(*). Less hi-tuned kernel turned out to be better FPS-wise. Still, the FPS is clearly fluctuating along with the number of players.

Server FPS (day)

At morning hours the FPS seems to be varying between 400 and 500. At the evening the FPS can be seen to drop closer to 300. That’s because there are fewer players on the server at mornings than on evenings. There are two bots on the server at night time, so that’s why the FPS probably fluctuates at morning hours too.

The FPS plugin also takes several samples from the server to ensure that the values just don’t happen to be extreme. It calculates mean (alternatively median, can be configured directly to the plugin) value from the samples. The default Munin timeout value, however, is about three seconds, so the plugin takes five samples in about 0.2 second intervals. In reality this means about five samples in 1.5-2.0 seconds. It should be close enough the truth at any given time - except on map changes - on the server.

Well adequate

Now we can be sure that there’s at average about 450 FPS server running at css.setti.info:27015. Nice.

How about your server?

Start from http://css.setti.info/code/munin-srcds/

Post your server stats to the forum thread considering the plugin.

(*) Note: The server running Setti CSS server also runs a high-traffic MySQL database configured to extreme extents, which may have caused the hi-tuned kernels perform less than optimal.

March 13, 2008

Upgraded server scanner

Setti server scanner was updated couple days ago. The new server scanner searches for new free-for-all servers from the official Steam masterservers constantly.

The system used to be semi-automatical this far. Semi-automatical means that there was script to find authentication free servers from the Steam masterservers. The results had to be semi-automatically inserted into the main server scanner system. Semi-automatical here too means that there was script which added list of servers from a file to the main system.

The new system does searching for new servers completely automatically. There are few sensibility checks in the system, which prevent false servers from getting on the list too easy. The system scans all servers listed by Steam masterservers periodically. After a certain server has been confirmed few times to allow the test dummy client to join, the server is added on the main list where it goes through same kind of verification all over again - but it’s also listed at Setti server list.

Here’s graph about number of servers found by the server scanner. The gentle slope in the number of servers is caused by the new server scanner. It has found about 800 new servers, which are mainly non-interesting HL1 based servers, though ;).

Number of servers (daily, 2008-03-13)

Most of the new servers “die” quickly in a week or so. That’s because the servers are probably home-hosted servers with dynamic IP. After their IP changes, the old server seems to have died, although most likely there’ll be new server from another IP. Usually home-hosted servers have big latency and unreliable connection altogether. That’s why it’s not so important to try to keep up with all the new home-hosted servers, but find those servers which are reliable.

Here’s graph about how servers disappear in a month time period. In the right end there’s that 800 new server boost in the graph.

Number of servers (monthly, 2008-03-13)

Now with the new system it’s expected to have steadier curve there. Servers shouldn’t disappear in a steady dozen servers per day speed. For every lost server there should be a new server found. That way the system can keep the server list fresh without any manual processing.

The historical reasons why the server scanner didn’t search for Steam’s masterserver before, is that there weren’t server patches which allowed the servers to show up in Steam’s masterservers. Maybe there were some patches, but there weren’t enough to have any good interest for anybody. Now there are, and they’re listed at Setti server list.

In a case you read this, you might be interested reading History of Setti server querier, Setti Masterserver - Better server browser and Masterservers upgrade.

February 22, 2008

Upgraded masterservers

Masterservers upgrade

Both HL1 and HL2 masterservers were upgraded on 2008-02-10.

The masterservers used to be simple creatures, which sent all servers on one request. The system worked well with relatively small few hundred CSS server list, which could be sent in couple 200 server packets. The HL1 server list is much bigger. There are over two thousand servers, which require at least ten packets to list all servers. (The limitation of number of servers in one packet comes from the maximum size for UDP packet.)

The whole list of few thousand servers list, packed in a very compact form used by the game, took somewhere about dozen kilobytes of size. At peak times there were over hundred requests per minute to the masterservers, so the masterservers sent data out about 20 to 25 kilobytes per second. It’s almost nothing on high speed 100 Mbps network, but it wasn’t nice to send all the data at once because the clients expect only one packet, after which they request for more.

Now the masterservers work more like Valve’s masterservers. Valve’s masterservers still have better logic of sending first the lowest ping servers. It’s not big issue with Setti masterservers, because regardless of the order in which the servers are sent, it takes only second to get the full list.

Setti Masterserver

Setti Masterserver usage statistics

January 13, 2008

Match report - Part II - The match

Game Report

Match info

Team1: Setti
Team2: MyrmidonI
Time: 2008-01-12 18.00 UTC
Place: Setti CSS server (css.setti.info:27015)

Result: Setti - MyrmidonI 3 - 27 (1 - 14, 2 - 13)

Match report

We, Setti team, had simple tactics to have them failsafe. Anyway, even the simplest failsafe tactics were not followed through in the match. The failure to follow tactics was probably partly because we had so many changes in the lineup and we didn’t have time to practice with the whole team.

In the following chapters I will go through our tactics on both sides.

Setti on TERRORIST

Full B

I think our best “tac” was “Full B”. We tried that on the first pistol round and then two times with better weapons. The tactic was to run to B and throw flashes as early as possible through the tunnel with the two frontmost players. The other three would stop for second or two in the “upper dark”. The two in the front would throw another flash farther behind the boxes while running backwards to not get flashed by their own flashes.

The tac worked so good that we got through the tunnel - at least some of us did. I think andicks000n didn’t fully grasp the idea in the tac, because he didn’t follow it through. Zadick once turned in the tunnel, which is probably the worst thing to do. Andicks000n and Zadick were the newest players in the team, so they didn’t have time to practice the tacs as much. Maybe that explains it. Toble, k1ller and I got through the tunnel sometimes, but we never got there as simultaneously as we wanted.

Split A

The second best tac was “Split A”. There we would use the “catwalk” and the “long” by 2-3 or 3-2 players, depending on how Toble calls it, to get to A.

This tac worked well until we saw the enemy. Especially the long was difficult to breach because we couldn’t kill the enemy. The catwalk players had better positions to shoot, but somehow it also boiled down to nothing. I think none of us really knew what to do when we get to our positions. In the round start it was also unclear who was going which way. Simultaneousity lacked here too.

Silent camping and other miscellaneous tacs

Silent camping worked somehow. At least we didn’t die for 60 seconds. When we were camping I don’t think we had good camping positions. Eventually the “let’s go get them” part failed because we couldn’t coordinate it well.

There were also couple non-coordinated or poorly coordinated attacks, which failed miserably.

Setti on CT

Our tac was 1-1-3 or 2-1-2, which means 1 at B, 1 at center and 3 at A or 2 at B, 1 at center and 2 at A.

The A site we got quite well. We agreed that the best spawn player goes to the “slope” in the end of the “long”. The others stand near A in fixed positions.

The point of failuere here was B. The initial tactic, said in the first day and ever since by Toble, was to *always* run to B and flash the corridor. I think Toble did that once in the match. The B site was k1ller’s property, however. In the demo you can see that k1ller never throws flash there in time. That way the enemy always got through the tunnel where they’d been easier to pin down.

Eventually k1ller left his post completely at B and our tacs worked no longer. Toble was also kind of “freeball” on CT. I’m not sure where he was supposed to be because he ran all the time somewhere else. Andicks000n failed to communicate couple times and that’s why the enemy had more time to prepare for plant.

Zadick and I followed the tacs best, but we were too far and late if the enemy came to B.

Individual ratings

Toble got only one kill as TERRORIST. As CT he was switching his weapons all the time and running wildly. (7 kills in total)

k1ller didn’t follow all tacs on TERRORIST and got only one kill as CT. He also scrapped the tacs on CT and left B site. (6 kills in total)

Andicks000n killed the most. Too bad that he didn’t know the tacs better. (~20 kills)

Zadick didn’t know the tacs either, but she got some kills. Quite often she was too far from the action, though. (13 kills)

Ghost, me, in myopinion, I got too little kills from the situtations I got in. (13 kills)

Overall the whole team should’ve killed more than we did. It’s impossible to win if there are players with 0 - 15 stats.

Match report - Part I - The practical stuff

Game Report

Match info

Team1: Setti
Team2: MyrmidonI
Time: 2008-01-12 18.00 UTC
Place: Setti CSS server (css.setti.info:27015)

Result: Setti - MyrmidonI 3 - 27 (1 - 14, 2 - 13)

Preface

In the beginning

The first step towards the match was taken in September 2007. Team MyrmidonI had had more of their players playing at Setti CSS server. k1ller suggested very unofficial match between Setti and MyrmidonI.

The original idea was to organize match ad hoc style with one extra player on Setti team for handicap. At the time team MyrmidonI had busy schedule and there was only little time for the match. Eventually the match was pushed aside waiting for better times.

In December 2007 k1ller saw r0jer of MyrmidonI in-game and asked whether he was interested in the match. The match was pushed in motion again. After few days of negotiation the match date was set to January 12th at 18.00 UTC.

Preliminary preparation

Unlike the original idea of ad hoc game, the upcoming match was going to be played with official CEVO league lo3 settings and no handicaps.

The game had to be broadcast to large amount of spectators too. Because of the bad past experiences of SourceTV, the whole SourceTV system was going to be different this time. Usually when it’s not expected to have more than 250 spectators watching the game, it’s easy to use the CS:S server’s own SourceTV to broadcast the game. However, in the previous matches where there had been about 20-30 spectators, the spectators caused noticeable lag on the game server and the spectators also reported the SourceTV did not work well.

The new SourceTV setup was more like big professional league style. On the actual game server there was only one SourceTV, which was broadcasting the game for SourceTV relay. The relay proxy was running on its own process set to low priority. In a bigger game the relay proxy, or proxies, could’ve been completely on their own servers. This way the actual game server process was isolated from the load generated by the spectators.

Various lineups for Setti team

Setti team had trouble finding stable lineup for the match. Here are all the lineups from the original September 2007 lineup to the final day.

2007-09-05: k1ller, Ghost, Maksa, Perplexer, essim0n, Saatanantaja
2007-09-09: k1ller, Ghost, HellKid, Fakfeijs, essim0n, Saatanantaja
2007-12-25: k1ller, Ghost, Maksa, Fakfeijs, essim0n
2007-01-05: k1ller, Ghost, Michaellsland, Fakfeijs, essim0n
2007-01-06: k1ller, Ghost, Perplexer, Fakfeijs, Zeicko
2007-01-11: k1ller, Ghost, Toble, Zadick, andicks000n

Toble was already in 2007-01-06 lineup as backup member, so he took part in all the training sessions. Toble also created the tactics for the match.

December 10, 2007

Rescuing a non-booting server

Rescuing pin.setti.info

The server hosting provider has got really advanced rescue system for dedicated servers. It is possible to activate rescuesystem through web control panel and then remotely reboot the server. When the server reboots, it will boot up the rescue system from the network instead of its own hard drive. The system requires “preboot execution environment” (PXE) capable network interface card. The result is that it is possible to get complete root access on the server without loading any files from the hard drive itself. The system however, is running by the processor and in the memory of the server. Cool.

New kernel install almost everytime fails couple times before it starts working. At those times it is nice to have good rescuesystem. The new kernel installed on the server yesterday was no exception in terms of failing. Unfortunately for some reason the rescue system wasn’t automatically activated, but it required some manual rebooting - for which reasons will remain mystery.

After having the rescuesystem at hand, it should be easy task to revert the change in the Linux’s boot loader (LILO) and run the boot loader command again to write the old boot sector back to the disk. This is how it worked on the previous server. You didn’t even notice when the kernel was updated many times. Unfortunately the easy part stopped to the “reverting the change” part. Executing the changes back to the disk was whole another matter.

There is direct access to all devices in Linux in directory /dev. Everybody can try this out by playing weird tunes of their hard drive by command:

cat /dev/sda > /dev/audio

The command will send contents of the hard drive to the sound card, which will make its best interpreting the weird stuff that’s directed to it.

The access to the hard drives becomes critical when you want to write something as exotic as boot sector on the hard drive. The file /dev/sda must be there for LILO to write at.

The way to make use of the rescue system is to make the rescue system compatible with the actual server. Then the rescue system can write the wanted boot sector and kernel to the server’s hard drives and the server can be booted normally. The problem is to make the compatibility to work.

The ages-old way is to boot a rescue system, mount the server’s root partition at /some/directory, change to the directory and write “chroot” to make the server’s root partition the root partition in the rescue system too. Then all commands written on the rescue system are actually the commands installed on the server. That way it is possible to run the server without ever actually booting from it.

Now, the problem, for some reason, is that the files under /dev/ are difficult to create. They are special files called “block or character files”. In most Linux distros there are boot scripts under /etc/init.d/ or /etc/rc.d/ which create the files. Usually the command is “makedev” or something similar. In Debian, the command does not work, because the script needs procfs, sysfs and probably process called “udevd”, which has real difficulties getting started in the “chrooted” environment. The makedev script only succeeded creating about ten useless block files under /dev.

The solution is to create the required special files for hard drives manually.

The special files have two weird properties: “major” and “minor” device number. The numbers define somehow to which device they are connected to. It seems that the major number is directly relative to the numbers at /proc/devices. The minor is something else - but with the hard drives it’s the number of the partition.

So, now it’s all just creating the access files to get through to the hard drives.

mknod /dev/sda b 8 0
mknod /dev/sda2 b 8 2

Then execute “lilo” and voilĂ . The working boot sector and kernel written on the disks! Reboot & pray.

Works.

PS.
Next time it might be good idea to maybe try copying the working kernel files on the server’s hard drive on the rescue system’s file space (it’d be wrong to say rescue system’s hard drive, because the rescue system does not have hard drive but it’s using the servers’ RAM) and then try to execute LILO completely from the rescue system’s filesystem.

December 8, 2007

The first 64 bit bug - SOLVED

The first 64 bit bug - SOLVED

There’s that neat server information box at the upper left on the main page. Its name is si2. How does it work?

There’s a PHP script that queries information from the given address in the same way that the game does when doig the “quick refresh”. The server response contains couple dozen bytes of data including the server name, numbe of players and such. The PHP script also queries detailed information about players, such as kills and onlinetime. The general information about the server can be fitted to one UDP packet, which are limited to 1400 bytes, but detailed player information could take more than 1400 bytes and it’d be splitted to multiple packets. This brings up some issues about the UDP protocol and how packets flow through the mysterious ‘net. The packet header must contain information about whether all the data was fitted in one packet or if there are multiple packets.

The server query protocol is described at http://developer.valvesoftware.com/wiki/Server_Queries#Protocol

The 64 bit bug is in the PHP script’s way of handling the packet header information.

This is the code block in file class_serverspy.php. The original code block is written by Daniel Luft, but it’s been used by Tim te Beek in si2. The code is GPL’d, so I’m not breaking the law by fixing it :)

The original lines have been commented by #, followed by the fixed lines.

   // 4 byte packet header
   $header = unpack("Nint", substr($cache, 0, 4));
   $packet_type = sprintf("%u", $header['int']);

#   if($packet_type == 0xFFFFFFFF)
   if($packet_type == sprintf("%u", -1))
   {
    // single packet
   }
#   elseif($packet_type == 0xFEFFFFFF)
   elseif($packet_type == sprintf("%u", -2))
   {
    // multi packets
   }

You can see that in the code the header information is taken from the full data, stored in $cache, and converted to integer value by unpack(”Nint”, …) where the “N” means “unsigned long (always 32 bit, big endian byte order)” [ http://fi2.php.net/pack ] and the “int” is the index name in the resulting associative array.

The next code line, $packet_type = sprintf(…), unnecessarily converts the integer value to integer value again. The “unnecessary” convert could have something to do when there is no data at all or something.

The following comparisons, if($packet_type == 0xFFFFFFFF), are the bug. On 32 bit system 0xFFFFFFFF is -1. On 64 bit system the value gets coverted to 0xFFFFFFFF00000000, which is something else. Big endian and little endian stuff on top of that and it’s all messed up. Nobody knows what the real value is after that :)

After both of the comparisons have failed because of the same cause, the PHP script thinks it got false data and stops there.

The fix is to compare against expected “real” value, which has been passed through the same sprintf() function to make sure that the values are in the same comparable format.

Migrating from pin to pin

The perfect upgrade, the fail

The server running css.setti.info website and other Setti services was upgraded on 2007-11-30. The upgrade was planned well forward and its execution was nearly perfect - until the midnight.

This is the story of pin.setti to pin.setti upgrade.

[read more…]

September 5, 2007

Computer specs for ~TOP-10

This is list of computers of the TOP-10 players currently.

The list contains CPU and GPU information. The release year of the CPU is in parenthesis after player name (uncertain entries are marked with ?). The point of the list is to emphasize the advatange of high-end CPU in CS:S.

1. DaLiu (2006?)

  • Core2Duo 6600 @ 3400 Mhz (425 x 8 )
  • GeForce 8800 GTX @ 630/1460/2000 Mhz

2. essim0n (2006?)

  • Athlon 64 X2 6000+
  • Geforce 7600GT

3. dred (2006?)

  • Core2Duo E6550 2.33GHz
  • GeForce 8400GS 512 MB

4. k1ller (2006)

  • Core2Duo 6300 @ 2600 Mhz (370 x 7)
  • Geforce 7900GS

5. voov (2004)

  • Pentium 4 511 2,8Ghz@3,2Ghz
  • Extreme AX800XL

6. Ghost (2004)

  • Sempron 3100+ 1.8GHZ@2.4GHz
  • GeForce 7600GS 256MB

7. TupaC (2003?)

  • Athlon 2400+
  • FX 5600 128MB

8. Saatanantaja (2004?)

  • Athlon 64 3400+
  • Radeon X850 XT

9. Perplexer

  • ?
  • ?

10. Infern0 (2005)

  • Pentium 4 670, 3.8 GHz
  • Radeon 9600 PRO

July 29, 2007

Attempts to fix the lag

2007-07-29

The lag is long gone (and this page is released).

Embarassingly the lag had very little to do with all the technical kernel and library stuff. The lag was caused by increased traffic on the various server list and server status pages. The scripts that proved good at ru.setti.info were taking more and more CPU time, causing the mysterious lag on this more powerful CPU at pin.setti.info. Nowdays the dynamic pages are cached to reduce the required CPU time to render the pages.

2007-06-04

Kernels

  • The original kernel of SuSE 10.1 (2.6.16.27-0.9)
  • The same kernel version as the original patched to 1000 HZ (didn’t boot)
  • 1000 HZ kernel version 2.6.21
  • 1000 HZ kernel / hrtimers / real-time pre-emption
  • 1000 HZ kernel / hrtimers / desktop-level pre-emption
  • Tickless kernel / hrtimers / desktop-level pre-emption
  • 300 HZ kernel / hrtimers / desktop-level pre-emption

New system libraries: libgcc_s.so, libstdc++.so, glibc

  • The original system files of SuSE 10.1 (glibc 2.4, GCC 4.1.0)
  • Files from official GCC 4.1.2 package
  • Files from Crowfire.de (libgcc_s.so and libstdc++.so)
  • Files from official GCC 3.3.6 package
  • Files from official GCC 4.2.0 package
  • Glibc 2.5
  • Glibc 2.6
  • Glibc 2.6 compiled with optimizations

Game servers

  • The laggy version build 3048
  • The first beta version build 3070
  • The second beta version build 3090
  • The third beta version build 3128
  • General i686 optimized binaries
  • General i486 optimized binaries

Miscellaneous tweaks

  • Completely fresh install of game server without any modifications
  • 24 player slots
  • 20 player slots
  • 18 player slots
  • Game server prioritized over other programs
How to not to screw up the current system when testing the new libraries

Most of the libraries can be used without installing them system-wide. The srcds_run startup script contains line export LD_LIBRARY_PATH=”.:bin:$LD_LIBRARY_PATH”, which tells the system to prefer libraries in the main install directory “.” and in the game server “bin” directory. Just copying the libgcc_s.so and libstdc++.so files in “bin/” and restarting the server makes them active.

Glibc is so low-level system library that it needs special treatment. The library can be compiled and installed to custom directory by running the configure script with –prefix=/usr/local/glibc-2.6/ argument. Then compiling (make) and installing (make install) will not overwrite the current system libraries, which are needed practically by all processes on the machine. The new glibc can be activated for certain program by initiating the wanted program with /usr/local/glibc-2.6/ld-2.6.so. The ld-2.6.so is helper program, which decides where to look for the system libraries.

The srcds_run startup script uses many small system programs, which require additional system libraries. The script won’t find them, unless it is explicitly told where they are. That’s why the game server must be started directly with the approriate server binary, ie. srcds_amd. The other way around is to define LD_LIBRARY_PATH to contain the original library paths like /lib/, /usr/lib/, /usr/local/lib and such. Then the startup script will work using the original system libraries. The cavepit is that the server might launch itself using the original libraries too. Thus, the most simplified and certain way is to set LD_LIBRARY_PATH to “.:bin” and start the right srcds_* binary manually.So, the command line option to set library path correct and activate new glibc for CS:S server is the following:

export LD_LIBRARY_PATH=”.:bin”
/usr/local/glibc2/lib/ld-2.6.so ./srcds_amd -game cstrike +map de_dust2

The command above might fail to start with the following error:

./srcds_amd: error while loading shared libraries: ./srcds_amd: wrong ELF class: ELFCLASS32

In that case, there is program called linux32, which fakes the 64 bit environment to 32. Then the command line startup parameter is like this (assuming the linux32 program is located at /usr/local/bin/):

/usr/local/glibc2/lib/ld-2.6.so /usr/local/bin/linux32 ./srcds_amd -game cstrike

“linux32″ : http://www.novell.com/products/linuxpackages/enterpriseserver/x86-64/linux32.html