Ghost’s blog » 2007 » December

December 10, 2007

Rescuing a non-booting server

Rescuing pin.setti.info

The server hosting provider has got really advanced rescue system for dedicated servers. It is possible to activate rescuesystem through web control panel and then remotely reboot the server. When the server reboots, it will boot up the rescue system from the network instead of its own hard drive. The system requires “preboot execution environment” (PXE) capable network interface card. The result is that it is possible to get complete root access on the server without loading any files from the hard drive itself. The system however, is running by the processor and in the memory of the server. Cool.

New kernel install almost everytime fails couple times before it starts working. At those times it is nice to have good rescuesystem. The new kernel installed on the server yesterday was no exception in terms of failing. Unfortunately for some reason the rescue system wasn’t automatically activated, but it required some manual rebooting - for which reasons will remain mystery.

After having the rescuesystem at hand, it should be easy task to revert the change in the Linux’s boot loader (LILO) and run the boot loader command again to write the old boot sector back to the disk. This is how it worked on the previous server. You didn’t even notice when the kernel was updated many times. Unfortunately the easy part stopped to the “reverting the change” part. Executing the changes back to the disk was whole another matter.

There is direct access to all devices in Linux in directory /dev. Everybody can try this out by playing weird tunes of their hard drive by command:

cat /dev/sda > /dev/audio

The command will send contents of the hard drive to the sound card, which will make its best interpreting the weird stuff that’s directed to it.

The access to the hard drives becomes critical when you want to write something as exotic as boot sector on the hard drive. The file /dev/sda must be there for LILO to write at.

The way to make use of the rescue system is to make the rescue system compatible with the actual server. Then the rescue system can write the wanted boot sector and kernel to the server’s hard drives and the server can be booted normally. The problem is to make the compatibility to work.

The ages-old way is to boot a rescue system, mount the server’s root partition at /some/directory, change to the directory and write “chroot” to make the server’s root partition the root partition in the rescue system too. Then all commands written on the rescue system are actually the commands installed on the server. That way it is possible to run the server without ever actually booting from it.

Now, the problem, for some reason, is that the files under /dev/ are difficult to create. They are special files called “block or character files”. In most Linux distros there are boot scripts under /etc/init.d/ or /etc/rc.d/ which create the files. Usually the command is “makedev” or something similar. In Debian, the command does not work, because the script needs procfs, sysfs and probably process called “udevd”, which has real difficulties getting started in the “chrooted” environment. The makedev script only succeeded creating about ten useless block files under /dev.

The solution is to create the required special files for hard drives manually.

The special files have two weird properties: “major” and “minor” device number. The numbers define somehow to which device they are connected to. It seems that the major number is directly relative to the numbers at /proc/devices. The minor is something else - but with the hard drives it’s the number of the partition.

So, now it’s all just creating the access files to get through to the hard drives.

mknod /dev/sda b 8 0
mknod /dev/sda2 b 8 2

Then execute “lilo” and voilĂ . The working boot sector and kernel written on the disks! Reboot & pray.

Works.

PS.
Next time it might be good idea to maybe try copying the working kernel files on the server’s hard drive on the rescue system’s file space (it’d be wrong to say rescue system’s hard drive, because the rescue system does not have hard drive but it’s using the servers’ RAM) and then try to execute LILO completely from the rescue system’s filesystem.

December 8, 2007

The first 64 bit bug - SOLVED

The first 64 bit bug - SOLVED

There’s that neat server information box at the upper left on the main page. Its name is si2. How does it work?

There’s a PHP script that queries information from the given address in the same way that the game does when doig the “quick refresh”. The server response contains couple dozen bytes of data including the server name, numbe of players and such. The PHP script also queries detailed information about players, such as kills and onlinetime. The general information about the server can be fitted to one UDP packet, which are limited to 1400 bytes, but detailed player information could take more than 1400 bytes and it’d be splitted to multiple packets. This brings up some issues about the UDP protocol and how packets flow through the mysterious ‘net. The packet header must contain information about whether all the data was fitted in one packet or if there are multiple packets.

The server query protocol is described at http://developer.valvesoftware.com/wiki/Server_Queries#Protocol

The 64 bit bug is in the PHP script’s way of handling the packet header information.

This is the code block in file class_serverspy.php. The original code block is written by Daniel Luft, but it’s been used by Tim te Beek in si2. The code is GPL’d, so I’m not breaking the law by fixing it :)

The original lines have been commented by #, followed by the fixed lines.

   // 4 byte packet header
   $header = unpack("Nint", substr($cache, 0, 4));
   $packet_type = sprintf("%u", $header['int']);

#   if($packet_type == 0xFFFFFFFF)
   if($packet_type == sprintf("%u", -1))
   {
    // single packet
   }
#   elseif($packet_type == 0xFEFFFFFF)
   elseif($packet_type == sprintf("%u", -2))
   {
    // multi packets
   }

You can see that in the code the header information is taken from the full data, stored in $cache, and converted to integer value by unpack(”Nint”, …) where the “N” means “unsigned long (always 32 bit, big endian byte order)” [ http://fi2.php.net/pack ] and the “int” is the index name in the resulting associative array.

The next code line, $packet_type = sprintf(…), unnecessarily converts the integer value to integer value again. The “unnecessary” convert could have something to do when there is no data at all or something.

The following comparisons, if($packet_type == 0xFFFFFFFF), are the bug. On 32 bit system 0xFFFFFFFF is -1. On 64 bit system the value gets coverted to 0xFFFFFFFF00000000, which is something else. Big endian and little endian stuff on top of that and it’s all messed up. Nobody knows what the real value is after that :)

After both of the comparisons have failed because of the same cause, the PHP script thinks it got false data and stops there.

The fix is to compare against expected “real” value, which has been passed through the same sprintf() function to make sure that the values are in the same comparable format.

Migrating from pin to pin

The perfect upgrade, the fail

The server running css.setti.info website and other Setti services was upgraded on 2007-11-30. The upgrade was planned well forward and its execution was nearly perfect - until the midnight.

This is the story of pin.setti to pin.setti upgrade.

[read more…]