The perfect upgrade, the fail

The server running css.setti.info website and other Setti services was upgraded on 2007-11-30. The upgrade was planned well forward and its execution was nearly perfect - until the midnight.

This is the story of pin.setti to pin.setti upgrade.


About a month ago our server hosting provider upgraded their hardware plans to new dimensions. All their new servers are equipped with AMD’s dual-core X2 processor. Even the cheapest solution - our choice - is loaded with 2 GB of ram. The processor on the server is X2 5600+.

Obviously, when our current system at the time was AMD 3700+ with 1 GB of ram, we saw it as attractive choice. During the year Setti services had got a lot of new users too, which was showing effect on the server CPU. Nevertheless, it was too cheap hardware upgrade to ignore.

Backups

There are various different services running on the server. The three main services are Setti masterserver, Setti CSS server and the website. Then there are several backframes for the aforementioned services and couple smaller non-critical amusements.

Backup for Setti masterserver takes two perl-scripts and the database. The first perl-script is a server query system, which is responsible for querying the thousand servers and checking whether they’re alive or not. Then there is the masterserver itself, which reads the live servers from the database and passes the list to players. In addition there are log files, which are required for the masterserver usage statistics.

Setti CSS server is somewhat easy to backup. The most important files are the logs, the custom made Mattie Eventscript scripts and the perl-files collection the necessary data from the database for the scripts to use. Other good to have files are the server.cfg and motd.txt.

The most difficult to backup is the Setti website. The files itself are quite easy to backup, even though they hold about 5.0 GB of data. The hard part is to make sure that the web server and the PHP engine have all the needed features to do all the stuff that is coded in the web files - which is not related to backuping directly.

Other backups include the rest of the databases, scripts, logs and configuration files for various programs. Unfortunately some configuration files are spread around the filesystem and they’re difficult to backup in good fashion.

Preparation

Since there is masterserver, which is fixed to certain IP, it is not easy to just set up new server, replicate the old server and do the switch. The new server must have the same IP, so it’s tricky situation. It would’ve been easier if the masterserver.vdf could hold hostname instead of direct IP address. Then css.setti.info could be pointed to the working masterserver at any time.

Luckily the server hosting provider got option to open up the new server to the same IP. It’s a rare feature. The downside is that the old server is gone when the new server is up. There’s no possibility to run both servers in parallel and copy the stuff from the old server to the new server.

Doing the switch fast requires that the backup files are quickly accessible on the new server. It is not possible to transfer nearly 8 GB of data quickly by home ADSL connection, so it was important to have all the backups ready on fast connection somewhere else.

The last backup was set to be taken automatically at 06.00. The backup was transferred to the “transit” server one hour later at 07.00.

The change was supposed to happen at around 10.00. This left enough time for the backups to be transferred on the “transit” server.

The perfect transition

The old server died around 12.00. The new server was active around 13.00.

It was important to get the database functioning as fast as possible, because it’s the backbone of everything.

Around 14.00 the database backup file was malfunctioning really bad. The database backup was taken by blatantly copying the database files instead of more correct way of “dumping” the database in SQL file. It’s not that the copying wouldn’t be correct, but it’s not supposed to be done when the database is running.

The database configuration file was one of the “some” that were lying around the filesystem at such place that it was not backupped. It shouldn’t have even been something to backup, because it’s “just a database”. Evidently the InnoDB engine, one of the storage engines in MySQL database, did have something important in that file. The log file sizes (innodb_log_file_size) and the database file sizes (innodb_data_file_path) were the critical ones.

Meanwhile the database was causing all the trouble, it was somewhat easy to set up the CS:S server running: Download and start. No configs, no worries. No admins, no limits - full server of course.

Eventually it turned out to be that the original database was created by MySQL 5.1 release candidate. The database couldn’t be read either with MySQL 6.0 alpha or MySQL 5.0. Also the 5.1 version had problems in properly opening the database file because it had been copied while the server was running.

At 21.40 the database was functional with MySQL 6.0. The working database was dumped in SQL format from 5.1 and then reloaded into 6.0.

At 21.54 the masterserver and the server scanner system were started.

The website was still just a plain white page with short explanation what was happening. However, this could be considered, given the situation, the nearly perfect transition from one server to another. The time between the old server died and the new server had taken its place was about eight hours.

Failure to succeed

The CS:S server needs to allow all players. One way to allow everybody is to cut out authentication from the server by blocking the authentication servers out of the play. However, the authentication servers can’t be blocked totally out, because then some players get dropped because the server couldn’t verify the players’ status. The solution is to block only some part of the authentication servers. This is the road to failure.

Everybody can try this at home. Start new CS:S server. Block outgoing packets of size 81 to port 27014. Then block outoing packets of size 64 to port 27014. Wait for the frenzy!

The block commands with Linux’s iptables firewall are as such:

iptables -I OUTPUT -p udp –dport 27014 -m length –length 81 -j DROP
iptables -I OUTPUT -p udp –dport 27014 -m length –length 64 -j DROP

It’s important to wait short while between setting the blocks. Even the order might be different. Also it could have effect if you start your server between setting the blocks or if someone joins the server while you’re setting the blocks.

Eventually the authentication server and the CS:S server will start flooding each others at ridiculous rate.

The reason why css.setti.info was cut out of the internet was the 32 000 packets per second flood to the authentication server. It was around 3MB/s of data for few minutes. The css.setti.info server could’ve done more, but apparently the authentication server couldn’t keep up the speed ;).

Weekend resolution

The server could’ve been brought up in the speed the very same day if it hadn’t been Friday night. Nothing happened during the weekend.

On Monday the case was quickly settled with two emails and one fax sent.

Now, on Thursday the server is back to normal. Small tweaks here and there and css.setti.info is better than ever.

The end is good, all is good