Amanda and Me: A Shotgun Wedding 
I am going to have to get really involved with Amanda. We bought some Zmanda licenses for our Sun X4540, which has a smallish Ultrium IV tape library hung off it, and have made some feeble attempts to back up the 20 TB of data on the Sun to tape.

It's been a big mess. The Zmanda GUI is nothing special, but it's cheap and gets you started. If you have a kiddie-sized SAN with only a few TB it will probably work out of the box. But Amanda does not scale up. Once you start spanning tapes and trying to track the state of 10+ million files it requires significant amounts of tuning to work, especially with regard to MySQL and getting it fast enough to support Ultrium IV without shoeshining, and you need to know the Amanda shell commands well.

It's still better than dealing with the incompetent bureaucrats at Symantec Netbackup support, our previous backup software, that's $50K down a black hole. With Amanda the the support is only as good as we users can make it. I haven't seen any other good tips online about how to make Amanda scale up. So as I figure it, I'll post some tips here.

[ view entry ] ( 1271 views )   |  permalink
Down with Sun, Up with Dell 
The Dell R510 is now promoted to Best Box Ever, and the Sun X4540 is demoted to also-ran. Sun is now subsumed into a huge corporate entity that will not return your phone call unless you are one of the Fortune 500, and Dell finally has a PowerEdge with easy out of band management and an onboard RAID that is actually faster than the disks you can slap on it.

I wish I had saved some MRTG graphs, but the R510 has now replaced a decrepit 2-CPU, SATA based generic shitbox as the sole MTA and MUA for close to 10,000 users. Disk IO is 10X faster (the 2x12-core CPU probably 100X). The old box used to spike to load averages well above 100 whenever the Monday morning newsletter got sent out to all 10,000 recipients, or when some hapless user forwarded their entire inbox to Hotmail.

No more. I have yet to see the load average spike above 3. Flat-line. Everyone gets their email a few seconds after it's sent. Best Box Ever.

By the way we upgraded the MTA/MUA software, CommuniGate Pro, at the same time. If you have the bucks, buy it. It's isn't a nightmare to install and configure, like Sendmail or Postfix; support is excellent; it has a web browser interface for users too inept to install Thunderbird; and, unlike Exchange, is standards compliant and doesn't need a $5000 war chest of tools for backup and administration.

[ view entry ] ( 1184 views )   |  permalink
Second Best Box Ever: Dell R510 
Still doomed to have all-local storage on my hosts, I desperately needed something new to host email services for 5000 people. It's a worst-case scenario - 6 million tiny files. We try to spread it out over as many filesystems as possible. The old box has an oldish OS, three XFS filesystems, and is at 100% iowait a lot of the time.

I selected a Dell R510 since Sun is basically out of business (all the sales people seem to have been sacked, and Oracle doesn't seem to have realized yet that Sun made computers.) I selected Xeon 5650 processors to take advantage of a 1.3Ghz bus, and got the box fully loaded with 14 disks. The disks have been configured as 7 RAID1 devices.

The proof is in the numbers: Here is some iostat output while testing the (ext3) filesystems by rsyncing one filesystem with 2 million tiny files to 2 other filesystems:
avg-cpu:  %user   %nice    %sys %iowait   %idle
           3.42    0.00    2.64    6.63   87.32

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb1       153.07 137.53 1915.09  7.70 180466.57 1161.82 90233.28   580.91    94.46     1.76    0.92   0.36  69.68
sdc1         0.00 11401.15  3.05 253.77   23.99 93244.58    11.99 46622.29   363.16    60.19  234.41   1.69  43.51
sdd1         0.00 10175.01  1.10 299.80    8.80 83828.49     4.40 41914.24   278.62    46.66  153.69   1.57  47.26


The formatting is a mess, but basically I'm getting 1800+ read iops and 500+ write iops per second through the R510 H700, and it's still loafing. In addition, in each generation of PowerEdges the out of band (iDRAC) management and server monitoring tools have gotten a little better, until they are finally easy to set up. Not too bad.

[ view entry ] ( 1195 views )   |  permalink
Annoying Bug of the Week: Nagios is going insane! 
Is anyone else having this problem with Nagios 3.2.1 (the current version)? It seems to go insane from time to time, and when I look, ownerships are messed up on nagios.cmd and config files, and nagios.cmd is occasionally transmogrified from a named pipe to a plain file (with the wrong ownership as well.)

This causes Nagios to basically go insane, plugins don't report back, and my active checks all break, and everyone gets paged for no reason.

I've seen some comments that imply SELinux might somehow be responsible, and only on Red Hat / Centos. I'm running SELinux in "permissive" mode but I might as well get rid of it altogether. I'll report back if anything (doesn't) happen.

[ view entry ] ( 1254 views )   |  permalink
Who needs all that fancy web stuff? Harness the Mighty Power of Whiptail! 
Until I started this job, I didn't know about Whiptail. No link to project page here - it doesn't seem to exist as an Open Source project anywhere, but it comes with most Linux distros.

This app uses curses to pop up dialogs, forms, and lists in a terminal. How come I didn't know about it until recently? I wouldn't have had to learn all that fancy web stuff.

[ view entry ] ( 1286 views )   |  permalink
Compiling mpt-status for CentOS on a Sun x4100 
I needed a new server, and management found me an X4100 at a garage sale. Not a bad server, but it's been EOLed by Sun, and Sun either never shipped mpt tools with the box, dropped support, or they got tossed in the dumpster when Oracle moved in.

Anyway, you will probably want to monitor your LSI MPT raid if you find one, so here's how to do it if your distro does not come with the "mpt-status" command:

- Obtain mpt-status from http://freshmeat.net/projects/mptstatus/

- Obtain the X4100 resource CD from Sun. You may have to pay for this. Hopefully you got one with your box. I have an ISO file called X4100_X4200_ResourceCD_4.

- Install the mpt driver from the RPMs on the CD: mptlinux-4.00.05.00-1-rhel5.x86_64.rpm

- Activate the mptctl driver (your distro should have come with mptbase and mpt sas): "/etc/rc3.d/S99fusion.mptctl start". Set up an rc3.d link to start this driver on boot!

- You should see mptctl, mptsas, mptscsih (maybe), and mptbase in the output of lsmod at this point. If not, keep hunting for drivers.

- Also on the Sun CDROM is mptlinux-4.00.05.00-src.tar.gz. Create the directory and extract this source into /tmp/mptlinux-4.00.05.00-src.

- Extract the mpt-status source into /tmp/mpt-status-1.2.0.

- Edit the Makefile with:

KERNEL_PATH := /usr/src/kernels/2.6.18-164.15.1.el5-x86_64/include
CFLAGS := -Iincl -Wall -W -O2 \
-I${KERNEL_PATH} \
-I/tmp/mptlinux-4.00.05.00-src/message/fusion


- Make and - it works!

# ./mpt-status -i 2
ioc0 vol_id 2 type IM, 2 phy, 67 GB, state OPTIMAL, flags ENABLED
ioc0 phy 1 scsi_id 4 SEAGATE ST973401LSUN72G 0556, 68 GB, state ONLINE, flags NONE
ioc0 phy 0 scsi_id 3 SEAGATE ST973401LSUN72G 0556, 68 GB, state ONLINE, flags NONE




[ view entry ] ( 1273 views )   |  permalink

<<First <Back | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | Next> Last>>