It's been a big mess. The Zmanda GUI is nothing special, but it's cheap and gets you started. If you have a kiddie SAN with only a few TB it will probably work out of the box. But Amanda does not scale up. Once you start spanning tapes and trying to track the state of 10+ million files it requires significant amounts of tuning to work, especially with regard to MySQL and getting it fast enough to support Ultrium IV without shoeshining, and you need to know the Amanda shell commands well.
It's still better than dealing with the incompetent bureaucrats at Symantec Netbackup support, our previous backup software, that's $50K down a black hole. With Amanda the the support is only as good as we users can make it. I haven't seen any other good tips online about how to make Amanda scale up. So as I figure it, I'll post some tips here.
[ view entry ] ( 973 views ) | permalink
The Dell R510 is now promoted to Best Box Ever, and the Sun X4540 is demoted to also-ran. Sun is now subsumed into a huge corporate entity that will not return your phone call unless you are one of the Fortune 500, and Dell finally has a PowerEdge with easy out of band management and an onboard RAID that is actually faster than the disks you can slap on it.
I wish I had saved some MRTG graphs, but the R510 has now replaced a decrepit 2-CPU, SATA based generic shitbox as the sole MTA and MUA for close to 10,000 users. Disk IO is 10X faster (the 2x12-core CPU probably 100X). The old box used to spike to load averages well above 100 whenever the Monday morning newsletter got sent out to all 10,000 recipients, or when some hapless user forwarded their entire inbox to Hotmail. No more. I have yet to see the load average spike above 3. Flat-line. Everyone gets their email a few seconds after it's sent. Best Box Ever.
By the way we upgraded the MTA/MUA software, CommuniGate Pro, at the same time. If you have the bucks, buy it. It's isn't a nightmare to install and configure, like Sendmail or Postfix; support is excellent; it has a web browser interface for users too inept to install Thunderbird; and, unlike Exchange, is standards compliant and doesn't need a $5000 war chest of tools for backup and administration.
[ view entry ] ( 854 views ) | permalink
Still doomed to have all-local storage on my hosts, I desperately needed something new to host email services for 5000 people. It's a worst-case scenario - 6 million tiny files. We try to spread it out over as many filesystems as possible. The old box has an oldish OS, three XFS filesystems, and is at 100% iowait a lot of the time.
I selected a Dell R510 since Sun is basically out of business (all the sales people seem to have been sacked, and Oracle doesn't seem to have realized yet that Sun made computers.) I selected Xeon 5650 processors to take advantage of a 1.3Ghz bus, and got the box fully loaded with 14 disks. The disks have been configured as 7 RAID1 devices.
The proof is in the numbers: Here is some iostat output while testing the (ext3) filesystems by rsyncing one filesystem with 2 million million files to 2 other filesystems:
avg-cpu: %user %nice %sys %iowait %idle
3.42 0.00 2.64 6.63 87.32
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sdb1 153.07 137.53 1915.09 7.70 180466.57 1161.82 90233.28 580.91 94.46 1.76 0.92 0.36 69.68
sdc1 0.00 11401.15 3.05 253.77 23.99 93244.58 11.99 46622.29 363.16 60.19 234.41 1.69 43.51
sdd1 0.00 10175.01 1.10 299.80 8.80 83828.49 4.40 41914.24 278.62 46.66 153.69 1.57 47.26
The formatting is a mess, but basically I'm getting 1800+ read iops and 500+ write iops per second through the R510 H700, and it's still loafing. In addition, in each generation of PowerEdges the out of band (iDRAC) management and server monitoring tools have gotten a little better, until they are finally easy to set up. Not too bad.
[ view entry ] ( 896 views ) | permalink
Is anyone else having this problem with Nagios 3.2.1 (the current version)? It seems to go insane from time to time, and when I look, ownerships are messed up on nagios.cmd and config files, and nagios.cmd is occasionally transmogrified from a named pipe to a plain file (with the wrong ownership as well.) This causes Nagios to basically go insane, plugins don't report back, and my active checks all break, and everyone gets paged for no reason.
I've seen some comments that imply SELinux might somehow be responsible, and only on Red Hat / Centos. I'm running SELinux in "permissive" mode but I might as well get rid of it altogether. I'll report back if anything (doesn't) happen.
[ view entry ] ( 960 views ) | permalink
Until I started this job, I didn't know about Whiptail. No link to project page here - it doesn't seem to exist as an Open Source project anywhere, but it comes with most Linux distros. This app uses curses to pop up dialogs, forms, and lists in a terminal. How come I didn't know about it until recently? I wouldn't have had to learn all that fancy web stuff.
[ view entry ] ( 969 views ) | permalink

Search



