Amanda Remodel: Down the Bare Studs, and Goodbye Zmanda For Now 
We're just going to have to rip out Zmanda and start over. When we installed Zmanda 3.0.2 last year, it was buggy, and used Amanda 3.0.0 as the backend, which was even more not ready for prime anything. It was pretty much hopeless, especially on Solaris 10 (which we must have because of ZFS.) Even "amtapetype" didn't work. Zmanda thus set our splitsize to 100MB, so the backups span tens of thousands of files on each tape. Because of a tape positioning bug that Amanda and Solaris point at each other having to do with the "fsf after filemark" property, and that Amanda seems to need to assemble nearly an entire media's worth of data in /tmp, our backups are all useless. Well, we never really made it out of the testing phase and our primary backup is to save a few month's backups in ZFS snapshots.

I'm just going to have to rip it out and start all over. Because Zmanda uses a massively tweaked version of Amanda, I think I am just going to blow off Zmanda entirely. Sorry, guys, it was a nice idea, to commericialize Amanda, but Zmanda required just as much tweaking as Amanda, and required fairly comprehensive knowledge of Amanda internals to get working, so, now that I know Amanda better, why should I bother? Zmanda does have Windows, MySQL, etc clients, but we don't need 'em. It will back up to a cloud, but cloud backup isn't cheap. (Or I should say it has "recurring costs", which tend to not recur around here.)

[ view entry ] ( 1057 views )   |  permalink
"service unsupported-transceiver" 
Yay to Cisco for supporting non-Cisco optics in SOME versions of IOS on SOME of their hardware. You have to use some undocumented commands to override the default, which doesn't allow 3rd party optics, and you have to view a disclaimer when you enter the commands, but that's entirely reasonable.

This after a co-worker wasted the better part of a day trying to get a 3rd party SFP to work in an HP Procurve. He should have tested first, but the HP equipment was chosen for us by a vendor installing a turnkey package. (One more reason Carly Fiorina must be defeated, and Mark Hurd should have a "T" swapped with the "H" in his last name.)

FWIW, Junipers don't do any of this stupidity. Of course, they will deny support to 3rd party optics, but the first thing support will ask you for is the hardware config, and the 3rd party optics show up clearly:

Juniper -> Xcvr 3 REV 01 740-011613 AM0821S9YEJ SFP-SX
Junk? -> Xcvr 0 NON-JNPR P821YQL SFP-SX


A lot of 3rd party optics are junk, especially the ones for sale on EBay for 1/10 the price of a legit one. But (just as an example) we've never had a lick of trouble with Fluxlights. They support DoM, cost a tad less than Juniper modules, and are half the price of a Cisco-branded module. Generally, you get what you pay for.

[ view entry ] ( 1151 views )   |  permalink
Amanda and Me: A Shotgun Wedding 
I am going to have to get really involved with Amanda. We bought some Zmanda licenses for our Sun X4540, which has a smallish Ultrium IV tape library hung off it, and have made some feeble attempts to back up the 20 TB of data on the Sun to tape.

It's been a big mess. The Zmanda GUI is nothing special, but it's cheap and gets you started. If you have a kiddie SAN with only a few TB it will probably work out of the box. But Amanda does not scale up. Once you start spanning tapes and trying to track the state of 10+ million files it requires significant amounts of tuning to work, especially with regard to MySQL and getting it fast enough to support Ultrium IV without shoeshining, and you need to know the Amanda shell commands well.

It's still better than dealing with the incompetent bureaucrats at Symantec Netbackup support, our previous backup software, that's $50K down a black hole. With Amanda the the support is only as good as we users can make it. I haven't seen any other good tips online about how to make Amanda scale up. So as I figure it, I'll post some tips here.

[ view entry ] ( 966 views )   |  permalink
Down with Sun, Up with Dell 
The Dell R510 is now promoted to Best Box Ever, and the Sun X4540 is demoted to also-ran. Sun is now subsumed into a huge corporate entity that will not return your phone call unless you are one of the Fortune 500, and Dell finally has a PowerEdge with easy out of band management and an onboard RAID that is actually faster than the disks you can slap on it.

I wish I had saved some MRTG graphs, but the R510 has now replaced a decrepit 2-CPU, SATA based generic shitbox as the sole MTA and MUA for close to 10,000 users. Disk IO is 10X faster (the 2x12-core CPU probably 100X). The old box used to spike to load averages well above 100 whenever the Monday morning newsletter got sent out to all 10,000 recipients, or when some hapless user forwarded their entire inbox to Hotmail.

No more. I have yet to see the load average spike above 3. Flat-line. Everyone gets their email a few seconds after it's sent. Best Box Ever.

By the way we upgraded the MTA/MUA software, CommuniGate Pro, at the same time. If you have the bucks, buy it. It's isn't a nightmare to install and configure, like Sendmail or Postfix; support is excellent; it has a web browser interface for users too inept to install Thunderbird; and, unlike Exchange, is standards compliant and doesn't need a $5000 war chest of tools for backup and administration.

[ view entry ] ( 852 views )   |  permalink
Second Best Box Ever: Dell R510 
Still doomed to have all-local storage on my hosts, I desperately needed something new to host email services for 5000 people. It's a worst-case scenario - 6 million tiny files. We try to spread it out over as many filesystems as possible. The old box has an oldish OS, three XFS filesystems, and is at 100% iowait a lot of the time.

I selected a Dell R510 since Sun is basically out of business (all the sales people seem to have been sacked, and Oracle doesn't seem to have realized yet that Sun made computers.) I selected Xeon 5650 processors to take advantage of a 1.3Ghz bus, and got the box fully loaded with 14 disks. The disks have been configured as 7 RAID1 devices.

The proof is in the numbers: Here is some iostat output while testing the (ext3) filesystems by rsyncing one filesystem with 2 million million files to 2 other filesystems:
avg-cpu:  %user   %nice    %sys %iowait   %idle
           3.42    0.00    2.64    6.63   87.32

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb1       153.07 137.53 1915.09  7.70 180466.57 1161.82 90233.28   580.91    94.46     1.76    0.92   0.36  69.68
sdc1         0.00 11401.15  3.05 253.77   23.99 93244.58    11.99 46622.29   363.16    60.19  234.41   1.69  43.51
sdd1         0.00 10175.01  1.10 299.80    8.80 83828.49     4.40 41914.24   278.62    46.66  153.69   1.57  47.26


The formatting is a mess, but basically I'm getting 1800+ read iops and 500+ write iops per second through the R510 H700, and it's still loafing. In addition, in each generation of PowerEdges the out of band (iDRAC) management and server monitoring tools have gotten a little better, until they are finally easy to set up. Not too bad.

[ view entry ] ( 890 views )   |  permalink

<<First <Back | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Next> Last>>