Anyway, you will probably want to monitor your LSI MPT raid if you find one, so here's how to do it if your distro does not come with the "mpt-status" command:
- Obtain mpt-status from http://freshmeat.net/projects/mptstatus/
- Obtain the X4100 resource CD from Sun. You may have to pay for this. Hopefully you got one with your box. I have an ISO file called X4100_X4200_ResourceCD_4.
- Install the mpt driver from the RPMs on the CD: mptlinux-4.00.05.00-1-rhel5.x86_64.rpm
- Activate the mptctl driver (your distro should have come with mptbase and mpt sas): "/etc/rc3.d/S99fusion.mptctl start". Set up an rc3.d link to start this driver on boot!
- You should see mptctl, mptsas, mptscsih (maybe), and mptbase in the output of lsmod at this point. If not, keep hunting for drivers.
- Also on the Sun CDROM is mptlinux-4.00.05.00-src.tar.gz. Create the directory and extract this source into /tmp/mptlinux-4.00.05.00-src.
- Extract the mpt-status source into /tmp/mpt-status-1.2.0.
- Edit the Makefile with:
KERNEL_PATH := /usr/src/kernels/2.6.18-164.15.1.el5-x86_64/include
CFLAGS := -Iincl -Wall -W -O2 \
-I${KERNEL_PATH} \
-I/tmp/mptlinux-4.00.05.00-src/message/fusion
- Make and - it works!
# ./mpt-status -i 2
ioc0 vol_id 2 type IM, 2 phy, 67 GB, state OPTIMAL, flags ENABLED
ioc0 phy 1 scsi_id 4 SEAGATE ST973401LSUN72G 0556, 68 GB, state ONLINE, flags NONE
ioc0 phy 0 scsi_id 3 SEAGATE ST973401LSUN72G 0556, 68 GB, state ONLINE, flags NONE
[ view entry ] ( 978 views ) | permalink
Once again I am redoing our failed Symantec-Veritas Netbackup installation. There are a few things I'd rather be doing instead, like anything else, but we're going with ZManda this time, so it should be a less painful job this time.
[ view entry ] ( 849 views ) | permalink
The Juniper project is back on the rails. The DHCP problem seems to be under control, through a combination of reducing DHCP lease times (to hours instead of days), and disabling ICMP blocking in Windows Firewall. It wasn't my decision to block ICMP. Sometimes you can be too paranoid for your own good. For example, ICMP is used to negotiate MTU sizes between disparate networks. I can tell you a story about a major website that blocked all ICMP and wasn't able to communicate with anyone running smaller-than-normal MTUs. Which is a lot of people.
[ view entry ] ( 1527 views ) | permalink
Not much news from the field. We've stopped rolling out Junipers for a while because of massive FAIL in the JunOS DHCP server. Actually, it serves us right, trying to use the switches as DHCP servers. Serves up double-right, for this now-seems-silly idea of assigning one routable subnet to each switch port, a-la service provider. Our end users do have a propensity to hang strings of cheap-ass STP-incapable wall-wart-powered hubs off their drops and then "store" patch cables by plugging both ends into one of the hubs, but modern switches have broadcast controls that will effectively allow only the deserving to have their service hosed in this manner. (When I started working here, it was different. Campus-wide outages from looped ports occurred nearly every other day. But my predecessors had disabled spanning tree everywhere and never enabled broadcast controls for some reason I can't fathom.)
Anyway, back to DHCP. JunOS just could not handle it. It turned out to be a mix of our fault and theirs. First, in some buildings but not all, the PCs have Windows Firewall blocking ICMP. This always encourages DHCP fail since hosts (clients and server) can't ping each other to see if an address is claimed. Second, JunOS was making a horrible mess of the leases database. Third, we made it worse by specifying week-long lease times. Fourth, the JunOS dhcpd would just dump core form time to time.
Well, after setting lease times short, disabling Windows Firewall, and upgrading to the latest JunOS, we about ready to start more rollouts. Cross our fingers.
[ view entry ] ( 862 views ) | permalink
The X4540 was brought to a standstill a few weeks ago by one dead SATA disk. The box didn't hang, but any ZFS IO did. Didn't lose any data, and it might be buggy hardware and drivers, but still, Sun support had no explanation. That should not happen.Eventually, we're going to give Symantec Netbackup the finger and move to Amanda, which will enable us to upgrade to OpenSolaris. I posted on Slashdot about this and got a reply from "greg1104":
"People need to understand that SATA disks and chipsets are fundamentally weak at error reporting and recovery. There's only so much you can do about that at the driver or OS level if a problem drives the chipset crazy. You really need hardware optimized for that purpose, like a mature and battle-tested RAID controller."
I agree 100%. For now, ZFS is worth the risk. The box is a virtual tape library, so 100% uptime is not a requirement. I'm not going to start shorting the stock of midrange storage companies just yet.
[ view entry ] ( 992 views ) | permalink

Search



