Wandering Thoughts archives

2007-10-31

Jumbo frames on gigabit Ethernet on Solaris 10 x86

We've recently been looking into using jumbo frames on gigabit Ethernet on Solaris 10U4 x86 (aka Solaris 10 8/07); it turns out that this is more work than you might expect, and information on it is somewhat scattered.

First, as of S10U4 only a few gigabit network drivers are documented as supporting jumbo frames:

driver maximum MTU chipset
bge 9000 Broadcom, but only on some chipsets; the manual says BCM5700, 5701, 5702, 5703C, 5703S, 5704C, 5704S, 5714C, 5714S, 5715C and 5715S.
e1000g 16218 Intel PRO/1000
rge 7000 Realtek (RTL8169S/8110S)
sk98sol 9000 SysKonnect SK-98xx

(The xge 10 gigabyte Ethernet driver also supports them with a 9600 byte MTU, but 10G Ethernet is far too rich for our blood right now. The nxge driver for various Sun cards apparently supports jumbo frames, but this is not documented in its manpage.)

Unlike what you might expect, the drivers do not automatically allow jumbo frames. Instead you have to specifically enable them before you can raise the MTU with ifconfig, and for more fun each driver does this differently and hides their configuration files in different places. For the ones I have personal experience with:

  • bge is configured in /platform/i86pc/kernel/drv/bge.conf
  • e100g is configured in /kernel/drv/e100g.conf

(The necessary configuration parameter is documented in each driver's manpage.)

I don't know about jumbo frame support on S10U3, because I no longer have a S10U3 system handy to check its manpages.

The quite nice SunFire X2100 M2 theoretically has two jumbo-capable bge interfaces (along with two not-jumbo-capable nge ones), but one of them is the ELOM interface and you probably want that to be on a special management network, so effectively you only get one jumbo-capable one.

JumboFrameGigabit written at 16:55:39; Add Comment

2007-10-21

How I got a corrupted metadb replica that paniced Solaris 10 x86

Since I got asked this in a comment on my entry about clearing metadb replicas, here is what I remember of how I managed to get a metadb replica so corrupted that it paniced Solaris 10u3 x86.

  • I wanted to experiment with metasets on my test machine, so I needed a local metadb replica. Because I didn't know about this I didn't have a spare partition, and because I didn't know any better I put the local metadb replica in that tempting slice 8.

    (Since I was only really interested in metasets, I didn't do any local DiskSuite stuff, although I did make and delete metasets and so on.)

  • sometime later I rebooted the system and it didn't even make it as far as starting GRUB; I believe it gave some initial GRUB message and then hung.

    (I had been crashing the system repeatedly due to some interesting tests so I did not think too much of this at the time.)

  • I booted the machine with a Fedora Core 7 live CD and poked around, verifying that the filesystems were still there.
  • after a while I found the installgrub command, booted the Solaris install CD rescue environment, and ran it to get the machine back to a bootable state. (I believe I may have also rebuilt the boot archive at this point on general principle, since I was getting used to it breaking if I sneezed on the system.)

  • the test Solaris install would then boot but panic, which led me to finding out how you boot Solaris 10 x86 in really single user mode.
  • turning off the metainit service let the system boot, but the moment I typed metadb or metainit it would panic.

  • because I was in a hurry and needed the system for other tests, I ignorantly tried to recover the system by erasing the metadb replica by dd'ing zeroes all over slice 8. This destroyed the system completely, since it wiped out the slice partitioning.

    (If I had been really clever I would have saved a dd image of slice 8 before doing this, but I was very irritated with Solaris 10u3 x86 at this point.)

On the whole it was a very educational experience and led me to look into a number of useful things so I would be better prepared for a future emergency on any production machines we wind up with.

I have one captured panic message from the system and the system disk (which has more in syslog, and it would be possible to extract them if I could reconstruct the necessary slice partitioning). I have since tried a bit to reproduce this in a VMWare Solaris image but haven't been successful, so it is not a simple and easy to reproduce issue.

(The Solaris 10u3 install I was using was current on all recommended patches and on all released patches that applied to a number of areas of interest to us, including ZFS, iSCSI, and DiskSuite.)

CorruptingMetadb written at 21:19:06; Add Comment

2007-10-19

Some notes on booting single user in x86 Solaris 10

Here's some somewhat sketchy notes, mostly for my own future reference, on various bits of booting Solaris 10 x86 into single-user mode.

On x86 Solaris, kernel boot parameters go on the end of the 'kernel /platform/i86pc/multiboot' line in the GRUB boot entry. You put them in by interrupting the GRUB menu to edit the default boot entry (and then that line). The two most useful parameters are -s for single-user boots and -v to dump kernel messages to the screen as the system boots instead of just to syslog.

However, this single user isn't really completely single-user, because there are still a bunch of svcs things that get started and run. It is possible for some of these to crash, leaving you with a machine that will not come up even when booted with -s.

(You cannot fix this by booting the rescue environment, because you can't use it to turn services off in your regular system; the svcs framework has to be running before you can do things like svcadm disable errant startup bits.)

In this situation, boot with '-m milestone=none'; this starts the svcs framework but runs almost nothing. In particular, the root filesystem is not mounted read-write; you can either run /lib/svc/method/fs-root or do it by hand with

fsck /
mount -o remount,rw /

(you will also need to mount /usr and so on if applicable.)

At this point you can use svcs and svcadm to modify what will be run, for example to force off svc:/system/metainit. An important safety tip: under at least some situations, trying to svcadm disable an already disabled service will cause svcadm to just give you its general usage message instead of explaining what's actually wrong.

(It is possible to spend some time retyping command lines very carefully, trying to figure out just what stray bit snuck in, before the penny drops. Fortunately I was experimenting, not in the middle of a crisis.)

Useful information is in /lib/svc/share/README aka here, and the various scripts that are run to start things are in /lib/svc/method/.

(Probably all the hip svcs people already know that, but I didn't.)

SingleUserSolaris written at 23:13:00; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.