The VPN routing problem
An end node machine connecting in through a VPN has two IP addresses; one IP address that is inside the VPN (call it I), and its normal IP address outside the VPN (call it O). The outside address is sometimes called the wild side address, because it is accessibly by the wild Internet.
A lot of writing about VPNs assumes that I and O are disjoint from each other, effectively on completely isolated networks; I talks to the corporate network behind the VPN and only the corporate network, and O talks only to the Internet and never to the corporate network (and vice versa; machines inside the corporate network never talk directly to O). This assumption makes routing possible and even simple: routes to the corporate network are established that point to the VPN, and the machine's default route remains going out its usual connection.
The problems with this come in if we violate the isolation assumptions; you wind up with asymmetric routing. If the corporate network tries to talk directly to O, the return packets will try to flow back over the VPN, which may or may not work. If the outside world tries to talk to I, the inside address, the return packets will try to flow back out over the end machine's regular Internet connection, which almost certainly won't work.
The core issue is that the machine really has two identities, the inside one and the outside one, and these identities need different routing. The inside identity should route everything over the VPN; the outside identity should route everything over the regular Internet connection.
Normal routing tables have no concept of separate identities; they pick where to send a packet based purely on what its destination address is. So things only completely work out when the routes for the two identities are completely distinct anyways: when the corporate network behind the VPN is in address space that's not reachable from the Internet.
(It can be publicly assigned and even nominally routed; the important thing is that a machine on the corporate network can't make a direct connection to O and a machine on the Internet can't make a direct connection to I.)
In this case, we can use the local IP address that a packet is coming from as a proxy for which identity of the machine sent it, and thus which connection it should go out over. If it comes from I, send it out over the VPN; if it comes from O, send it out over the regular connection.
This sort of routing goes by the general name of 'policy based routing', and is unfortunately a very complicated field with no standardization between systems. For example, Linux can do a certain amount of it purely with routing magic, while at least some other systems put it in the hands of IP filtering systems.
(This problem doesn't come up with a pure IPSec VPN implementation, because the IPSec specification essentially requires you to have a second routing layer that can match on the source as well as the destination IP address of packets.)
Some problems with iSCSI on Solaris 10 (on x86)
The basic operation of iSCSI goes something like this:
- poke things to discover what iSCSI targets you have available, then
- login to each target, creating sessions and discovering what iSCSI devices (eg, disks) they have available, then
- talk to those iSCSI devices over the sessions you've created to actually do things.
There are three sorts of discovery: static configuration, SendTarget, and iSNS. Static configuration is the least convenient, because it requires you to know the target's full iSCSI name as well as its IP address; SendTarget is widely supported and just needs an IP address.
I started with Linux iSCSI, which separates things into those three phases: you use one command to discover things, another one to log in to some or all of what you've discovered (which registers the actual disks with the kernel), and then you talk with the disks themselves. (And then you can log out of a session, removing those disks from the system.)
(And yes, you can set Linux up to automatically discovery stuff and automatically log in to discovered stuff.)
Solaris 10 mashes these together, and also calls things by different names. In Solaris, discovery methods can be either enabled or disabled, which doesn't mean what you think it means; what it is actually doing is enabling or disabling logins to all targets discovered through that discovery method.
(Solaris does the actual SendTarget discovery the moment you
add a target for ST poking; you can see what it found with
list discovery-address -v. This is occasionally useful, for example
to find out a target's iSCSI name.)
The first problem with this is logging out of sessions. On Solaris, you log out of a session by stopping discovery for the session's target. You do this either by disabling that discovery mode (which logs out of everything discovered that way) or by removing entirely the particular discovery target, which is kind of inconvenient if what you actually want to do is log out temporarily while, say, you shuffle all of an iSCSI device's logical drives around.
The next problem is that this means there is no convenient way to boot a Solaris machine and bring up sessions with just one or two iSCSI targets (for disaster recovery or whatever). Since there is no way to do a manual login to one target, you have to remove almost all of your listed targets and then enable that discovery method.
The final and biggest problem is that SendTarget discovery returns all available IP addresses for a given target, which Solaris will then try to login to all of, and Solaris 10 deals very badly with unreachable iSCSI target IP addresses, even if the target is reachable through another path. This effectively renders SendTarget discovery quite dangerous except for devices that only have a single path to them; you need to use static configuration, and I don't know how easily you can do multipath support on top of that.
(If you use SendTarget discovery, your Solaris 10 machine will not come up with the iSCSI disks accessible if you ever lose even one path to any iSCSI target. So much for redundancy. And this assumes that your machine is connected to all of the interfaces your iSCSI device has to start with, which is not necessarily true.)
The unreachable target IPs situation really irritates me, because I really thought better of Sun; it seems an obvious case to handle, yet Solaris 10 blows up relatively spectacularly. (It's not as if iSCSI devices with multiple interfaces are rare high-end gear, either.)
PS: yes, I have the current Solaris 10 on x86 recommended patch set and the latest version of the only iSCSI patch I could find.