File README of Package ib-bonding

Bonding support for operation over IPoIB

	January 14 2007

	Or Gerlitz <ogerlitz@voltaire.com>


This package contains patches to the bonding driver such that it would 
be able to support non ARPHRD_ETHER netdevices for its High-Availability (active-backup) mode.

The motivation is to enable the bonding driver on its HA mode to work 
with the IP over Infiniband (IPoIB) driver. With these patches we were 
able to enslave IPoIB netdevices and run TCP, UDP, IP (UDP) Multicast 
and ICMP traffic with fail-over and fail-back working fine. The working 
env was the net-2.6.20 git and later also RH4 and SLES10 whose IB 
drivers provided are based on OFED 1.1

More over, as IPoIB is also the IB ARP provider for the RDMA CM driver 
which is used by native IB ULPs whose addressing scheme is based on IP 
(eg iSER, SDP, Lustre, NFSoRDMA, RDS), bonding support for IPoIB 
devices **enables** HA for these ULPs. This holds as when the ULP is 
informed by the IB HW on the failure of the current IB connection, it 
just need to reconnect, where the bonding device will now issue the IB ARP over the active IPoIB slave.

Please note that the XXX patch that must be applied on the IPoIB driver 
to have it work fine with this package.

Below, some detailed info is provided on the patches applied by this 
package to the kernel bonding code.

These patches are not enough for configuration of IPoIB bonding through 
tools (eg /sbin/ifenslave and /sbin/ifup) provided by packages such as 
sysconfig and initscripts, specifically since these tools sets the 
bonding device to be UP before enslaving anything.

The next step we plan is look on how to enhance the tools/packages so 
it would be possible to bond/enslave with the modified code. As 
suggested by the bonding maintainer, this step can potentially involve 
converting ifenslave to be a script based on the bonding sysfs 
infrastructure rather on the somehow obsoleted 
Documentation/networking/ifenslave.c

For the ease of potential users, the package contains example bash 
scripts based on the bonding sysfs support which can be used to have 
the modifed bonding driver working with the changes.

detailed info on the patches
============================

The first patch (dev_setup.patch) changes some of the bond netdevice 
attributes and functions to be that of the active slave for the case of 
the enslaved device not being of ARPHRD_ETHER type. Basically it 
overrides those setting done by ether_setup(), which are netdevice 
**type** dependent and hence might be not appropriate for devices of 
other types. It also enforces mutual exclusion on bonding slaves from dissimilar ether types.

IPoIB (see Documentation/infiniband/ipoib.txt) MAC address is made of a 
3 bytes IB QP (Queue Pair) number and 16 bytes IB port GID (Global ID) 
of the port this IPoIB device is bounded to. The QP is a resource 
created by the IB HW and the GID is an identifier burned into the HCA 
(i have omitted here some details which are not important for the bonding RFC).

Basically the IPoIB spec and impl. do not allow for setting the MAC 
address of an IPoIB device and this work was made under this assumption.

Hence, the second patch (set_mac_address.patch) allows for enslaving 
netdevices which do not support the set_mac_address() function. In that 
case the bond mac address is the one of the active slave, where remote 
peers are notified on the mac address (neighbour) change by Gratuitous 
ARP sent by bonding when fail-over occurs (this is already done by the bonding code).

Normally, the bonding driver is UP before any enslavement takes place.
Once a netdevice is UP, the network stack acts to have it join some 
multicast groups (eg the all-hosts 224.0.0.1). Now, since ether_setup() 
have set the bonding device type to be ARPHRD_ETHER and address len to 
be ETHER_ALEN, the net core code computes a wrong multicast link 
address. This is b/c ip_eth_mc_map() is called where for mcast joins 
taking place **after** the enslavement another ip_xxx_mc_map() is 
called (eg ip_ib_mc_map() when the bond type is ARPHRD_INFINIBAND)

The third patch (allow_not_up_enslave.patch) handles this problem by