------------------------------
  SCSIRAS Kernel Patches
        Overview 
------------------------------

Hard disks are the most common system element to be replaced, and are 
therefore a critical consideration in improving availability. Disk Mirroring 
(RAID-1) is the technique of using redundant disks to record multiple copies 
of the data so that a failure of one disk does not cause data loss.  These 
changes enhance the reliability, availability and serviceability of the 
drivers that are commonly used in a Linux software RAID-1 configuration.   
A separate effort has been made to enable various common hardware RAID 
adapters and their drivers on Linux.  See the "More Information" section 
for details.

    Information about the SCSIRAS kernel patches

This is a set of kernel patches that is included in the carrier-grade Linux
kernel, but the scsirastools package is not necessarily dependent upon it.  
SCSIRAS consists of a set of kernel changes, grouped into several separate 
patch files, described below.

This SCSIRAS patches are intended to add targeted features and improvements 
that provide additional Reliability, Availability, and Serviceability (RAS)
capabilities to the aic7xxx driver, scsi mid-layer, and md (software RAID)
driver on top of the Linux 2.4.x kernel.
Some of these patches are already available in current Linux kernels.
SCSIRAS enhances the logging and error reporting capabilities of a software 
RAID configuration under Linux 2.4.x.

List of the SCSIRAS changes included:
  * aic: Upgrade of the aic7xxx driver from version 6.1.7 to version 6.2.4
    See http://people.freebsd.org/~gibbs/linux/ for associated change history.
  * aic: Removed and handled 4 panic sites in aic7xxx 
    Two of these were handled in the aic7xxx v6.2.4 upgrade, 
    The "Unexpected async event" change was rejected after some discussion.
    The "Unexpected Command type" change logs it and continues, since there 
    is no downstream impact requiring a panic.
  * aic: Improved recovery from scsi parity errors 
    Comparable logic was added by Justin in v6.2.4
  * aic/scsi/md: Added calls to Enterprise Event Logging (via CONFIG_EVLOG).
    Use macros to substitute for printk calls.
    Make sure multi-part printks are assembled first.
    Add severity and event ids also.
  * scsi: handle hot-inserting new disks (via scsi_rescan)
  * scsi: Improved logging of check conditions and bus_resets in scsi.
    Added more decoding of Illegal commands, log detected bus resets,
  * scsi: Changed some error messages in scsi to be less informal
    "Ththththaats all folks..." is too informal and vague.
    "I believe this is dead code ..." is too informal, not descriptive enough.
  * scsi: Added display of the device serial number to scsi messages.
    Serial Number is needed to uniquely define each device. Show during boot.
  * scsi: Added a test_unit_ready retry after resets in scsi error handling.
    Resets always cause a 6/29/NN sense error, so clear it now rather than 
    waiting for the next retry.
  * scsi: Added serial number & tallies to /proc/scsi/scsi.
    Tallies can be used track disk-related problems. 
    timeouts 0 resets 0 par_errs 0 disk_errs 0 trans_errs 0 user_errs 0
    The last 3 above are groups of SCSI sense errors.
  * md: Added additional debug messages
    obsoleted by kernel 2.4.18
  * md: improved raid1 error handling during resync/reconstruct
    The most significant one is that if you had a mirrored set in which
    all the devices failed, then write requests would never return,
    whereas they should return with an error.
    fixed now in kernel 2.4.18
  * md: fixed null pointer dereference oops in lvm code, where it 
    referenced an invalid LV if /boot is not on the root fs.
    change removed here, fixed in lvm project updates.
  * md: more granular locking during md resync
    fixed in kernel 2.4.18
  * md: fixed some unchecked pointer sites in md.c
    most fixed in kernel 2.4.18, 
    one still patched at the end of md_setup_drive()
  * md: improvements to resync code speed window
    fixed in kernel 2.4.18
  * md: set md_notifier priority to 1 to avoid race condition with other 
    notifiers it depends on in the stack with priority 0.
    fixed in kernel 2.4.18
  * md: improved md error handling to not mark last disk faulty
    fixed in kernel 2.4.18
  * md: Clarified resync message text
    fixed in kernel 2.4.18
  * md: dont show a bug if we hotadd a disk in a new slot
    fixed in kernel 2.4.18
  * md: if a faulty disk was removed, don't check it any more
    fixed in kernel 2.4.18
  * md: validate superblock values before writing to avoid inconsistencies
    Otherwise some values go negative, and cause problems on reboot.
  * aic: handle overrun errors for SAF-TE hot-insertions, add tallies.
    Often SAF-TE insertions cause overrun errors that don't clear until
    a bus reset occurs, so don't wait for the timeouts, do it as 
    soon as the overrun occurs.
    ahc tallies added for parity errors and overruns.
  * aic79: handle overrun errors for SAF-TE hot-insertions. 
    Applies to aic79xx v1.3.0 and previous.
  * md: fix oops in mdrecoveryd writing superblock, from Neil Brown.
    Changes superblock IO to not go through buffer cache, and changes
    handling for rdev->sb so that it is not freed when device fails, 
    but freed when device is removed from array.
  * md: fix bug causing resync not to complete sometimes.

Note that with the 2.4.18 kernel, some of the changes above were already
included with the kernel source tree.
Files in kern/* containing the scsiras patches:
 aic_misc.patch      - misc aic7xxx cleanup, see 'aic:' list above
 aic_overrun.patch   - fix hang on overrun after hot-insert for aic7xxx
 aic_evl.patch       - add posix event log handling for each printk
 raid_validate.patch - add validate_sb for superblock inconsistencies, etc.
 raid_null.patch     - handle some null pointer cases, and ITERATE_RDEV escape
 raid_evlog.patch    - add posix event log handling for each printk
 scsi_rescan.patch   - add scsi_rescan code to automatically recognize
		       hot-inserted disks
 scsi_ras.patch      - revised sense decoding, added scsi tallies, 
                       clear some check conditions via TUR without waiting,
                       show disk serial number, cleanup some messages.
 scsi_reset.patch    - add the capability to do resets from sg
		       (needed in 2.4.18, already there in 2.4.19)
 scsi_evlog.patch    - add posix event log handling for each printk
 aic79_overrun.patch - fix hang on overrun after hot-insert for aic79xx
 raid_oops.patch     - fixes oops in mdrecoveryd writing superblock,
                       from Neil Brown.
 raid_resync.patch   - fixes bug causing resync not to complete sometimes.
		       (needed in 2.4.18, already included in 2.4.19)

Minimum subset of above required for RedHat Advanced Server 2.1:
 raid_oops-as21.patch - a port of raid_oops.patch above
 raid_iter.patch      - subset of raid_null.patch above

---------------------------------
  Problems/Restrictions
---------------------------------
If you are using the aic79xx driver, it should be version 1.3.2 or greater.
See http://people.freebsd.org/~gibbs/linux/ for the current aic79xx version.

Known issues:

- Some large configuration kernels may not be able to fit on a boot diskette.
If "mkbootdisk --device /dev/fd0 2.4.18-CGLE-SMP" results in "No space left on 
device", then you can trim the kernel configuration to rebuild it with fewer 
modules, or reboot to a previous smaller kernel to make the emergency boot 
diskette that supports root mirroring. 

---------------------------------
  Configuring Software RAID1 
---------------------------------
You can either set up the RAID devices for a root mirror during the 
Linux OS installation, or there is also a way to create a root 
mirror from an already installed Linux system.
See the RAID CONFIGURATION section below for an example of how a system 
can be configured for root mirroring.

Note that the kernel must be built with CONFIG_BLK_DEV_MD=y (instead of =m)
in order to use the md driver for root mirroring.

----------------------
  More Information
----------------------
 
scsirastools project             http://sourceforge.net/projects/scsirastools/
Carrier-Grade Linux Enhancements http://developer.osdl.org/
SCSI Draft Standards            http://www.t10.org/drafts.htm
Linux RAID                      http://linas.org/linux/raid.html
Linux LVM                       http://www.sistina.com/products_lvm.htm
Justin Gibbs Adaptec driver     http://people.freebsd.org/~gibbs/linux/
Linux SCSI Generic Driver       http://gear.torque.net/sg/
Linux mdadm utility             http://www.cse.unsw.edu.au/~neilb/source/mdctl/
Linux SCSI subsystem            http://www.kernel.org
                                http://mirrors.kernel.org/LDP/     
Linux SG utility "SCU"          http://www.zk3.dec.com/~rmiller/scu.html
         (another useful tool for SCSI testing & debug)
Intel iSCSI project             http://www.sourceforge.net/projects/intel-iscsi
Intel RAID adapter drivers
         http://support.intel.com/support/motherboards/server/srcu31/index.htm
         http://support.intel.com/support/motherboards/server/srcu31l/index.htm
         http://support.intel.com/support/motherboards/server/srcu21/index.htm

----------------------------------
   SOFTWARE RAID CONFIGURATION
----------------------------------

Setting up a software RAID-1 root mirror

Some Linux distributions offer the option to specify the root devices
as RAID devices during the initial setup when the disks are partitioned.

Otherwise, you can configure an existing system for root mirroring via
the procedure outlined in the UserGuide, section 4.0.

--------------------------------------------------------------
This is an example of how to configure a software RAID root
mirror configuration with RedHat 7.1 Linux at install time.  
Instructions may vary with other distributions.

Steps to create a software RAID-1 mirror from the RedHat7.1 install CD.
(assume at least 9GB disk)
1) Specify Custom/Everything as the type of installation.
   This ensures that the software RAID option is included.
2) Specify mount point = <RAID>, with partition type = Linux RAID.
   create partitions on each disk with something like:    
      sd*1 = 70 MB   (/boot)
      sd*5 = 133 MB  (swap)
      sd*6 = 750+ MB (root = remainder)
3) Click on the "Make RAID Device" button to specify:
    mount point:   /boot       /           <SWAP>
    md device:     md1         md0         md2
    sd partitions: sda1, sdb1  sda6, sdb6  sda5, sdb5
    RAID type:     RAID1       RAID1       RAID1
4) Don't forget to select firewall = none.
   (My location is secured from outside access, others may choose
    differently.)
5) Install everything else as normal.

--------------------------------------------------------------

----------------------------------------
  ChangeLog for scsiras kernel patches 
----------------------------------------
0.8.2 = initial M1, 8/30/01
0.9.3 = candidate for M2
- updated aic to ver 6.2.2, corresponding changes to other aic stuff.
- added scsi_evl.patch & raid_evl.patch
- modified evl patches for new posix_log_printf name
- added two more raid fixes
0.9.4 = post-M2 update
- changed EPRINTK macro for added evlog 1.0 parameter
- added scsi2.patch to retry tur after device_reset in scsi_error.c
- merged scsi2.patch into scsi.patch
- revised scsi_evl.patch to create scsi_evlog.h rather than use scsi.h,
  since it caused a define conflict with aacraid/include/comstruc.h.
0.9.5 = post-M2 update
- removed two aic panic changes for invalid SCB 
0.9.6 = candidate for M3 on 10/26/01
- added scsi2.patch with /proc tallies
0.9.7 = update 11/14/01
- modified aic7xxx_linux.c for leftover printk() -> EPRINTK
0.9.8 = update 11/21/01
- modified scsi_scan.c for one-char EPRINTKs (bug 49)
1.2.0 = update 05/16/02
- re-based on kernel 2.4.18
- added validate_sb to md/raid1 patch
- added scsi_reset.patch
- re-based on kernel 2.4.18
1.2.3 = update 08/29/02
- added scsi_rescan patch:
  . scsi: added scsi_rescan function
  . scsi: check for duplicate reset tallies
  . scsi: some cosmetic sense error message cleanup
  . md:   more superblock checking
  . raid1: added raid_counts()
  . aic:  cosmetic printk message cleanup
