Appendix C. Troubleshooting

Many SCSI problems are caused by cabling and (lack of, or inappropriate) termination. This often results in repeated SCSI bus resets, parity or CRC errors and sometimes reduced transfer speeds. There is a good SCSI termination tutorial at this site: www.scsita.org/aboutscsi/SCSI_Termination_Tutorial.html. There is other useful SCSI information at that site (see W9).

There is also a SCSI "faq" site (see W10) that addresses many configuration and troubleshooting issues. Although the main focus of this site is Windows (and its ASPI interface), much is relevant to SCSI in Linux and other Unix implementations.

When it looks like something has partially locked up the system, the ps command can be useful for finding out what may be causing the problem. The following options may be useful for identifying what part of the kernel may be causing the problem. This information could be forwarded to the maintainers.

ps -eo cmd,wchan
ps -eo fname,tty,pid,stat,pcpu,wchan
ps -eo pid,stat,pcpu,nwchan,wchan=WIDE-WCHAN-COLUMN -o args
The most interesting option for finding the location of the "hang" is "wchan". If this is a kernel address then ps will use /proc/ksyms to find the nearest symbolic location. The "nwchan" option outputs the numerical address of the "hang".

If the system is not responding to keystrokes, then <Alt+ScrollLock> in text mode should output a stack trace while <Ctrl+ScrollLock> should output a list of all processes. If the log is still working, the output will be sent there as well as appearing on the console.

If the kernel has been built with the CONFIG_MAGIC_SYSRQ, then in text mode <Alt+SysRq+H> will list available commands. Of these <Alt+SysRq+S> is useful for doing an emergency sync while <Alt+SysRq+U> will remount file systems in read only mode. After that <Alt+SysRq+B> to reboot the machine might be your next move.