gdt's notes on net5501 and NetBSD
Table of Contents
1 Introduction
This page contains gdt's notes on the Soekris Engineering net5501 and NetBSD, primarily for my own benefit, but published because it might be useful to others.
I intend to migrate most of this infomation to the Soekris wiki, once I have reduced the confusion to acceptable levels.
1.1 Links to other people's notes
2 Power
2.1 Soekris-provided Supply
The standard supply from Soekris is a thin wall-wart, taking up only one plug. It ends in a 5.5mm x 2.1mm barrel connector, nominally 12V center, shield ground. With no load, I measured 12.14V.
2.2 Observed currents
Power draw was measured with an Astro Flight DC volt/ammeter/wattmeter, hooked up with Anderson PowerPole connectors to a monstrous 12V linear supply (via a large power diode, because the supply normally charges a 30 Ah battery).
| condition | Voltage (V) | Current (A) | Power (W) |
|---|---|---|---|
| disconnected | 14.25 | - | - |
| POST | 13.86 | 0.46 | 6 |
| idle | 14.00 | 0.33 | 4 |
| + vpn1411 | 13.98 | 0.33 | 4 |
| + 32MB CF | 13.98 | 0.33 | 4 |
| + 40GB 2.5" | 13.96 | 0.38 | 5 |
| during install | 13.78 | 0.56 | 7 |
| NetBSD idle | 13.82 | 0.39 | 5 |
2.3 DC-DC convertor vs regulator
The documentation on the web has been unclear on whether the net5501 (and other boards) has a DC-DC convertor or a regulator. Specifically, does the board require less current at a higher supply voltage?
3 BIOS/BOOT info
3.1 PCI status at boot
The 0:17:0 line is the vpn1411.
Slot Vend Dev ClassRev Cmd Stat CL LT HT Base1 Base2 Int ------------------------------------------------------------------- 0:01:2 1022 2082 10100000 0006 0220 08 00 00 A0000000 00000000 10 0:06:0 1106 3053 02000096 0117 0210 08 40 00 0000E101 A0004000 11 0:07:0 1106 3053 02000096 0117 0210 08 40 00 0000E201 A0004100 05 0:08:0 1106 3053 02000096 0117 0210 08 40 00 0000E301 A0004200 09 0:09:0 1106 3053 02000096 0117 0210 08 40 00 0000E401 A0004300 12 0:17:0 13A3 0020 0B400000 0116 0280 08 40 00 A0005000 A0006000 15 0:20:0 1022 2090 06010003 0009 02A0 08 40 80 00006001 00006101 0:20:2 1022 209A 01018001 0005 02A0 08 00 00 00000000 00000000 0:21:0 1022 2094 0C031002 0006 0230 08 00 80 A0010000 00000000 07 0:21:1 1022 2095 0C032002 0006 0230 08 00 00 A0011000 00000000 07
3.2 PXE booting
I had a number of difficulties with PXE booting. I used a NetBSD/i386 5.1ish VM in VMware on a mac, with pcn0 bridged to en0 on the mac, and a crossover cable to the net5501.
3.2.1 Delays
The net5501 booted ok and started to pxeboot. It got a DHCP address and then downloaded pxebootia32.bin via tftp. From tcpdump, all looked normal. Then the system just sat there. Sometimes it would start to run pxeboot (or rather, to print out the text that pxeboot is supposed to print on start) after about 2m15s. This is almost certainly due to a well-known problem with the interaction between NetBSD's standalone boot code and the net5501. Somtimes it would apparently just hang. I did not fully characterize the behavior; it could be that the hangs were 2m15s delays and I was impatient, since I knew of no reason for a delay more than a few seconds.
3.2.2 Console device confusion
When pxeboot starts, I believe it is using the pc console BIOS calls, not knowing that it is on a machine with a serial console (I set no options in pxebootia32.bin). If it is left to continue, I see the lines indicating that it is loading NetBSD over nfs, but I see no output from the NetBSD kernel. When using GENERIC as the kernel, I then see NFS calls to load init, etc.
In pxebootia32.bin, I ran "consdev com0" to force the console to the serial port, bypassing relying on the comBIOS emulation. Then, "boot netbsd" proceeded normally, and I saw the kernel device probe lines. I was able to install, fetching sets over the same NFS mount I had used for the kernel. I con configured the boot blocks to use a serial console (and thus pass this to the kernel). The system then booted normally. It seems that the NetBSD kernel does not interact well with the comBIOS vga-to-console emulation.
4 Miscellaneous
4.1 Console Speed
The default console speed is 19200. However, NetBSD defaults to 9600, following ancient traditions. I decided to set the comBIOS to 9600 because that seemed easier than adjusting NetBSD.
set ConSpeed 9600
4.2 CF socket lock
The standard metal case has a plastic bushing and machine screw that must be removed to insert a CF, and it can be replaced to lock the CF in place.
4.3 Disk Mounting
On both mounts, the flanges point down, so the top of the disk is flush with the mount. Also, one of the wings near the cable end of the disk is shaped to allow access to the floppy/SATA power connector when the mounting bracket is installed.
4.3.1 PATA
The mini-PCI slot must be populated before installing the 2.5" PATA drive bracket, but this is obvious. Less obvious is which screwholes to use to mount the drive. The bracket has two holes at each end (per side of course). The wiki image appears to show the top of the drive flush with the mount, but I found that the cable did not reach. Using the holes nearer the PATA cable connector made the cable seem slightly too long but solidly connected.
To mount the bracket, you remove the screws holding the board to the case, and replace them with threaded standoffs. Then, the original screws mount the bracket to the standoffs. The board/standoff screws seem very similar to the disk/bracket screws but they are not exactly the same.
4.3.2 SATA
The SATA cables are long; this post shows how to fold them.
http://lists.soekris.com/pipermail/soekris-tech/2008-April/014272.html
5 NetBSD
This section is primarily concerned with NetBSD 5.1 (netbsd-5 branch, 2010-05ish).
5.1 driver support
glxsb0 at pci0 dev 1 function 2: RNG AES hifn0 at pci0 dev 17 function 0: Hifn 7955, rev. 0 hifn0: 3DES/AES, 32KB dram, interrupting at irq 15
This table lists drivers in dmesg probe order. "not tested" is for things that almost certainly are fine but I haven't actually checked, and "?" is for things about which I am genuinely uncertain.
| gcscpcib0 | wdogctl - works |
| glxsb0 | rndctl -l shows no bits |
| glxsb0 | opencrypto - works/buggy |
| vr0 | works |
| viaide/wd0 | works |
| gcscehci0 | works (probably) |
| uhci | not tested |
| com0 | works |
| com1 | not tested |
| hifn rng | rndctl -l shows bits |
| hifn aes | opencrypt - works/buggy |
| hifn rsa | ? |
5.2 crypto performance
Note that sysctl variable "user.cryptodev" should be 1 to enable openssl to use coprocesors. See a tech-crypto thread about this issue. (Only one measurement in each case is presented, but values seem fairly stable from run to run.)
net5501: no /dev/crypto present $ openssl speed -elapsed -evp aes-128-cbc type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 3574.50k 3901.83k 3950.56k 8026.66k 8126.68k
net5501: both glxsb and hifn drivers $ openssl speed -elapsed -evp aes-128-cbc type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 1108.07k 4067.77k 9876.57k 20509.60k 25928.63k
net5501: Only hifn $ openssl speed -elapsed -evp aes-128-cbc type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 224.36k 859.60k 2752.89k 7810.30k 18182.97k
Macbook Pro, 2.2 GHz Core 2 Duo, 10.6.5 type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128 cbc 108758.96k 113031.65k 112286.86k 113983.60k 114841.96k
6 Problems
6.1 Boot delay
There is an issue between the boot code and the net5501, described at http://gnats.netbsd.org/39726.
It is apparently fixed in -current but not in netbsd-5.
6.2 instability associated with rsync/ssh and USB disk
I hooked up a WD Elements 2T disk directly to the USB port. The disk has a single GPT partition which contains a NetBSD UFS2 filesystem, which was mounted with WAPBL ("-o log"). I started rsyncing bits to it (via ssh of course) and the machine locked up, but I didn't have anything connected to the console. I tried again, and used "-o rump" to have the filesystem code be in user space, and that resulted in:
Mar 6 17:17:32 foo /netbsd: umass0 at uhub1 port 1 configuration 1 interface 0 Mar 6 17:17:32 foo /netbsd: umass0: Western Digital Ext HDD 1021, rev 2.00/20.02, addr 2 Mar 6 17:17:32 foo /netbsd: umass0: using SCSI over Bulk-Only Mar 6 17:17:32 foo /netbsd: scsibus0 at umass0: 2 targets, 1 lun per target Mar 6 23:03:43 foo /netbsd: umass0: BBB reset failed, TIMEOUT Mar 6 23:04:48 foo /netbsd: umass0: BBB bulk-in clear stall failed, TIMEOUT Mar 6 23:05:53 foo /netbsd: umass0: BBB bulk-out clear stall failed, TIMEOUT
Then, rumpffs dumped core. So there's a kernel ffs bug, but it seems likely that it was triggered by other misbehavior, not the other way around. I was able to detach the drive via "drvctl -d umass0".
Later, with -o rump, but without -o log, and using remote syslog, I found an ohci error message:
Mar 13 12:13:22 172.16.32.1 /netbsd: vr0: rx packet lost Mar 13 12:53:34 172.16.32.1 /netbsd: vr0: rx packet lost Mar 13 12:53:34 172.16.32.1 /netbsd: ohci0: 1 scheduling overruns Mar 13 13:57:43 172.16.32.1 /netbsd: vr0: rx packet lost Mar 13 13:57:43 172.16.32.1 /netbsd: ohci0: 1 scheduling overruns Mar 13 13:59:54 172.16.32.1 /netbsd: umass0: BBB reset failed, TIMEOUT Mar 13 14:00:59 172.16.32.1 /netbsd: umass0: BBB bulk-in clear stall failed, TIMEOUT Mar 13 14:02:04 172.16.32.1 /netbsd: umass0: BBB bulk-out clear stall failed, TIMEOUT
In this case, the machine was locked up solidly. This could be related to PR kern/18820.
Reading 50G of files from the disk—without using rsync/ssh completed without problems.
Interactive ssh has been fine.
6.2.1 Details and Theories
Other people report no issues with USB on the net5501 with NetBSD (other than that it's slow). I wondered if the hifn chip was used for opencrypto, and if ssh used that. There is a surprising lack of instrumentation about this. After experimenting, I determined that openssl (and hence ssh, I believe) uses opencrypto by default, and that glxsb is preferred to hifn.
The following table collects stability observations. In results, fast and slow refer to crashes/lockups on the scale of 15 minutes to several hours. crypto refers to use of opencrypto; none means /dev/crypto was renamed to /dev/crypto.dont, and driver names are the accelerator chips I believe were in use. speed refers to the write write in KB/s (all rsync is of MB+ image files).
| rump | wapbl | rsync | crypto | results | speed | cpu |
|---|---|---|---|---|---|---|
| no | yes | yes | glxsb | slow | ||
| yes | yes | yes | glxsb | slow | ||
| yes | no | yes | glxsb | slow | ||
| yes | no | no | glxsb | stable | ||
| yes | no | yes | none | stable | ||
| no | yes | yes | hifn | fast | ||
| no | yes | yes | none | fast (tstile) | 1880 | (busy, mostly user) |
So, my current belief is that there is a bug in either opencrypto or in both drivers, that, at least in conjunction with USB disk usage, results in some serious problem. I also suspect wapbl when writing large amounts of data, and think it's likely there is more than one bug.
The next step is to remove glxsb and hifn drivers from the kernel, to remove opencrypto hardware support entirely.
Then, I'll start with rump, no wapbl, and see if I can find a stable situation. Here, rsync is yes/no, where no means no significant network traffic, and mode is write for writing files, and read for reading them (with rsync -avc, or with tar to /dev/null).
| rump | wapbl | disk | rsync | mode | results | speed | cpu |
|---|---|---|---|---|---|---|---|
| yes | no | USB | no | read | ok/71G | 2233 | 27% |
| yes | no | USB | yes | write | ok/90G | 1200 | 34% |
| yes | yes | USB | yes | write | ok/150G | ||
| yes | yes | USB | yes | write | no - tstile |
Running with rump and wapbl, the memory usage of rumpffs seemed to grow to at least 100 MB more VSZ than RSS. Perhaps there is a leak?