RancherOS/ISOLinux/Syslinux on FreeBSD bhyve

After messing with Docker on my laptop, I thought it would be interesting to setup a VM on my FreeBSD server to run RancherOS. I've been using vm-bhyve to manage the VMs, and have been running Ubuntu without much problem, so I figured another Linux distro would be fine ... but ended up opening a whole can of worms and while I did get it running eventually, I learned more about grub and booting on bhyve than I really wanted. I thought I'd jot down some notes here for future reference.

To start with, bhyve is not a general hypervisor that can boot any PC-compatible disk or CD image you throw at it, the way something like KVM, VMWare, or Parallels can. It doesn't start a VM in 16-bit mode and go through a old-school BIOS boot sequence where it reads a Master Boot Record and executes whatever's there. It knows how to load a FreeBSD kernel, and with grub2-bhyve it can boot disks and CDs that use Grub2 - such as Ubuntu.

Unfortunately, RancherOS doesn't use grub, instead it uses Syslinux/ISOLinux on their ISO images and harddisk installations. When bhyve boots using the grub loader, it doesn't find any grub menu on the disk, and just drops you into a grub command prompt.

GNU GRUB  version 2.00

Minimal BASH-like line editing is supported. For the first word, TAB
lists possible command completions. Anywhere else TAB lists possible
device or file completions.

grub>

Fortunately, the grub commandline is like a mini-os, with lots of abilities to look around the disks, and it turns out manually boot things like RancherOS.

The first command to run is:

set pager=1

so that future commands don't just scroll off the screen.

help

displays a list of commands, help <command> gives a short explanation. ls lets you start poking around, in this case giving:

(cd0) (cd0,msdos1) (host)

Now we're getting somewhere. Trying ls (cd0) gives

Device cd0: Filesystem type iso9660 - Label `RancherOS' - Last modification time 2018-09-19 03:09:12 Wednesday, UUID 2018-09-19-03-09-12-00 - Total size 176128 sectors

ls -l (cd0)/ gives

DIR          20180919030909 boot/
DIR          20180919030912 rancheros/

OK, a boot directory, getting closer. ls -l (cd0)/boot gives

170          20180919030909 global.cfg
66978212     20180919030909 initrd-v1.4.1
DIR          20180919030909 isolinux/
1373         20180919030909 linux-current.cfg
12734        20180919030909 rancher.png
5523216      20180919030909 vmlinuz-4.14.67-rancher2

There we go, isolinux, but no grub files, no wonder it doesn't boot. After lots and lots of messing around learning grub, I was able to get an initial boot of the CD image from the grub> prompt with:

linux (cd0)/boot/vmlinuz-4.14.67-rancher2
initrd (cd0)/boot/initrd-v1.4.1
boot

And it started! After lots of Linux boot output I was rewarded with:

                ,        , ______                 _                 _____ _____TM
   ,------------|'------'| | ___ \               | |               /  _  /  ___|
  / .           '-'    |-  | |_/ /__ _ _ __   ___| |__   ___ _ __  | | | \ '--.
  \/|             |    |   |    // _' | '_ \ / __| '_ \ / _ \ '__' | | | |'--. \
    |   .________.'----'   | |\ \ (_| | | | | (__| | | |  __/ |    | \_/ /\__/ /
    |   |        |   |     \_| \_\__,_|_| |_|\___|_| |_|\___|_|     \___/\____/
    \___/        \___/     Linux 4.14.67-rancher2

    RancherOS #1 SMP Thu Sep 13 15:37:04 UTC 2018 rancher ttyS0
    docker-sys: 172.18.42.1 eth0: 10.66.0.48 lo: 127.0.0.1
rancher login:

Very cool, but what's the login? Userid is rancher, but there is no default password. According to the rancher docs, the ISO image is supposed to auto-login. Now what?

After rebooting and getting back to the grub> prompt, and digging around more, I found that cat (cd0)/boot/global.cfg showed:

APPEND rancher.autologin=tty1 rancher.autologin=ttyS0 rancher.autologin=ttyS1 rancher.autologin=ttyS1 console=tty1 console=ttyS0 console=ttyS1 printk.devkmsg=on panic=10

Ah, LInux command parameters including autologin stuff. To apply them it ended up being (again at the grub> prompt):

linux (cd0)/boot/vmlinuz-4.14.67-rancher2 rancher.autologin=tty1 rancher.autologin=ttyS0 rancher.autologin=ttyS1 rancher.autologin=ttyS1 console=tty1 console=ttyS0 console=ttyS1 printk.devkmsg=on panic=10
initrd (cd0)/boot/initrd-v1.4.1
boot

(that commandline could probably be simplified since we can see from the banner that our VM console is ttyS0, so we probably don't need the params relating to tty1 or ttyS1) This time I got the cattle banner from above, and a beautiful:

Autologin default
[rancher@rancher ~]$

A simple sudo -s (not requiring a password) gives root access. At that point you can do whatever, including installing onto a harddisk.

To get a RancherOS harddisk installation to boot, you'd have to go through similar steps with grub in exploring around the (hd0,1) disk to find the kernel, initrd, and kernel params. The grub commands for booting can be saved permanently in the vm-bhyve config for this machine with grub_runX lines like:

grub_run0="linux (hd0,1)/boot/vmlinuz-4.14.67-rancher2 rancher.autologin=ttyS0 printk.devkmsg=on rancher.state.dev=LABEL=RANCHER_STATE rancher.state.wait panic=10 console=tty0"
grub_run1="initrd (hd0,1)/boot/initrd-v1.4.1"
grub_run2="boot"

So the full vm-bhyve config file looks like (in case you're wondering - I hate it when people give snippets of code but don't show where it should go exactly):

loader="grub"
grub_run0="linux (hd0,1)/boot/vmlinuz-4.14.67-rancher2 rancher.autologin=ttyS0 printk.devkmsg=on rancher.state.dev=LABEL=RANCHER_STATE rancher.state.wait panic=10 console=tty0"
grub_run1="initrd (hd0,1)/boot/initrd-v1.4.1"
grub_run2="boot"
cpu=2
memory=2048M
network0_type="virtio-net"
network0_switch="public"
disk0_type="virtio-blk"
disk0_name="disk0"
disk0_dev="sparse-zvol"
uuid="7f9fc9e5-c835-11e8-a327-e03f49af0c7d"
network0_mac="58:9c:fc:05:2a:04"

With that, my VM now boots without manual intervention, even though the virtual disk doesn't use grub.

Winbind failure do to incorrect time

I had the weirdest thing suddenly start happening last night that took several hours to finally figure out was a time-related issue.

I've got an Ubuntu box that uses pam_winbind to allow for logging into a machine using an Active Directory account. Normally I connect with an SSH key, but once in when doing sudo -s I enter an AD password to become root. Last night that sudo -s suddenly stopped working.

Luckily I had another non-AD account that I could connect with, and sudo worked for that, so I could become root and poke around. The logs showed:

sudo: pam_unix(sudo:auth): authentication failure; logname=barry.pederson uid=14283 euid=0 tty=/dev/pts/0 ruser=barry.pederson rhost=  user=barry.pederson
sudo: pam_unix(sudo:auth): conversation failed
sudo: pam_unix(sudo:auth): auth could not identify password for [barry.pederson]

That was weird, I could log into other things though that used the same AD account, so I knew the password was right and the account wasn't locked out.

I hoped by the next morning, some cache thing would expire and I'd be back in business, but no dice.

Poking around some more I found if I disabled my SSH keys, I couldn't log in at all, so it was really a pam_winbind issue, not a sudo one. The logs for a SSH password login attempt were a bit more informative:

pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=xxx.yyy.zzz  user=barry.pederson
pam_winbind(sshd:auth): getting password (0x00000388)
pam_winbind(sshd:auth): pam_get_item returned a password
pam_winbind(sshd:auth): request wbcLogonUser failed: WBC_ERR_AUTH_ERROR, PAM error: PAM_AUTH_ERR (7), NTSTATUS: NT_STATUS_LOGON_FAILURE, Error message was: Logon failure
pam_winbind(sshd:auth): user 'barry.pederson' denied access (incorrect password or invalid membership)
Failed password for barry.pederson from x.x.x.x port 50655 ssh2

WTF? I know the password's right, I've been typing it all morning into other systems. I even tried wbinfo --authenticate barry.pederson on this box and it accepted my passwords.

Much time was spent Googling, trying various tweaks to smb.conf, etc. Finally, I don't remember why, I thought to check the date with ntpdate -d my.ad.server and it came back with offset -338.308573 sec. Holy crap, that's more than 5 minutes! Even though ntpd is running.

Anyhow, once the clock was fixed to be closer to the AD server, logins and sudo started working again.

Make sure virtualization is enabled in the BIOS

I just wasted a fair amount of time on a RedHat 6.1 box being setup to be a hypervisor with KVM, trying to figure how why when I ran virsh version it was telling me among other things

internal error Cannot find suitable emulator for x86_64

All the appropriate packages such as qemu-kvm were installed, but it just didn't seem to want to work. Finally as I was about to try reinstalling RHEL, I remoted into the actual console and saw:

kvm: disabled by bios

Doh!, and looking back in /var/log/messages the same thing was buried deep within all the boot noise. While trying to figure this out I managed to just be looking for virt or qemu in the logs and somehow didn't search for kvm. Enabled virtualization in the BIOS and everything's gravy now.

So there you go, if you're Googling that first error message and get lots of other nonsense, look for the message about the BIOS.

Rebuilding Hyper-V Linux Integration Components for a kernel upgrade

At work I've been running RedHat Enterprise Linux (RHEL) 5.6 on top of a Windows Server 2008 R2 Hyper-V host, with Linux Integration Components (LinuxIC) installed.

That all worked fine until I did a yum update on RHEL to pick up a new kernel and tried to reboot. The new kernel panicked, saying it couldn't find my LVM volume groups. Fortunately, the old kernel was still on the menu and booted OK. Turns out the new kernel didn't for whatever reason wouldn't see the virtual disks for that machine.

OK, so I needed to rebuild LinuxIC for the new kernel while running the old kernel, how to do that? The Makefile and various scripts that come with LinuxIC basically builds for and installs on the currently running kernel. Fortunately I came across this post showing a trick of replacing /bin/uname with a fake version that shows the kernel version number you want to build for. Tried that and was back in business.

I think this would work too, without messing with the original /bin/uname: create a directory somewhere, say '/tmp/fake_uname' and stick this file in it with the name uname (changing the "echo" line with the installed kernel version number you want to build for.

#/bin/sh   
case $1 in
    -r)
        echo "2.6.18-238.9.1.el5"
        ;;
    *)
        exec /bin/uname $1
        ;;
esac

Then build and install your Linux IC with the /tmp/fake_uname prepended to your PATH as in

PATH=/tmp/fake_uname:$PATH make
PATH=/tmp/fake_uname:$PATH make install

When the LinuxIC build calls uname it finds the fake version first, and if the argument is -r shows you desired version number, otherwise falls back to the real uname.

KVM Networking

Still playing with KVM (Kernel-based Virtual Machine), this time checking out some networking features. I've been running Ubuntu 8.04 LTS Server (Hardy Heron), both as the host and as a VM on that host. Networking is setup to use a bridge.

KVM offers different emulated NICs, I took a quick look at running iperf between the VM and the host, and got these speeds for a few select NIC models:

  • RTL-8139C+ (the default): ~210 Mb/sec
  • Intel e1000 (somewhat recommended here): ~ 330Mb/sec
  • virtio: ~ 700Mb/sec

The thing about virtio though is that it doesn't work when the VMs RAM is set to 4GB. So I guess you can have fast networking, or lots of memory, but not both.

Playing with KVM and LVM on Linux

I'm still experimenting with Ubuntu 8.04 Server (Hardy Heron), and have switched from Xen to KVM (Kernel-based Virtual Machine). Xen worked well on a little test machine I had, but when I tried it on a brand-new Supermicro server, it turned out to have a problem with the Intel NIC. Since it seems Ubuntu is recommending KVM over Xen, and the server supports hardware virtualization, I figured I'd give it a try.

One big difference is that KVM does full emulation, which means any disk space you give it from LVM (Logical Volume Manager), will be a full virtual disk, with a partition table. It's a little more complicated to access filesystems within the virtual disk that it was with Xen, I wanted to jot some notes down here mostly for myself on how to do that.

If I've created a logical volume named /dev/myvg/test_vm and installed another linux on it with a single ext3 filesystem (/dev/sda1 from the point of view of the VM) and some swap space (/dev/sda5), it can be accessed when the VM isn't running with the help of the kpartx utility...

kpartx -av /dev/myvg/test_vm

would read the partition table on the virtual disk and create:

/dev/mapper/myvg-test_vm1 
/dev/mapper/myvg-test_vm2 
/dev/mapper/myvg-test_vm5

Then you can

mount /dev/mapper/myvg-test_vm1 /mnt

to mess with the VMs /dev/sda1. To clean things up when finished, run:

umount /mnt
kpartx -d /dev/myvg/test_vm

Snapshots

If you want to look at the contents of a running VM's disks (perhaps for backing it up) you can use LVM snapshots. For example:

lvcreate --snapshot --size 1G --name test_snap /dev/myvg/test_vm
kpartx -av /dev/myvg/test_snap
mount /dev/mapper/myvg-test_snap1 /mnt
   .
   (play with VM's /dev/sda1 in /mnt)
   .
umount /mnt
kpartx -dv /dev/myvg/test_snap
lvremove /dev/myvg/test_snap

Ubuntu server locale errors

This is mostly a note to myself... After setting up a minimal Ubuntu server install (in Xen), following these instructions using debootstrap I saw lots of errors like this:

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = (unset),
        LC_ALL = (unset),
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

Checking with locale -a would show

C
POSIX

While a full Ubuntu server install (off a CD) would show:

C
en_US.utf8
POSIX

This command seems to have generated the missing locale and made everybody happy.

localedef --no-archive -i en_US -c -f UTF-8 en_US.UTF-8

Xen and UFW on Ubuntu

I've been experimenting with setting up Ubuntu Server 8.04 (Hardy Heron) to run Xen, and had a minor problem with UFW (Uncomplicated Firewall) running in the dom0 blocking network access to a domU running in bridged mode. It seems the fix is just to edit /etc/defaults/ufw and make this change to enable forwarding:

--- a/default/ufw       Thu Oct 23 10:00:33 2008 -0500
+++ b/default/ufw       Thu Oct 23 10:34:36 2008 -0500
@@ -16,7 +16,7 @@ DEFAULT_OUTPUT_POLICY="ACCEPT"

 # set the default forward policy to ACCEPT or DROP.  Please note that if you
 # change this you will most likely want to adjust your rules
-DEFAULT_FORWARD_POLICY="DROP"
+DEFAULT_FORWARD_POLICY="ACCEPT"

 #
 # IPT backend

and then run ufw disable; ufw enable.

I believe dom0 is now protected, and it'll be up the the domU to protect itself. I can't say I'm entirely comfortable with Linux IPTables, sure wish PF was available as an alternative.