<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0">
<channel>
<title>Fun with ones and zeros - unix</title>
<description><![CDATA[Barry's notes on computer software and hardware]]></description>
<link>/blog/tags/unix</link>
<lastBuildDate>Tue, 05 May 2026 09:04:58 -0700</lastBuildDate>
<item>
<title>&quot;man&quot; annoyance</title>
<link>/blog/entries/man-annoyance</link>
<pubDate>Sat, 16 Feb 2008 17:12:09 -0800</pubDate>
<author>bp@barryp.org (Barry Pederson)</author>
<description><![CDATA[
<p>I work on several different Unix-type machines, some FreeBSD, some Linux, and there are always small differences between them that can be very annoying.  <br />
</p>
<p>One that's driven me crazy many times is how on some machines, when using <code>man</code> to read a manpage, after getting to the part I'm interested in, and hitting 'q' to quit - the contents of the manpage completely disappear and the screen is restored to what was shown before I ran <code>man</code>.  Other machines I work on don't do that - the contents of the manpage remain on the screen so you can see them as you type the next command.
</p>
<p>After finally getting fed up, I looked into this and it turns out the machines that were acting the way I like have the <code>PAGER</code> environment variable set to <code>more</code>, so <code>man</code> uses that instead of <code>less</code> which has the screen-restore-on-quit behavior.  Adding:
</p>
<pre><code>PAGER=more; export PAGER
</code></pre><p>to <code>.bashrc</code> seems to have done the trick on my Ubuntu box.  Seems the FreeBSDs already had this by default.  <br />
</p>
<p>It's not a big deal, but I'm just jotting this down in case anyone else has the same annoyance.  Good luck finding this in Google though, if you're just using words like 'more', 'less',  and 'man' ;)
</p>


]]></description>
</item>
<item>
<title>More DHCP Failover</title>
<link>/blog/entries/more-dhcp-failover</link>
<pubDate>Tue, 08 Jan 2008 17:42:00 -0800</pubDate>
<author>bp@barryp.org (Barry Pederson)</author>
<description><![CDATA[<body><p>Earlier I <a href="/blog/entries/dhcp-failover">wrote about DHCP failover</a>, but there's another thing I thought I might mention that could be useful to others....</p>
<p>I had a problem in that one of my servers' CMOS clocks tends to be a bit off, maybe 90 seconds.  When <code>dhcpd</code> starts up, it is unable to enter a normal failover state because of the time difference between it and the other <code>dhcpd</code> server.</p>
<p>I have </p>
<pre><code>ntpdate_enable="YES"
ntpdate_flags="-b x.x.x.x"</code></pre>
<p>in my <code>/etc/rc.conf</code>, along with running <code>openntpd</code>,  but for some reason <code>ntpdate</code> wasn't setting the clock at boot time, and by the time <code>openntpd</code> got the clock tuned up, <code>dhcpd</code> had given up on trying to re-establish failover.  Restarting <code>dhcpd</code> by hand later on always worked OK.</p>
<p>I think what was happening was that the network jack this server was plugged into wasn't coming alive quick enough to be up and running when <code>ntpdate</code> tried to do its thing.  Something to do with the Cisco switch not having <em>portfast</em> enabled.   </p>
<p>I don't have access to do anything about the switches, so I came up with the workaround of adding a simple script <code>/usr/local/etc/rc.d/000.afterboot.sh</code> to schedule a job to run a few minutes after the machine boots - to adjust the clock and restart <code>dhcpd</code>.  It looks something like:</p>
<div class="source"><pre><span></span><span class="ch">#!/bin/sh</span>
at now + <span class="m">5</span> minutes <span class="s">&lt;&lt;EOF</span>
<span class="s">/etc/rc.d/ntpdate restart</span>
<span class="s">/usr/local/etc/rc.d/isc-dhcpd restart</span>
<span class="s">EOF</span>
</pre></div>
<p>It's a bit of a kludge, but seems to do the trick.</p></body>]]></description>
</item>
<item>
<title>DHCP Failover</title>
<link>/blog/entries/dhcp-failover</link>
<pubDate>Tue, 01 Jan 2008 16:23:00 -0800</pubDate>
<author>bp@barryp.org (Barry Pederson)</author>
<description><![CDATA[<p>I've been setting up DHCP servers at work to use the failover feature available in ISC-DHCP (the <code>net/isc-dhcp3-server</code> port in FreeBSD).  That allows for two servers to work together, sharing a pool of addresses and keeping track of leases handed out by both servers.  The <code>dhcpd.conf(5)</code> manpage discusses this feature somewhat.  I'll jot down some notes here that are a bit more specific about what all had to be done.</p>
<p>Let's assume the two DHCP servers will have the IP addresses 10.0.0.10 and 10.0.0.20 - with .10 being the 'primary' and .20 being the 'secondary' server.  It doesn't really matter which is which - although the logs on the 'primary' server seem a bit more complete.  We'll also assume these servers are giving out addresses from a pool of numbers 10.0.0.100-10.0.0.200, and are running DNS caches - so that the DHCP clients should be told to use them for DNS servers.</p>
<p>We'll also use TCP port 520 for communications between the DHCP servers, so be sure to allow for that through any firewalls.</p>
<h3>Configuration</h3>
<p>On the 10.0.0.10 'primary' machine, the <code>/usr/local/etc/dhcpd.conf</code> file might look like:</p>
<pre><code>failover peer "foo" 
    {
    primary;
    mclt 1800;  # only specified in the primary
    split 128;  # only specified in the primary

    address 10.0.0.10;
    port 520;

    peer address 10.0.0.20;
    peer port 520;

    max-response-delay 30;
    max-unacked-updates 10;
    load balance max seconds 3;                
    }

option domain-name-servers 10.0.0.10, 10.0.0.20;

include "/usr/local/etc/dhcp/master.conf"; </code></pre>
<p>and the same file on the 'secondary' 10.0.0.20 machine is very similar:</p>
<pre><code>failover peer "foo" 
    {
    secondary;

    address 10.0.0.20;
    port 520;

    peer address 10.0.0.10;
    peer port 520;

    max-response-delay 30;
    max-unacked-updates 10;
    load balance max seconds 3;                
    }

option domain-name-servers 10.0.0.20, 10.0.0.10;

include "/usr/local/etc/dhcp/master.conf"; </code></pre>
<p>The failover peer name, &quot;foo&quot; in this example, will also appear in the DHCP pool configuration, and will be used in a script change the failover state later on.</p>
<p>I created a directory <code>/usr/local/etc/dhcp/</code> to hold the DHCP config files that will be common to both DHCP servers.   That way, it's just a matter of copying the entire directory between servers when a change is made.  The <code>/usr/local/etc/dhcp/master.conf</code> file I included from the main server config might look something like:</p>
<pre><code>omapi-port 7911;

default-lease-time 16200;  # 4.5 hours
max-lease-time 16200;

subnet 10.0.0.0 netmask 255.255.255.0
        {
        option routers 10.0.0.1;

        pool
                {
                failover peer "foo";
                deny dynamic bootp clients;

                range 10.0.0.100  10.0.0.200;
                }
        }</code></pre>
<p>The <code>deny dynamic bootp clients;</code>directive is required for any failover pool.  The <code>omapi-port 7911;</code> directive will be useful later on for when a server needs to be put into the 'partner-down' state because the other server will be off for a while.  </p>
<p>To sync and restart the two servers whenever there's a change to the DHCP configuration, I setup the 10.0.0.20 server to allow root logins through SSH from the root account of 10.0.0.10 using public/private keys, and then put a script named <code>restart_dhcp</code> on the 10.0.0.10 server that looks like:</p>
<pre><code>#!/bin/sh
/usr/local/etc/rc.d/isc-dhcpd restart
scp -pr /usr/local/etc/dhcp 10.0.0.20:/usr/local/etc
ssh 10.0.0.20 /usr/local/etc/rc.d/isc-dhcpd restart</code></pre>
<p>That copies the entire <code>/usr/local/etc/dhcp</code> directory, so if you need to break up your config into more files that get included, they'll all be copied over when you do a restart.</p>
<h3>Failover</h3>
<p>When one server stops unexpectedly, the remaining server will go into a <code>communications-interrupted</code> state, and continue offering up addresses from its half of the DHCP pools, and will renew leases it knows were given out by the other server.  </p>
<p>If the downed server will be out for longer than the <code>mclt</code>  value from the server config (1800 seconds (30 minutes) in the examples above).  You may want to let the surviving server know that it's on its own so that it can use the entire pool of available addresses.  This is done by putting the surviving server into <code>partner-down</code> state.  </p>
<p>This has to be done <em>after</em> the other server is really down.  Doing it before shutting down the other server doesn't work, because the two servers will get themselves back into a <code>normal</code> state very quickly, probably before you get a chance to shut the 2nd server down.</p>
<p>The <code>omshell</code> program can be used to communicate with a running DHCP server daemon to control it in various ways, including changing the failover state.  I put this <code>partner-down</code> script on both the primary and secondary servers:</p>
<pre><code>#!/bin/sh
omshell &lt;&lt; EOF
connect
new failover-state
set name = "foo"
open
set local-state = 1
update
EOF</code></pre>
<p>so when one server is going to be down for a while, I can connect to the other server and just run that script.  </p>
<p>When the downed server comes back up, the two servers automatically start communicating and eventually get themselves back into a <code>normal</code> state.  But only after the recovering server has spent <code>mclt</code> time in <code>recover-wait</code> state, where it renews existing leases but won't offer up new ones.  So you probably wouldn't want to go into a <code>partner-down</code> state if the other server will be down for less than that amount of time.</p>
<p>Running the <code>partner-down</code> script when both servers are really up and running doesn't seem to do any harm, as mentioned above the two servers will quickly move back into a <code>normal</code> state.  This can be seen by watching the DHCP logs.</p>
<h3>Clean Failover</h3>
<p>It's possible using OMAPI to shut down a server and have the remaining server automatically switch to &quot;partner-down&quot; mode in a clean way, so that when the downed server comes back up both servers quickly move to &quot;normal&quot; mode, without spending the <code>mclt</code> time in <code>recover-wait</code> state.    This script does the trick:</p>
<pre><code>#!/bin/sh
omshell &lt;&lt; EOF
connect
new control
open
set state = 2
update
EOF</code></pre>
<p>When run, it causes the dhcpd daemon on the current server to shutdown, and the dhcpd daemon on the other server takes over completely the DHCP pools.</p>
<hr />
<p><strong>Update</strong>: I wrote a <a href="/blog/entries/more-dhcp-failover">bit more</a> about DHCP failover,  talking about how to deal with a clock sync problem when the machine boots by scheduling a <code>dhcpd</code> restart a few minutes after boot time.</p>]]></description>
</item>
<item>
<title>Viewing a man file
</title>
<link>/blog/entries/view_manfile</link>
<pubDate>Sun, 05 Mar 2006 21:49:36 -0800</pubDate>
<author>bp@barryp.org (Barry Pederson)</author>
<description><![CDATA[
<p>
This is one of those little things that I just want to jot down for
myself so I have it written down until I learn it for good.  To view
a man file, that's not installed in the regular man file locations, just
run
</p>
<blockquote><pre>
nroff -man <i>filename</i> | more
</pre></blockquote>
<p>
Stupidly simple, but unfortunately not mentioned in the manpage for man.
</p>



]]></description>
</item>
<item>
<title>Restoring Boot Sectors in FreeBSD
</title>
<link>/blog/entries/restore_boot</link>
<pubDate>Sat, 04 Feb 2006 16:34:53 -0800</pubDate>
<author>bp@barryp.org (Barry Pederson)</author>
<description><![CDATA[
<p>
At work the other day, we had a long power outage, and afterwards one of our FreeBSD 5.2.1
boxes refused to come back up.  It'd power up, go through the BIOS stuff, show the FreeBSD 
boot manager that lets you select which slice to boot, but when you hit F1, the screen would go
black and the machine would reset.
</p>
<p>
Booted off the 5.2.1 install CD, and after entering fixit mode, was able to mount the disk and see
that the files seemed to be intact.  Couldn't run <code>fsck</code> though, the 5.2.1 CD seemed to be missing 
<code>fsck_4.2bsd</code>.  
</p>
<p>
<a href="http://www.freesbie.org">FreeSBIE</a> 1.1 on the other hand, was able to <code>fsck</code> the 
disk, but that didn't solve the problem.  Next guess was that something in the <code>/boot</code>
directory was hosed.  I'd setup the machine to do weekly dumps of the root partition to another 
machine, and was able to extract <code>/boot</code> from a few days before and pull it back onto
this machine over the network using FreeSBIE, but it still wouldn't boot.
</p>
<p>
Next theory was that something in the boot sectors was bad.  First tried restoring the MBR (Master Boot Record)
from copy that's kept in <code>/boot</code> - even though it was working well enough to show the <code>F1</code>
prompt to select the slice.  Wanted to keep what 5.2.1 had been using, so mounted the 
non-booting disk readonly and made sure to have <code>boot0cfg</code> use the copy there 
instead of anything
that might have been on the FreeSBIE disc.
</p>
<blockquote><pre>
mkdir /foo
mount -r /dev/twed0s1a /foo
boot0cfg -B -b /foo/boot/boot0 /dev/twed0
reboot
</pre></blockquote>
<p>
Unfortunately, that didn't help.  Each slice (partition in non-BSD terminology) also has boot sectors, and to 
restore them, turns out you use the <code>bsdlabel</code> (a.k.a. <code>disklabel</code>) utility.  Again from 
FreeSBIE:
</p>
<blockquote><pre>
mkdir /foo
mount -r /dev/twed0s1a /foo
bsdlabel -B -b /foo/boot/boot /dev/twed0s1
reboot
</pre></blockquote>
<p>
That did it.  Apparently something in the slice's boot sectors was messed up.  
</p>



]]></description>
</item>
</channel>
</rss>