My Nightmare weekend with LFS

Andy Blower andy.blower at sopheon.com
Mon Oct 1 00:49:42 PDT 2001


Hi,

This is going to be a long post attempting to describe, from memory, what
has gone wrong with my PC this weekend. It is probably above the call of
duty to read this - and truely heroic to reply... ;-) But I thought it was
worth a shot anyway. Before I begin, please note there are 2 problems:

1) Why the disk (referred to as 2nd disk below) has problems keeping a
filesystem on one partition after rebooting. This is not urgent any more -
just curious for opinions/theories etc.

2) Why I can't get my dual boot setup to work using 3rd disk in place of the
2nd disk.


History:

Redhat installed 12 months ago on 2nd disk in system, system looked like:

hda1 = Boot manager (partition magic) pointing to Win95, DOS and also hdb2
(Linux boot partition).
hda(various) = Windows partitions
hdb1 = FAT partition
hdb2 = Linux boot partition created by RH with lilo installed here
hdb4 = extended partition
hdb6 = Redhat linux (logical drive)
hdb7 = Linux swap

This worked for ages, although I did not get along with Redhat. So I
installed LFS 3.0 pre4 about 2.5 months ago, had a few problems but got it
working on a 3rd drive (hdc1) using the same boot partition as RH (hdb2).
About this time, the hdb1 FAT partition completely screwed up loosing all my
data on there. I recreated loads of times replacing the data (well what I
had backed up anyway) but each time it would be knackered after a couple of
reboots. I thought it must be something to do with the partition I had
created just above it (hdb5) on which I started the LFS creation on before
moving to a third drive when I ran outta space (trust me to ignore space
requirements).

Again this dual boot worked for ages, but with 2 areas of the disk un-used.
After hdb1&hdb2, before the extended partition start. After the extended
partition start, before the RH Linux partition.


This Weekend:

Well, decided that I was gonna install some software downloaded at work on
my LFS this weekend and was looking forward to it. Booted up LFS and created
a partition to use up the remaining space on hdb before the extended
partition (where the FAT partition used to be) formatted it ext2 and
rebooted. fsck threw an error on /dev/hdb2 - my boot partition and I had to
enter root pw to do maintenance. I ran e2fsck and corrected the loads of
errors it found. When I mounted hdb2 some of the files had gone. I saved
what I had and dropped the partition. Rebooted using tomsrtbt and chrooted
into LFS (must get around to making a proper boot disk someday..) re-created
the partition and the filesystem on /dev/hdb2 and copied all the files
required back. Re-ran lilo to install on /dev/hdb2 (the boot manager on hda
kicks me to this partition for Linux and then I select the flavour from lilo
menu.)

Booted okay but after a reboot it was knackered again. Spookily like what
happened with the FAT partition months ago!!

Anyway I went round this loop loads (about 40 - no joke) of times, using my
redhat installation and every combination I could think of. At one point I
thought it was the versions of E2fsprogs I was using (1.21) as I was able to
fix problems using the redhat versions wheras the lfs versions just killed
it. Even after upgrading to 1.25 the same thing was happening. Eventually I
decided to completely wipe the first half of the disk (all but the redhat
partition) with zeros. Created a large partition with no fs and typed "dd
if=/dev/zero of=/dev/hdb1" and also "dd if=/dev/zero of=/dev/hdb bs=512
count=1" to clear the MBR. At this point I was thinking the problem was
either a virus (didn't know they could live on a disk without a filesystem)
or just a disk problem that may be solved if I blank it... (if this seems
really dumb - please don't laugh too hard, we are at hour 15 and counting
here... not particularly funny for me ;)

So, recreated /dev/hdb2, copied files and reran lilo... what do  you know,
now my redhat partition is screwed after a reboot in exactly the same manner
and fsck doesn't fix it it just f**ks it up more!!  Now, it worked fine
initially after creating the partition table entry for it using fdisk after
blanking the MBR (and yes I'm sure that the partition I created and wrote
zeros too finished a long way before this partition started) and this was
what I was worried about. It just then developed this strange 'degrade after
reboot' problem just like other partitions before. Also note that this only
happens with this disk drive.

At this point I am pulling my hair out and get some sleep. Sunday I decide
to re-partition the whole problem 2nd disk and see if I can get it working
as I suspect a HD problem, but always imagined it to be a bit more obvious
than this loosing integrity after a reboot (not power off) on a single
partition...  I definately think I would have just junked the drive at this
point had I been able to change the boot manager pointer to a different
location than /dev/hdb2, but unfortunately Partition magic doesn't work on
hda since I re-did the partitions with Linux fdisk... :-(  I also can't
install lilo on here - that is a risk I cannot take since there is a lot of
un-backed up windows stuff here. So, after completely re-partitioning and
formatting the disk into Linux partitions and restoring the boot files and
lilo to /dev/hdb2 - guess what - same thing happens to one of the partitions
(hd5 I think - the first one in the extended partition, but it seems pretty
random).

Okay then, only thing left is to junk this drive (its got nothing of use
left on it now anyway)... So swap in 3rd drive so the LFS partition on here
(was hdc1) becomes hdb1. So, just need to create /dev/hdb2. Unfortunately
the hdb1 LFS partition took up half the disk, and the boot partition needs
to be below some limit I remember reading, according to the redhat install..
so I resized and moved the LFS partition up by 36Mb and created hdb2 at the
start of the disk - installed files and lilo. This tested that the resize
and move (done by Partition Magic) had worked okay as I was chrooted into
the LFS partition to create the boot partition. Unfortunately I couldn't
boot - I get a Kernel Panic.. not good.

So, in case the LFS partition has been mangled I try the following: take the
1st (windows disk) out and make 3rd disk (currently hdb) into hda, use the
tomsrtbt to chroot into LFS (now hda1) and edit lilo.conf and fstab to
reflect the latest change of device to hda, copy required files to /boot and
write lilo to the MBR. Reboot and voila... LFS is back. Unfortunately
despite all my efforts I cannot get the disk to boot when it's hdb and
jumping into the hdb2 partition from the boot manager on the 1st disk. Not
much of a multi-boot system now...

So, any ideas? Feel free to comment on why you think that I had such strange
behaviour from the disk I eventually removed. Also feel free to point out my
stupidity.. I'm sure I must have messed up somewhere ;-)

Finally, this is the error I get on attempting to boot LFS from boot manager
-> hdb2:

.... stuff ... normal .... stuff ...
[MS-DOS FS Rel 12, FAT 16,check=n,conv=b,uid=0,gid=0,umask=000]
[me=0xf8,cs=4,#f=2,fs=1,fl=12,ds=25,de=512,data=57,se=32,ts=0,ls=512,rc=0,fc
=4294967295]
Transaction block size = 512
Kernel panic: VFS: Unable to mount root fs on 03:01


No idea why it says FAT16 or what 03:01 means. I also have tried using the
3rd disk (with LFS on) as hda and using lilo to jump to windows on 1st disk
(as hdb) but it won't boot into windows when this 1st disk is hdb - even if
I make the windows partition have the bootable flag on this partition, it
just says "Non-System disk".

Hope everyone had a better weekend than me.

Thanks,

Andy.
-- 
Unsubscribe: send email to listar at linuxfromscratch.org
and put 'unsubscribe blfs-support' in the subject header of the message



More information about the blfs-support mailing list