strange badblocks problem

marnixk lfs at mkaart.net
Wed Jan 14 23:26:45 PST 2004


Bill's LFS Login wrote:

> This is *correct* operation. Badblocks and the dd as you should be using
> it *will* read the whole partition (or regular file if that was
> specified). Dd can be limited by adding "count=<some number>". If you
> don't do this, its job is to read until end of file (which in this case
> is the whole partition).

Where does dd get its information about the size of the partition from? The
kernel or the FS? I ask this because I have discovered something strange.
My home system (also Gentoo based) shows the exact same behaviour. This
kinda rules out hardware problems i guess, but I suspect the gentoo livecd
that I have used on both machines even more. I have a theory that this only
happens on partitions that have once been mounted when booting from this
CD. I will go to work later and boot from a knoppix cd or something, create
partitions and FS, mount, unmount, reboot, mount, ... and so on. Then I
will reboot with some other boot disk and keep running badblocks and dd in
between all steps. Then if no errors occur I will reboot with the liveCD
and mount/unmount and check if the problem re-appeares and persists...

> I have no theory on why that is happening. It would sound as if the
> update of the meta-data got some bad info from somewhere. When you do an
> unmount, certain info about the file system is updated.
> 
> Did you right down the alternate super-blocks when you made the file
> system? If so, we can specify one of the alternates and run the fsck.
> Normally this alternate will not have been updated and should have the
> info as originally transcribed when the FS was created. This could be
> compared with the default super-block that was modified at unmount time.

I tried dumpe2fs to dump the super-block, but I am not sure how I can dump
the backup super-block (which is located at 32768, 98306,...) However when
I run e2fsck -f /dev/hda9 it says all is OK, but when I do e2fsck -b 98304
it says FILE SYSTEM WAS MODIFIED, even if I do this repeatedly. But they
both report the same line at the end: /dev/hda6: 11/141696 files (0.0%
non-contiguous), 12663/283137 blocks

So I suspect there indeed *is* a difference between those two super-blocks,
but I do not now how to "dump" the backup blocks to compare them to the
first one.
 
> Then I *have* to ask this (don't take offense, I feel we are at the
> stage of "what have we overlooked here"). Are you *sure* that the FS is
> *not* mounted when you are running the fsck? An FS can be mounted
> multiple times at the same time. E.g. two mount statements in
> /etc/fstab.

I am very sure that the FS is not mounted when running the fsck, but it is
not fsck-ing that gives me problems. Filesystems are reported as being
clean, although I have had the expirience on my home system that everytime
when it was time to check the rootFS it said: FILE SYSTEM WAS MODIFIED and
something about a reboot needed. So maybe this has to do something with it
as well...
 
> Cat /proc/mounts to be sure that the partition is not referenced twice.
> Be sure it is not mounted under another directory with a "mount --bind"
> (this is harder to see because the secondary mounts don show the
> partition. Here's an example of things mounted with bind (from "mount").
> 
> /dev/sr0 on /mnt/SourcesCD type ext2 (ro)
> /mnt/workspac/New650MbFs on /mnt/Sources2 type ext2 (rw,loop=/dev/loop0)
> /mnt/SourcesCD on /mnt/archives/Lfs_Sources/Sources1 type none (rw,bind)
> /mnt/SourcesCD on /mnt/archives/Blfs_Sources/Sources1 type none (rw,bind)
> 
> Note the secondary mounts show in place of the device the path of the
> mounted device instead.
> 
> Here's the same from cat /proc/mounts
> 
> /dev/sr0 /mnt/SourcesCD ext2 ro 0 0
> /dev/loop0 /mnt/Sources2 ext2 rw 0 0
> /dev/sr0 /mnt/archives/Lfs_Sources/Sources1 ext2 ro 0 0
> /dev/sr0 /mnt/archives/Blfs_Sources/Sources1 ext2 ro 0 0
> 
> Note that here it does show the base device for the secondary mounts.

checked al this, not mounted.

> Ugh! This puts a crimp on what I was hoping might be the problem. The
> further we go, the less hope I have that I can ask the right questions
> to help locate the problem. This because of the "remote" nature, I can't
> get the normal "visual" clues that spark a thought, etc.

I will try the approach that I have described above and try to find out if
the mounting with the gentoo CD (it has a 2.4.21 kernel btw) writes
something bad somewhere. And believe me I would not have had this idea
without your questions and advice so far!

> Right file system type specified everywhere? Ext2 or reiserfs?

yup, pretty sure
 
> I hope that's it. Do you have enough of an LFS system to test it?
See above. Actually my LFS systems are my firewall and mailserver, so I
really do not want to test anything on those machines...

> That's making me *guess* that the FS is still mounted somewhere because
> I think an in-core copy of the super-block (and other meta-data) is
> being used, based on your earlier description that it goes away when you
> reboot.

So dd and badblocks get their info from the super-block or from this other
meta-data? Is there someway to dump this meta-data when I don't have the
problem and compare it with the case when I do have the problem?

> This is especially pernicious if, as you suspect, it is something wrong
> in the OS that is being run.

<snip>

> Sounds more and more like something flaky in the OS if it's not related
> to dupe mounts.

I hope too find out soon (see above)

> 
> Have you checked to see that all the cables are in good condition and
> *well* seated? I know sometimes intermittent hardware problems (like a
> cable *almost* making solid contact) can cause erratic results.
> Temperature changes of just a few degrees can cause it to appear and
> disappear.
> Shoot. I beginning to feel like I've been no help. I would first confirm
> the cabling (data and power) are all good. Look for any nicks in the
> data cables too (a break can be almost invisible).
> 
> If you can carry the drive to a known-good hardware and OS and test
> there, that would be a big plus.

Although I do not suspect hardware problems anymore, based on the above I
will check all this and let you know. And again, your help has been most
welcome!
 
> BTW, what's the size of the drive? Any chance (part of) hda9 is beyond
> some upper limit of the OS or bios or something (I don't think BIOS
> normally affects this, but...).
work drive is about 30Gb and at home is about 40Gb. Both are maxtor btw...

Ok, again many thanks, will report back to you later!

Marnix



More information about the lfs-support mailing list