DragonFly kernel List (threaded) for 2004-01
[
Date Prev][
Date Next]
[
Thread Prev][
Thread Next]
[
Date Index][
Thread Index]
Re: Background fsck
:On Mon, 19 Jan 2004 08:45:30 -0800 (PST)
:Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxx> wrote:
:
:> I really dislike the concept of a background fsck. I don't trust
:> it.
:
:On Mon, 19 Jan 2004 09:28:30 -0800 (PST)
:Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxx> wrote:
:
:> I really dislike a R/W mount on anything dirty.
:
:Matt, I can appreciate that you feel a certain way, but, but, but,
:you're not saying *why* and it's driving me bonkers. :)
:
:-Chris
The problem is that while it is possible to make softupdates
algorithmically robust in regards to recovery, softupdates itself
is such a complex beast that bugs have and will still be found which
create unexpected corruption on the disk during a failure.
In addition, it is very difficult to tell when and in what order
data actually winds up on a disk, even with the dependancy information
available. For example, lets say you are doing an atomic 64KB
write transaction to a hard drive and a power failure occurs right smack
in the middle of the transaction. You might think that you would be able
to assume that a sequential portion of your 64KB block might have made
it to the disk, but in reality it is possible for *RANDOM* portions
of that 64KB block to wind up on the disk because:
* Most modern HD's do whole-track writes now, and may start writing
the track at any relative sector. So it could start writing the
track in the middle of the 64K block, getting the last half of it
to disk before getting the first of it to disk.
* If part of your block happens to reside in a spare sector (all
modern disks keep a number of spare sectors on each physical track
to handle media errors), then the actual update on the disk could
be entirely reandom.
* Modern HD's number sectors either backwards or forwards on the
actual media. Most do it backwards now, so you can't make
assumptions in regards to which portion of your larger write
might have gotten to the disk before other portions of your
larger write.
So what does this all mean? This means that if a power failure occurs
write smack in the middle of a disk I/O, all of softupdate's careful
block ordering could wind up for naught, which means that unexpected
corruption can creep in no matter what you do.
At least with a journal you can (A) replay the log starting much farther
back then you really need to start in order to add slop and (B) you
can serialize (add serial numbers to) individual log blocks to detect
hardware-related out-of-order failure conditions on reboot. If you are
missing part of your log that's where you stop the replay. And, (C),
since metea-data log flushes do not have to occur at a high rate you
can afford to force the HD to physically flush its caches between the
journal meta-data write and the random meta-data writes.
-Matt
Matthew Dillon
<dillon@xxxxxxxxxxxxx>
[
Date Prev][
Date Next]
[
Thread Prev][
Thread Next]
[
Date Index][
Thread Index]