DragonFly kernel List (threaded) for 2009-07
DragonFly BSD
DragonFly kernel List (threaded) for 2009-07
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: CRC FAILED: LAYER2


From: Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxx>
Date: Tue, 7 Jul 2009 10:38:04 -0700 (PDT)

:Hi.
:I've just caught this panic this morning.  It's an Athlon 64 X2 running
:an SMP kernel built from source as of July 5th (a few commits after the
:fixes to fdfree()).  It's a 1Tbyte HAMMER-only filesystem and I accidentally
:run `hammer version-upgrade' command on it a several days ago, in case
:it matters.  The machine is turned on at 7:00 am every day, runs cvsync
:to retrieve a few CVS repositories, and convert them to git repos.
:According to today's log something went wrong during the second invocation
:of cvsync.  The script then proceeded to converting the first CVS repository,
:when it caught this panic.  The kernel and vmcore are uploaded at my leaf
:account as ~y0netan1/crash/{kernel,vmcore}.4 .
:
:Thanks in advance.

    The crash dump looks a bit odd.  There seems to be some corruption
    in the layer2 structure.  The unused01/02 fields should both be 0.

(kgdb) print layer2[-4]
$14 = {
  zone = 9 '\t', 
  unused01 = 0 '\0', 
  unused02 = 255, 		<------ should not be 255!
  append_off = 8388608, 
  bytes_free = 1050240, 
  entry_crc = 1541026046
}
(kgdb) 

    This doesn't look like the AHCI bug that I fixed.  It kinda looks
    like memory corruption but I see unused02 set to 255 in several
    layer2 entries.

    Take the filesystem offline if you can and do:

	hammer -f <device> blockmap

    And look for a 'B' in column 1 (indicating a bad CRC).

    I also noticed these in the dmesg for that kernel core:

	pid 880 (sshd), uid 0: exited on signal 11
	pid 881 (sshd), uid 0: exited on signal 6

    signal 6 tends to occur when a machine has memory issues.  It could
    be related.

						-Matt




[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]