DragonFly kernel List (threaded) for 2009-07
DragonFly BSD
DragonFly kernel List (threaded) for 2009-07
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: Re: CRC FAILED: LAYER2


From: YONETANI Tomokazu <qhwt+dfly@xxxxxxxxxx>
Date: Wed, 8 Jul 2009 08:58:13 +0900

On Tue, Jul 07, 2009 at 10:38:04AM -0700, Matthew Dillon wrote:
> 
> :Hi.
> :I've just caught this panic this morning.  It's an Athlon 64 X2 running
> :an SMP kernel built from source as of July 5th (a few commits after the
> :fixes to fdfree()).  It's a 1Tbyte HAMMER-only filesystem and I accidentally
> :run `hammer version-upgrade' command on it a several days ago, in case
> :it matters.  The machine is turned on at 7:00 am every day, runs cvsync
> :to retrieve a few CVS repositories, and convert them to git repos.
> :According to today's log something went wrong during the second invocation
> :of cvsync.  The script then proceeded to converting the first CVS repository,
> :when it caught this panic.  The kernel and vmcore are uploaded at my leaf
> :account as ~y0netan1/crash/{kernel,vmcore}.4 .
> :
> :Thanks in advance.
> 
>     The crash dump looks a bit odd.  There seems to be some corruption
>     in the layer2 structure.  The unused01/02 fields should both be 0.
> 
> (kgdb) print layer2[-4]
> $14 = {
>   zone = 9 '\t', 
>   unused01 = 0 '\0', 
>   unused02 = 255, 		<------ should not be 255!
>   append_off = 8388608, 
>   bytes_free = 1050240, 
>   entry_crc = 1541026046
> }
> (kgdb) 
> 
>     This doesn't look like the AHCI bug that I fixed.  It kinda looks
>     like memory corruption but I see unused02 set to 255 in several
>     layer2 entries.

The disk is on nata driver, ahci driver is just loaded and unused.

>     Take the filesystem offline if you can and do:
> 
> 	hammer -f <device> blockmap
> 
>     And look for a 'B' in column 1 (indicating a bad CRC).

Running this *online* couldn't find any 'B', so I guess I have to
boot it with a USB memory stick and try it again?

>     I also noticed these in the dmesg for that kernel core:
> 
> 	pid 880 (sshd), uid 0: exited on signal 11
> 	pid 881 (sshd), uid 0: exited on signal 6
> 
>     signal 6 tends to occur when a machine has memory issues.  It could
>     be related.
> 
> 						-Matt

I'll also try memtest after that.  Thanks for the hint.



[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]