DragonFly bugs List (threaded) for 2011-04
[
Date Prev][
Date Next]
[
Thread Prev][
Thread Next]
[
Date Index][
Thread Index]
Re: panic: assertion: p->p_lock == 0 in kern_wait
On Thu, Feb 24, 2011 at 09:29:16AM +0900, YONETANI Tomokazu wrote:
> On Mon, Feb 21, 2011 at 01:41:26PM +0900, YONETANI Tomokazu wrote:
> > Apparently a few hours of pbulk test can trigger this panic (this is
> > on x86_64 and the kernel is built from source as of 5347900e6).
> > As opposed to what KKASSERT claims, p->p_lock doesn't hold the non-zero
> > value in the frame 5:
> >
> :
> > (kgdb) fr 5
> > #5 0xffffffff802989a1 in kern_wait (pid=<value optimized out>,
> > status=0xffffffe05e997a74, options=1528637672, rusage=0x0,
> > res=0xffffffe05e997b58) at /usr/src/sys/kern/kern_exit.c:901
> > 901 KKASSERT(p->p_lock == 0);
> > (kgdb) p p->p_lock
> > $1 = 0
>
> After poking here and the, I think this KKASSERT() can simply go away
> as proc_remove_zombie() will wait for p->p_lock to drop to zero anyway.
>
> /*
> * Unlink the proc from its process group so that
> * the following operations won't lead to an
> * inconsistent state for processes running down
> * the zombie list.
> */
> KKASSERT(p->p_lock == 0);
> proc_remove_zombie(p);
> leavepgrp(p);
The following is what I have in my tree. What it does is to hold proc_token
while waiting for p->p_lock to drop, just as done in proc_remove_zombie().
If I don't hold the proc_token as in the first chunk, I see the
proc_remove_zombie: waiting for %p->p_lock to drop
message a few times every hour on the console. I guess it may also
imply that the race is between a code which holds proc_token for
a long time but not p->p_token, like all*_scan().
diff --git a/sys/kern/kern_exit.c b/sys/kern/kern_exit.c
index 1e5a110..30fc0af 100644
--- a/sys/kern/kern_exit.c
+++ b/sys/kern/kern_exit.c
@@ -867,8 +867,10 @@ loop:
* put a hold on the process for short periods of
* time.
*/
+ lwkt_gettoken(&proc_token);
while (p->p_lock)
tsleep(p, 0, "reap3", hz);
+ lwkt_reltoken(&proc_token);
/* Take care of our return values. */
*res = p->p_pid;
@@ -898,7 +900,6 @@ loop:
* inconsistent state for processes running down
* the zombie list.
*/
- KKASSERT(p->p_lock == 0);
proc_remove_zombie(p);
leavepgrp(p);
diff --git a/sys/kern/kern_proc.c b/sys/kern/kern_proc.c
index cb067f0..048a619 100644
--- a/sys/kern/kern_proc.c
+++ b/sys/kern/kern_proc.c
@@ -661,6 +661,7 @@ proc_remove_zombie(struct proc *p)
{
lwkt_gettoken(&proc_token);
while (p->p_lock) {
+ kprintf("%s: waiting for %p->p_lock to drop\n", __func__, p);
tsleep(p, 0, "reap1", hz / 10);
}
LIST_REMOVE(p, p_list); /* off zombproc */
--
1.7.3.2
[
Date Prev][
Date Next]
[
Thread Prev][
Thread Next]
[
Date Index][
Thread Index]