DragonFly bugs List (threaded) for 2011-04
[
Date Prev][
Date Next]
[
Thread Prev][
Thread Next]
[
Date Index][
Thread Index]
Re: panic: assertion: p->p_lock == 0 in kern_wait
On Thu, Apr 21, 2011 at 11:36:10AM -0700, Matthew Dillon wrote:
> :> After poking here and the, I think this KKASSERT() can simply go away
> :> as proc_remove_zombie() will wait for p->p_lock to drop to zero anyway.
> :>
> :...
> :The following is what I have in my tree. What it does is to hold proc_token
> :while waiting for p->p_lock to drop, just as done in proc_remove_zombie().
> :If I don't hold the proc_token as in the first chunk, I see the
> :
> : proc_remove_zombie: waiting for %p->p_lock to drop
> :
> :message a few times every hour on the console. I guess it may also
> :imply that the race is between a code which holds proc_token for
> :a long time but not p->p_token, like all*_scan().
>
> It looks good, I would make two changes. One would be to shortcut
> the case where p->p_lock is already 0 to avoid unnecessary contention
> with proc_token in the critical exit path.
>
> if (p->p_lock) {
> lwkt_gettoken(&proc_token);
> while (p->p_lock)
> tsleep(p, 0, "reap3", hz);
> lwkt_reltoken(&proc_token);
> }
The problem here is that p->p_lock can become non-zero shortly after
being observed as 0 on this reap3 loop (which originally caused the panic
on KKASSERT), so I think this shortcut logic almost defeats that purpose.
However, even without this shortcut logic, the added kprintf() still very
occasionally shows the diagnostic message.
> :...
> :@@ -661,6 +661,7 @@ proc_remove_zombie(struct proc *p)
> : {
> : lwkt_gettoken(&proc_token);
> : while (p->p_lock) {
> :+ kprintf("%s: waiting for %p->p_lock to drop\n", __func__, p);
> : tsleep(p, 0, "reap1", hz / 10);
> : }
> : LIST_REMOVE(p, p_list); /* off zombproc */
> :--
> :1.7.3.2
>
> This one may unnecessarily spam the kprintf since the wait is 1/10
> of a second. Maybe conditionalize it with a variable so it only issues
> one kprintf().
I'll put it under bootverbose.
> With regards to getting rid of the timeout in the tsleep and using a
> proactive wakeup(), we have to avoid calling wakeup() for 1->0
> transitions unless someone is known to be waiting on p_lock. This
> can be implementing by adding a WAITING flag to the field and using
> atomic_cmpset_int() to handle the (WAITING | 1) -> (0) transition and
> then calling wakeup() if WAITING was set.
>
> I will augment the sys/refcount.h API and add refcount_wait() and
> refcount_release_wakeup() which encapsulate the appropriate atomic
> ops. I will leave it up to you if you want to then use the new API
> functions for PHOLD/PRELE, which would give the tsleep case a
> proactive wakeup() instead of having to wait for it to timeout.
So what I need to do is to change PHOLD/PRELE to use refcount_acquire/
refcount_release_wakeup and replace p->p_lock loop with
refcount_release_wakeup? I'll give it a try.
Best Regards,
YONETANI Tomokazu.
[
Date Prev][
Date Next]
[
Thread Prev][
Thread Next]
[
Date Index][
Thread Index]