DragonFly BSD
DragonFly users List (threaded) for 2006-08
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: Postfix suddenly stopped working


From: Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxx>
Date: Sat, 12 Aug 2006 16:58:32 -0700 (PDT)

:I ran with -HEAD built from a couple of weeks ago and did not see a
:reoccurrance of the postfix queue "sticking".  Last night, I went back
:to 
:
:DragonFly woodstock.nethamilton.net 1.7.0-PREVIEW DragonFly 1.7.0-PREVIEW #6: Sat Aug 12 12:07:04 CDT 2006     hamilton@xxxxxxxxxxxxxxxxxxxxxxxxx:/usr/obj/usr/src/sys/WOODSTOCK  i386
:
:with a build/installkernel and installworld.
:
:To my surprise, the symptom popped back up this morning.  I checked the source,
:and found that the patch above hadn't been applied.  I applied the patch and 
:rebuilt and installed the kernel, and the queue got stuck again this 
:afternoon.

    There are two commits and I'm not sure whether you applied both of them.
    kern/kern_lockf.c 1.32 and 1.33 both need to be applied.

    PREVIEW isn't HEAD.  I will slip the PREVIEW tag for those two commits
    right now.  If you scrap your manual patch and resync with preview you
    should get both patches.

:I ran vnodeinfo as above, and after ripping out the non-locked stuff from
:the output the results are at http://www.nethamilton.net/lock_debug/stuck1.txt
:(which is pre-patch) and http://www.nethamilton.net/lock_debug/stuck2.txt 
:(post-patch).  I'm not sure what this is trying to tell me aside from 
:confirming that postfix is holding a lock on unix.local.  
:
:A couple of questions:
:1) is this a different problem, since it's occurring even after I applied 
:   the patch?
:2) what can I do to diagnose further?  
:
:I'm happy to fiddle around to gather info on this, but need a little 
:hand holding in terms of exactly what to do.  
:
:-- 
:
:   Jon Hamilton 
:   hamilton@xxxxxxxxx

    From the information you posted I'm guessing that a lock did not get
    released, which is symptom of the 1.32 commit (the patch I emailed you
    was the 1.33 commit, but PREVIEW did not have 1.32 OR 1.33).

    I have included the diff between 1.31 and 1.33 of kern_lockf.c below
    for reference but if you update to the latest preview you should
    get the patches automatically.
    
					-Matt
					Matthew Dillon 
					<dillon@xxxxxxxxxxxxx>

Index: kern_lockf.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_lockf.c,v
retrieving revision 1.31
retrieving revision 1.33
diff -u -r1.31 -r1.33
--- kern_lockf.c	27 May 2006 02:03:17 -0000	1.31
+++ kern_lockf.c	3 Aug 2006 16:06:15 -0000	1.33
@@ -38,7 +38,7 @@
  *
  *	@(#)ufs_lockf.c	8.3 (Berkeley) 1/6/94
  * $FreeBSD: src/sys/kern/kern_lockf.c,v 1.25 1999/11/16 16:28:56 phk Exp $
- * $DragonFly: src/sys/kern/kern_lockf.c,v 1.31 2006/05/27 02:03:17 dillon Exp $
+ * $DragonFly: src/sys/kern/kern_lockf.c,v 1.33 2006/08/03 16:06:15 dillon Exp $
  */
 
 #include <sys/param.h>
@@ -239,8 +239,15 @@
 
 	switch(ap->a_op) {
 	case F_SETLK:
-		ap->a_vp->v_flag |= VMAYHAVELOCKS;
+		/*
+		 * NOTE: It is possible for both lf_range and lf_blocked to
+		 * be empty if we block and get woken up, but another process
+		 * then gets in and issues an unlock.  So VMAYHAVELOCKS must
+		 * be set after the lf_setlock() operation completes rather
+		 * then before.
+		 */
 		error = lf_setlock(lock, owner, type, flags, start, end);
+		ap->a_vp->v_flag |= VMAYHAVELOCKS;
 		break;
 
 	case F_UNLCK:
@@ -683,7 +690,7 @@
 			 * Extend brange to cover range and scrap range.
 			 */
 			brange->lf_end = range->lf_end;
-			brange->lf_flags |= brange->lf_flags & F_NOEND;
+			brange->lf_flags |= range->lf_flags & F_NOEND;
 			TAILQ_REMOVE(&lock->lf_range, range, lf_link);
 			if (range->lf_flags & F_POSIX)
 				--count;
@@ -753,20 +760,23 @@
 }
 
 /*
- * Wakeup pending lock attempts.
+ * Wakeup pending lock attempts.  Theoretically we can stop as soon as
+ * we encounter an exclusive request that covers the whole range (at least
+ * insofar as the sleep code above calls lf_wakeup() if it would otherwise
+ * exit instead of loop), but for now just wakeup all overlapping
+ * requests.  XXX
  */
 static void
 lf_wakeup(struct lockf *lock, off_t start, off_t end)
 {
 	struct lockf_range *range, *nrange;
+
 	TAILQ_FOREACH_MUTABLE(range, &lock->lf_blocked, lf_link, nrange) {
 		if (lf_overlap(range, start, end) == 0)
 			continue;
 		TAILQ_REMOVE(&lock->lf_blocked, range, lf_link);
 		range->lf_flags = 1;
 		wakeup(range);
-		if (range->lf_start >= start && range->lf_end <= end)
-			break;
 	}
 }
 



[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]