DragonFly bugs List (threaded) for 2008-07
DragonFly BSD
DragonFly bugs List (threaded) for 2008-07
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: hammer_alloc_data panic


From: Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxx>
Date: Tue, 15 Jul 2008 18:21:58 -0700 (PDT)

:I do not intend to sound discouraging; I'm just worried that the cries of
:those who have hit the reblocking issues and/or some stray bugs are going
:to cover the positive feedback.
:
:Aggelos

    Actually I think we are doing very well, though I can see why
    you might be a little rattled looking at it from the outside.  I
    apologize for that, and I will try to explain what is going on the
    alleviate any concerns.  In fact, I am going to go into great detail,
    this is as much a philosophical document as it is an explanation :-)
    It is virtually the only form of development possible for a one-man
    project, or even a two or three-man project.

    Nearly all the bug flow is due to the continued work being pushed into
    the filesystem.  That work essentially ended last weekend with the
    last major mirroring infrastructure commit.

    Virtually none of the bug flow is related to the older HAMMER code
    pertaining to basic filesystem operation.  For example, the UNDO
    crash recovery and filesystem corruption bugs stopped occuring almost
    a month ago.  Basic filesystem operations... read, write, open, close,
    readdir, chmod, etc.. have been stable for well over 2 months. 
    Historical lookups and snapshots have been stable for over 3 months.
    I purposefully destabilized truncation for a few days last week,
    and I purposefully destabilized the deadlock handling for a few days
    last weekend, all in order to the mirroring code operational (and in
    the case of truncation to fix UNDO FIFO issues related to the
    limited UNDO space in small HAMMER filesystems).

    When I said 2 weeks ago that I wasn't sure I would be able to get the
    mirroring and PFS code in, this is what I was talking about.  It isn't
    just coding and committing, it is also getting the basic testing done,
    the utility support done, and fixing the bugs introduced when surgery
    is required on other parts of the filesystem to support the new feature.

    What do I mean by purposeful destabilization?  Let me give you another
    example.  Taking the mirroring code again.  In order to propagate a
    transaction id up the B-Tree to support incremental mirroring I couldn't
    abort half way through with an EDEADLK and have the high level code
    retry, because the governing insertion or deletion had already occured.
    So what I did was implement the propagation *without* deadlock handling,
    got it working, then worked through the deadlocks (the 'purposeful
    destabilization') that I had created.  I knew I was introducing some
    deadlock issues when I did that, but it was still the fastest way to
    get it implemented.

    So what you are seeing is not really new crops of unexpected bugs, but
    instead mostly expected bugs whos flow is carefully managed so they
    will be fixed by the release, and a few I left on the backburner
    (mostly related to filesystem-full issues but also a few related to the
    handling of I/O errors), because I knew I could fix them in a day or two.
    80% of the bug flow is from purposefully destabilization, and about 20%
    is in the 'unexpected bug' category.

    HAMMER is a really complex project, and the complexity is somewhat
    of a moving target because all the myrid theory does not always fit
    together seemlessly.  It is not possible to implement each subsystem
    independant of the other subsystems to the point where it is perfect.
    Invariably working on a later subsystem requires going back to the
    earlier ones and making (sometimes major) changes to the algorithms,
    with massive debugging inbetween each major piece of subsystem work so
    the bugs would not create geometrically complex (and hard to debug)
    failures. 

    The constant flow of bugs is the intended outcome for this sort of
    development style.  It is the ONLY single-person development style
    that has even a half chance of working for a complex project,
    something I have learned through the years with various large projects
    such as Diablo, various embedded OSs, DICE (The C compiler I wrote for
    the Amiga many years ago), numerous other projects, and now HAMMER.

    In anycase this week is crunch time for the remaining bugs and I'm still
    on schedule!  I'm quite happy that I get to dedicate this week just to
    fixing bugs, and won't be introducing any new algorithms to start the
    endless bug cycle going again :-).  Even I was feeling a bit flustered
    last week, trying to squeeze that massive, massive mirroring
    implementation in.  That was literally a 100-hour work week for me.
    I was stressing out big-time last week.  This week is smooth sailing.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>



[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]