DragonFly kernel List (threaded) for 2003-12
[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index][Thread Index]
Re: TrustedBSD...

From:	Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxx>
Date:	Tue, 9 Dec 2003 13:13:42 -0800 (PST)
:I agree and disagree both in principle and practice :-).  Actually, I
:think you'll find that if you dig a little deeper, the placement of MAC
:Framework entry points is done exactly on the philosophy you describe.  In
:order to prevent race conditions, you have to perform access control
:checks on the actual objects, not the names provided in system calls.  We
:place our checks at the front ends of various subsystems: i.e., the top
:layer of VFS, the top layer of the process signalling pieces, etc.  This
:is the point where the name has been resolved to the object, and the
:correct locks are held to make sure you can perform a consistent check. In
:a traditional UNIX kernel, you cannot do this safely at the system call
:layer using wrappers, because that involves multiple lookups, which can be
:raced (time of check, time of use).
:
:However, if you have a compartmentalized kernel (i.e., microkernel) with
:message passing between subsystems, subsystems can perform the checks at
:the point where the message enters, which might accomplish what both you
:want, and what I want architecturally :-).  However, that relies on
:cleaning up object naming, perhaps in the style of Mach
:ports/capabilities, so that the names used in messages are "authoritative"
:and safe to control with. 
:
:The problem with system call level wrappers is pretty hard to fix,
:however, and sometimes, it can cause more security vulnerabilities than it
:solves.  Take, for example, systrace's interception and replacement of
:path names.  There are actually two race conditions here: first, a dual
:copyin, which can be raced by threaded processes and shared memory -- this
:is fixed through proper encapsulation of system call arguments.  Second, a
:semantic race in the implementation of the file system code, which is a
:lot harder to solve.  Systrace's lookup occurs before the kernel has
:resolved the file name from the string passed by the process, so the
:lookup actually occurs twice:  once in the wrapper for the control, and
:once when the actual system call does the work.  Neils has explored using
:a "look aside buffer"  to cache system call arguments to address the first
:problem, but I think the "Separation" in DragonFly will solve this much
:more cleanly.  The second can't be avoided unless the name used for the
:test acts more like a capability (or, you combine the checks with the same
:locked referenced used in the file system code, as in FreeBSD). 
:
:Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
:robert@xxxxxxxxxxxxxxxxx      Senior Research Scientist, McAfee Research

    Yah.  I wasn't planning on actually porting systrace, just porting
    the concepts into the message-passing framework as we implement it.
    We will definitely be leveraging off othe kern_*() syscall separation
    work!

    The sequence of events will be:

    * Implement syscall message encapsulation through a 'syscall emulation
      library' which is unconditionally mapped by the kernel into user memory.
      i.e. think of it as a syscall.so userland library which implements all
      the syscalls instead of libc implementing the syscalls.

    * Revector int 0x80 to enter into this library via userland (SEL_UPL)
      to handle legacy int 0x80 syscalls.  This effectively 'turns off'
      direct kernel access via int 0x80.

    * Make the native libc directly aware of the kernel-mapped syscall
      library (i.e. have it call into syscall.so directly instead of issuing
      int 0x80)'s.

    * Move all emulation code (linux, sysv, etc...) out of the kernel and
      into userland via syscall.so.  The kernel will select the correct
      syscall*.so library to map into the user process when exec()ing.
      This removes all potential security issues from the sysv and linux
      emulation code.

    * Implement a kernel-loadable security 'filter' on syscall messages
      that intercepts the syscall message.  Actually, make it a layering
      of filters.

      * per-user
      * per-group
      * per-jail
      * per-process (child inherited)

    I see two ways to implement the filter mechanism:

    (1) The filter would have to implement the copyin/copyout layer and then
        call the syscall meat layer.  Any filtered syscalls that don't 
	actually have to examine user-supplied data could simply call the
	main syscall entry point after acceptance, so it would not be too
	messy.  This is easier to do then #(2) but makes the layering of
	multiple syscall filters difficult.

    (2) Do all copyins necessary for filter operations (basically anything
	that passes a path) prior to executing the first filter.  Then the
	filters need only deal with the data.  Harder but probably the more
	effective solution.

					-Matt
					Matthew Dillon 
					<dillon@xxxxxxxxxxxxx>
Follow-Ups:
- Re: TrustedBSD...
  - From: Robert Watson
References:
- Re: TrustedBSD...
  - From: Robert Watson
[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index][Thread Index]