DragonFly BSD
DragonFly commits List (threaded) for 2011-11
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

git: kernel - Greatly improve shared memory fault rate concurrency / shared tokens


From: Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 15 Nov 2011 01:33:47 -0800 (PST)

commit 54341a3b445fade1bbc473141893a7e06c06ccb5
Author: Matthew Dillon <dillon@apollo.backplane.com>
Date:   Tue Nov 15 01:02:24 2011 -0800

    kernel - Greatly improve shared memory fault rate concurrency / shared tokens
    
    This commit rolls up a lot of work to improve postgres database operations
    and the system in general.  With this changes we can pgbench -j 8 -c 40 on
    our 48-core opteron monster at 140000+ tps, and the shm vm_fault rate
    hits 3.1M pps.
    
    * Implement shared tokens.  They work as advertised, with some cavets.
    
      It is acceptable to acquire a shared token while you already hold the same
      token exclusively, but you will deadlock if you acquire an exclusive token
      while you hold the same token shared.
    
      Currently exclusive tokens are not given priority over shared tokens so
      starvation is possible under certain circumstances.
    
    * Create a critical code path in vm_fault() using the new shared token
      feature to quickly fault-in pages which already exist in the VM cache.
      pmap_object_init_pt() also uses the new feature.
    
      This increases fault-in concurrency by a ridiculously huge amount,
      particularly on SHM segments (say when you have a large number of postgres
      clients).  Scaling for large numbers of clients on large numbers of
      cores is significantly improved.
    
      This also increases fault-in concurrency for MAP_SHARED file maps.
    
    * Expand the breadn() and cluster_read() APIs.  Implement breadnx() and
      cluster_readx() which allows a getblk()'d bp to be passed.  If *bpp is not
      NULL a bp is being passed in, otherwise the routines call getblk().
    
    * Modify the HAMMER read path to use the new API.  Instead of calling
      getcacheblk() HAMMER now calls getblk() and checks the B_CACHE flag.
      This gives getblk() a chance to regenerate a fully cached buffer from
      VM backing store without having to acquire any hammer-related locks,
      resulting in even faster operation.
    
    * If kern.ipc.shm_use_phys is set to 2 the VM pages will be pre-allocated.
      This can take quite a while for a large map and also lock the machine
      up for a few seconds.  Defaults to off.
    
    * Reorder the smp_invltlb()/cpu_invltlb() combos in a few places, running
      cpu_invltlb() last.
    
    * An invalidation interlock might be needed in pmap_enter() under certain
      circumstances, enable the code for now.
    
    * vm_object_backing_scan_callback() was failing to properly check the
      validity of a vm_object after acquiring its token.  Add the required
      check + some debugging.
    
    * Make vm_object_set_writeable_dirty() a bit more cache friendly.
    
    * The vmstats sysctl was scanning every process's vm_map (requiring a
      vm_map read lock to do so), which can stall for long periods of time
      when the system is paging heavily.  Change the mechanic to a LWP flag
      which can be tested with minimal locking.
    
    * Have the phys_pager mark the page as dirty too, to make sure nothing
      tries to free it.
    
    * Remove the spinlock in pmap_prefault_ok(), since we do not delete page
      table pages it shouldn't be needed.
    
    * Add a required cpu_ccfence() in pmap_inval.c.  The code generated prior
      to this fix was still correct, and this makes sure it stays that way.
    
    * Replace several manual wiring cases with calls to vm_page_wire().

Summary of changes:
 sys/gnu/vfs/ext2fs/ext2_alloc.c        |    2 +
 sys/gnu/vfs/ext2fs/ext2_balloc.c       |    2 +
 sys/gnu/vfs/ext2fs/ext2_inode.c        |    2 +
 sys/gnu/vfs/ext2fs/ext2_linux_balloc.c |    3 +-
 sys/gnu/vfs/ext2fs/ext2_linux_ialloc.c |    1 +
 sys/gnu/vfs/ext2fs/ext2_subr.c         |    2 +
 sys/kern/lwkt_thread.c                 |   65 +---
 sys/kern/lwkt_token.c                  |  726 +++++++++++++-------------------
 sys/kern/sysv_shm.c                    |   36 ++-
 sys/kern/usched_bsd4.c                 |    2 -
 sys/kern/vfs_bio.c                     |   34 +-
 sys/kern/vfs_cluster.c                 |    8 +-
 sys/platform/pc32/i386/pmap.c          |    6 +-
 sys/platform/pc64/x86_64/pmap.c        |   18 +-
 sys/platform/pc64/x86_64/pmap_inval.c  |    7 +-
 sys/platform/vkernel/platform/pmap.c   |    9 +-
 sys/platform/vkernel64/platform/pmap.c |    9 +-
 sys/sys/buf.h                          |    6 +-
 sys/sys/buf2.h                         |   26 ++
 sys/sys/globaldata.h                   |    5 +-
 sys/sys/proc.h                         |    1 +
 sys/sys/thread.h                       |   29 +-
 sys/sys/thread2.h                      |    5 +-
 sys/vfs/hammer/hammer_io.c             |    1 +
 sys/vfs/hammer/hammer_ondisk.c         |    1 +
 sys/vfs/hammer/hammer_vnops.c          |   21 +-
 sys/vfs/hammer/hammer_volume.c         |    2 +
 sys/vfs/hpfs/hpfs_alsubr.c             |    2 +
 sys/vfs/hpfs/hpfs_subr.c               |    2 +
 sys/vfs/hpfs/hpfs_vfsops.c             |    1 +
 sys/vfs/hpfs/hpfs_vnops.c              |    1 +
 sys/vfs/isofs/cd9660/cd9660_lookup.c   |    2 +
 sys/vfs/isofs/cd9660/cd9660_rrip.c     |    2 +
 sys/vfs/isofs/cd9660/cd9660_vfsops.c   |    2 +
 sys/vfs/isofs/cd9660/cd9660_vnops.c    |    2 +
 sys/vfs/msdosfs/msdosfs_denode.c       |    2 +
 sys/vfs/msdosfs/msdosfs_fat.c          |    2 +
 sys/vfs/msdosfs/msdosfs_lookup.c       |    2 +
 sys/vfs/msdosfs/msdosfs_vfsops.c       |    2 +
 sys/vfs/ntfs/ntfs_subr.c               |    4 +-
 sys/vfs/ntfs/ntfs_vfsops.c             |    2 +
 sys/vfs/ntfs/ntfs_vnops.c              |    2 +
 sys/vfs/tmpfs/tmpfs_vnops.c            |    2 +
 sys/vfs/udf/udf_vfsops.c               |    2 +
 sys/vfs/ufs/ffs_alloc.c                |    3 +-
 sys/vfs/ufs/ffs_balloc.c               |    2 +
 sys/vfs/ufs/ffs_inode.c                |    1 +
 sys/vfs/ufs/ffs_subr.c                 |    2 +
 sys/vfs/ufs/ffs_vfsops.c               |    2 +
 sys/vfs/userfs/userfs_vnops.c          |    3 +
 sys/vm/phys_pager.c                    |    2 +-
 sys/vm/vm_fault.c                      |  344 +++++++++++++---
 sys/vm/vm_kern.c                       |    1 +
 sys/vm/vm_map.h                        |    3 +-
 sys/vm/vm_meter.c                      |   32 +--
 sys/vm/vm_object.c                     |   73 +++-
 sys/vm/vm_object.h                     |    7 +-
 sys/vm/vm_page.c                       |   70 +++-
 sys/vm/vm_page.h                       |   18 +-
 59 files changed, 961 insertions(+), 665 deletions(-)

http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/54341a3b445fade1bbc473141893a7e06c06ccb5


-- 
DragonFly BSD source repository



[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]