DragonFly kernel List (threaded) for 2010-02
[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index][Thread Index]

kernel work week of 3-Feb-2010 HEADS UP (interleaved swap test)

From:	Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxx>
Date:	Sun, 7 Feb 2010 22:36:20 -0800 (PST)

    The latest commit adds write clustering and some additional features
    to the swapcache, plus a manual page: swapcache(8).

    The write clustering significantly reduces the IOPS rate for writes to
    the swapcache and appears to improve the SSDs performance (presumably
    it has an earlier time write-combining and erasing).

    --

    Just for the hell of it I set up two 40G Intel SSDs as 2x interleaved
    swap and ran a non-parallel linear file read test on 10G worth of files
    after they got cached by swapcache.  I was able to achieve around
    300MB/s.

    cat test* | dd of=/dev/null bs=32k
    10066329600 bytes transferred in 32.916703 secs (305812207 bytes/sec)
    10066329600 bytes transferred in 32.879632 secs (306157003 bytes/sec)
    10066329600 bytes transferred in 32.867684 secs (306268298 bytes/sec)
    10066329600 bytes transferred in 32.779923 secs (307088263 bytes/sec)
    10066329600 bytes transferred in 32.837278 secs (306551890 bytes/sec)

    I found it difficult to keep both SSDs maxed out.  They were typically
    between 80-100% busy.  There are some practical limitations related
    to how the cluster read-ahead works.  It does synchronous BMAP operations
    which the SSD doesn't prioritize over prior reads still in progress,
    so there are little spots of serialization that reduce performance.

    I tried splitting the file set up into two or three concurrent reads
    going at once and that did saturate the SSDs (95-100% busy), but
    overall performance actually dropped a little, down to 250MB/sec
    in aggregate.  My guess is that the SSD optimizes for linear reads,
    so the more fragmented requests from the concurrent file operations
    weren't as optimal inside the SSD.

    --

    I also did some testing of the OCZ 120G Colossus.  A single Intel 40G
    is able to do 180-200M/sec or so reading.  The OCZ doesn't do
    as well despite being advertised as having 128M of ram cache.  It's
    performance varies between 130MB/s and 220MB/s with an average of
    around 170MB/s.  A key thing to note with the OCZ is that it does
    not do NCQ while the Intel does.  I was a bit surprised, actually.
    I fully expected the OCZ to negotiate command queueing.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>

Follow-Ups:
- Re: kernel work week of 3-Feb-2010 HEADS UP (interleaved swap test)
  - From: Michael Neumann <michaelneuma@googlemail.com>

References:
- kernel work week of 3-Feb-2010 HEADS UP
  - From: Matthew Dillon <dillon@apollo.backplane.com>
- Re: kernel work week of 3-Feb-2010 HEADS UP
  - From: "Steve O'Hara-Smith" <steve@sohara.org>
- Re: kernel work week of 3-Feb-2010 HEADS UP
  - From: Matthew Dillon <dillon@apollo.backplane.com>
- Re: kernel work week of 3-Feb-2010 HEADS UP
  - From: Oliver Fromme <check+kxbjjk00rswzbq4p@fromme.com>
- Re: kernel work week of 3-Feb-2010 HEADS UP
  - From: Matthew Dillon <dillon@apollo.backplane.com>
- Re: kernel work week of 3-Feb-2010 HEADS UP
  - From: Oliver Fromme <check+kxd9f100rsnt6tav@fromme.com>
- Re: kernel work week of 3-Feb-2010 HEADS UP
  - From: Matthew Dillon <dillon@apollo.backplane.com>
- Re: kernel work week of 3-Feb-2010 HEADS UP
  - From: Matthew Dillon <dillon@apollo.backplane.com>
- Re: kernel work week of 3-Feb-2010 HEADS UP
  - From: Matthew Dillon <dillon@apollo.backplane.com>
- Re: kernel work week of 3-Feb-2010 HEADS UP
  - From: "Justin C. Sherrill" <justin@shiningsilence.com>
- Re: kernel work week of 3-Feb-2010 HEADS UP
  - From: Matthew Dillon <dillon@apollo.backplane.com>

[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index][Thread Index]