DragonFly BSD
DragonFly kernel List (threaded) for 2003-09
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: SLAB allocator now the default.


From: sander <sander@xxxxxxxxxxxxxxxxxxx>
Date: Mon, 29 Sep 2003 02:46:20 +0300 (EEST)

On Sun, 28 Sep 2003, Matthew Dillon wrote:

>
> :>    to allocate whole pages.  The slab allocator does this for power-of-2
> :>    sized requests beyond PAGE_SIZE but does NOT page-align oddly sized
> :>    requests (like a 6K request) beyond PAGE_SIZE, at least until the requests
> :>    get large (greater then 16K).
> :>
> :>    So keeping the power-of-2-allocation-is-power-of-2-aligned characteristic
> :>    is reasonable for power-of-2-sized requests.
> :
> :structures smaller than say 128 bytes should be rounded up to the next larger
> :2^n size though.
> :
> :--
> :	Sander
>
>     Well, I don't think you can point to any one thing and say that it
>     will magically solve all the problems.  It takes an integrated approach
>     to make things operate smoothly.
>

There never is a one "solve it all" solution, except in some trivial
cases.

>     For example, there is a rather severe memory and cache efficiency
>     tradeoff here that cannot be ignored.  If one is allocating 32 byte
>     structures and wasting 128 bytes of memory on each one the result is
>     that 80% of your memory accesses wind up using only 20% of your available
>     L2 cache, which makes your cache only 1/3 as effective as it would be
>     if you had compacted the allocations to spread them over the entire L2
>     cache evenly.

Yes, that would be rather wasteful - you need differnt strategies for
small and non-small allocatios

>
>     In DragonFly we do several things, and taken together they form a far
>     more effective solution:
>
>     (1) Our slab allocator is per-cpu.
>
>     (2) Because it is per-cpu our slab allocator can make compact
> 	allocations without severe cache line contention.
>
>     (3) We forward modifications to structures to the cpu owning the
> 	structure (the structure that was also allocated on that cpu,
> 	typically), to reduce modifying cache contention and avoid the use
> 	of mutexes (mutexes virtually guarentee cache contention).
>
>     (4) We intend to isolate subsystems in their own cpu-locked threads
> 	so the related data structures remain local to the cpu.
>
>     We don't do everything perfectly... right now the cpu allocating a
>     structure is not necessarily the cpu that is going to use it, for example,
>     but it is simply not possible to cover all the bases right from the
>     start.  As long as the infrastructure and programming model allow for
>     it to be done properly, as a goal, then we can eventually achieve the
>     goal.
>
>     So, at least in regard to DragonFly, aligning memory requests on 128
>     byte boundaries would be detrimental.


Allocating a  800byte structure on a 64 or 128 byte boundary imposes a
pretty neglible additional overhead. and you can figure out the right
alignment at kernel startup the latest

>
> 					-Matt
> 					Matthew Dillon
> 					<dillon@xxxxxxxxxxxxx>
>


	Sander

+++ Out of cheese error +++






[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]