DragonFly kernel List (threaded) for 2006-07
[
Date Prev][
Date Next]
[
Thread Prev][
Thread Next]
[
Date Index][
Thread Index]
Re: Reviving userland LWKT
:In actuality that is very similar to ideas I've been throwing around
:regarding the messaging implementation. I'm a big fan of TLV-based
:protocols, and idea just recently occurred to me also to have the
:lower-layered protocols interpret the leftmost X bits of the type field
:to decide how to translate the message for transmission while the
:detailed format of the message remains opaque. Fantastic!
Yes, exactly! And we need 64 bits for that (at least) in order to be
able to embed the logical host id, subsystem id, object index,
possibly also a boot counter or timestamp of some sort to properly
detect stale objects (for inter-machine communications or when the
link gets lost and later reconnects), maybe a few bits to identify
the structural type (vnode, VM object, descriptor, or something more
opaque), etc.
The data isn't quite opaque, however. It can't be. I'll address
that down below.
:It makes a lot of sense actually -- I will take a closer look at the
:details. I did not have VFS as a particular application in mind; I
:wanted the protocol to be general, but VFS is certainly applicable.
:
:My big question for you is why do you mention this idea as alternative
:to the idea of implementing LWKT? It seems like LWKT would be a
:necessary platform to build this kind of service on. What would the
:userland VFS drivers use for m:n threading and asynchronous requests?
:
: -Eric
We can still implement LWKT, but LWKT can't be the lowest layer
because it isn't a generic transport. An LWKT message might contain
pointers to other structures, pointers to strings, etc... you can't
just bcopy() it into a buffer and transmit it to another host.
A stream or memory FIFO is a generic transport. I think it is
very important to define it at the lowest layer (e.g. the
streaming/memory-FIFO interface) because that is the layer where we
can really tune the system for performance. LWKT messages may be a
good abstraction for userland or even for the kernel, but they need
to be translated in the transport layer. NOTE that, of course,
if the transport layer is just passing the message between two
threads or something like that, then no translation would be
required. But in ordre to make this generic there has to be
a translation layer of some sort (even if it is a NOP in some cases).
LWKT -> TRANSLATION -> TRANSPORT.
More on data opaqueness. We can't *quite* make the data opaque. My
original posting noted some reserved bits:
msg {
linkid (64 bits) (specifies the communications end point)
msgid (32 bits) (allows parallel commands to be issued)
command (16 bits) (bit 15 indicates a response)
(this field is also the error code on response)
length (16 bits)
item {
itemid (16 bits) (bit 15 indicates item recursion)
(bit 14 indicates ref'd data)
itemlen (16 bits)
data[] (recursive item if item recursion)
}
}
bit 15 in the command, to indicate a command or response for the msgid,
is pretty obvious. Each msgid represents a single transaction, and
clearly we need to be able to have multiple transactions running in
parallel on any given object (linkid), hence they are separate fields.
But lets look at bits 14 and 15 in the itemid. The item { } can be a
recursive structure. If bit 15 is set in the itemid then the data[]
consists of zero or more (recursive) item { }'s. Otherwise it indicates
relatively opaque data. That part is fairly obvious too.
But bit 14 is not so obvious. This protocol is going to be used to
pass all sorts of object references around. An object reference
is just a 'linkid', but the protocol needs to be able to identify
which data elements are linkid's in order to properly keep track of
them. In particular, in order to track a reference count for them.
(If bit 15 and bit 14 are both set, it indicates that there is at
least one linkid somewhere in the recursive sub-tree. If just bit
14 is set, it indicates that the data[] represents a linkid.
Here's an example:
client sends CMD=OPEN DATA="a/b/c"
server responds LINKID(bit14set) DATA=<linkid_of_open_file>
In this example the server returns an object reference to the client,
a linkid representing the open file. Clearly this has to be tracked
so the server knows when it can destroy the object (vnode) represented
by the linkid.
Now normally you might think that, ok, well, this could be tracked in
higher layers. But it actually has to be tracked by the transport layer
as well as higher layers for two reasons:
(1) Because the transport layer, or some layer just above it (but below
the API/VFS-interface layer/whatever)... that layer needs to deal with
disconnects and reconnects. It needs to deal with resychronization as
well. In short, some level of robustness.
(2) Because we are using a flexible recursive data structure, and
because the client and server may be running different versions of
a particular command, one of the communications protocol might not
be able to completely parse a message sent by the other end. If
a message cannot be completely parsed, the highest layer (i.e. the
code implementing 'open' or 'read') might 'miss' an object reference
that is passed to it.
For example, lets say we have a UNIX box and an APPLE box talking to
each other and the UNIX box sends a cmd=OPEN request and the APPLE
box returns two link references in two item { } structures instead of
one, say to represent two data forks for the file. If the UNIX box
doesn't understand two data forks it won't properly ref count the
second linkid reference. BUT since the linkid reference is defined by
the low level protocol, the protocol *WILL* be able to properly keep
track of the reference and will be able to properly dereference it or
whatever if the higher protocol layer didn't pick up the object.
Another example... say we do a 'stat' command. The server might return
a recursive item { } structure containing items for each stat field
(size, modes, owner, etc). The server might contain item structures
that the client does not recognize. The client needs to simply be
able to ignore the sub elements it does not recognize.
So by making the data slightly non-opaque. Just slightly, we can
develop protocols which interoperate over many releases. It is very
important that 'old' machines be able to talk to 'new' machines and
vise-versa.
-Matt
Matthew Dillon
<dillon@xxxxxxxxxxxxx>
[
Date Prev][
Date Next]
[
Thread Prev][
Thread Next]
[
Date Index][
Thread Index]