DragonFly kernel List (threaded) for 2004-02
[
Date Prev][
Date Next]
[
Thread Prev][
Thread Next]
[
Date Index][
Thread Index]
Packaging system effort
Hey people,
As some of you might know I've written up my thoughts about what a
proper packaging system should provide etc. It has become a bit
longish, but I just had to add everything in my mind (at least
concerning packaging), I feared otherwise people would start
bikeshedding about things I had thought about but didn't write down.
This text is not about implementation, not even a little bit (if there
is some, ignore it). This is intentional. I think we first need to come
to a conclusion *what* we want and after that start thinking about
*how* we will implement it.
Uhm, there was some talk about a June release, no? :) So timeline is
pretty narrow (if we want to have our own packaging system ready for
then). I propose some weeks (1-2) for discussion about these points
here; in this time we need to come to a conclusion about what we want
and what of that needs to be in in the first release. After that some
weeks for implementation proposals and discussion (proposals should be
written down completely and then submitted to the list so that we can
all discuss about various complete concepts rather than about fragments
which could be implemented). After that, start hacking [yes, me too] :)
Now here it comes. Oh yea, current version will be available at
<http://chlamydia.fs.ei.tum.de/~corecode/packaging.txt> and Justin will
put up the text (after it has been polished) on the blog and the main
page too I guess.
Thanks for taking your time for reading and commenting(!)...
cheers
simon
Thoughts about a packaging system
---------------------------------
$Revision: 1.2 $
$Date: 2004/02/25 15:02:27 $
A package building and installation system (referenced as packaging
system now)
should provide several features which come to mind after some time of
using
other (partly incapable) systems and thinking about usability.
This is what I did for the last months, still I failed to write down my
thoughts, now I'll try and write down the mess in my head :)
My current knowledge concerning packaging systems is rather limited to
Debian's
dpkg, FreeBSD's ports, NetBSD's pkgsrc and Gentoo's portage. As of such
I'll
only reference to these systems for comparison with the One True System
(OTS)
I'll describe here.
The packaging system should be OS agnostic in almost all parts, the
package
descriptions in large parts. This is a requirement if the system wants
to be
OTS, or - at least - provide its functionality to a wider range of OSes
(see
pkgsrc/zoularis, portage to some extent).
The packaging system must provide the necessary infrastructure to hold
descriptions for multiple versions of one ``program'' without lots of
overhead
and easy (read: easy, sane choice of versions) usage. This
functionality is
needed for both specialized deployment/installation strategies and
multiple
OS/arch support as described below. Portage provides such a feature,
ports
doesn't really (ok, there are -devel and versioned directories, still I
see this
more as a bandaid).
Multiple architectures must be supported. This means build quirks for
special
architectures and the need for multiple versions of one program, as
e.g. i386
might be supported the best whereas amd64 might only be unstable for
the same
version or won't build at all.
It is highly desirable to be able to install multiple versions of one
program at
the same time. Besides means to enable this in the filesystem (symlinks,
variable symlinks, VFS voodoo, etc) - which might not be available on
all target
platforms - this also adds some more questions concerning the logic of
newly
installed packages. Imagine two perl versions installed: 5.6 and 5.8;
which
version should the newly installed spamassassin depend upon?
This brings us to another point: Clean build environments, environments
which
only contain the build and/or (to be discussed) runtime requirements
(note that
dependencies are the opposite - a misnomer in ports/pkgsrc), so that
there is a
guarantee that various configure scripts (or whatever) don't suck in
optionally
supported components and create not registered requirements. This also
needs
special filesystem voodoo, VFS might be a nice thing to use, pkgsrc
does this
via buildlink's symlink system.
As the system should be easily usable and not just academic, easy tools
are
strongly needed. This includes tools for all sorts of maintenance: from
updating
descriptions over searching to upgrading installed packages. It is most
desirable to also provide graphical tools (ncurses, X11, web), or at
least
provide infrastructure so that third parties can easily develop such.
The system should be able to track various kinds of requirements and act
differently upon them: Build time requirements might easily be garbage
collected
because they are not needed once packages are built; runtime
requirements (e.g.
shared libraries) might not be in use any more and could thus be
cleaned too
(compare portage's world file).
Concerning shared libraries/runtime requirements: When runtime or build
time
(harder case) requirements are being updated it is not always needed to
update
their dependants too. For example, this could be the case for security
fixes in
shares libraries; if the shared object major version changes or a
dependant is
linked statically to this shared library, this - of course - can't be
applied.
Of course the system must provide an advanced requirement and collision
system
which also provides room for meta requirements (MTA, web server,
whatever;
compare to dpkg and portage). This also means ability to fuzzily
specify version
numbers (>=2.0, everything but 1.4) and - where applicable - package
flags (see
below).
It is also desirable that the system can dynamically include additional
optional
requirements if the host system provides this (e.g. optional GNOME,
IPv6 etc);
either automatically or semi-automatically. This choice could possibly
additionally be handled with package flag settings as described next.
A very strong must have is a unified package flags system. Ports
provides
package flags (e.g. USE_LDAP), but these are not unified and per port
only.
Specifying in make.conf helps a bit, but lacking a global registry this
can be
painful. Portage provides a better way by use flags but one package
only flags
are not handled correctly. There needs to be a (small sized and
thoughtfully
selected) global flags registry which contains more that yes/no: On a
server
system, I most certainly don't want any X11 stuff being sucked in when
installing a package, so "never ever" is a needed state. Sometimes I
might not
want X11 support if optional but don't care when a package requiring X11
installs this too; this corresponds to a "better not" state. And, of
course
there also is a "if it can support it, use it" state.
Packages themselves need the ability to use package local flags too.
This might
be the case for e.g. subversion ("I don't want the DAV server stuff")
or PHP
("Well, support this and that..."). Ports supports this but it's not
unified and
not easy to use. Not everybody wants to more Makefile to find out all
flags that
can be used, nor can this be used for recursive requirements. A unified
system
is needed which allows the user to customize the packages in an easy
(graphical,
for example) and unobtrusive (all questions asked before unattended
installation) way. Nevertheless, some users don't want these choices,
they "just
want" some package, so there need to be sane defaults which will be
used if the
user chooses not to answer any questions at all.
Package flags and binary packages don't really mix (compare portage: no
binary
packages at all), so another feature is needed: package flavors (compare
OpenBSD's ports, I heard). A package flavor is a predefined, sane set
of package
flags which can be automatically built into a binary package. This also
allows
to give the user a kind of flexibility without the need to cope with all
possible flags.
All these flags of course need to be registered for all installed
packages so
that this recorded preference is used in upgrades. If package
flags/flavors
changed their meaning or got added/deleted it might be desirable for
the user to
get asked to review the settings; if flags/flavors didn't change, the
system
should be able to use old recorded settings.
The system should also be able to support split packages: Some packages
(especially X11) are so big so that it's highly desirable to split
them. Ports
does this by creating several "independent" packages which just happen
to use
the same source code. OpenBSD's ports natively produce several binary
packages
for one port, as I heard. The way this is being implemented needs to be
subject
of further discussion.
Debian's way of providing -dev packages which have been splitted off is
always a
highly controversial point of discussion. This is why I want to comment
on this
here. For a source based system as ports etc. having header files
available is
unavoidable, and also when using binary packages the bloat due to
header files
and static libs is small. Still there may be cases where every file
must be
considered, so having the possibility to prune development files if
feasible
might be a nice add-on. Same goes for foreign gettext language files
etc. This
could all be implemented via global package flags. What I'm opposing is
the
creation or use of additional packages for -dev headers/libs. The
number of
distinct packages should be kept to a minimum.
It is desirable to have a way to import an individually defined set of
packages
for easy deployment of multiple systems.
The system must both support building from source and installing from
precompiled binary packages equally good and be able to use building
from source
as fallback method if binary packages are not available in an individual
configuration. Furthermore it should be easy to build binary packages
for
installation on another system.
As a direct conclusion the system must have strong binary package
distribution
support. In the past a lot of people demanded a streaming binary format
to have
the ability to install packages straight whilst downloading without
having to
wait for the whole package. This needs to be discussed further as
installing
while downloading is even less atomic than installing after download
which can
lead to other major problems.
A nice feature might be the availability of relative (binary) patchsets
between
certain versions (individually selected) to reduce consumed bandwidth
and
installation time. For binary patching systems see the bsdiff effort.
If possible, a nice addition would be the optional integration of
installation
and build management of the base system. Together with an advanced and
easy
binary update system this would lead to an unified system update
mechanism - on
the cost of losing the clear border between system and third party
products (as
it is the situation with ports at the moment). Package flags could
easily be
used as a way to customize the base system, as we currently use
-DNO_CRYPTO etc.
The advantage for the user is clear: System and third party products
appear as
the same category; the OS isn't just kernel + some userland; it appears
as
everything provided via the packaging system (see linux world).
The system must in any case provide different update strategies which
need to be
selectable both globally and per package. This means: on a critical
production
server, I don't want to upgrade my software (base system and third party
products as e.g. apache) unless there is a security problem (might even
be
classified into local/remote root/DoS) or I need new features only
provided by a
newer version. I'll call this way of updating the very conservative
way. Other
users might upgrade every now and then to a new version which has been
tested
and tagged as stable working (this can be different for various
architectures or
OS). Some other power users might upgrade to every newly released
version
because they don't care about minor instabilities. This update
strategies need
not only to be selected for all available packages as a whole, the user
needs to
have the ability to individually specify them for single packages or
groups
thereof. Ports doesn't provide this - versions are dictated by
committers;
portage provides this feature to some small amount (accept keywords).
Debian
runs stable/testing/unstable versions.
For this to work properly, packages need to carry information about
vulnerabilities, new features etc. so that the admin can chose whether
an
upgrade is needed or not. This shouldn't be a whole changelog, just a
summary of
the most interesting changes.
The use of cryptographic signatures is a hard requirement. This must be
implemented for package descriptions and for binary packages. MD5/SHA1
is no
cryptographic signature! This could mean an openssl requirement for the
packaging system itself or the need of implementing some cryptographic
functions. The distribution and extent of default trust of a
certificate needs
to be discussed in this context too.
Another important aspect is a powerful build system. It should be
possible that
multiple packages are being built at the same time and get synchronized
for
installation etc. It's just PITA if you're compiling KDE or OpenOffice
and can't
build/install a small package like mpg123 or irssi because this might
damage the
package db. Existing systems handle such cases nice most of the time,
but that's
just luck. Pkgsrc implements locking as far as I remember.
Another very nice to have is native distributed build support. This is
very much
needed if one needs to install customized packages on a slow machine
(firewall/NAT etc) or does binary package building for distribution.
Portage
provides this kind of service via distcc and it just plainly rocks. You
can even
build OpenOffice in reasonable time with 10 boxen compiling :) Another
possibility is the use of distributed pmake.
Display of the build progress is a nice add on for users for sure. This
can be
both implemented in a macroscopic way (x of y packages built) and
microscopic
(anybody wanna hack make for SIGINFO?).
Speaking of compilation for slow boxes: Cross compilation comes to
mind. Is this
needed when distcc support exists? Discussion point here.
As times get harder and it's common that the source/configure of major
software
get compromised the system should include the possibility (hopefully as
default)
to build packages either as non-root or in a chroot/jail (who needs
network
access for builds anyways?). This - of course - needs VFS magic or else
to map
requirements into the chroot.
It should be possible to build and install packages as an unprivileged
user.
Sometimes local security policy or laziness of an admin demands the
installation
of a package into the user's home dir. A nice point would be native
support for
such in the packaging system. This doesn't mean that binary packages
need to be
relocatable into home dirs, but the system would need to provide an
alternative
(user home) location of package registry.
An essential duty of a packaging system is the tracking of installed
files. It
must be an easy task to remove a package and thus all its installed
files from
the system. The system needs to provide collision management (same file
installed by several packages, VFS voodoo?) and configuration file
awareness
(see below). Compare with portage (automatic list building) and ports
(ugly
manually generated plists).
All config files that might be potentially modified by users (read:
all) need to
be treated in a special way: they may not be overwritten, yet new
versions
shouldn't be discarded. There must be an easy way for the user to merge
own
changes and upstream changes. If the config didn't change since last
modification the system should be intelligent enough to suppress
obsolete merge
actions. On temporary deinstall of a package, existing config files
shouldn't be
removed but on the user's request the system should be able to purge
remaining
config files. Compare with port's .sample files and portage's config
file
protection system (path bound, fails e.g. on TeX stuff).
The packaging system descriptions shouldn't consume too much space in
general
and inodes in specific. It's just horrible to have a myriad of small
files and
directories in your /usr (or whatever) wasting a big deal of inodes.
This goes
for end users. Package maintainers could have a different view of the
description which could be collapsed into less files later. A possible
approach
could be one description file per available package and version plus
approximately one patch file for each version (vs. patch-per-file in
ports).
This leads to patches. As the system should aim to be OS agnostic in
most parts
this also counts for patches. These should be specially crafted so that
they at
best don't interfere with the build process on other platforms/OSes.
This means
extensive use of #if defined(__MyOS__) etc.
This portability is the key to a close communication and development
with the
upstream authors. It should be policy that patches are to be written as
cleanly
as possible and have always to be submitted upstream. The packaging
system might
provide help in or even enforce this process. Having patches go in
upstream
reduces needed files, enhances overall acceptance of the packaging
system and
also provides people not using the packaging system with features and
fixes.
The system should provide support for bug tracking so that users can
easily
check for known bugs and report new ones or add followup information for
existing ones. As the bug tracking system should be closely integrated
with the
system, bugs need be associated with packages or specific package
versions. This
helps maintainers and committers to follow user input better than the
GNATS/CVS
decoupling ports is currently using.
It goes without saying that the packaging system implementation must
only have
low/moderate requirements for needed tools and processing power. This
means the
system should be buildable with only POSIX tools and a moderately new
C/C++/ObjC
compiler. If a scripting language is being used it should be one of the
really
popular ones: sh, perl, python, tcl.
The system should be able to bootstrap itself. This means it shouldn't
depend on
the system tools be included with the host OS. If the system undergoes
changes
the tools need to change too. As seen with ports several times having
the tools
in the base OS only complicates stuff and leads to legacy issues (e.g.
tbz/tgz
issue). Pkgsrc provides this in a nice way. Bootstrapping also means
registering/tracking installation of the package system (and
requirements of
it). This seems like a chicken-and-egg problem but it can and should be
solved.
Package descriptions must be easy to generate and easy to be used. This
could
lead to different views of the packages - one maintainer side and one
consumer
side - which can be converted into each other. For e.g. ports a big
show stopper
concerning format conversion (remember pkg_info) and fast processing
(see INDEX
generation) is the fact that Makefiles are indeed interpreted and not
just
parsed. This means slowdown in processing and also problems in automatic
conversions - you never know how creative a maintainer was in (ab-)
using make.
This leads me to one conclusion: Don't use an interpreted file format.
Use a
standardized description format which only needs to get parsed. This is
much
faster and more portable if multiple programs intend to work with the
descriptions. Not having a turing-complete language to use when writing
a
description might at first need some change in thinking (when moving
from ports
or portage) and will involve the need of writing more text/data (like
no more
using ${variable:C/pattern/replacements/}) but will help overall
cleanness.
To prevent the need of writing common things all over again and thus the
possibility of inconsistencies, the system needs to provide
infrastructure to
group common settings as templates. I'll call it package classes for
now. This
similar to portage's eclasses and port's special .mk files. Creation of
these
classes shouldn't take place when only few consumers exist as too many
existing
classes destroy cleanness and transparency. It must be possible for a
package to
use more than one class at the same time.
Basic Principe: The last instance of decision is always the user - but
she
shouldn't have to be in most cases.
. .. still to come:
{}
vim:tw=80:
--
/"\ http://corecode.ath.cx/#donate
\ /
\ ASCII Ribbon Campaign
/ \ Against HTML Mail and News
Attachment:
PGP.sig
Description: This is a digitally signed message part
[
Date Prev][
Date Next]
[
Thread Prev][
Thread Next]
[
Date Index][
Thread Index]