TIP speed up portage with cdb
From Gentoo Linux Wiki
| Terminals / Shells • Network • X Window System • Portage • System • Filesystems • Kernel • Other |
Contents |
[edit] Introduction
This tip will teach you how to drastically increase portage's speed after syncing, and for calculating dependencies. This method uses a database, rather than a bunch of flat files to store its metadata. It requires the module cdb, and you can view the original post in the Gentoo Forums.
There is also an beta ebuild in bugzilla, but it's better you follow the steps as laid out here by hand.
A similar speed-up with sqlite is described in TIP speed up portage with sqlite
[edit] What is cdb?
cdb is a fast, reliable, simple package for creating and reading constant databases. Its database structure provides several features:
- Fast lookups: A successful lookup in a large database normally takes just two disk accesses. An unsuccessful lookup takes only one.
- Low overhead: A database uses 2048 bytes, plus 24 bytes per record, plus the space for keys and data.
- No random limits: cdb can handle any database up to 4 gigabytes. There are no other restrictions; records don't even have to fit into memory. Databases are stored in a machine-independent format.
- Fast atomic database replacement: cdbmake can rewrite an entire database two orders of magnitude faster than other hashing packages.
- Fast database dumps: cdbdump prints the contents of a database in cdbmake-compatible format.
[edit] Warning
The method this tip uses to plugin a new module to portage is fine - portage devs added it to portage explicitly for purposes like this. That said, using a third party plugin with portage means you have to deal with the bugs- for example, if you unmerge python-cdb, you need to remember to remove the custom /etc/portage/modules setting.
Beyond that, an upcoming cache backport to 2.0.x from the >=2.1 line of portage will break this module. This should occur sometime around >=2.0.54. However, instructions are below which will repair the problem.
Also, portage-2.1_pre4 will not work with this module as is. See http://forums.gentoo.org/viewtopic-t-261580-postdays-0-postorder-asc-start-175.html#3068457
[edit] Getting what you need
Not much is needed for this. Just a text editor, and the cdb module. So emerge it!
emerge dev-python/python-cdb
[edit] Setting up portage
Portage will require us to create/edit two files for this to work:
[edit] Create Our new Module
We first need to create a new file, telling portage how to work with cdb.
So with portage <2.1, create the new file /usr/lib/portage/pym/portage_db_cdb.py and put the following in it:
| File: /usr/lib/portage/pym/portage_db_cdb.py |
# Copyright 2004, 2005 Tobias Bell <tobias.bell@web.de>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
import portage_db_template
import os
import os.path
import cPickle
import cdb
class _data(object):
def __init__(self, path, category, uid, gid):
self.path = path
self.category = category
self.uid = uid
self.gid = gid
self.addList = {}
self.delList = []
self.modified = False
self.cdbName = os.path.normpath(os.path.join(
self.path, self.category) + ".cdb")
self.cdbObject = None
def __del__(self):
if self.modified:
self.realSync()
self.closeCDB()
def realSync(self):
if self.modified:
self.modified = False
newDB = cdb.cdbmake(self.cdbName, self.cdbName + ".tmp")
for key, value in iter(self.cdbObject.each, None):
if key in self.delList:
if key in self.addList:
newDB.add(key, cPickle.dumps(self.addList[key], cPickle.HIGHEST_PROTOCOL))
del self.addList[key]
elif key in self.addList:
newDB.add(key, cPickle.dumps(self.addList[key], cPickle.HIGHEST_PROTOCOL))
del self.addList[key]
else:
newDB.add(key, value)
self.closeCDB()
for key, value in self.addList.iteritems():
newDB.add(key, cPickle.dumps(value, cPickle.HIGHEST_PROTOCOL))
newDB.finish()
del newDB
self.addList = {}
self.delList = []
self.openCDB()
def openCDB(self):
prevmask = os.umask(0)
if not os.path.exists(self.path):
os.makedirs(self.path, 02775)
os.chown(self.path, self.uid, self.gid)
if not os.path.isfile(self.cdbName):
maker = cdb.cdbmake(self.cdbName, self.cdbName + ".tmp")
maker.finish()
del maker
os.chown(self.cdbName, self.uid, self.gid)
os.chmod(self.cdbName, 0664)
os.umask(prevmask)
self.cdbObject = cdb.init(self.cdbName)
def closeCDB(self):
if self.cdbObject:
self.cdbObject = None
class _dummyData:
cdbName = ""
def realSync():
pass
realSync = staticmethod(realSync)
_cacheSize = 4
_cache = [_dummyData()] * _cacheSize
class database(portage_db_template.database):
def module_init(self):
self.data = _data(self.path, self.category, self.uid, self.gid)
for other in _cache:
if other.cdbName == self.data.cdbName:
self.data = other
break
else:
self.data.openCDB()
_cache.insert(0, self.data)
_cache.pop().realSync()
def has_key(self, key):
self.check_key(key)
retVal = 0
if self.data.cdbObject.get(key) is not None:
retVal = 1
if self.data.modified:
if key in self.data.delList:
retVal = 0
if key in self.data.addList:
retVal = 1
return retVal
def keys(self):
myKeys = self.data.cdbObject.keys()
if self.data.modified:
for k in self.data.delList:
myKeys.remove(k)
for k in self.data.addList.iterkeys():
if k not in myKeys:
myKeys.append(k)
return myKeys
def get_values(self, key):
values = None
if self.has_key(key):
if key in self.data.addList:
values = self.data.addList[key]
else:
values = cPickle.loads(self.data.cdbObject.get(key))
return values
def set_values(self, key, val):
self.check_key(key)
self.data.modified = True
self.data.addList[key] = val
def del_key(self, key):
retVal = 0
if self.has_key(key):
self.data.modified = True
retVal = 1
if key in self.data.addList:
del self.data.addList[key]
else:
self.data.delList.append(key)
return retVal
def sync(self):
pass
def close(self):
pass
if __name__ == "__main__":
import portage
uid = os.getuid()
gid = os.getgid()
portage_db_template.test_database(database,"/tmp", "sys-apps", portage.auxdbkeys, uid, gid)
|
In portage 2.1 you instead need to create the file /usr/lib/portage/pym/cache/cdb.py
| File: /usr/lib/portage/pym/cache/cdb.py |
# Copyright: 2005 Gentoo Foundation
# Author(s): Brian Harring (ferringb@gentoo.org)
# License: GPL2
# $Id: anydbm.py 1911 2005-08-25 03:44:21Z ferringb $
cdb_module = __import__("cdb")
try:
import cPickle as pickle
except ImportError:
import pickle
import copy
import os
import fs_template
from template import reconstruct_eclasses
import cache_errors
class database(fs_template.FsBased):
autocommits = True
cleanse_keys = True
serialize_eclasses = False
def __init__(self, *args, **config):
super(database,self).__init__(*args, **config)
self._db_path = os.path.join(self.location, fs_template.gen_label(self.location, self.label)+".cdb")
self.__db = None
try:
self.__db = cdb_module.init(self._db_path)
except cdb_module.error:
try:
self._ensure_dirs()
self._ensure_dirs(self._db_path)
self._ensure_access(self._db_path)
except (OSError, IOError), e:
raise cache_errors.InitializationError(self.__class__, e)
try:
cm = cdb_module.cdbmake(self._db_path, self._db_path+".tmp")
cm.finish()
self._ensure_access(self._db_path)
self.__db = cdb_module.init(self._db_path)
except cdb_module.error, e:
raise cache_errors.InitializationError(self.__class__, e)
self._adds = {}
self._dels = {}
def iteritems(self):
self.commit()
return iter(self.__db.each, None)
def _getitem(self, cpv):
if cpv in self._adds:
d = copy.deepcopy(self._adds[cpv])
else:
d = pickle.loads(self.__db[cpv])
return d
def _setitem(self, cpv, values):
if cpv in self._dels:
del self._dels[cpv]
self._adds[cpv] = values
def _delitem(self, cpv):
if cpv in self._adds:
del self._adds[cpv]
self._dels[cpv] = True
def commit(self):
if not self._adds and not self._dels:
return
cm = cdb_module.cdbmake(self._db_path, self._db_path+str(os.getpid()))
for (key, value) in iter(self.__db.each, None):
if key in self._dels:
del self._dels[key]
continue
if key in self._adds:
cm.add(key, pickle.dumps(self._adds.pop(key), pickle.HIGHEST_PROTOCOL))
else:
cm.add(key, value)
for (key, value) in self._adds.iteritems():
cm.add(key, pickle.dumps(value, pickle.HIGHEST_PROTOCOL))
cm.finish()
self._ensure_access(self._db_path)
self.__db = cdb_module.init(self._db_path)
self._adds = {}
self._dels = {}
def iterkeys(self):
self.commit()
return iter(self.__db.keys())
def has_key(self, cpv):
return cpv not in self._dels and (cpv in self._adds or cpv in self.__db)
def __del__(self):
if getattr(self, "__db", None):
self.commit()
self.__db.finish() |
You can also get it here. See also the original forum thread here. I hope jstubbs doesn't mind it being put here. ;-)
If the modules could be put on portage where it's not as easy for any hacker to change the link or code, that'd be nice.
[edit] Tell portage to use it
After setting that up, we need to tell portage to use our new database. So create the file /etc/portage/modules (if it's not already there) and put the following in it:
| File: /etc/portage/modules |
portdbapi.auxdbmodule = portage_db_cdb.database eclass_cache.dbmodule = portage_db_cdb.database |
In the portage 2.1 release, you'll need to add the following lines instead:
| File: /etc/portage/modules |
portdbapi.auxdbmodule = cache.cdb.database eclass_cache.dbmodule = cache.cdb.database |
[edit] Tell eix to use it
If we use eix (in version < 0.5.4), we will need to inform eix about our new database. So create the file /etc/eixrc (if it's not already there) and put the following in it:
| File: /etc/eixrc |
PORTDIR_CACHE_METHOD='cdb' |
Warning: This doesn't work for newer versions of eix anymore. Just leave the default method (metadata), because it is independent of the cache used by portage.
[edit] Final Steps
Now all we have to do is regenerate the portage cache:
emerge --metadata
[edit] Last Thoughts
- New updates to portage will be much faster, and if you use emerge --searchdesc then you'll definitely notice a speedup as well.
- If you want to turn this off, just comment out the two lines we added in /etc/portage/modules.
- Will this work with a /usr/portage that is shared across multiple servers? If so, does the module and cdb need to be installed on every one or...?
- The cache is entirely seperate from /usr/portage. Any machine that uses a shared /usr/portage will have it's own cache (unless the cache directory is also shared). The actual cdb (or other) cache files are stored in /var/cache/edb/dep/usr/portage.
- According to DirtyEpic on the Gentoo Forums, upgrading to python-2.4 will break portage. Comment out your /etc/portage/modules file, and remerge portage and python-cdb to fix it. You can also just leave it and run python-updater, which should be done after each python upgrade anyway.
- Likewise, if you want to upgrade to python 2.5 (which is ~ as of 2007-10-16), you will need to comment out the lines in /etc/portage/modules while you are updating. Make sure to unmask dev-python/python-cdb to get the latest version that can work with python 2.5. Once python-cdb is updated, you should be able to uncomment your modules and continue to use cdb. I ran python-updater before updating python-cdb, but if I were doing it again, I would try to update to python 2.5, comment the lines, update python-cdb, uncomment the lines, and then try python-updater.
[edit] Why NO cdb?
by Chaosite at
http://forums.gentoo.org/viewtopic-t-261580-postdays-0-postorder-asc-start-50.html
So, I went to #gentoo-portage, and carpaski was nice enough to explain to me the issues with this module:
This goes to show how the Gentoo Devs have differing opinions. (Note that chaosite is Matan Peled is me - Chaosite 16:55, 8 May 2006 (UTC)... Apparently I've become a major proponent of cdb without noticing it :) |
This behavior has led to many arguments in the Gentoo community.
[edit] Performance Comparison
For a frame of reference, here is a recent (May 2007) performance comparison on a current Gentoo installation with 498 packages installed:
time emerge --metadata (standard config):
real 11m51.729s user 0m11.390s sys 0m5.820s
time emerge --metadata (cdb config):
real 5m45.889s user 0m9.110s sys 0m3.010s
Almost a 2x performance improvement. This comparison was run on a AMD x64 4400+ nForce4 with 1G and XFS on a md raid 5 volume across four 500G SATA drives.
