How to use NFS to make Blobs more efficient

Author:Christian Theune <ct@gocept.com>

Overview

When handling blobs, the biggest goal is to avoid writing operations that require the blob data to be transferred using up IO resources.

When bringing a blob into the system, at least one O(N) operation has to happen, e.g. when the blob is uploaded via a network server. The blob should be extracted as a file on the final storage volume as early as possible, avoiding further copies.

In a ZEO setup, all data is stored on a networked server and passed to it using zrpc. This is a major problem for handling blobs, because it will lock all transactions from committing when storing a single large blob. As a default, this mechanism works but is not recommended for high-volume installations.

Shared filesystem

The solution for the transfer problem is to setup various storage parameters so that blobs are always handled on a single volume that is shared via network between ZEO servers and clients.

Step 1: Setup a writable shared filesystem for ZEO server and client

On the ZEO server, create two directories on the volume that will be used by this setup (assume the volume is accessible via $SERVER/):

  • $SERVER/blobs
  • $SERVER/tmp

Then export the $SERVER directory using a shared network filesystem like NFS. Make sure it’s writable by the ZEO clients.

Assume the exported directory is available on the client as $CLIENT.

Step 2: Application temporary directories

Applications (i.e. Zope) will put uploaded data in a temporary directory first. Adjust your TMPDIR, TMP or TEMP environment variable to point to the shared filesystem:

$ export TMPDIR=$CLIENT/tmp

Step 3: ZEO client caches

Edit the file zope.conf on the ZEO client and adjust the configuration of the zeoclient storage with two new variables:

blob-dir = $CLIENT/blobs
blob-cache-writable = yes

Step 4: ZEO server

Edit the file zeo.conf on the ZEO server to configure the blob directory. Assuming the published storage of the ZEO server is a file storage, then the configuration should look like this:

<blobstorage 1>
    <filestorage>
        path $INSTANCE/var/Data.fs
    <filestorage>
    blob-dir $SERVER/blobs
</blobstorage>

(Remember to manually replace $SERVER and $CLIENT with the exported directory as accessible by either the ZEO server or the ZEO client.)

Conclusion

At this point, after restarting your ZEO server and clients, the blob directory will be shared and a minimum amount of IO will occur when working with blobs.