mike watkins dot ca : Entries tagged with “Durus”

Entries tagged with “Durus”

July 12 2007

Python Database Interfaces

Python object databases need some love too

Flávio Coelho recently performed an examination of various Python database API and ORM interfaces to MySQL, Postgres, and SQLite, and included a benchmark for cPickle.

Here's an addition to Flávio's Fastest Python Database Interface article and script to include Durus: performance.py.

I also pointed out on Flávio's blog that his cPickle benchmark needed to include pickling 100,000 "Person" classes, in addition to 100,000 simple tuples - this to show the overhead of class instantiation and serialization / deserialization which all of the ORM's and object databases share in some form or another. An example of both can be found in performance.py.

June 08 2007

Python Web Application Diary, Part Six

In part five of this series we dove deep into QP and looked at the fundamentals of any QP application - SitePublisher and SiteDirectory - as well as explored the use of QPY templating. We also built a rudimentary UI for our Entry object.

In this installment of our web application diary we'll work more with the Durus object database by injecting some data into it; exploring the interactive interpreter (one of the cool features of Durus to be sure) and starting the basis for a conversion script to take weblog data in PyBlosxom format and insert it into our blog application database.

Tip

Before going further, install Pyrepl - this is required to support QP / Durus interactive interpreter features, and adds significant functionality (optional) to Python's own interactive interpreter.

To see Pyrepl at work with regular Python launch:

pythoni

Durus, the database you already know

Now I know what you are thinking. I think. Well, that is my theory and it is mine and I own that theory. My theory is that you are thinking:

"Object database? what sort of weird and strange alchemy is that? Fear the unknown! Down with the unknown! Destroy the unknown with DELETE FROM queries!" -- you

While object databases are not exactly in commonplace use by the IT industry, within the Python community, there is a long history of kinship with object databases with ZODB, the Zope Object Database, arguably being the most well known example.

Durus is patterned after ZODB, and indeed was written by developers who had used ZODB extensively. Visit the Durus pages for more information on their rationale for reinventing this particular wheel; from my own experience I can only say that Durus is small and easy to read and understand.

What exactly is an object database? Put simply, Durus and ZODB allow you to persist your Python objects. Its more than pickle but not unlike pickle in some respects.

Tip

Launch a log viewer in another terminal window so you can watch what happens as we make changes to the Durus database. qp -l blog

Demonstrating Durus, Interactively

QP and Durus provide the facility to work directly with the Durus object database directly. Lets fire up an interactive session to show Durus basics.

% qp -i blog
Profile, connection, publisher, root, sessions, site, users
->>

Working within the interactive session: Pyrepl provides very useful search and command history capabilities. Control-P and Control-N step through previous lines entered. Control-R starts up reverse history search - start typing an entry you've made previously (searches substrings within) and Control-R again to step through the hits, if any.

Term expansion is perhaps my favorite Pyrepl enhancement - it certainly is the one that gets used enough. Try it now by entering in a couple letters:

->> pu

And press Tab - you'll be rewarded with either publisher or a list of terms in the namespace which match the letters entered so far. A real timesaver.

Access to objects: The interactive session provides us with access to QP objects (connection, site, publisher), application objects (sessions, users), a Profile testing class, but the most relevant to our discussion right now is root.

By convention our application data lives under root, which is itself a persistent object. Changes to root will persist from session to session provided a call to connection.commit() has been made to commit the changes to the database. Lets do some simple examples.

->> from durus.persistent_dict import PersistentDict
->> mydict = PersistentDict()
->> root['test'] = mydict
->> connection.commit()
->>
%

Control-D exits the interactive session, as it also exits a standard Python interpreter. Restart the interpreter to see if our object was 'saved' or persisted.

% qp -i blog
Profile, connection, publisher, root, sessions, site, test, users
->>

Very good, test, now shows up in our display -- objects living at the root level are conveniently displayed as a reminder when we fire up an interactive session. Lets put some data in test, but first, what was test?

->> test
<PersistentDict 17020>
->> test.items()
[]

Right, now I remember. Ok, add some data.

->> test[1] = 'My first persistent data'
->> connection.commit()
->>

Control-D to quit, and restart again to satisfy any fears that you may have about your important data.

% qp -i blog
Profile, connection, journals, publisher, root, sessions, site, test, users
->> test.items()
[(1, 'My first persistent data')]
->>

By now you can see that what we are doing is using Python to manage our data, and, by virtue of subclassing one of Durus persistent object classes, we can make our Python objects full partners in the Durus object database.

Durus is the database you already know. No object relational mappers to learn, no SQL to learn or work around.

Durus Mini FAQ

What about performance?
This is too difficult a question to answer simply, but its been my experience that I have been able to use Durus, instead of a SQL database (Postgres is my personal favorite among the open source databases), far more often than not. You won't put an on-line banking system processing millions of transactions a day on to Durus or ZODB; but you might base on Durus a complex company inventory system, even if there are hundreds of thousands of items and related history. Third party solutions marry Durus with relational databases as a back-end to Durus (transparent to the application) to extend Durus (ZODB has similar approaches I'm told) even further.
What about SQL / queries? How will I ever live?
One of the challenging things for a SQL-oriented developer (that was me, some time ago) is to start thinking in pure-Python again. Its not hard, but it does take some realignment of thought before it comes naturally - at least for me. Being able to dispense with relational thinking in the SQL sense brings a lot of design freedom.
What about sharing data with other systems?
My approach has been to export data as CSV or DIF for import into other systems SQL databases, or to provide APIs such as XML-RPC or REST / JSON approaches for other applications themselves, or to use RSS or Atom feeds when it makes sense.

The bottom line: Durus objects are Python objects. You've already invested in learning and knowing Python, so you already know Durus, so there is no time-to-learn downside to spending some time with Durus now. Lets press on.

Entries with no home

In part three of this series we turned a simple Entry object into a full partner of a Durus database merely by subclassing PersistentObject instead of the standard Python new-style class object. In part four we kicked things up a notch by fleshing out our Entry object with specifications provided by the QP module qp.lib.spec.

What we have not done, yet, is provide a place for our journal entries to 'live'. We need a container for Entry, and early on we decided to call that container Journal. We are really going to kick things up a notch by levering off of functionality provided by QP in qp.lib.keep. A Keep is a mapping of Keyed items using an integer as a key. Lets enhance Entry first, then we'll write some unit tests for Journal, and then write Journal itself.

All the code for the end-result objects will be available at the conclusion of this series, but for you folks following along at home, lets dive in and re-edit our journal.py and clean up our Entry object first. For brevity's sake I have included imports relevant to both Entry and the Journal object we will be writing.

from dulcinea.base import DulcineaPersistent
from dulcinea.sort import attr_sort
from qp.lib.keep import Keep, Keyed, Stamped
from qp.lib.spec import add_getters_and_setters, boolean, both, datetime_with_tz
from qp.lib.spec import init, pattern, string, spec
from qp.pub.user import User


class Entry(DulcineaPersistent, Keyed, Stamped):
    """
    An entry in a journal.
    """
    title_is = spec(
        (string, None),
        "A string briefly describing the Entry")
    text_is = spec(
        (string, None),
        "The entry conten")
    published_is = spec(
        boolean,
        "Boolean indicating if Entry can be published")
    author_is = spec(
        User,
        "User responsible for creating entry")
    created_is = datetime_with_tz

    def __init__(self, author):
        Keyed.__init__(self)
        Stamped.__init__(self)
        init(self, author=author, created=self.stamp, published=False)

add_getters_and_setters(Entry)

Lets now write Journal but before we write it, lets write the tests we want it to pass, first, and then write the object. Typically you might write only some of these tests, at least until you become familiar with the various features of the QP and Dulcinea libraries. In our ./test/utest_journal.py we'll add another test.

from parlez.journal import Journal

class JournalTest(UTest):
    # we'll write this first, and then write Journal

    def _pre(self):
        # set up a journal which we'll use for most tests.
        self.j = Journal('science', User('einstein'))
        # it is automatically taken down following each individual test

    def init_test(self):
        # we want Journal to have a URL name and an owner, so force it
        Journal('musings', User('joe'))

    def create_entry_test(self):
        assert isinstance(self.j.create_entry(), Entry)

    def add_test(self):
        e = self.j.create_entry()
        self.j.add(e)
        assert e in self.j.get_all_entries()
        assert e == self.j.get_entry(1)

    def only_published_test(self):
        # nothing in
        assert self.j.get_all_entries() == []
        e = self.j.create_entry()
        self.j.add(e)
        e_published = self.j.create_entry()
        e_published.set_published(True)
        self.j.add(e_published)
        assert e not in self.j.get_entries()
        assert e_published in self.j.get_entries()
        # publish e now
        e.set_published(True)
        assert e in self.j.get_entries()
        # both should be in reverse sorted result, e last
        assert [e_published, e] == self.j.get_recent_entries()

if __name__ == '__main__':
    EntryTest()
    JournalTest()

I've kept this briefer than I'd like it to be, as there are some other tests we need to write to completely cover our Journal object, but these tests of primary functionality - add, retrieve, retrieve all and sort - should give you the spirit of what we are trying to achieve here.

PyBlosxom to Journal Conversion

A common challenge: you've got data in one system and need to move it into a Durus database. A script to perform this task will be included in full at the end of this series. For now lets sketch out what we need to do, and look at how to access an application's Durus database from a script.

Pyblosxom maintains its files in a hierarchy that looks like something like this:

../entries/categoryname/file1.txt
../entries/categoryname/someotherfile.rst
../entries/python/2007-06-08-08-44.rst

And so on. My particular installation uses a plugin which parses the entry date from the file name if it is formatted as a datetime in the form of yyyy-mm-dd-hh-mm.ext, so for files formatted like that I can set Entry.created to a datetime parsed from the filename. Otherwise, I need to stat the file and get its creation date from the operating system, which isn't always reliable (in the case of edits and hapless administrators).

The file contents are simple for me to parse - content is either plain text, or in my instance, mostly Textile formatted with a sprinkling of reST and Markdown.:

Some article title
#author Mike Watkins
The article content.

.h2 A subtitle

More content. Etc.

I never used the #author directive; some files use the #parser directive to indicate which formatter should be used; most rely on file extensions (.rst, .txt, .mkd).

Ultimately my script needs to deliver to me:

  • Entry date
  • Format
  • Title
  • Content

And, if I intend to preserve the URLs (am debating this now... I really dislike the existing bloxsom / Pyblosxom URL design) I'll need to carry that information forward too. For now, lets assume we have a mapping containing file paths as keys and a list with the four above noted data elements to work with, and write a script to import that information into Durus.

Importing data to Durus

Working with a QP application's Durus database is easy - remember, its just Python.

from qp.lib.site import Site
from parlez.journal import Entry, Journal

def bloxsom_to_mapping(entrypath):
    # here you'll deal with the specifics - see a future article
    data = {}
    # ...
    return data

def add_journal_entries(data, journal):
    for path, entry_data in data:
        # path I might store, or some component of it, in the Entry
        # object to facilitate mapping old to new URLs in the future.
        # for now, just ignoring it
        created, format, title, content = entry_data
        entry = journal.create_entry()
        entry.set_format(format)
        entry.set_title(title)
        entry.set_text(content)
        # normally we don't bypass getters/setters
        entry.created = created
        entry.stamp = created
        journal.add_entry(entry)

if __name__ == '__main__':
    BLOXSOM_ENTRY_PATH = '/home/mw/bloxsom/entries'
    APP_NAME = 'blog'
    JOURNAL_NAME = 'mw'
    USER_ID = 'mw'

    # the Site object gives us the ability to access
    # configuration information and live objects
    site = Site(APP_NAME)
    pub = site.get_publisher()
    root = pub.get_root()
    users = root['users']
    # make sure I exist in Users
    if USER_ID not in users:
       user = pub.create_user(USER_ID)
       users.add(user)
    if 'journal' not in root:
        journal = Journal(JOURNAL_NAME, user)
        root['journal'] = journal
    # move bloxsom data into Entry/Journal
    add_journal_entries(bloxsom_to_mapping(BLOXSOM_ENTRY_PATH),
                        journal)
    # made it here, commit everything to the database
    pub.get_connection().commit()
    # that's it!

Next Installment

When we return in part seven of this series we will further flesh out our UI objects for Entry and Journal, adding methods for creating and editing objects. At that point we'll have a basic journal or weblog application ready to deploy to the world. Subsequent articles will add more functionality.