Cassandra Schema

Working with a Cassandra schema is very flexible using CDM. There are several options available.

Using a schema file

This is useful if you have a schema somewhere already that you want to write to disk through cqlsh, and you don’t wish to use CQLEngine models.

To easily use a schema file, make sure your installer subclasses SimpleCQLSchema first:

class MyInstaller(SimpleCQLSchema, Installer):

Put your schema in schema.cql, and it will automatically be picked up and loaded, splitting the statements on ;.

CQLEngine Models

This is a convenient as you’ll frequently want to leverage CQLEngine models for validating and inserting data. We’ll use the cassandra_schema() hook to return the classes we want sync’ed to the database.

For example, in movielens-small, we define our Movie Model similar to this:

class Movie(Model):
    __table_name__ = 'movies'
    id = Integer(primary_key=True)
    name = Text()
    release_date = Date()
    video_release_date = Date()
    url = Text()
    avg_rating = Float()
    genres = Set(Text)

In our installer, we return a list of table models:

class MovieLensInstaller(Installer):
    def cassandra_schema(self):
        return [Movie]

Specifying a Schema Inline

This will be necessary for UDAs/UDFs as they aren’t simply split on ;. A future version of CDM may include a parser to properly support this but it’s unlikely anytime soon. Until that day comes, it’s possible to use fat strings to specify schema:

class MovieLensInstaller(Installer):
    def cassandra_schema(self):
        statements = ["""CREATE TABLE movies
                        (id uuid primary key,
                         name text)""",
                     """CREATE CUSTOM INDEX on movies(name)
                        USING 'org.apache.cassandra.index.sasi.SASIIndex'"""
        return statements

Mixed Mode

There are cases which are not handled with CQLEngine yet. Materialized views, SASI indexes, UDFs, UDAs are all difficult to express. Python allows us a lot of flexibility by allowing lists to contain objects of mixed types. We can leverage our CQLEngine models for our tables and provide fat strings for the rest of the schema:

class MovieLensInstaller(Installer):
    def cassandra_schema(self):
        statements = [Movie,
                       """CREATE CUSTOM INDEX on movies(name)
                          USING 'org.apache.cassandra.index.sasi.SASIIndex'"""]

This is cool because we can leverage CQLEngine for our database models but still get the flexibility of using any CQL that it doesn’t support yet.