OSCON Data ObjectDriver

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Data::ObjectDriver: A relational mapper that doesn’t suck July 27th 2006: 

Data::ObjectDriver: A relational mapper that doesn’t suck July 27th 2006

Relational mapper: 

Relational mapper Abstraction layer Maps from classes and objects to relational database tables Build database-backed sites without (much) database knowledge

“Doesn’t suck”?: 

“Doesn’t suck”? A bold claim! Actually: all relational mappers suck

They all suck: 

They all suck Slower than raw SQL Never as powerful as raw SQL Reinventing SQL syntax

But…: 

But… Reduces code complexity (and overall code) Reduces potential for errors Particularly drastic errors

Back story: 

Back story Movable Type (Six Apart’s first product) was released in 2001 Server software, and hobbyist software at that Minimal prerequisites: only HTML::Template and Image::Size

Options we briefly considered (in 2001): 

Options we briefly considered (in 2001) Class::DBI Too many required modules Funky class structure: class is both base object and driver Tangram: too theoretical Alzabo: too complex

Plus, we were crazy: 

Plus, we were crazy We wrote full support for Berkeley DB, along with custom-built indexes It wasn’t much fun But it made sense at the time

Hence MT::ObjectDriver: 

Hence MT::ObjectDriver Supports all of the normal operations: creating, loading, deleting rows Very minimal support for JOINs Still used in Movable Type today

Fast forward four years, to mid 2005: 

Fast forward four years, to mid 2005 We’re starting a new project: Vox New database architecture (like LiveJournal’s) But LJ writes all of its queries by hand We wanted to abstract caching and partitioning

Options we considered (in 2005): 

Options we considered (in 2005) All of the previous options: discarded for the same reasons DBIx::Class wasn't released yet Plus, we knew we had a good, stable codebase Worked for four years on MT, 2 on TypePad

Hence Data::ObjectDriver: 

Hence Data::ObjectDriver All of the obvious features of a relational mapper Plus: transparent caching and partitioning support Plus: layered driver architecture Easy to write your own driver Easy to plug layers together

Typical Layers: 

Typical Layers

Goals: 

Goals Transparent Flexible Subclassable

Embrace SQL: 

Embrace SQL Don't eliminate it It's a good, flexible language Replacing SQL with a new syntax (in Perl) is silly

Handles all of the easy things: 

Handles all of the easy things Creates objects, updates objects Looks up objects by primary key Searches for objects by various columns

For example: a recipe database: 

For example: a recipe database package Recipe; use base qw( Data::ObjectDriver::BaseObject ); __PACKAGE__->install_properties({ columns => [ 'id', 'title' ], datasource => 'recipes', primary_key => 'id', driver => Data::ObjectDriver::Driver::DBI->new( dsn => 'dbi:SQLite:dbname=global.db', ), });

But your traffic grows!: 

But your traffic grows! Your single database is overwhelmed with SELECT queries! What do you do? You add caching

Caching: Goals: 

Caching: Goals Transparent: Automate the easy things, like caching by primary key Flexible: Allow the application to mix and match caching drivers per class/table

So, we had this…: 

So, we had this… __PACKAGE__->install_properties({ columns => [ 'id', 'title' ], datasource => 'recipes', primary_key => 'id', driver => Data::ObjectDriver::Driver::DBI->new( dsn => 'dbi:SQLite:dbname=global.db', ), });

So we extend our Recipe class:: 

So we extend our Recipe class: __PACKAGE__->install_properties({ columns => [ 'id', 'title' ], datasource => 'recipes', primary_key => 'id', driver => Data::ObjectDriver::Driver::Cache::Memcached->new( cache => Cache::Memcached->new({ servers => [ ... ] }), fallback => Data::ObjectDriver::Driver::DBI->new( dsn => 'dbi:SQLite:dbname=global.db', ), ), });

Caching: Effect: 

Caching: Effect All primary key lookups now come out of memcached Records in memcached are automatically kept in sync with the database

New Feature!: 

New Feature! Visitors clamor for a comment feature So you add it: comments/notes on recipes

Recipe Notes: 

Recipe Notes package RecipeNote; use base qw( Data::ObjectDriver::BaseObject ); __PACKAGE__->install_properties({ columns => [ ‘recipe_id’, ‘note_id’, ‘author’, ‘text’ ], datasource => 'recipe_note', primary_key => [ ‘recipe_id’, ‘note_id’ ], driver => Data::ObjectDriver::Driver::Cache::Cache->new( cache => Cache::Memcached->new({ servers => [ ... ] }), fallback => Data::ObjectDriver::Driver::DBI->new( dsn => 'dbi:SQLite:dbname=global.db', ), ), });

But your site is still growing, oh no!: 

But your site is still growing, oh no! Visitors to your site post comments on recipes Write traffic is crushing your single database server What now? Partition the data to spread the writes

Partitioning: Background: 

Partitioning: Background Move as much data as possible into partitions Global database tables are an index into partitions All partitioned tables use composite primary keys We always know where to look up partitioned data

Partitioning: Goals: 

Partitioning: Goals Transparent: Caller should never have to care about whether an object is in a partitioned table Flexible: Applications define their own partitioning scheme--it's not imposed by the framework Simple: Partitioning is hard--try to make it easier

We had this…: 

We had this… package RecipeNote; use base qw( Data::ObjectDriver::BaseObject ); __PACKAGE__->install_properties({ columns => [ ‘recipe_id’, ‘note_id’, ‘author’, ‘text’ ], datasource => 'recipe_note', primary_key => [ ‘recipe_id’, ‘note_id’ ], driver => Data::ObjectDriver::Driver::Cache::Cache->new( cache => Cache::Memcached->new({ servers => [ ... ] }), fallback => Data::ObjectDriver::Driver::DBI->new( dsn => 'dbi:SQLite:dbname=global.db', ), ), });

… and now we have this:: 

… and now we have this: package RecipeNote; use base qw( Data::ObjectDriver::BaseObject ); __PACKAGE__->install_properties({ columns => [ ‘recipe_id’, ‘note_id’, ‘author’, ‘text’ ], datasource => 'recipe_note', primary_key => [ ‘recipe_id’, ‘note_id’ ], driver => Data::ObjectDriver::Driver::Cache::Cache->new( cache => Cache::Memcached->new({ servers => [ ... ] }), fallback => Data::ObjectDriver::Driver::SimplePartition->new( using => 'Recipe', ), ), });

Partitioning: Effect: 

Partitioning: Effect Recipe notes spread across multiple servers Writes are spread across multiple partitions Horizontal scaling: add another database server, increase capacity

Fun stuff: Parallelization: 

Fun stuff: Parallelization We have an asynchronous job system called Gearman Client submits a couple of jobs, then waits for all of them to repeat while they're worked on in parallel We have many partitions of data, all with the same data model Wouldn't it be nice to be able to query them in parallel, then merge the results? Yes.

Fun stuff: Parallelization: 

Fun stuff: Parallelization Create a new driver, ParallelQuery The driver knows about the various database partitions Queries are submitted in parallel, and the results are merged, like a map-reduce algorithm A class that represents a data set across all partitions can use this driver

Caching support: 

Caching support Memcached Cache::* family of modules Simple in-memory cache

Database support: 

Database support MySQL PostgreSQL SQLite

Other Useful Features: 

Other Useful Features Views (Simple) Query Profiling Triggers

Current Status: 

Current Status Data::ObjectDriver has a version number of 0.03 But don't let that fool you: it's stable

It’s stable!: 

It’s stable! Based on a codebase that's been stable and production-ready for five years Used on TypePad and Vox, and other Six Apart projects http://code.sixapart.com/