Object-Relational Persistence with Glorp
This document describes the Glorp object-relational mapper. By reading it, you will learn how to map your objects to a relational database. It presents step by step how to use Glorp and its basic and advanced concepts in a tutorial. At the end we include an Appendix that revisits relational database concepts. We thank the authors of the previous Glorp tutorials R. Withney and N. Pratt for having granted us the right to use some of their material. This document has been written by E. Maringolo and edited by S. Ducasse.
What is Glorp?
Working in a live object environment such as Pharo is great. You can freely create your domain objects and compose them as you like. Sometimes those objects can be stored in a way that preserves the original design but in other cases you have to store your objects in a Relational Database Management System (a.k.a. RDBMS). This requires you to flatten your object graph into relational tables managed by the RDBMS.
This process of mapping object to tables is called Object-Relational Mapping (ORM). It imposes certain constraints on the design of your object model to support persisting it in tables. Some models are easier to map onto tables than others, the difficulty lies in what is known as the Object-Relational Impedance Mismatch.
To work with relational databases, Pharo provides a battle-tested ORM created at CampSmalltalk a decade ago and maintained since then. Its name is Glorp for Generic Lightweight Object-Relational Persistence. It is usually called both GLORP (all caps, as an acronym), or Glorp (as a proper name).
Glorp is a full-featured ORM which offers a number of features to reduce the impedance as much as possible. Amongst those features, you'll find some features saving you from writing SQL queries by hand, managing transactions that rollback the changes to the objects in your image or commit them to the database, writing simple and complex queries using plain Pharo syntax, and other features that we will cover in this introduction chapter and in the advanced topics chapter.
Installation
Database server
Before installing Glorp you should already have installed the RDBMS of your choice on your machine or a reacheable server, it could be PostgreSQL (http://postgresql.org), MySQL (http://dev.mysql.com/), SQLite (http://sqlite.org), or any other of your preference (as long it is supported). Hereafter we will refer to this RDBMS of your choice as the Platform.
Database drivers
Along with your working installation of your RDBMS you should have installed in your image the database drivers required to connect to the server.
Native drivers
With Native drivers the raw protocol is written in Smalltalk and it uses Pharo sockets to write and read the data from the wire. This has the advantage of being self contained, because you don't depend on an external file or setup to connect to a server. Also, because everything is written in Smalltalk it is debuggable using Pharo tools down to a very low-level.
Library wrapper drivers (FFI)
Library Wrappers, as its name implies, wrap an underlying Operative System
shared library like a .dll
or .so
, these wrappers will access
the shared library functions by means of FFI
(Foreign Function Interface) calls and data structures.
Using an external library enables you to use an official client implementation of a proprietary protocol like Oracle, or you can be sure that even if the protocol spec is open you can use the official version of it. In order to use it in Pharo you need to implement the wrapper classes and methods.
A common driver API
There are plenty of drivers available in Pharo, however those drivers have different API's making their use not directly interchangeable. Making it hard to migrate to a different driver or to simultaneously support different RDBMS.
To solve that there is Garage (aka Garage Drivers), that provides a common API for the different driver implementations. Garage is Pharo's analogous to ODBC, JDBC or ADO.NET drivers.
Using this common API your code won't need to change if, for example, you decide to move from MySQL to PostgreSQL, as long as you don't use exclusive features, because all the implemented drivers will conform to the common Garage API.
Available drivers
There are many drivers available drivers in different versions of Pharo, as the time of writing this, these are the currently supported drivers are:
- P3 (Native PostgreSQL v3 protocol)
- Garage
- PostgreSQL (Native)
- MySQL (Native)
- UDBC SQLite3
Glorp with P3
P3 is a modern, lean and mean PostgreSQL client for Pharo.
P3Client uses frontend/backend protocol 3.0 (PostgreSQL version 7.4 [2003] and later), implementing the simple query cycle. It supports plaintext and md5 password authentication. When SQL queries return row data, it efficiently converts incoming data to objects. P3Client supports most common PostgreSQL types.
More information can be found at its repository. To load it first install it through Metacello.
Glorp packages for Garage
Glorp provides a Metacello configuration configured to load the core classes and/or its tests.
It may be the case that you want to load Garage in an alpha version, in such case, you should load the most recent version instead of a stable version that may be not defined for a alpha stream.
One package is the Glorp-Core
and there is also a Glorp-Tests
package.
Glorp with UDBC / SQLite3
Glorp may also be configured to work directly with the UDBC SQLite3 driver in Pharo 5 (instead of the Garage drivers):
Running the tests
Having loaded the database drivers and the Glorp packages, it is recommended to run the unit tests of Glorp, to ensure everything was loaded correctly and it is working properly.
Person: Our First Example Class
To put some of the concepts described previously into practice we will create
a Person
class and store it into a PERSON
database table,
all from within the Pharo environment.
Class definition
Let us define a simple class.
Instance methods
Let us define some stupid methods.
Defining a DescriptorSystem
As you can see the above created class and methods, don't have anything related to persistence and don't require you to inherit from a particular class.
Glorp models all the involved concepts (such as tables, columns, classes, etc.)
as first class objects, and then links instances of those objects in a DescriptorSystem
.
It is the core of a Glorp system, it holds all the Glorp metadata,
such as the Tables, Descriptors and Class Models.
By using a separate artifact (in this case, a class) to define all the metadata of your system, you can decouple your business models from their persistence information. This separation of concerns is a good practice, and helps with the maintainability of your code.
Also, having an orthogonal description of your domain objects allows you to have more than one descriptor for the same business domain class. This is an important difference with patterns such as ActiveRecord where the persistence metadata is defined in the business domain class, and you can't reuse the same class for two different systems using different persistence configurations.
Defining a DescriptorSystem subclass
All the definitions and mappings are defined in a subclass of
DescriptorSystem
, and for that we will create our own subclass as follows:
We said before that Glorp has a whole metamodel that involves describing the mapped class, the table(s) where it is going to be mapped and the mapping itself. To define each one of those, Glorp follows a convention in the method naming. We will mention the conventions below.
Class Model
We start by describing the class Person
, and the way to
do it is by defining a method with the pattern classModelForYourClass:
.
Here we describe the class model for the Person
class.
Note that a class model is a Glorp representation of your domain class. Glorp uses it to
store metadata about your domain objects. The descriptor system provides hooks to let you define
such a class model. classModelForPerson:
is one of these hooks. Based on the naming convention described
before, Glorp automatically associates this class model to your domain class.
Table Model
The class Person
will be stored in a single table (which is the usual case), and we
provide the description of the table in a similar way as with the class
model, i.e., by following the tableForYOURTABLE
convention.
Please notice the upper case of the table name.
Here is the definition of the table model for PERSON table.
If the table name is not uppercase, it is necessary to add
the method allTableNames
with the correct case, e.g.:
For a non-uppercase table names we define the method allTableNames
.
The serial datatype, explained in detail below, is an autoincrementing integer datatype. Every time you save a new instance Glorp will obtain a new integer and assign it to your instance.
Expressing mappings
Once we have the class model and the table object, we should define the mappings between the class attributes and the table fields.
In this simple example we use a DirectMapping
, which is a class of
mapping that takes the value in the column and assigns it to the attribute,
and vice versa.
Here is the definition of the mapping between the class and the table.
In the above method we can see how the the descriptor links the class attribute to the field name.
Although this is verbose and the advantages may not be visible in this simple example, here lies the power of the Glorp orthogonal descriptor system, you describe everything and then link all the parts the way you want without modifying your domain object.
Creating tables
Assuming we haven't created the database tables externally, Glorp's metamodel
allows you to perform DDL (Data Definition Language) commands such as
CREATE TABLE
or CREATE CONSTRAINT
(among others) using plain
Pharo objects, and it can even determine when to run those.
To do that we must first connect to the database. We will explain how to do so in the following sections.
We will be using a PostgreSQL server running on the same host as our Pharo image. We will then create the Login object and a DatabaseAccessor
for it to interact with the database.
Database Accessor
Glorp is programmed to be agnostic of the Smalltalk dialect and
database driver. To achieve that instead of talking directly to the
driver it uses an adapter object that acts as an intermediary between
Glorp and the underlying driver. Each database driver requires its
own subclass of DatabaseAccessor
.
We use the recommended accessor, which is the Garage accessor that is compatible with all the drivers supported by the library.
To define it as the default driver we must execute once the following expression:
When using P3, this is not necessary, instead execute:
Login
All the information necessary to connect to the database is defined in an instance of the
Login
class. It is an abstraction for the connection parameters
used to connect to the database server, like the Plaform used,
hostname, port, username, password, database name and other parameters.
Here is a login creation:
The database accessor requires an instance of the class Login
to establish a connection
to the database server specified in the login, using the platform specified.
You can make the Login
secure by making its password volatile, so
once the password was accessed the instance variable will be set to nil and
the secret won't be stored in the image anymore. Note that if you want to
use the same login instance for other connections, you must assign the
password string again.
Here we declare a secure Login
Verifying the connectivity.
If everything goes fine, then executing accessor isLoggedIn
should answer
true.
Another way to test it is performing a basic query like the following
(accessor basicExecuteSQLString: 'SELECT 3+4') contents first first
should
return 7 as the result; and yes, the contents first first
is dirty,
but it is not meant to be used this way, we're doing it just to quickly verify that everything is working.
Session object
Once we're able to connect to the database we can start interacting with it using
Glorp's objects. We will do that using an object that orchestrates
all the other objects, the GlorpSession
. The easiest way to get a
session is to ask the DescriptorSystem
passing the Login
instance
instance as the argument.
Here is how we access a session
How the Parts Play Together
Let us look into the implementation of the method DescriptorSystem>>sessionForLogin:
to understand
better how the parts interact.
First we get a system instantiating the receiver (in our case
GlorpBookDescriptorSystem
) passing the platform as argument.
The platform in the system is important because it will be used to
generate SQL code using its particular syntax and data types.
Then we instantiate a GlorpSession
and store it in the session object.
This session object is assigned a DatabaseAccessor
. It uses this
accessor to communicate to the database, hence the name, but to know what to
send to the db it needs all the metadata defined in the system object
defined before.
Note we ask the abstract DatabaseAccessor
class to return a concrete
instance for aGlorpLogin, it will return an instance of the
database accessor configured as default.
About the Platform.
In Glorp terminology a Platform is the RDBMS (for Relational Database
Management System) platform itself. The abstract class DatabasePlatform
defines the
abstract methods to deal with differences of supported data types,
test the support of certain features, and so on.
In Pharo we have
MySQLPlatform
, OraclePlatform
, PostgresPlatform
, SQLServerPlatform
, SQLite3Platform
and UDBCSQLite3Platform
as subclass of DatabasePlatform
.
Saving/Updating/Deleting Objects
It is time to put to work all the classes and schema we just created. To do it
we will create and persist a few instances of our example class Person
, and
then read some of them back. We'll explain a little bit
how Glorp decides what and when to INSERT, UPDATE or DELETE an object using
the concept of Unit of Work. But we should create the required tables first.
Creating Tables
Now we have our working session object, we can perform our DDL queries extracting the data from the system and sending commands to the accessor. But before we should execute the following to create the tables:
And voilà, it will send the required commands to the database to create our tables, sequences, constraints and so on.
Unit of Work and Transactions
If you worked with RDBMS's before, you know you can use transactions that detach you from immediate writes to the database files, and enable you to rollback (undo) your changes, or do a batch of changes and commit them all at once.
Glorp provides you with automatic handling of such database transactions. It also adds another concept, the Unit of work (aka UOW or UnitOfWork). This UnitOfWork keeps track of all the persisted objects, enabling you to commit or rollback changes at the object level while maintaining the database transaction in sync with it.
Saving New Instances
To save instances, we should send the message register: anInstance
to a session.
Note that this operation should occur within a Unit of Work using the message
inUnitOfWorkDo:
as follows:
As you can see, we didn't say if the instance needs to be inserted or updated, nor did we specify an id for each instance. The framework will determine that by registering the instance in the all-mighty session.
Note that if we are registering an object that was read in a previous unit of work, the object cannot be modified prior to registration in the current unit of work.
Reading instances
We can now read back some of the persisted instances. Again we use
the session object, sending it the message read:
and the class as argument.
This will return a collection of results.
The following expression returns all read all instances of Person
:
Of course you don't always read all the instances from the database but want to filter the results. For that you can pass a block that will act as a filter.
The following expression reads all instances of Person
matching a criteria
Reading one instance
Another convenient way to retrieve a single instance instead of a Collection
is to use the readOneOf:
or readOneOf:where:
message in an analogous
way to read:
and read:where:
.
About blocks
The read:where:
message might lead you to think you can pass any block to filter the results
as you can do with select:
in a regular collection, but that is not the case.
Glorp uses blocks as a very clever way to build expressions that are later converted to SQL by the framework. You can do basic filtering using this feature, but keep in mind it doesn't work as a regular block.
Translated selectors
Next to regular operators such as = and
>
=, you can use some special selectors
that get translated to their counterparts SQL operators.
For instance the similarTo:
selector gets converted to SQL's LIKE
operator.
With this we can retrieve all the Persons with a given last name using an operator that gets translated:
Object update
We can also update one instance, i.e., to read one and update it. We do it
in the context of a UnitOfWork using the message inUnitOfWorkDo:
.
Updating an object read within a UnitOfWork:
Note that to update an object accessed from the database does not require you to
send the message register:
to the session object,
as we did while populating the table.
This is because when you're working inside a Unit of Work all the objects you read are automatically registered by the session, saving you from explicitly registering them (which is safe to do anyway, but it is not required).
Object deletion
So far we covered Creation, Read, and Update. We're only missing the Deletion of
objects. This is as easy as it gets, just send the delete:
to the session
object with the object you want to delete as argument.
The following snippet show how to delete an object inside of a UnitOfWork:
Another way to delete objects is passing a condition block:
You can delete an object without being inside of a UnitOfWork, this will cause the object to be deleted right away, without giving you the option to rollback such action unless you handle the transaction manually.
Transactions
Glorp lets you handle transactions automatically using a unit of work or manually as we will now show.
Rolling back changes
Atomic modifications to a database are important, and to achieve that you need transactions that let you rollback changes if something fails.
A Unit of Work handles the errors and other unexpected terminations of the block, so if something fails, it not only rollbacks any changes at the database level if necessary, but also, and more importantly it reverts the changes in your objects.
The following example shows that the last name of the Locke object will not be modified and the database updated since the unit of work failed.
Manually managing transaction
The inUnitOfWorkDo:
message is a convenient way to isolate a block of execution
within the context of a transaction at both object and database levels. However
it has some limitations, like not being able to handle nested UnitOfWorks
or transactions. Everything happens in the context of the outer context transaction
in which it was evaluated.
If for some reason you need to handle the start/end/rollback of the unit of work manually, you can do so by using the following messages:
beginUnitOfWork
commitUnitOfWork
rollbackUnitOfWork
commitUnitOfWorkAndContinue
The first three have self-describing selectors, and can be used like in the following code snippet.
The message commitUnitOfWorkAndContinue
needs some explanation,
but the concept is simple: It commits the current unit of work, and then
creates a new one migrating all the objects registered in the commited unit of
work to the newly created, and still open, unit of work. If this paragraph
confuses you, looking at its implementation might explain it better.
It is useful for cases like batch loads or updates, where you want to commit changes every n instances or similar.
Commiting and continuing the Unit Of Work:
We can cleanup (delete) some sample instances by running:
Glorp Configurations and Additional Concepts
In our previous example we created a simple class that mapped 1:1 with a table using simple data types, but Glorp provides many more features than an ActiveRecord like mapping. It lets you fine tune the persistence of your classes. We will go over the different configurations of class models, table data types and constraints and mappings of all sorts.
Class models
A class model defines the attributes of your domain objects, and how they
compose. Each of your persisted objects should have an instance of
GlorpClassModel
in the descriptor system containing all the
attributes of your class. By convention, these attributes are added by
implementing classModelForYourClass:
as we did previously for the Person example.
The simplest way to add an attribute is using a series of convenience methods returning instances of GlorpAttributeModel
,
described below:
newAttributeNamed: #email
- Used to define simple scalar/literal values such as Numbers, Strings, Dates, etc.
newAttributeNamed: #address type: Address
- Used to define 1:1 (one to one) relations with other objects of your domain model.
newAttributeNamed: #invoices collectionOf: Invoice
- Used to define 1:n (one to many) and n:m (many to many) relations of your class model with other models.
newAttributeNamed: #invoices collection: collectionClass of: Invoice
- Similar as the one above for 1:n and n:m relations, but you can define what kind of Collection is going to be used.
newAttributeNamed: #counters dictionaryFrom: keyClass to: valueClass
- Used to define an attribute that is a Dictionary where the key is
keyClass
and its values are instances ofvalueClass
.
Attribute properties
The above described methods return instances of GlorpAttributeModel
, which
share common properties that you can configure using the following messages.
useDirectAccess: aBoolean
- Let you define whether the access to the attribute described by the symbol of your domain model will be performed by directly accessing the instance variable (slot) instead of using a message send. This is the default, and lets you persist variables that don't have getters and setters. It's also useful where the getters and setters
have extra behavior that you don't want triggered by the persistence mechanisms. On the other hand, going through a message send allows you to persist things that don't correspond directly to a single instance variable.
beForPseudoVariable
- Useful for cases where you want to describe an attribute that won't be read nor written, but still described to be used on queries.
For example, in a later section we talk about mapping the position of an object in a collection to a field in the database, since databases don't maintain order. We could map that order field to a pseudo-variable and use it in queries, even though we don't want it stored in the object.
Table model
Glorp also models your database objects, such as tables, constraints, indexes, etc.. With this model it will be able to determine how to serialize the objects to SQL, how to perform joins to retrieve 1:1 or 1:n relations, and so on.
The descriptor system follows a convention to define the tables,
it uses the tableForTABLENAME:
selector to configure TABLENAME
, the
argument of this method is an instance of DatabaseTable
, and this
method is responsible for describing your table in the relational
database, including field names and their data types, contraints (primary keys
and/or foreign keys), etc.
Adding fields to the table
Let's bring back our example from the beginning of this book
As you can see, you can add a field by sending createFieldNamed:type:
to
the table object. The first argument of the method is name of the field
(aka column) in your table, the second one is the datatype.
For the datatype we're not specifying any particular implementation of the type but instead we send a message to the platform object, and in Glorp jargon the platform is the RDBMS we'll be using.
Doing it this way enables our table model to be RDBMS agnostic, and, for instance,
it will work with SQLite or PostgreSQL (or any other platform).
E.g. if our platform is PostgreSQL then platform varchar
will return
VARCHAR
, but if instead our platform is SQLite then it will return TEXT
because SQLite only supports TEXT
as datatype for character based fields.
Going back to createFieldNamed:type:
, it will return an instance of
DatabaseField
, which has properties of its own, such as if the field is
nullable, unique, its default value, and some convenience methods such
as bePrimaryKey
which will create a PK (primary key) on the table
for this field.
Commonly used datatypes in alphabetic order are:
selector | SQL-92 datatype | Pharo class |
blob | BLOB | ByteArray |
boolean | BOOLEAN | Boolean |
date | DATE | Date |
decimal | NUMERIC | ScaledDecimal |
double | DOUBLE | Float |
float | REAL | Float |
integer | INTEGER | Integer |
serial | SERIAL | Integer |
time | TIME | Time |
timestamp | TIMESTAMP | DateAndTime |
varchar | VARCHAR | String |
You can find more datatypes by browsing the types method category of
DatabasePlatform
or any of its subclasses, not all databases support all
datatypes, and some platforms have their own datatype. If you decide to use
a datatype that's only supported by a particular RDBMS you will lose the
benefits of being platform agnostic, so it is a tradeoff.
Mappings (aka Descriptor)
Once we have the Class Model describing our business objects and the
Table Models describing the tables where they will be stored, we need to
describe how they are going to work together. To achieve that Glorp uses
an instance of Descriptor
that contains the mappings between the attributes
of the class model and the fields of the tables involved.
As expected, there is a convention to define Descriptors and, you guessed right,
it is descriptorForYourClass:
that will receive as an argument a
instance of Descriptor
on which you will configure the mappings.
As you can see, the usual practice is to get a reference to the tables involved
(one in this example, could be more), and then add mappings to
aDescriptor
by sending it the message newMapping:
with an instance of
the class of mapping we want to use.
There are many mappings that Glorp provides, but the most used ones are:
Direct mapping.
To map a number, a string, a date or any other value like object
you use an instance of DirectMapping
, specifying which attribute from
your class model you want to map to a field of a table.
Although many times they will be the same symbol, the argument you pass to
the from:
parameter is not the symbol of the selector
but instead the name of attribute you have defined in your classModelFor...:
method.
1:1 relationships.
When you're trying to map a single reference from one object to another,
you use the class OneToOneMapping
and specify the name of the attribute
that this instance is mapping. If everything is as simple as this you don't
have to specify further instructions and Glorp will determine how to retrieve
the instances based on the information provided in the ClassModel, foreign keys,
etc. If you need further instructions you can always define them, more on that later.
1:N relationships.
If you are trying to map a collection that belongs to a parent/owner object
like the invoices of a Person or the items of such invoice to the invoice itself,
then you have to use the OneToManyMapping
class.
It is commonly used when you have the rows of the tables at the N side of
the relation have a field that points back to the the 1 side.
In our example, the invoices have a Foreign Key pointing back to the persons table.
N:M relationships.
The Many to Many relationships is similar to the previous one, but in this case
the other side of the relation might belong to more than one owner.
For instance, if you have Person and Tag, a Person might have many Tags,
but these tags in turn belong to other instances of Person.
In this case you use the ManyToManyMapping
and also have to specify
the attributeName.
From your the Pharo point of view, you will continue to see a regular collection
but when storing its elements it will require a third table to do it, called
a link table, Glorp has a convention to name such link tables, and it is
based on composing the names of both sides of the relations.
Following our Person to Tag relation, the table name Glorp will expect is
TAG_ON_PERSON
, and this must also have its definition in the descriptor system,
so in this case there must be a tableForTAG_ON_PERSON:
method.
Common properties of relationship mappings
All OneToOneMapping
, OneToManyMapping
and ManyToManyMapping
are
convenience subclasses of RelationshipMapping
, and as such they share
common properties that can be modified.
For instance, the only difference between OneToManyMapping
and
ManyToManyMapping
is that the later is initialized to use a link table,
but you can have a OneToManyMapping
that uses a link table if you prefer,
you simple send useLinkTable
to it.
Reference Classes.
All RelationshipMapping
instances can specify the class they reference
by sending referenceClass:
with the desired class as argument. This is
used a lot, and is a convenience method that ends up modifying the attribute
in the ClassModel. So you can specify it at either place you find more
convenient.
Note we only define the class referenced, because the other side of the relationship is the class for which we're defining the mapping itself.
Other attribute common to all RelationshipMapping
subclasses is that
you can specify how it is suppose to join with other tables, it is,
its Join
object. More on this on the advanced section.
Exclusivity.
Another attribute is the exclusivity of relationships, enabled by sending
beExclusive
to the mapping. This handy method makes that once the parent
(or left) side of the relation is deleted from the database, Glorp will
manually delete the child (or right) side of it.
In our Person example, if the relation with Address
was exclusive,
then deleting a Person would cause Glorp to delete its address as well.
One caveat of this convenience approach, is that it doesn't play along with database
foreign key actions like ON DELETE CASCADE
, basically because one will
happen without the other knowing it, and Glorp will try to delete an
exclusive object that was already removed from the database
by the CASCADE and because of that it will fail.
Collection type.
In the case of to many mappings you can specify the collection type
of the mapping, for instance, you can send collectionType: Set
to your
mapping and have it initialized to a Set instead of the default OrderedCollection
.
Proxy.
And last, but not least, you can specify whether to proxy or not such a relation.
By sending shouldProxy:
either true
or false
, Glorp will
place proxies or immediately build an instance from the database. If you don't specify this, it will proxy by default.
A Word about Proxies.
To avoid reading the whole object graph every time you read an object that has any kind of relationship mapping, Glorp places a Proxy object instead of reading it from the database. This saves a lot of roundtrips to the database, bandwidth and CPU/memory in the image building objects that you may not ever use.
However, there are cases when you know that certain objects reference other objects that you know that always will be used, if that is the case you can avoid the proxying, and instead force the immediate materialization.
Once Glorp turns a proxy into one of your domain objects, it will store your object in the Session Cache (more on this later), so future references to the same object won't cause an additional read from the database, and instead use the instance in the cache.
And also, Proxies are tricky animals, so beware when debugging something that involves them. Sometimes they can fool the development tools, making them believe they're something they're not!
Extending our Basic Example
Now that we explained in more detail the main parts of Glorp's descriptor system, we're ready to extend our basic example, we're going to create a minimal invoicing model to include the concepts just learnt.
We will
- define new Pharo class
- describe such new classes to be handled by Glorp
- describe the table in which such class instances will be stored
- describe the mapping between the instances and the records.
Domain classes
To do so we will create the following classes Person
,
Address
, Invoice
, InvoiceItem
, all inheriting from
GlorpBookObject
, as a convenience to factor the id attribute.
We will omit the listing of accessor methods, but you should create them. Regarding the id attribute, Glorp doesn't require you to use it as the primary key of your objects.
You can use as many attributes as you want, but for the sake of simplicity we will use it as a surrogate key, which is a key that is a unique identifier, an object not derived from application data.
Here is the hierarchy root of book classes.
Aside from the accessors, when you're dealing with collections (to-many relations) that are going to persisted, it is recommended to early initialize all the instance variables that will reference such collections. Otherwise it could cause some failures because Glorp keeps a copy of the object attributes to enable rolling back changes on memory. Plus... having a complete instance is a good practice too.
Class Model Declarations
Let's create our GlorpBookDescriptorSystem
to describe the new
domain models.
Table Declarations
Then we should declare the tables that represent our domain.
Here is the table models for the class Person
, Address
and Invoice
.
As you can see in our INVOICE table, we not only added fields, but also
added Foreign Keys (aka FK)
to our table table model using the message addForeignKeyFrom:to:
.
This will add the foreign keys to the list of constraints of the table model, and will be used by Glorp to order writes/deletions and
infer relations.
Also notice that we need to pass the referenced field as argument.
To obtain it we ask the descriptor system for the table of PERSON
and ADDRESS by sending tableNamed:
with the name of table we want
as argument. This will return the DatabaseTable
with that name,
and to this object we will ask for the field named id.
As you can see in the last method, we created a table model for the table
PERSON_ON_ADDRESS
. This table will be used as a link table. Because
this link table references to other tables we also created Foreign Keys
from each reference field using the same approach described before.
Mapping declarations
Address
The mapping for the class and table address is simple since there is a direct mapping between the class model (the domain model) and the table.
Invoice
The mapping for invoice is a bit more elaborated, because it not only
contains Direct Mappings, and one-to-one mappings for the Person
and
Address
, but also contains a one-to-many relationship to its items.
We introduced the orderBy:
and writeTheOrderField
attributes in
the #items mappings. You can use the first attribute independently or both together.
The orderBy:
attribute instructs
Glorp to read the elements from the database and order (aka sort) them
by the field described in the block. This is useful so the collection that holds
the references to the items (in this example) is always sorted the same way,
because the order the rows come from the database is not always the same,
and you can't trust they will return in the same order as they were written.
The writeTheOrderField
command will make Glorp save the index of the
element in the collection as a field in the row. It will use the field defined
in orderBy:
as its target, in this case position. So if you have an
OrderedCollection with five items, the first element one will have 1
stored
in the position field, the second 2
, and so on. If you change the order
of the elements in the collection, the position field will be saved with the
new index.
InvoiceItem
The mapping for each invoice item is slightly different since an item is always
part of an Invoice, and only that invoice; so because of that we have a
OneToOneMapping
referencing its invoice. This way we make the relation
between Invoice and InvoiceItem bidirectional.
Person
The mapping for the class Person is more elaborate and worth some explanation of what happens in the last lines.
Here we introduce the explicit join:
attribute of the Reference Mapping.
Usually Glorp infers and creates such Join by looking at the Foreign Keys
of the described or referenced class, but you can make that Join explicit,
you just have to create a Join
object with the source field and the
referenced field as arguments. In this case the join is pretty straightforward
and between two fields, but it could be between as many fields as you want.
Sample data
We will create some sample data for our example, and for the Persons we will save three instances named after famous Smalltalkers, with random addresses, so don't mail them.
Now we created the sample Persons, let's add an extra address for Alan.
As you can see, we didn't have to register the newly created address because the Person was read inside of a UnitOfWork. If we read the instance again and inspect its addresses we will find the new instance is there.
Before continuing and to make code simpler and more readable we'll create a few convenience methods:
We can now procceed to create an few instances of Invoice. Let's create one invoice for each person in the database with two items describing donations to the Pharo Consortium and Association with randomized amounts (within a certain range) for each one.
As usual, you can read the Invoices by doing session read: Invoice
, let's
print how much each Person donated.
When we presented the ReferenceMappings we mentioned how Glorp puts proxies in place of a referenced object to avoid performing unnecessary roundtrips to the database.
In the example above, assuming all caches are empty, Glorp will perform one query to retrieve all the Invoices and then one extra query to retrieve the data to instantiate each Person. So if you retrieve Invoices for three different persons, Glorp will perform four queries, it is four roundtrips, to the database. Keep that in mind while we continue with other examples.
The message sum:
above will retrieve all the rows from the tables,
instantiate an equivalent quantity of Invoice and then go over all
the invoices' items to sum them. In our example, we only have a few instances
but what if we had a million of them? Instantiating everything at the Pharo
side wouldn't be convenient for such a simple sum. Performing an SQL SUM()
at the database side would be more efficient, since we would only retrieve
the total instead. But how to do such a query?
The Query object
So far we've been querying objects by means of sending read:
or
read:where:
to the session object, but if you want to perform
a particular query, like limiting the number of results, running an aggregate
function like SUM()
, AVG()
, ordering the rows at the server side,
performing subqueries and many other features then the session object
doesn't provide you with an API for that.
The read: message
Everytime you sent a read:
message to the session, the session created a
simple Query
object to read the class passed as parameter as in the following expression.
But you can instantiate a Query object independently, configure it to fit your needs and
execute it in the session. Enter the Query
object!
SimpleQuery
Let's say we want to know the sum of all the prices of our InvoiceItems,
we can do that by using the retrieve:
method and using an aggregate function
like sum
.
The retrieve:
method configures the query to return only the retrieved attributes
of the read object. The query will return a collection with the objects
retrieved instead of a collection of instances of the read class.
If we know beforehand that our query will return a single row with a single
field, i.e., a single object, we can configure the query to return that
object directly by setting readsOneObject:
to true.
If you retrieve more than one object then the query will return a collection of collections, in the form of #(#(...) #(...) ...).
Attributes of queries
The Query object models an SQL SELECT
statement, and like that statement
it can include a WHERE
, ORDER BY
, GROUP BY
, HAVING
, etc.
WHERE expression
You can always include a where expression as you did when reading through the session object. Inside the where expression, you can filter by attributes of referenced objects.
Or even by interacting with a referenced collection, i.e., retrieving only Invoices containing items with a price above 2000.
ORDER BY expression
The ORDER BY
expression in a Query object is set by sending orderBy:
to
the query object as many times, as fields you want to include in your ordering
criteria.
GROUP BY expression
As shown in previous code samples you can also perform SQL's aggregate
queries like SUM()
, COUNT()
, etc.; if you want to perform those
aggregate functions grouped by certain expression, then the groupBy:
option comes to the rescue.
In the above example you can see we're retrieving two fields concatenated and grouped by the same concatenated value. Remember that when using an aggregate function, all the non-aggregated fields retrieved must be included in the GROUP BY. Notice here we're also grouping by a field in other table, Glorp resolves all the JOINs for you.
Retrieving the average donation
Although the GROUP BY expression is used extensively in production systems, it only works for retrieving fields of the query and not full-fledged instances of your objects described in your DescriptorSystem.
LIMIT and OFFSET expressions
You can also limit the number of rows returned by your Query, this is useful if you know your query will return one single row/object or if only want to retrieve a certain number of rows. The LIMIT expression varies between different databases, Glorp will make sure it generates the right SQL code for the plaform you're using.
When the LIMIT expression is used in combination with the OFFSET expression it lets you implement simple pagination queries.
Fetching referenced objects
We mentioned before that when you're querying the Invoices Glorp will place a Proxy object for each reference (e.g. Person) it doesn't have in the session cache, and then once the Proxy is hit by a message send it will materialize it by retrieving its contents by performing a database Query.
Imagine this scenario for at least 1000 Invoices that reference a different Person each, that would be 1001 queries and their resulting I/O impact.
In a regular SQL query you would solve that by doing a regular JOIN with the related table; when reading first class Objects with Glorp it makes it almost as easy as doing a JOIN in SQL.
You can specify as many alsoFetch:
references as you want, and Glorp
will knit all together by using JOIN expressions created from the Join
objects defined when configuring the reference mappings of the Class Descriptor.
If nothing else is added to the alsoFetch:
expression block it will perform
INNER JOIN
to knit the relations. If you're also fetching a
reference that might be nil then the whole object/row would be excluded
from the result. For instance, if an Invoice doesn't have an Address, then
with the above example that invoice would be left out by the JOIN.
You can save that by doing an OUTER JOIN
on the field you're fetching
by simply adding asOuterJoin
to the field.
Retrieving all at once with an OUTER JOIN
The alsoFetch:
is a powerful feature, and can be used in combination with
all the above described Query attributes (except for GROUP BY).
Ensuring fresh data
Everytime you read an object inside a Glorp session, it will save it in the
session cache keyed by an object representing the Primary Key in your database,
in our example each ClassDescriptor will have a cache keyed by the id
field,
which happens to be our Public Key in the tables.
This is great because, as any proper cache would, it avoids going outside of your image to retrieve something from the database, saving you from the extra roundtrips. However, when the objects are modified concurrently during the lifetime of your session, then you could end up with an old version of it, and even if you perform a new Query to the database Glorp will try to use the same object it had in the session Cache.
As you might expect by now, Glorp provides a way to override this and force
the update of the Cache with the newly read data. Just simply set
shouldRefresh: true
, and this will tell Glorp to ignore any attempt
to read from the Cache and instead always use what is new.
Set operations
In addition to statements that affect the result set of the query,
you can perform
set operations
like UNION ALL
, INTERSECT
and EXCLUDE
(aka subtract)
between your query and other queries.
SQL SET OPERATOR | Query selector | |
UNION ALL |
union: |
|
INTERSECT |
intersect: |
|
EXCLUDE |
minus: |
Conclusion
The SimpleQuery offers powerful logic and you should play with it to understand how to use it.
Handling inheritance
In this chapter we will look at how Glorp help us handling object of different subclasses.
When a class Parent
has two subclasses LeftChild
and
RightChild
, a polymorphic query is one where we can query on the
Parent
class and as a result get instances of Parent
, LeftChild
and RightChild
.
There are three common ways to support polymorphic queries with a relational database:
- A table for each class (abstract or not)
- A table for each concrete class
- One table for all classes
Glorp currently supports the two latest options. We will illustrate these two approaches now.
A table for each concrete class
In this approach each concrete class has its own table.
That table holds all the instance variables for the class.
As an example of this approach, let us imagine that the Parent class an abstract class.
Therefore, we need two tables, which we call LEFTCHILDTABLE
and
RIGHTCHILDTABLE
.
Class definitions
The class definitions are straightforward.
System Description
We'll extend our System Descriptor to include the newly created classes.
Class model for LeftChild
.
Class model for RightChild
.
Describing the tables.
We introduced sequenceTypeNamed:
, that we didn't use before. When using
sequence
or serial
datatypes, Glorp will create a database sequence
for each table if required, but because we're using a type resolver
with inheritance, it is necessary that all the children classes that use
a serial
datatype share the same sequence.
The message sequenceTypeNamed:
lookups in the Descriptor System for an existing sequence with that name,
and if no one was registered before it will instantiate a new one with that name and register it for later use.
Handling inheritance
So far we created the class models for two classes, and a table model for each
one of them, but nothing involving inheritance. Notice we don't have to create a
class model for the Parent
class, although we will have to create a
class descriptor for it.
As you can see, we introduced a concept in the Class Descriptor that we never used, explicitly, before: a Type Resolver (or resolver for short).
Type resolver
Every Class Descriptor has a resolver that helps it determine which
class to instantiate. If Glorp doesn't find one explicitly defined, it
creates an IdentityTypeResolver
for your class.
You can define the resolver for your class by implementing a method
named typeResolverForYourClass
that returns an instance of TypeResolver.
We will create one for our class Parent
There are different classes of resolvers:
- IdentityTypeResolver
- No inheritance
- HorizontalTypeResolver
- Each Class is persisted in a separate table
- FilteredTypeResolver
- All Classes are persisted in the same table
We chose the HorizontalTypeResolver
because we're storing our instances
in separate tables. IdentityTypeResolver
is the default one: it is for the case when there is no inheritance,
where you map a class to a single table, or more than one table but always for a single class.
and we will cover FilteredTypeResolver
later.
what is IdentityTypeResolver
Each of the class descriptor methods register a resolver.
The method typeResolverFor:
results in a call to typeResolverForParent
.
If you look at the last line in the definition of Descriptors for LeftChild
and RightChild
you'll notice we're registering the Descriptor in the
Type Resolver of the Parent
class; both child classes, unless overriden,
will continue having their default resolver.
Playing with instances
Now we can play with some sample instances
With the setup above, we can write queries like the following one which
query the Parent
class:
We can still write queries directly on the subclasses
There is one restriction on polymorphic queries when using
HorizontalTypeResolver
(a table for each Concrete Class) which is one can
not use orderBy:
. The query makes multiple requests to the database as it
has to read both tables, so we cannot have the database order the results
for use. As a result the following request will raise an exception becausehorizontal resolvers can't order.
One table for all classes
In this alternate approach for table organization, there is one table that holds the data
for all objects in the hierarchy. In this example the one table is called
ALL_DATA
. We include an object_type
column used to store the type
of the object in the row. This allows GLORP to create an instance of the
correct class when it reads a row in the table. Note
that regardless of what type of object is in a row, one column,
either left_value
or right_value
, will be null
.
While this method does waste space in the database, performing a polymorphic
query is faster as there is only one table to access.
We start by describing the new Type Resolver for the Parent
class.
The class definitions remain the same as in the Table for each concrete class example. The descriptor changes a bit.
Since we use a different resolver, we need to supply values for object_type
column and need more information for the Parent
class.
Note how to indicate the value of the object_type
column for
LeftChild
objects with the resolver.
The argument for the keyedBy:
parameter is the String 'L'
but it could be any String of your preference, like the class name. Keep in mind that the shorter the key, the less storage space it will take, and also less bandwidth used.
Even though Parent
is still an abstract class, Glorp requires us to given
the mappings for the Parent
instance variables. If Parent
were not abstract
we would have to provide a key value for the column object_type
.
About inheritance with Type Resolvers
Although in our example with the class Parent
was the superclass of both
LeftChild
and RightChild
, in Glorp terms it is not required that
child Descriptors share a common class ancestor, as far as they have the same
class attributes as the parent Descriptor.
Under the hood
How Glorp expressions work
At this point you might have noticed that we use blocks in many places to specify the where expression, retrieve a particular field, order the results, and many other things.
Although we use blocks for the convenience and readability they provide, once assigned to a Glorp object such as
SimpleQuery
, Glorp converts the block to a Glorp expression, depending
on the context it may be any of the subclasses of GlorpExpression
.
To convert the block to an expression, Glorp passes a MessageArchiver
as the
first argument of the block, the message archiver will intercept all the message sends vía doesNotUnderstand:
hook,
and will return another archiver, so the next message send is handled by the
new archiver and so on. With this strategy it will construct a tree
of instances of GlorpExpression
.
Using each as the block argument name can be misleading, but it is used
by convention, but it has more sense in the context of a filtering expression
than in a retrieve:
or groupBy:
.
Because it uses doesNotUnderstand:
as a way to build the expression tree,
it is possible that some selectors will be highlighted by Pharo as not
implemented, in particular salectors that are used to instantiate
FunctionExpression
's.
About special selectors
When writing blocks that involve more than one condition, we don't use
and:
nor or:
selectors, and instead we use AND:
and OR:
respectively. The same applies for isNil
and notNil
. This is so
because the compiler optimizes certain message sends, including the ones
mentioned before.
Example of special selectors:
Function expressions
The special selectors mentioned before are nothing but a particular case of a
FunctionExpression
, but as those there are many others, consider the
following Glorp query:
You can see there that we send the asUppercase
message to the lastName
attribute and then similarTo:
to that result. The previous query will produce something like this:
It is very likely that the similarTo:
selector will be highlighted
as missing or not implemented in the current image. This is so because
the MessageArchiver
will resolve the symbol to an instance of one
of the subclasses of FunctionExpression
. It achieves it by rebuilding
the archiver as it consumes the selectors, you can find how it does by
looking for the implementors of rebuildOn:startingFrom:withOuterScopeBase:
.
In the middle of those implementors you'll find senders to
getFunction:arguments:
, this in turn will query the platform
for the list of functions available, which will end up creating
the list of default functions by sending createBasicFunctionsFor:
in to the FunctionExpression class
.
If you ever need to add functions not supported by default, you can extend
the functions in FunctionExpression class >> createBasicFunctionsFor:
or by adding them to your current platform. Also you can read there the list
of available symbol functions and their default mapping to SQL functions.
Glorp class model diagram
The following diagram provides an outline of the most important classes and attributes of Glorp's object model.
Appendix A: Basic Relational Databases Concepts
Tables
A table is a collection of related data held in a structured format within a database. It consists of fields (columns), and rows.
In relational databases and flat file databases, a table is a set of data elements (values) using a model of vertical columns (identifiable by name) and horizontal rows, the cell being the unit where a row and column intersect.
A table has a specified number of columns, but can have any number of rows. Each row is identified by one or more values appearing in a particular column subset. The columns subset which uniquely identifies a row is called the primary key.
Rows
In the context of a relational database, a row —also called a record or tuple— represents a single, implicitly structured data item in a table. In simple terms, a database table can be thought of as consisting of rows and columns or fields. Each row in a table represents a set of related data, and every row in the table has the same structure.
Columns
For example, in a table that represents companies, each row would represent a single company. Columns might represent things like company name, company street address, whether the company is publicly held, its VAT number, etc.. In a table that represents the association of employees with departments, each row would associate one employee with one department.
Constraints
Constraints make it possible to further restrict the domain of an attribute. For instance, a constraint can restrict a given integer attribute to values between 1 and 10. Constraints provide one method of implementing business rules in the database. SQL implements constraint functionality in the form of check constraints.
Constraints restrict the data that can be stored in relations. These are usually defined using expressions that result in a boolean value, indicating whether or not the data satisfies the constraint. Constraints can apply to single attributes, to a tuple (restricting combinations of attributes) or to an entire relation.
Since every attribute has an associated domain, there are constraints (domain constraints). The two principal rules for the relational model are known as entity integrity and referential integrity.
Primary Key
A primary key uniquely specifies a tuple within a table. In order for an attribute to be a good primary key it must not repeat. While natural attributes (attributes used to describe the data being entered) are sometimes good primary keys, surrogate keys are often used instead.
A surrogate key is an artificial attribute assigned to an object which uniquely identifies it (for instance, in a table of information about students at a school they might all be assigned a student ID in order to differentiate them).
The surrogate key has no intrinsic (inherent) meaning, but rather is useful through its ability to uniquely identify a tuple. Another common occurrence, especially in regard to N:M cardinality is the composite key. A composite key is a key made up of two or more attributes within a table that (together) uniquely identify a record. (For example, in a database relating students, teachers, and classes. Classes could be uniquely identified by a composite key of their room number and time slot, since no other class could have exactly the same combination of attributes. In fact, use of a composite key such as this can be a form of data verification, albeit a weak one.
Foreign Keys
A foreign key is a field in a relational table that matches the primary key column of another table. The foreign key can be used to cross-reference tables. Foreign keys do not need to have unique values in the referencing relation. Foreign keys effectively use the values of attributes in the referenced relation to restrict the domain of one or more attributes in the referencing relation.
Indexes
An index is one way of providing quicker access to data. Indices can be created on any combination of attributes on a relation. Queries that filter using those attributes can find matching tuples randomly using the index, without having to check each tuple in turn. This is analogous to using the index of a book to go directly to the page on which the information you are looking for is found, so that you do not have to read the entire book to find what you are looking for.
Relational databases typically supply multiple indexing techniques, each of which is optimal for some combination of data distribution, relation size, and typical access pattern.
Indices are usually not considered part of the database, as they are considered an implementation detail, though indices are usually maintained by the same group that maintains the other parts of the database. It should be noted that use of efficient indexes on both primary and foreign keys can dramatically improve query performance.
Relations
Joins
A SQL join clause combines records from two or more tables in a relational database. A JOIN is a means for combining fields from two tables (or more) by using values common to each
ANSI-standard SQL specifies five types of JOIN: INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER and CROSS. As a special case, a table (base table, view, or joined table) can JOIN to itself in a self-join.
INNER JOINs
An inner join requires each record in the two joined tables to have matching records, and is a commonly used join operation in applications but should not be assumed to be the best choice in all situations. Inner join creates a new result table by combining column values of two tables (A and B) based upon the join-predicate. The query compares each row of A with each row of B to find all pairs of rows which satisfy the join-predicate. When the join-predicate is satisfied by matching non-NULL values, column values for each matched pair of rows of A and B are combined into a result row.
The result of the join can be defined as the outcome of first taking the Cartesian product (or Cross join) of all records in the tables (combining every record in table A with every record in table B) and then returning all records which satisfy the join predicate. Actual SQL implementations normally use other approaches, such as hash joins or sort-merge joins, since computing the Cartesian product is slower and would often require a prohibitively large memory space to store.
OUTER JOINs
The joined table retains each record—even if no other matching record exists. Outer joins subdivide further into left outer joins, right outer joins, and full outer joins, depending on which table's rows are retained (left, right, or both).
The result of a left outer join (or simply left join) for tables A and B always contains all records of the "left" table (A), even if the join-condition does not find any matching record in the "right" table (B). This means that if the ON clause matches 0 (zero) records in B (for a given record in A), the join will still return a row in the result (for that record)—but with NULL in each column from B. A left outer join returns all the values from an inner join plus all values in the left table that do not match to the right table, including rows with NULL (empty) values in the link field.
For example, this allows us to find an employee's department, but still shows employees that have not been assigned to a department (contrary to the inner-join example above, where unassigned employees were excluded from the result).
Data manipulation language (DML) queries
SELECT
A SELECT
statement retrieves zero or more rows from one
or more database tables or database views. In most applications, SELECT is the
most commonly used data manipulation language (DML) command. As SQL is a
declarative programming language, SELECT queries specify a result set, but do
not specify how to calculate it. The database translates the query into a "query
plan" which may vary between executions, database versions and database
software. This functionality is called the "query optimizer" as it is
responsible for finding the best possible execution plan for the query, within
applicable constraints.
INSERT
An SQL INSERT
statement adds one or more
records to any single table in a relational database.
The number of columns and values must be the same. If a column is not specified, the default value for the column is used. The values specified (or implied) by the INSERT statement must satisfy all the applicable constraints (such as primary keys, CHECK constraints, and NOT NULL constraints). If a syntax error occurs or if any constraints are violated, the new row is not added to the table and an error returned instead.
UPDATE
An SQL UPDATE
statement changes the data of one or more records in a
table. Either all the rows can be updated, or a subset may be chosen using a
condition.
In some databases, such as PostgreSQL, when a FROM clause is present, what essentially happens is that the target table is joined to the tables mentioned in the fromlist, and each output row of the join represents an update operation for the target table. When using FROM, one should ensure that the join produces at most one output row for each row to be modified. In other words, a target row shouldn't join to more than one row from the other table(s). If it does, then only one of the join rows will be used to update the target row, but which one will be used is not readily predictable.
Because of this indeterminacy, referencing other tables only within sub-selects is safer, though often harder to read and slower than using a join.
DELETE
In the database structured query language (SQL), the DELETE
statement removes one or more records from a table. A subset may be defined for
deletion using a condition, otherwise all records are removed.
Any rows that match the WHERE condition will be removed from the table. If the WHERE clause is omitted, all rows in the table are removed. The DELETE statement should thus be used with caution.
The DELETE statement does not return any rows; that is, it will not generate a result set.