Schemaless storage via a variant
As we just wrote, a first-class candidate for data sharding in a
TSQLRecord
is our TDocVariant custom variant type.
You may define:
TSQLRecordData = class(TSQLRecord)
private
fName: RawUTF8;
fData: variant;
public
published
property Name: RawUTF8 read fTest write fTest stored AS_UNIQUE;
property Data: variant read fData write fData;
end;
Here, we defined two indexed keys, ready to access any data record:
- Via the
ID: TID
property defined atTSQLRecord
level, which will map the SQLite3RowID
primary key; - Via the
Name: RawUTF8
property, which will was marked to be indexed by setting the "stored AS_UNIQUE
" attribute.
Then, any kind of data may be stored in the Data: variant
published property. In the database, it will be stored as JSON UTF-8 text,
ready to be retrieved from any client, including AJAX / HTML5
applications.
Delphi clients or servers will access those data via
late-binding, from its TDocVariant
instance.
You just reproduced the schema-less approach of the NoSQL database engines, in a few lines of code! Thanks to the mORMot's JSON RESTful Client-Server design, your applications are able to store any kind of document, and easily access to them via HTTP.
The documents stored in such a database can have varying sets of fields,
with different types for each field. One could have the following objects in a
single collection of our Data: variant
rows:
{ name : "Joe", x : 3.3, y : [1,2,3] } { name : "Kate", x : "abc" } { q : 456 }
Of course, when using the database for real problems, the data does have a fairly consistent structure. Something like the following would be more common, e.g. for a table persisting student objects:
{ name : "Joe", age : 30, interests : "football" } { name : "Kate", age : 25 }
Generally, there is a direct analogy between this schema-less style
and dynamically typed languages. Constructs such as those above are easy to
represent in PHP, Python and Ruby. And, thanks to
our TDocVariant
late-binding magic, even our good
Delphi is able to handle those structures in our code. What we are
trying to do here is make this mapping to the database natural, like:
var aRec: TSQLRecordData; aID: TID; begin // initialization of one record aRec := TSQLRecordData.Create; aRec.Name := 'Joe'; // one unique key aRec.data := _JSONFast('{name:"Joe",age:30}'); // create a TDocVariant // or we can use this overloaded constructor for simple fields aRec := TSQLRecordData.Create(['Joe',_ObjFast(['name','Joe','age',30])]); // now we can play with the data, e.g. via late-binding: writeln(aRec.Name); // will write 'Joe' writeln(aRec.Data); // will write '{"name":"Joe","age":30}' (auto-converted to JSON string) aRec.Data.age := aRec.Data.age+1; // one year older aRec.Data.interests := 'football'; // add a property to the schema aID := aClient.Add(aRec,true); // will store {"name":"Joe","age":31,"interests":"footbal"} aRec.Free; // now we can retrieve the data either via the aID created integer, or via Name='Joe' end;
One of the great benefits of these dynamic objects is that schema migrations
become very easy.
With a traditional RDBMS, releases of code might contain data migration
scripts. Further, each release should have a reverse migration script in case a
rollback is necessary.
ALTER TABLE
operations can be very slow and result in scheduled
downtime.
With a schema-less organization of the data, 90% of the time
adjustments to the database become transparent and automatic.
For example, if we wish to add GPA to the student objects, we add the
attribute, re-save, and all is well - if we look up an existing student and
reference GPA, we just get back null.
Furthermore, if we roll back our code, the new GPA fields in the existing
objects are unlikely to cause problems if our code was well written.
In fact, SQlite3 is so efficient about its indexes B-TREE storage,
that such a structure may be used as a credible alternative to much heavier
NoSQL engines, like MongoDB or CouchDB.
With the possibility to add some "regular" fields, e.g. plain numbers (like
ahead-computed aggregation values), or text (like a summary or description
field), you can still use any needed fast SQL query, without the complexity of
map/reduce algorithm used by the NoSQL paradigm. You could
even use the Full Text Search - see
FTS3/FTS4 - or RTREE extension advanced features of SQLite3
to perform your queries.
Then, thanks to mORMot's ability to access any external database
engine, you are able to perform a JOINed query of your schema-less
data with some data stored e.g. in an Oracle, PostgreSQL or MS SQL enterprise
database. Or switch later to a true MongoDB storage, in just one line
of code - see
MongoDB + ORM = ODM.
JSON operations from SQL code
As we stated, any variant
field would be serialized as JSON,
then stored as plain TEXT in the database.
In order to make a complex query on the stored JSON, you could retrieve it in
your end-user code, then use the corresponding TDocVariant
instance to perform the search on its content.
Of course, all this has a noticeable performance cost, especially when the data
tend to grow.
The natural way of solving those performance issue is to add some "regular"
RDBMS fields, with a proper index, then perform the requests on those
fields.
But sometimes, you may need to do some addition query, perhaps in conjunction
with "regular" field lookup, on the JSON data stored itself.
In order to avoid the slowest conversion to the ORM client side, we defined
some SQL functions, dedicated to JSON process.
The first is JsonGet()
, and is able to extract any value from
the TEXT field, mapping a variant
:
JsonGet(ArrColumn,0) |
returns a property value by index, from a JSON array |
JsonGet(ObjColumn,'PropName') |
returns a property value by name, from a JSON object |
JsonGet(ObjColumn,'Obj1.Obj2.Prop') |
returns a property value by path, including nested JSON objects |
JsonGet(ObjColumn,'Prop1,Prop2') |
extract properties by name, from a JSON object |
JsonGet(ObjColumn,'Prop1,Obj1.Prop') |
extract properties by name (including nested JSON objects), from a JSON object |
JsonGet(ObjColumn,'Prop*') |
extract properties by wildchar name, from a JSON object |
JsonGet(ObjColumn,'Prop*,Obj1.P*') |
extract properties by wildchar name (including nested JSON objects), from a JSON object |
If no value does match, this function would return the SQL
NULL
. If the matching value is a simple JSON text or number, it
will be returned as a TEXT, INTEGER or DOUBLE value, ready to be passed as a
result column or any WHERE clause. If the returned value is a nested JSON
object or array, it will be returned as TEXT, serialized as JSON; as a
consequence, you may use it as the source of another JsonGet()
function, or even able to gather the results via the CONCAT()
aggregate function.
The comma-separated syntax allowed in the property name parameter (e.g.
'Prop1,Prop2,Prop3'
), would search for several properties at once
in a single object, returning a JSON object of all matching values - e.g.
'{"Prop2":"Value2","Prop3":123}'
if the Prop1
property did not appear in the stored JSON object.
If you end the property name with a *
character, it would
return a JSON object, with all matching properties.
Any nested object would have its property names be flattened as
{"Obj1.Prop":...}
, within the returned JSON object.
Note that the comma-separated syntax also allows such wildchar search, so that
e.g.
JsonGet(ObjColumn,'owner') = {"login":"smith","id":123456} as TEXT JsonGet(ObjColumn,'owner.login') = "smith" as TEXT JsonGet(ObjColumn,'owner.id') = 123456 as INTEGER JsonGet(ObjColumn,'owner.name') = NULL JsonGet(ObjColumn,'owner.login,owner.id') = {"owner.login":"smith","owner.id":123456} as TEXT JsonGet(ObjColumn,'owner.I*') = {"owner.id:123456} as TEXT JsonGet(ObjColumn,'owner.*') = {"owner.login":"smith","owner.id":123456} as TEXT JsonGet(ObjColumn,'unknown.*') = NULL
Another function, named JsonHas()
is similar to
JsonGet()
, but will return TRUE or FALSE depending if the supplied
property (specified by name or index) do exist. It may be faster to use
JsonHas()
than JsonGet()
e.g. in a WHERE clause, when
you do not want to process this property value, but only return data rows
containing needed information.
JsonHas(ObjColumn,'owner') = true JsonHas(ObjColumn,'owner.login') = true JsonHas(ObjColumn,'owner.name') = false JsonHas(ObjColumn,'owner.i*') = true JsonHas(ObjColumn,'owner.n*') = false
Since the process would take place within the SQLite3 engine itself, and since they use a SAX-like fast approach (without any temporary memory allocation during its search), those JSON functions could be pretty efficient, and proudly compare to some dedicated NoSQL engines.
The upcoming official SQlite3 extension
As you may have noticed in the SQlite3 timeline, some extension functions for processing JSON are currently being implemented.
The mORMot native implementation is very proven, since it uses the JSON core of the framework, and is able to handle the "extended" JSON syntax of MongoDB: field names can be serialized without double quotes, which reduces a lot the storage size, and the transmission size, when the JSON is stored within a TEXT column, which would escape all double quote characters.
I guess the performance of our implementation is probably faster than
current SQlite3's extension.
We use a SAX-like fast approach, without any temporary memory allocation during
its search, whereas AFAIK SQlite3's extension is doing a lot of
memory allocations, creating all nodes of the JSON documents, like a DOM.
But here again, we are not exclusive guys. If you like this upcoming JSON extension, you are free to use it! No offense taken!
Feedback is welcome on our forum, as usual.