Schemaless storage via a variant

As we just wrote, a first-class candidate for data sharding in a TSQLRecord is our TDocVariant custom variant type.

You may define:

 TSQLRecordData = class(TSQLRecord)
 private
    fName: RawUTF8;
    fData: variant;
 public
 published
   property Name: RawUTF8 read fTest write fTest stored AS_UNIQUE;
   property Data: variant read fData write fData;
 end;

Here, we defined two indexed keys, ready to access any data record:

  • Via the ID: TID property defined at TSQLRecord level, which will map the SQLite3 RowID primary key;
  • Via the Name: RawUTF8 property, which will was marked to be indexed by setting the "stored AS_UNIQUE" attribute.

Then, any kind of data may be stored in the Data: variant published property. In the database, it will be stored as JSON UTF-8 text, ready to be retrieved from any client, including AJAX / HTML5 applications.
Delphi clients or servers will access those data via late-binding, from its TDocVariant instance.

You just reproduced the schema-less approach of the NoSQL database engines, in a few lines of code! Thanks to the mORMot's JSON RESTful Client-Server design, your applications are able to store any kind of document, and easily access to them via HTTP.

The documents stored in such a database can have varying sets of fields, with different types for each field. One could have the following objects in a single collection of our Data: variant rows:

 { name : "Joe", x : 3.3, y : [1,2,3] }
 { name : "Kate", x : "abc" }
 { q : 456 }

Of course, when using the database for real problems, the data does have a fairly consistent structure. Something like the following would be more common, e.g. for a table persisting student objects:

 { name : "Joe", age : 30, interests : "football" }
 { name : "Kate", age : 25 }

Generally, there is a direct analogy between this schema-less style and dynamically typed languages. Constructs such as those above are easy to represent in PHP, Python and Ruby. And, thanks to our TDocVariant late-binding magic, even our good Delphi is able to handle those structures in our code. What we are trying to do here is make this mapping to the database natural, like:

var aRec: TSQLRecordData;
    aID: TID;
begin
  // initialization of one record
  aRec := TSQLRecordData.Create;
  aRec.Name := 'Joe';                              // one unique key
  aRec.data := _JSONFast('{name:"Joe",age:30}');   // create a TDocVariant
  // or we can use this overloaded constructor for simple fields
  aRec := TSQLRecordData.Create(['Joe',_ObjFast(['name','Joe','age',30])]);
  // now we can play with the data, e.g. via late-binding:
  writeln(aRec.Name);     // will write 'Joe'
  writeln(aRec.Data);     // will write '{"name":"Joe","age":30}' (auto-converted to JSON string)
  aRec.Data.age := aRec.Data.age+1;    // one year older
  aRec.Data.interests := 'football';   // add a property to the schema
  aID := aClient.Add(aRec,true);       // will store {"name":"Joe","age":31,"interests":"footbal"}
  aRec.Free;
  // now we can retrieve the data either via the aID created integer, or via Name='Joe'
end;

One of the great benefits of these dynamic objects is that schema migrations become very easy.
With a traditional RDBMS, releases of code might contain data migration scripts. Further, each release should have a reverse migration script in case a rollback is necessary.
ALTER TABLE operations can be very slow and result in scheduled downtime.

With a schema-less organization of the data, 90% of the time adjustments to the database become transparent and automatic.
For example, if we wish to add GPA to the student objects, we add the attribute, re-save, and all is well - if we look up an existing student and reference GPA, we just get back null.
Furthermore, if we roll back our code, the new GPA fields in the existing objects are unlikely to cause problems if our code was well written.

In fact, SQlite3 is so efficient about its indexes B-TREE storage, that such a structure may be used as a credible alternative to much heavier NoSQL engines, like MongoDB or CouchDB.
With the possibility to add some "regular" fields, e.g. plain numbers (like ahead-computed aggregation values), or text (like a summary or description field), you can still use any needed fast SQL query, without the complexity of map/reduce algorithm used by the NoSQL paradigm. You could even use the Full Text Search - see FTS3/FTS4 - or RTREE extension advanced features of SQLite3 to perform your queries.
Then, thanks to mORMot's ability to access any external database engine, you are able to perform a JOINed query of your schema-less data with some data stored e.g. in an Oracle, PostgreSQL or MS SQL enterprise database. Or switch later to a true MongoDB storage, in just one line of code - see MongoDB + ORM = ODM.

JSON operations from SQL code

As we stated, any variant field would be serialized as JSON, then stored as plain TEXT in the database.
In order to make a complex query on the stored JSON, you could retrieve it in your end-user code, then use the corresponding TDocVariant instance to perform the search on its content.
Of course, all this has a noticeable performance cost, especially when the data tend to grow.

The natural way of solving those performance issue is to add some "regular" RDBMS fields, with a proper index, then perform the requests on those fields.
But sometimes, you may need to do some addition query, perhaps in conjunction with "regular" field lookup, on the JSON data stored itself.
In order to avoid the slowest conversion to the ORM client side, we defined some SQL functions, dedicated to JSON process.

The first is JsonGet(), and is able to extract any value from the TEXT field, mapping a variant:

JsonGet(ArrColumn,0) returns a property value by index, from a JSON array
JsonGet(ObjColumn,'PropName') returns a property value by name, from a JSON object
JsonGet(ObjColumn,'Obj1.Obj2.Prop') returns a property value by path, including nested JSON objects
JsonGet(ObjColumn,'Prop1,Prop2') extract properties by name, from a JSON object
JsonGet(ObjColumn,'Prop1,Obj1.Prop') extract properties by name (including nested JSON objects), from a JSON object
JsonGet(ObjColumn,'Prop*') extract properties by wildchar name, from a JSON object
JsonGet(ObjColumn,'Prop*,Obj1.P*') extract properties by wildchar name (including nested JSON objects), from a JSON object

If no value does match, this function would return the SQL NULL. If the matching value is a simple JSON text or number, it will be returned as a TEXT, INTEGER or DOUBLE value, ready to be passed as a result column or any WHERE clause. If the returned value is a nested JSON object or array, it will be returned as TEXT, serialized as JSON; as a consequence, you may use it as the source of another JsonGet() function, or even able to gather the results via the CONCAT() aggregate function.

The comma-separated syntax allowed in the property name parameter (e.g. 'Prop1,Prop2,Prop3'), would search for several properties at once in a single object, returning a JSON object of all matching values - e.g. '{"Prop2":"Value2","Prop3":123}' if the Prop1 property did not appear in the stored JSON object.

If you end the property name with a * character, it would return a JSON object, with all matching properties.
Any nested object would have its property names be flattened as {"Obj1.Prop":...}, within the returned JSON object.
Note that the comma-separated syntax also allows such wildchar search, so that e.g.

 JsonGet(ObjColumn,'owner') = {"login":"smith","id":123456} as TEXT
JsonGet(ObjColumn,'owner.login') = "smith" as TEXT
JsonGet(ObjColumn,'owner.id') = 123456 as INTEGER
JsonGet(ObjColumn,'owner.name') = NULL
JsonGet(ObjColumn,'owner.login,owner.id') = {"owner.login":"smith","owner.id":123456} as TEXT
JsonGet(ObjColumn,'owner.I*') = {"owner.id:123456} as TEXT
JsonGet(ObjColumn,'owner.*') = {"owner.login":"smith","owner.id":123456} as TEXT
JsonGet(ObjColumn,'unknown.*') = NULL

Another function, named JsonHas() is similar to JsonGet(), but will return TRUE or FALSE depending if the supplied property (specified by name or index) do exist. It may be faster to use JsonHas() than JsonGet() e.g. in a WHERE clause, when you do not want to process this property value, but only return data rows containing needed information.

 JsonHas(ObjColumn,'owner') = true
JsonHas(ObjColumn,'owner.login') = true
JsonHas(ObjColumn,'owner.name') = false
JsonHas(ObjColumn,'owner.i*') = true
JsonHas(ObjColumn,'owner.n*') = false

Since the process would take place within the SQLite3 engine itself, and since they use a SAX-like fast approach (without any temporary memory allocation during its search), those JSON functions could be pretty efficient, and proudly compare to some dedicated NoSQL engines.

The upcoming official SQlite3 extension

As you may have noticed in the SQlite3 timeline, some extension functions for processing JSON are currently being implemented.

The mORMot native implementation is very proven, since it uses the JSON core of the framework, and is able to handle the "extended" JSON syntax of MongoDB: field names can be serialized without double quotes, which reduces a lot the storage size, and the transmission size, when the JSON is stored within a TEXT column, which would escape all double quote characters.

I guess the performance of our implementation is probably faster than current SQlite3's extension.
We use a SAX-like fast approach, without any temporary memory allocation during its search, whereas AFAIK SQlite3's extension is doing a lot of memory allocations, creating all nodes of the JSON documents, like a DOM.

But here again, we are not exclusive guys. If you like this upcoming JSON extension, you are free to use it! No offense taken!

Feedback is welcome on our forum, as usual.