MongoDB client

The SynMongoDB.pas unit features direct optimized access to a MongoDB server.

It gives access to any BSON data, including documents, arrays, and MongoDB's custom types (like ObjectID, dates, binary, regex or Javascript):

  • For instance, a TBSONObjectID can be used to create some genuine document identifiers on the client side (MongoDB does not generate the IDs for you: a common way is to generate unique IDs on the client side);
  • Generation of BSON content from any Delphi types (via TBSONWriter);
  • Fast in-place parsing of the BSON stream, without any memory allocation (via TBSONElement);
  • A TBSONVariant custom variant type, to store MongoDB's custom type values;
  • Interaction with the SynCommons' TDocVariant custom variant type as document storage and late-binding access;
  • Marshalling BSON to and from JSON, with the MongoDB extended syntax for handling its custom types.

This unit defines some objects able to connect and manage databases and collections of documents on any MongoDB servers farm:

  • Connection to one or several servers, including secondary hosts, via the TMongoClient class;
  • Access to any database instance, via the TMongoDatabase class;
  • Access to any collection, via the TMongoCollection class;
  • It features some nice abilities about speed, like BULK insert or delete mode, and explicit Write Concern settings.

At collection level, you can have direct access to the data, with high level structures like TDocVariant/TBSONVariant, with easy-to-read JSON, or low level BSON content.
You can also tune most aspects of the client process, e.g. about error handling or write concerns (i.e. how remote data modifications are acknowledged).

Connecting to a server

Here is some sample code, which is able to connect to a MongoDB server, and returns the server time:

var Client: TMongoClient;
    DB: TMongoDatabase;
    serverTime: TDateTime;
    res: variant; // we will return the command result as TDocVariant
    errmsg: RawUTF8;
begin
  Client := TMongoClient.Create('localhost',27017);
  try
    DB := Client.Database['mydb'];
    writeln('Connecting to ',DB.Name); // will write 'mydb'
    errmsg := DB.RunCommand('hostInfo',res); // run a command
    if errmsg<>'' then
      exit; // quit on any error
    serverTime := res.system.currentTime; // direct conversion to TDateTime
    writeln('Server time is ',DateTimeToStr(serverTime));
  finally
    Client.Free; // will release the DB instance
  end;
end;

Note that for this low-level command, we used a TDocVariant, and its late-binding abilities.

In fact, if you put your mouse over the res variable during debugging, you will see the following JSON content:

{"system":{"currentTime":"2014-05-06T15:24:25","hostname":"Acer","cpuAddrSize":64,"memSizeMB":3934,"numCores":4,"cpuArch":"x86_64","numaEnabled":false},"os":{"type":"Windows","name":"Microsoft Windows 7","version":"6.1 SP1 (build 7601)"},"extra":{"pageSize":4096},"ok":1}

And we simply access to the server time by writing res.system.currentTime.

Adding some documents to the collection

We will now explain how to add documents to a given collection.

We assume that we have a DB: TMongoDatabase instance available. Then we will create the documents with a TDocVariant instance, which will be filled via late-binding, and via a doc.Clear pseudo-method used to flush any previous property value:

var Coll: TMongoCollection;
    doc: variant;
    i: integer;
begin
  Coll := DB.CollectionOrCreate[COLL_NAME];
  TDocVariant.New(doc);
  for i := 1 to 10 do
  begin
    doc.Clear;
    doc.Name := 'Name '+IntToStr(i+1);
    doc.Number := i;
    Coll.Save(doc);
    writeln('Inserted with _id=',doc._id);
  end;
end;

Thanks to TDocVariant late-binding abilities, code is pretty easy to understand and maintain.

This code will display the following on the console:

Inserted with _id=5369029E4F901EE8114799D9
Inserted with _id=5369029E4F901EE8114799DA
Inserted with _id=5369029E4F901EE8114799DB
Inserted with _id=5369029E4F901EE8114799DC
Inserted with _id=5369029E4F901EE8114799DD
Inserted with _id=5369029E4F901EE8114799DE
Inserted with _id=5369029E4F901EE8114799DF
Inserted with _id=5369029E4F901EE8114799E0
Inserted with _id=5369029E4F901EE8114799E1
Inserted with _id=5369029E4F901EE8114799E2

It means that the Coll.Save() method was clever enough to understand that the supplied document does not have any _id field, so will compute one on the client side before sending the document data to the MongoDB server.

We may have written:

  for i := 1 to 10 do
  begin
    doc.Clear;
    doc._id := ObjectID;
    doc.Name := 'Name '+IntToStr(i+1);
    doc.Number := i;
    Coll.Save(doc);
    writeln('Inserted with _id=',doc._id);
  end;
end;

Which will compute the document identifier explicitly before calling Coll.Save().
In this case, we may have called directly Coll.Insert(), which is somewhat faster.

Note that you are not obliged to use a MongoDB ObjectID as identifier. You can use any value, if you are sure that it will be genuine. For instance, you can use an integer:

  for i := 1 to 10 do
  begin
    doc.Clear;
    doc._id := i;
    doc.Name := 'Name '+IntToStr(i+1);
    doc.Number := i;
    Coll.Insert(doc);
    writeln('Inserted with _id=',doc._id);
  end;
end;

The console will display now:

Inserted with _id=1
Inserted with _id=2
Inserted with _id=3
Inserted with _id=4
Inserted with _id=5
Inserted with _id=6
Inserted with _id=7
Inserted with _id=8
Inserted with _id=9
Inserted with _id=10

Note that the mORMot ORM will compute a genuine series of integers in a similar way, which will be used as expected by the TSQLRecord.ID primary key property.

The TMongoCollection class can also write a list of documents, and send them at once to the MongoDB server: this BULK insert mode - close to the Array Binding feature of some SQL providers, and implemented in our SynDB classes - see BATCH sequences for adding/updating/deleting records - can increase the insertion by a factor of 10 times, even when connected to a local instance: imagine how much time it may save over a physical network!

For instance, you may write:

var docs: TVariantDynArray;
...
  SetLength(docs,COLL_COUNT);
  for i := 0 to COLL_COUNT-1 do begin
    TDocVariant.New(docs[i]);
    docs[i]._id := ObjectID; // compute new ObjectID on the client side
    docs[i].Name := 'Name '+IntToStr(i+1);
    docs[i].FirstName := 'FirstName '+IntToStr(i+COLL_COUNT);
    docs[i].Number := i;
  end;
  Coll.Insert(docs); // insert all values at once
...

You will find out later for some numbers about the speed increase due to such BULK insert.

Retrieving the documents

You can retrieve the document as a TDocVariant instance:

var doc: variant;
...
  doc := Coll.FindOne(5);
  writeln('Name: ',doc.Name);
  writeln('Number: ',doc.Number);

Which will write on the console:

Name: Name 6
Number: 5

You have access to the whole Query parameter, if needed:

  doc := Coll.FindDoc('{_id:?}',[5]);
  doc := Coll.FindOne(5); // same as previous

This Query filter is similar to a WHERE clause in SQL. You can write complex search patterns, if needed - see http://docs.mongodb.org/manual/reference/method/db.collection.find for reference.

You can retrieve a list of documents, as a dynamic array of TDocVariant:

var docs: TVariantDynArray;
...
  Coll.FindDocs(docs);
  for i := 0 to high(docs) do
    writeln('Name: ',docs[i].Name,'  Number: ',docs[i].Number);

Which will output:

Name: Name 2  Number: 1
Name: Name 3  Number: 2
Name: Name 4  Number: 3
Name: Name 5  Number: 4
Name: Name 6  Number: 5
Name: Name 7  Number: 6
Name: Name 8  Number: 7
Name: Name 9  Number: 8
Name: Name 10  Number: 9
Name: Name 11  Number: 10

If you want to retrieve the documents directly as JSON, we can write:

var json: RawUTF8;
...
  json := Coll.FindJSON(null,null);
  writeln(json);
...

This will append the following to the console:

[{"_id":1,"Name":"Name 2","Number":1},{"_id":2,"Name":"Name 3","Number":2},{"_id
":3,"Name":"Name 4","Number":3},{"_id":4,"Name":"Name 5","Number":4},{"_id":5,"N
ame":"Name 6","Number":5},{"_id":6,"Name":"Name 7","Number":6},{"_id":7,"Name":"
Name 8","Number":7},{"_id":8,"Name":"Name 9","Number":8},{"_id":9,"Name":"Name 1
0","Number":9},{"_id":10,"Name":"Name 11","Number":10}]

You can note that FindJSON() has two properties, which are the Query filter, and a Projection mapping (similar to the column names in a SELECT col1,col2).
So we may have written:

  json := Coll.FindJSON('{_id:?}',[5]);
  writeln(json);

Which would output:

[{"_id":5,"Name":"Name 6","Number":5}]

Note here than we used an overloaded FindJSON() method, which accept the MongoDB extended syntax (here, the field name is unquoted), and parameters as variables.

We can specify a projection:

  json := Coll.FindJSON('{_id:?}',[5],'{Name:1}');
  writeln(json);

Which will only return the "Name" and "_id" fields (since _id is, by MongoDB convention, always returned:

[{"_id":5,"Name":"Name 6"}]

To return only the "Name" field, you can specify '_id:0,Name:1' as extended JSON for the projection parameter.

[{"Name":"Name 6"}]

There are other methods able to retrieve data, also directly as BSON binary data. They will be used for best speed e.g. in conjunction with our ORM, but for most end-user code, using TDocVariant is safer and easier to maintain.

Updating or deleting documents

The TMongoCollection class has some methods dedicated to alter existing documents.

At first, the Save() method can be used to update a document which has been first retrieved:

  doc := Coll.FindOne(5);
  doc.Name := 'New!';
  Coll.Save(doc);
  writeln('Name: ',Coll.FindOne(5).Name);

Which will write:

Name: New!

Note that we used here an integer value (5) as key, but we may use an ObjectID instead, if needed.

The Coll.Save() method could be changed into Coll.Update(), which expects an explicit Query operator, in addition to the updated document content:

  doc := Coll.FindOne(5);
  doc.Name := 'New!';
  Coll.Update(BSONVariant(['_id',5]),doc);
  writeln('Name: ',Coll.FindOne(5).Name);

Note that by MongoDB's design, any call to Update() will replace the whole document.

For instance, if you write:

  writeln('Before: ',Coll.FindOne(3));
  Coll.Update('{_id:?}',[3],'{Name:?}',['New Name!']);
  writeln('After:  ',Coll.FindOne(3));

Then the Number field will disappear!

Before: {"_id":3,"Name":"Name 4","Number":3}
After:  {"_id":3,"Name":"New Name!"}

If you need to update only some fields, you will have to use the $set modifier:

  writeln('Before: ',Coll.FindOne(4));
  Coll.Update('{_id:?}',[4],'{$set:{Name:?}}',['New Name!']);
  writeln('After:  ',Coll.FindOne(4));

Which will write on the console the value as expected:

Before: {"_id":4,"Name":"Name 5","Number":4}
After:  {"_id":4,"Name":"New Name!","Number":4}

Now the Number field remains untouched.

You can also use the Coll.UpdateOne() method, which will update the supplied fields, and leave the non specified fields untouched:

  writeln('Before: ',Coll.FindOne(2));
  Coll.UpdateOne(2,_Obj(['Name','NEW']));
  writeln('After:  ',Coll.FindOne(2));

Which will output as expected:

Before: {"_id":2,"Name":"Name 3","Number":2}
After:  {"_id":2,"Name":"NEW","Number":2}

You can refer to the documentation of the SynMongoDB.pas unit, to find out all functions, classes and methods available to work with MongoDB.

Write Concern and Performance

You can take a look at the MongoDBTests.dpr sample - located in the SQLite3- MongoDB sub-folder of the source code repository, and the TTestDirect classes, to find out some performance information.

In fact, this TTestDirect is inherited twice, to run the same tests with diverse write concern.

The difference between the two classes will take place at client initialization:

procedure TTestDirect.ConnectToLocalServer;
...
  fClient := TMongoClient.Create('localhost',27017);
  if ClassType=TTestDirectWithAcknowledge then
    fClient.WriteConcern := wcAcknowledged else
  if ClassType=TTestDirectWithoutAcknowledge then
    fClient.WriteConcern := wcUnacknowledged;
...

wcAcknowledged is the default safe mode: the MongoDB server confirms the receipt of the write operation. Acknowledged write concern allows clients to catch network, duplicate key, and other errors. But it adds an additional round-trip from the client to the server, and wait for the command to be finished before returning the error status: so it will slow down the write process.

With wcUnacknowledged, MongoDB does not acknowledge the receipt of write operation. Unacknowledged is similar to errors ignored; however, drivers attempt to receive and handle network errors when possible. The driver's ability to detect network errors depends on the system's networking configuration.

The speed difference between the two is worth mentioning, as stated by the regression tests status, running on a local MongoDB instance:

1. Direct access

1.1. Direct with acknowledge: - Connect to local server: 6 assertions passed 4.72ms - Drop and prepare collection: 8 assertions passed 9.38ms - Fill collection: 15,003 assertions passed 558.79ms 5000 rows inserted in 548.83ms i.e. 9110/s, aver. 109us, 3.1 MB/s - Drop collection: no assertion 856us - Fill collection bulk: 2 assertions passed 74.59ms 5000 rows inserted in 64.76ms i.e. 77204/s, aver. 12us, 7.2 MB/s - Read collection: 30,003 assertions passed 2.75s 5000 rows read at once in 9.66ms i.e. 517330/s, aver. 1us, 39.8 MB/s - Update collection: 7,503 assertions passed 784.26ms 5000 rows updated in 435.30ms i.e. 11486/s, aver. 87us, 3.7 MB/s - Delete some items: 4,002 assertions passed 370.57ms 1000 rows deleted in 96.76ms i.e. 10334/s, aver. 96us, 2.2 MB/s Total failed: 0 / 56,527 - Direct with acknowledge PASSED 4.56s
1.2. Direct without acknowledge: - Connect to local server: 6 assertions passed 1.30ms - Drop and prepare collection: 8 assertions passed 8.59ms - Fill collection: 15,003 assertions passed 192.59ms 5000 rows inserted in 168.50ms i.e. 29673/s, aver. 33us, 4.4 MB/s - Drop collection: no assertion 845us - Fill collection bulk: 2 assertions passed 68.54ms 5000 rows inserted in 58.67ms i.e. 85215/s, aver. 11us, 7.9 MB/s - Read collection: 30,003 assertions passed 2.75s 5000 rows read at once in 9.99ms i.e. 500150/s, aver. 1us, 38.5 MB/s - Update collection: 7,503 assertions passed 446.48ms 5000 rows updated in 96.27ms i.e. 51933/s, aver. 19us, 7.7 MB/s - Delete some items: 4,002 assertions passed 297.26ms 1000 rows deleted in 19.16ms i.e. 52186/s, aver. 19us, 2.8 MB/s Total failed: 0 / 56,527 - Direct without acknowledge PASSED 3.77s

As you can see, the reading speed is not affected by the Write Concern settings.
But data writing can be multiple times faster, when each write command is not acknowledged.

Since there is no error handling, wcUnacknowledged is not to be used on production. You may use it for replication, or for data consolidation, e.g. feeding a database with a lot of existing data as fast as possible.

Stay tuned for the next article, which will detail the MongoDB integration 's ORM... and the last one of the series, with benchmarks of MongoDB against SQL engines, via our ORM...
Feedback is welcome on our forum, as usual!