Fast JSON parsing
When it deals with parsing some (textual) content, two directions are
usually envisaged. In the XML world, you have usually to make a choice
- A DOM parser, which creates an in-memory tree structure of objects mapping the XML nodes;
- A SAX parser, which reads the XML content, then call pre-defined events for each XML content element.
In fact, DOM parsers use internally a SAX parser to read the XML content. Therefore, with the overhead of object creation and their property initialization, DOM parsers are typically three to five times slower than SAX. But, DOM parsers are much more powerful for handling the data: as soon as it's mapped in native objects, code can access with no time to any given node, whereas a SAX-based access will have to read again the whole XML content.
Most JSON parser available in Delphi use a DOM-like approach. For instance, the DBXJSON unit included since Delphi 2010 or the SuperObject or DWS libraries create a class instance mapping each JSON node.
In a JSON-based Client-Server ORM like ours, profiling shows that a lot of time is spent in JSON parsing, on both Client and Server side. Therefore, we tried to optimize this part of the library.
In order to achieve best speed, we try to use a mixed approach:
- All the necessary conversion (e.g. un-escape text) is made in-memory, from and within the JSON buffer, to avoid memory allocation;
- The parser returns pointers to the converted elements (just like the vtd-xml library).
In practice, here is how it is implemented:
- A private copy of the source JSON data is made internally (so that the Client-Side method used to retrieve this data can safely free all allocated memory);
- The source JSON data is parsed, and replaced by the UTF-8 text un-escaped content, in the same internal buffer (for example, strings are un-escaped and #0 are added at the end of any field value; and numerical values remains text-encoded in place, and will be extracted into
double only if needed);
- Since data is replaced in-memory (JSON data is a bit more verbose than pure UTF-8 text so we have enough space), no memory allocation is performed during the parsing: the whole process is very fast, not noticeably slower than a SAX approach;
- This very profiled code (using pointers and tuned code) results in a very fast parsing and conversion.
This parsing "magic" is done in the
GetJSONField function, as
defined in the SynCommons.pas unit:
/// decode a JSON field in an UTF-8 encoded buffer (used in TSQLTableJSON.Create) // - this function decodes in the P^ buffer memory itself (no memory allocation // or copy), for faster process - so take care that it's an unique string // - PDest points to the next field to be decoded, or nil on any unexpected end // - null is decoded as nil // - '"strings"' are decoded as 'strings' // - strings are JSON unescaped (and \u0123 is converted to UTF-8 chars) // - any integer value is left as its ascii representation // - wasString is set to true if the JSON value was a "string" // - works for both field names or values (e.g. '"FieldName":' or 'Value,') // - EndOfObject (if not nil) is set to the JSON value char (',' ':' or '}' e.g.) function GetJSONField(P: PUTF8Char; out PDest: PUTF8Char; wasString: PBoolean=nil; EndOfObject: PUTF8Char=nil): PUTF8Char;
This function allows to iterate throughout the whole JSON buffer content,
retrieving values or property names, and checking
returning value to handle the JSON structure.
This in-place parsing of textual content is one of the main reason why we
used UTF-8 (via
RawUTF8) as the common string type in our
framework, and not the generic
string type, which would have
introduced a memory allocation and a charset conversion.
Feedback and comments are welcome in our forum, just as usual.