In order to achieve best speed, we try to use a mixed approach:
- All the necessary conversion (e.g. un-escape text) is made in-memory, from
and within the JSON buffer, to avoid memory allocation;
- The parser returns pointers to the converted elements (just like the
vtd-xml library).
In practice, here is how it is implemented:
- A private copy of the source JSON data is made internally (so that the
Client-Side method used to retrieve this data can safely free all allocated
memory);
- The source JSON data is parsed, and replaced by the UTF-8 text un-escaped
content, in the same internal buffer (for example, strings are un-escaped and
#0 are added at the end of any field value; and numerical values remains
text-encoded in place, and will be extracted into Int64
or
double
only if needed);
- Since data is replaced in-memory (JSON data is a bit more verbose than pure
UTF-8 text so we have enough space), no memory allocation is performed during
the parsing: the whole process is very fast, not noticeably slower than a SAX
approach;
- This very profiled code (using pointers and tuned code) results in a very
fast parsing and conversion.
This parsing "magic" is done in the GetJSONField
function, as
defined in the SynCommons.pas unit:
/// decode a JSON field in an UTF-8 encoded buffer (used in TSQLTableJSON.Create) // - this function decodes in the P^ buffer memory itself (no memory allocation // or copy), for faster process - so take care that it's an unique string // - PDest points to the next field to be decoded, or nil on any unexpected end // - null is decoded as nil // - '"strings"' are decoded as 'strings' // - strings are JSON unescaped (and \u0123 is converted to UTF-8 chars) // - any integer value is left as its ascii representation // - wasString is set to true if the JSON value was a "string" // - works for both field names or values (e.g. '"FieldName":' or 'Value,') // - EndOfObject (if not nil) is set to the JSON value char (',' ':' or '}' e.g.) function GetJSONField(P: PUTF8Char; out PDest: PUTF8Char; wasString: PBoolean=nil; EndOfObject: PUTF8Char=nil): PUTF8Char;
This function allows to iterate throughout the whole JSON buffer content,
retrieving values or property names, and checking EndOfObject
returning value to handle the JSON structure.
This in-place parsing of textual content is one of the main reason why we
used UTF-8 (via RawUTF8
) as the common string type in our
framework, and not the generic string
type, which would have
introduced a memory allocation and a charset conversion.
Feedback and comments are welcome in our forum, just as usual.