Efficient Routing for Christmas

The New TUriRouter Class

From THttpServerGeneric.Route, or from TRestHttpServer.Route, you can access a new TUriRouter class.
It is the class responsible of the core registration process of all custom URI parsing.

By default, it is disabled. The classical mORMot routing applies.
But once you call THttpServerGeneric.Route or TRestHttpServer.Route, you can register URIs and how the HTTP server should process it.

Internal URI rewrite

Here, we are not talking about HTTP redirection, i.e. returning a 30x HTTP status code to let the client make a new request to another URI.
We allow URI rewriting on the fly, within the server, just before the incoming request is identified to the ORM, MVC or SOA mORMot router.

It offers for instance an alternative URI path to the method-based services or the interface based services, at HTTP/HTTPS server level.

Now, we could write:

Server.Route.Get('/info', 'root/timestamp/info');

So that any GET on /info will redirect to the internal TRestServer.TimeStamp method-based services, and its hidden /info sub-method, which displays some general statistics about the server.

Or we could write:

Server.Route.Get('/user/<id>', '/root/userservice/new?id=<id>', urmPost);

to rewrite internally e.g. the GET '/user/1234' URI into a POST at '/root/userservice/new?id=1234', as published by a IUserService.New(id: Int64) interface-based service method.

As such, you could have the best of both worlds.

This URI redirection may also have a very high benefit for a mORMot MVC web application. You could easily redirect some human-friendly URIs into the MVC routing convention, as expected by the MVC interface definition.

Last but not least, if the redirected URI is an integer in range 200..599, then the server will abort the request immediately with an HTTP status error matching the integer:

Server.Route.Get('/admin.php', '403');

This could help to avoid calling the main REST engine, or write a callback, just to return an error code.

Direct Callbacks Execution

As an alternative, you can assign a TOnHttpServerRequest callback with a given URI, optionally with <parameters>:

TOnHttpServerRequest = function(Ctxt: THttpServerRequestAbstract): cardinal of object;

The Ctxt instance is the low-level structure holding the HTTP/HTTPS request, with all input and output context.
It even has a property to retrieve the named parameters within the URI, i.e. an <id> place holder in the URI registration will be recognized, and available within the callback from the Ctxt['id'] property.

For instance, it could be used to publish a standard REST process as:

// retrieve a list of picture IDs
Server.Route.Get('/user/<user>/pic', DoUserPic); 
// support CRUD access of a given picture by ID
Server.Router.Run([urmGet, urmPost, urmPut, urmDelete], '/user/<user>/pic/<id>', DoUserPic)

Then the callback could be something like this:

function TMyClass.DoUserPic(Ctxt: THttpServerRequestAbstract): cardinal;
var
  user: RawUtf8;
  id: Int64;
  ids: TInt64DynArray;
begin
  user := Ctxt['user'];
  if Ctxt.RouteInt64('id', id) then
    // manage /user/<user>/pic/<id>
    if CRUDUserPictureFromDatabase(Ctxt, user, id) then
      result := HTTP_SUCCESS
    else
     result := HTTP_NOTFOUND;
  else if RetrieveUserPictureIDListFromDatabase(Ctxt, user, ids) then
      // returned /user/<user>/pic
      result := HTTP_SUCCESS
    else
     result := HTTP_NOTFOUND;
end;

Of course, URI redirection to an interface-based service may be more convenient, but if you want to reuse some existing code, and have the best performance possible, you could follow this pattern.

It also may be handy for some low-level tasks of the HTTP server, like proxying to internal sub-servers, or quickly return some 30x redirection, or generate some HTML pages.

If the callback returns a result of 0, then execution will continue as usual, but you can change some Ctxt fields. It may allow for very efficient and tuned client authorization, even before you execute some regular interface-based services.

You could also redirect all published methods of a class instance using RTTI, via TRouterUri.RunMethods().
Just like TRestServer method-based services, but here at HTTP server level, with even higher performance, since the mORMot REST engine is not involved.

Another use may be to handle some very tuned HTTP OPTIONS requests, if the default CORS feature is too broad for your case.
Or quickly return a standard response for HTTP HEAD requests, without any Content-Length header as it is allowed, to leverage the server process.

Why Not Make It Fast ?

About performance, its TUriRouter.Process() method is done with no memory allocation for a static route, using a very efficient Radix Tree algorithm for path lookup, over a thread-safe non-blocking URI parsing with values extractions for rewrite or execution.

Here are some numbers from TNetworkProtocols._TUriTree on my old Core i5 laptop, on a single thread/core:

1000 URI lookups in 37us i.e. 25.7M/s, aver. 37ns
1000 URI static rewrites in 80us i.e. 11.9M/s, aver. 80ns
1000 URI parametrized rewrites in 117us i.e. 8.1M/s, aver. 117ns
1000 URI static execute in 91us i.e. 10.4M/s, aver. 91ns
1000 URI parametrized execute in 162us i.e. 5.8M/s, aver. 162ns

As you can see, this routing won't be the bottleneck in your server process. It has a non blocking O(1) complexity, with no unneeded memory allocation during its process.
I know no other Delphi or FPC web or REST framework using custom Radix Tree data structures. Other libraries parse the URI as parts, then check the parts against registered routed (using hash maps if possible). It is clearly less efficient, and has some known disadvantages we will discuss about.

How Does It work?

If we run the following code (from our regression tests):

    router.Get('/plaintext', DoPlainText);
    router.Get('/', DoRequestRoot);
    router.Get('/do/<one>/pic/<two>', DoRequest0);
    router.Get('/do/<one>', DoRequest1);
    router.Get('/do/<one>/pic', DoRequest2);
    router.Get('/do/<one>/pic/<two>/', DoRequest3);
    router.Get('/da/<one>/<two>/<three>/<four>/', DoRequest4);
    writeln(router.Tree[urmGet].ToText);

We get the following output on the console:

/
 d
  a/
    <one>
         /
          <two>
               /
                <three>
                       /
                        <four>
                              /
  o/
    <one>
         /pic
             /
              <two>
                   /
 plaintext

The Radix Tree is expanded as spaces. Above lines make it easy to understand how the URI parsing is done: for the top / node to the last exact matching node.
A node can be some static text like /pic or a parameter like <one>. The registered routes (either URI rewrite, or a callback) are assigned to one node.

This data structure has several benefits, for our parametrized routing scheme.

1. The nodes form a memory structure very easy to parse an URI, one character per character, even with thousands of registered routes.

2. Unlike hash-maps, a tree structure also allows us to use dynamic parts like the <one> parameter, since we actually match against the routing patterns instead of just comparing hashes. Hashes map static values, while we need to map URI parameters as dynamic values. Our tree is perfect for this purpose.

3. It greatly reduces the classical routing problem of regular registration based on lists and maps, which can suffer from unexpected behavior, just due to the order of the URI registration calls. With our data structure, we know that there is a single node per path, before even starting the look-up in the prefix-tree. Thanks to the tree structure, we know by design that nodes won't overlap.

4. For even better scalability, the child nodes on each tree level are sorted by depth/priority/usage after each registration. The depth/priority/usage is just the number of sub nodes: children, grandchildren, and so on.. Nodes which are part of the most routing paths are evaluated first. This helps to make as much routes as possible to be reachable as fast as possible. It is also some sort of cost compensation. The longest reachable path (highest cost) can always be evaluated first. You can see this sorting in the above tree sample.

5. As you can see, there is one Radix Tree per supported HTTP method, which are GET, POST, PUT, DELETE, HEAD and OPTIONS. For one thing it is more efficient than holding a per-method registration in every single node, for another thing it greatly reduces any routing problem on path overlapping between methods.

In practice, it is very fast and efficient: from 6 to 12 million of URI parsed per CPU core, depending on the process (static URI or parametrized URI).

If you are curious, you could look at the source code in our repository:
https://github.com/synopse/mORMot2/blob/master/src/net/mormot.net.server.pas
Some of the code may be a bit difficult to follow, since we use low-level pointers over chars, to avoid intermediate string allocations, and to offer the best performance. Even the values are parsed and stored as integer indexes and lengths, not as pre-allocated string instance... But you could see the classes hierarchy, and how the registration is done - registration is less performance sensitive, so the code is more high-level here. ;-)

What Does The Marmot Say?

marmotinflowers.jpg, Dec 2022

Merry Christmas again, and enjoy!

Synopse Open Source