Full Text Search in the database, using FTS3
By A.Bouchez on 2010, Sunday September 19, 15:10 - SQLite3 Framework - Permalink
Beginning with version 1.9.1 of our SQlite3 framework, you can have a direct access to the FTS3 features of the SQlite3 engine.
A FTS3 table is an SQLite virtual table that allows users to perform full-text searches on a set of documents. The most common (and effective) way to describe full-text searches is "what Google, Yahoo and Altavista do with documents placed on the World Wide Web". Users input a term, or series of terms, perhaps connected by a binary operator or grouped together into a phrase, and the full-text query system finds the set of documents that best matches those terms considering the operators and groupings the user has specified.
With our framework, it's very easy to create such a table.
The dedicated TSQLRecordFTS3 class enables you to define a FTS3 table, just as easy as for any other table, in pure ORM approach (you don't need to write dedicated SQL statements: the ORM will adapt itself to the FTS3 table, and create the corresponding SQL code for you).
Here are some tips for using this class:
- any record which inherits from this class must have only sftUTF8Text
(RawUTF8) fields
- this record has its fID: integer property which may be published as
DocID, to be consistent with SQLite3 praxis, and reflect that it
points to an ID of another associated TSQLRecord
- a good approach is to store your data in a regular TSQLRecord table,
then store your text content in a separated FTS3 table, associated to this
TSQLRecord table via its ID/DocID
- the ID/DocID property can be set when the record is added (i.e. the
ID field is used when the TSQLRest.Add method is called with
a TSQLRecordFTS3 instance), to retrieve any associated
TSQLRecord (note that for a TSQLRecord record, the ID
property can't be set at adding, but is calculated by the engine)
- static tables don't handle TSQLRecordFTS3 classes
- by default, the FTS3 engine ignore all characters >= #80, but handle
low-level case insentivity (i.e. 'A'..'Z') so you must keep your request with
the same range for upper case
- by default, the "simple" tokenizer is used, but you can inherits from
TSQLRecordFTS3Porter class if you want a better English matching,
using the Porter Stemming
algorithm
- in order to make FTS3 queries, use the dedicated TSQLRest.FTS3Match
method, with the MATCH operator (you can use regular queries, but you must
specify 'RowID' instead of 'DocID' or 'ID' because of FTS3 Virtual table
specificity):
var IDs: TIntegerDynArray; if FTS3Match(TSQLMyFTS3Table,'text MATCH "linu*"',IDs) then // you've all matching IDs in IDs[]
Here is an example of such a FTS3 table:
type TFTS3Test = class(TSQLRecordFTS3) private fSubject: RawUTF8; fBody: RawUTF8; published property Subject: RawUTF8 read fSubject write fSubject; property Body: RawUTF8 read fBody write fBody; property DocID: integer read fID write fID; end;
See the TestFTS3 procedure in the SQlite3.pas unit, for an example of how such a record can be handled.
Feedback and comments are welcome on our forum.