Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Azure by (45.3k points)

We are running many products search on a huge catalog with partially matched barcodes.

We started with a simple like query

select * from products where barcode, like '%2345%'

But that takes way too long since it requires a full table scan. We thought a full-text search will be able to help us here using contains.

select * from products where contains(barcode, '2345')

But, it seems like contains doesn't support finding words that partially contain a text but, only full a word match or a prefix. (But in this example we're looking for '123456').

1 Answer

0 votes
by (16.8k points)

Create an FTS Index.

First, create the table terms:

CREATE TABLE Terms

(

    Id int IDENTITY NOT NULL,

    Term varchar(21) NOT NULL,

    CONSTRAINT PK_TERMS PRIMARY KEY (Term),

    CONSTRAINT UK_TERMS_ID UNIQUE (Id)

)

Now,cut barcodes to grams, and save each of them to a table Terms. For example: barcode = '123456', your table should have 6 rows for it: '123456', '23456', '3456', '456', '56', '6'.

Then, create table BarcodeIndex:

CREATE TABLE BarcodesIndex

(

    TermId int NOT NULL,

    BarcodeId int NOT NULL,

    CONSTRAINT PK_BARCODESINDEX PRIMARY KEY (TermId, BarcodeId),

    CONSTRAINT FK_BARCODESINDEX_TERMID FOREIGN KEY (TermId) REFERENCES Terms (Id),

    CONSTRAINT FK_BARCODESINDEX_BARCODEID FOREIGN KEY (BarcodeId) REFERENCES Barcodes (Id)

)

Now, select the barcodes by their parts using the following query:

SELECT b.* FROM Terms t

INNER JOIN BarcodesIndex bi

    ON t.Id = bi.TermId

INNER JOIN Barcodes b

    ON bi.BarcodeId = b.Id

WHERE t.Term LIKE 'SomeBarcodePart%'

This solution force all similar parts of barcodes to be stored nearby, so SQL Server will use Index Range Scan strategy to fetch data from the Terms table. Terms in the Terms table should be unique to make this table as small as possible. This could be done in the application logic: check existence -> insert new if a term doesn't exist. Or by setting option IGNORE_DUP_KEY for clustered index of the Terms table. BarcodesIndex table is used to reference Terms and Barcodes.

Browse Categories

...