How Sphinx’s Multi-Valued Attributes Replaces the Need for an OR Comparison Operator

Whether you use the Sphinx API or SphinxQL, if your Sphinx server is configured properly, you can search and filter by a myriad of indices. But one big obstacle is Sphinx doesn’t offer an OR comparison operator.

THE CHALLENGE

I want a search result set that shows all PUBLIC records and excludes PRIVATE records except for those users that have permission to see certain PRIVATE records (like owners of the record or a peer-to-peer share scenario).

the sphinx.conf might look like:

source srcRecords : srcDB
{
  sql_query_range = SELECT MIN(id),MAX(id) \
                    FROM records_sphinx_index /*sphinx*/
  sql_query       = SELECT id, id as record_id, name, \
                    user_id, status FROM records_sphinx_index \
                    WHERE id>=$start AND id<=$end /*sphinx*/

  sql_attr_uint   = record_id
  sql_attr_uint   = user_id
  sql_attr_string = name
  sql_attr_uint   = status
}

A SQL query to retrieve all public records and only private records that belong to users 1,2 and 3 would ideally look like:

SELECT record_id, name FROM table WHERE (user_id in (1,2,3) AND status='PRIVATE') OR (status = 'PUBLIC');

But alas, Sphinx does NOT support OR, and the above query is an invalid SphinxQL statement. Therefore, I have to take a different approach.

THE SOLUTION

If I create a multi-valued attribute named visible_to that contains a list of user_ids for those who have permission to see the record, then I could query on one field instead of two eliminating the need for an OR.

All PUBLIC records would have visible_to value = “0” and PRIVATE records would have a list of user_ids of those who can access (excluding the value “0” because they are not PUBLIC records). The owner and any users the owner has granted access would be the multi-values. visible_to value = “1,2,3”

Here’s what the updated excerpt from the sphinx.conf might look like:

source srcRecords : srcDB
{
  sql_query_pre   = SET SESSION query_cache_type=OFF

  sql_query_range = SELECT MIN(id),MAX(id) \
                    FROM records_sphinx_index /*sphinx*/
  sql_query       = SELECT id, id as record_id, name, \
                    visible_to FROM records_sphinx_index \
                    WHERE id>=$start AND id<=$end /*sphinx*/

  sql_attr_uint   = record_id
  sql_attr_string = name
  sql_attr_multi  = uint visible_to from field
} 

Now we can querying Sphinx for public records and private records that user_id 3 is allowed to see with a valid SphinxQL statement that looks like:

SELECT record_id, name FROM table WHERE visible_to IN (0,3);

IMPORTANT: You’ll see that I used SOURCE-TYPE of ‘field’ in my sql_attr_multi definition. I did this because there was not a subquery that could accomplish populating both a 0 for PUBLIC records and getting rows for users that have permission to see.

Sphinx documentation doesn’t give examples nor explain how to use SOURCE-TYPE ‘field’. After experimenting, I discovered a simple comma delimited list of ids populated in the the visible_to column of my record_sphinx_index table (the one being indexed) does the trick!

THE TAKE-AWAY

I’m not trying to explain how I populated the visible_to field, rather, I’m sharing my discovery that you can use sql_attr_multi with a SOURCE-TYPE = ‘field’ as long as the referenced field contains a comma delimited list of values.

Learn more about Sphinx here.

Related Posts