Tuesday, January 24, 2012

BM25 the root of search engines algorithms

Recently we have been experimenting with http://sphinxsearch.com/ servers , a  popular free search server that powers craigslist.com and many other leading websites. So as i was googling around for sphinx advice and tips, i learned that one of the main modes of  sphinx weighting uses BM25 a keyword frequency full text based ranking algorithm that was invented in the 1980's. Turns out it is still very widely used.

Check out this great explanation of BM25 to understand the roots of modern search engines :