Funnelback, Internet & Enterprise Search Engine. We create powerful, scalable search solutions that meet your business requirements. Use Funnelback Search for your website, your department or your entire organisation. WE MAKE SEARCH WORK.

Newsletters


Join our Mailing List

Newsletter

No 4, September 2009

Measuring and comparing the quality of search results
Author: Dr. David Hawking, Funnelback's Chief Scientist

Improving the quality of the search results delivered by your website or enterprise search facility can have a significant effect on your organisation's revenues, cost base and productivity. It can:

guide customers more quickly and reliably to the point of purchase on a retail or finance site,

cut down the volume of telephone or face-to-face enquiries for a university or government department, or

reduce the time employees take to respond to support requests, write proposals or generate reports.

But how can we measure how well a search engine is working, tune it to maximize performance and compare it with its competitors? Ideally, we'd run each engine or configuration for a period of time and measure revenue and productivity. However, there are many factors -- not least managerial ire when revenue slumps dramatically -- which make this method impractical in most circumstances. We must look at other methods.

Side-by-side comparisons
Since 2004, the Funnelback team has been using side-by-side comparisons to compare different search tools or different configurations of Funnelback and other search engines. Sometimes we have anonymised results (see paper 56) and asked people to vote which panel (or neither) better satisfies the need behind their search. View image

But the usefulness of a search engine only partly derives from its ability to rank documents so that the ones which most closely answer the query appear at the top of the list. It's important that the screen "real estate" available for search results is used to maximum advantage.

Does the presentation of results allow us to quickly see whether particular answers are useful or not? E.g. title, URL, summary, metadata, quick links, document type icon or thumbnail, etc.

How good is the trade off between number of results visible on a screen and the amount of information presented for each result?

Is significant screen real estate taken up by items such as logos, graphics, navigation boilerplate, and advertisements etc. which are not likely to be of value to a person scanning search results? Would some or all of that space be better used to display more results or more information about each result?

What tools are provided to help a person transfer from too broad query to a more specific, more useful one e.g. narrower query suggestions, contextual navigation, related queries, also-of-interest tool, result set mining, facets?

Are facilities provided for suggesting corrections for misspelled or unlikely queries e.g. 'Carlton Furball Club'?

An un-anonymised side-by-side comparison lets real people with real information needs compare systems or configurations which vary on all these dimensions. Obviously participants need a wide high resolution screen to allow meaningful comparisons. View image

Given enough queries and votes, side-by-side comparisons give a very good idea of which cofiguration is better and how much difference it makes.

Flights
A search service with a high volume of queries can compare the value of one search configuration versus another by dividing the user population into "flights". For example, 5% of users might be randomly assigned to an experimental group or flight which is given the opportunity to try a new ranking algorithm or a new interface feature. User behaviour is monitored using clicks (or information transmitted by toolbars) to see whether the experimental system receives more clicks at higher ranks (presumed good) and whether new interface features are clicked on sufficiently to justify their inclusion.

Tuning
Neither flights nor side-by-side comparisons allow us to run through millions of different possible parameter combinations to maximise performance in a particular installation or over a set of benchmark installations. In the next issue I plan to describe how we use CSIRO's C-TEST framework to tune our parameters.

Line break

 

Canberra, Australia
+61 2 6176 3160
Sydney, Australia
+61 (0)418 459 137
London, England
+44 (0)207 101 8300
Funnelback Pty. Ltd. © 2010 | Privacy Policy