Funnelback, Internet & Enterprise Search Engine. We create powerful, scalable search solutions that meet your business requirements. Use Funnelback Search for your website, your department or your entire organisation. WE MAKE SEARCH WORK.

Newsletters


Join our Mailing List

Newsletter

No 4, September 2009

Funnelback Tips & Tricks

  1. No index directives

    You can define parts of content which should not be indexed and made searchable in Funnelback. This is useful in excluding common navigation elements, headers and footers. There are two ways to do this:

    No index expression in the collection configuration
    Enter a regular expression so that content that matches the regular expression will not be indexed. Content that matches the expression will also be ignored when deciding if two files are duplicates based on their extracted text during a web crawl.

    Example regular expression to ignore some "breadcrumb" navigation elements in a page:
    noindex_expression=<div class=\"BreadCrumb(.*?)>/div>
    Further documentation available in http://docs.funnelback.com/8.3/noindex_expression_collection_cfg.html

    Inserting special HTML comments into the document itself
    For example:

    ... This section is indexed ...
     <!--noindex-->
    ... This section is not indexed ...
     <!--endnoindex-->
    ... This section is indexed ...


  2. Restricting results to specific directories

    You can restrict search results to a specific directory in 2 ways:

    manually append the "v:<path>" to your search query, for example:
    search v:media


    This will restrict results to only those that have the word "media" in
    the URL. Example result URLs from above's example:

    funnelback.com/pdfs/media/Media_Release_NPS.pdf
    funnelback.com/Media/releases/mediarelease4.shtm
    funnelback.com/media.html


    If you would like to be more specific, i.e. results must exist below /media/, you can manually append the "scope" CGI parameter to the URL, for example:

    http://bureau-query.funnelback.com/search/search.cgi?query=search&collection=new_website&scope=funnelback.com/media/

    Note the "&scope=funnelback.com/media/" on the end. Using this we returned results such as:

    funnelback.com/Media/releases/mediarelease5.shtm
    funnelback.com/Media/releases/mediarelease7.shtm
    funnelback.com/Media/index.shtm

  3. Reports blacklist

    A list of search terms, searches from a certain IP addresses, or a combination of the two can be configured to be ignored by the reporting system. This helps preventing unwanted spam searches from appearing in the Query Reports system. This is controlled from the reporting-blacklist.cfg file accessible from the 'Administer' tab under 'Browse collection configuration files'. Example syntax:

    enter keywords
    @123.123.123.123
    spam search term@192.168.123.123


    Further documentation available in http://docs.funnelback.com/8.3/reporting_blacklist_cfg.html


Canberra, Australia
+61 2 6176 3160
Sydney, Australia
+61 (0)418 459 137
London, England
+44 (0)207 101 8300
Funnelback Pty. Ltd. © 2010 | Privacy Policy