We often get asked for hints and tips on how to make websites more ‘searchable’.
Firstly, we like to say that even less than perfect content can be well managed with a suite of good quality tools such as analytics, results promotion, strategic auto-completion and so on. These methods can be particularly helpful when your resources for correcting content and metadata are limited; but if perfection is your benchmark, read on for 11 ways to improve the quality of your content to encourage better search results.
If some of your documents are very long, consider publishing them as separate chapters or sections. Imagine that your organization has an administrative procedures manual (APM) which is 3,000 pages long and a HR employee enters the search query "long service leave". A PDF file of the whole APM wouldn't be a good answer to the query, even though it contains the best answer, because the HR employee would then need to search through the very large document for what they actually wanted. A far better answer would be a single HTML file containing "Section 13.4.5: Long Service Provisions".
This could be achieved by ensuring that page-level date metadata is published in a supported format, or by ensuring that your webserver is configured to send the correct document modified dates in the HTTP headers.
Title tags are often used as search result titles, and aid in providing a strong information scent. Titles should aim to be unambiguous, and provide users with a clear indication of the result's content, purpose and context.
Search platforms can be configured to index metadata, and use metadata for display purposes. For example, a metadata abstract can be presented instead of the auto-generated snippet. Good metadata can also be used to provide faceted navigation. Bad metadata is worse than having no metadata at all.
Link text is defined as the words that form the text of the hyperlink when creating links in your HTML. Avoid using link text like 'More...' or 'Click here…'. Instead, connect the link to descriptive text, for example 'Read our 5 simple tips on how to make the most of your search analytics.'
Search crawlers work by following links. With dynamically generated content, they can potentially miss important pages or clutter up indexes with rubbish. When you do generate pages dynamically, give each page a single, short, human-readable URL.
Most search platforms index the frame and its component pages separately. When a particular search result is returned it may appear without the context which would have been provided by the frame.
Configure your collections (or use ROBOTS.TXT files) to prevent the crawler from accessing material which isn't suitable for searching. You may wish to exclude mirror sites and directories of non-textual data. Excess material increases disk space usage, and slows down crawling, indexing and query processing. Focusing the material indexed may also improve the quality of results.
This may include pages that are useful in a browsing context, but are less likely to be appropriate as search results. Examples include A-Z listing pages, mid- and low-level index pages, etc. Use of the <meta name="robots" content="follow,noindex"/> robots metadata directives would be appropriate here.
This might include navigational elements, headers, footers, etc. (See Controlling indexable content in PADRE for details). The query-biased result summaries on some sites can suffer in quality because the summaries include sentences extracted from the site navigation text instead of the main document content. A solution for this problem is to add directives into the Web pages to indicate that certain sections should not be indexed. Where these pages cannot be modified at the source, the use of a NoIndexFilterInjector is recommended. Note that anchor text is indexed as part of the target document at all times to ensure that ranking quality is not affected.
During crawling, URLs that are requested that return a status code of 200:OK will be regarded as valid pages, even if the page itself contains a 'Broken / Not Found' message. Your webserver should ensure that broken URLs return a 404:Not Found status code.
These guidelines will help you build a site that is highly searchable. A searchable site means enhanced search experience in Funnelback (and any other search product), plus greater visibility in global search engines such as Yahoo, Google or Bing. This translates to efficiency gains for employees and easier information availability for customers and stakeholders.