Home > Lucene Search > Creating a Sitecore.Search Crawler for the File System

Creating a Sitecore.Search Crawler for the File System

Introduction

Welcome to my latest blog on how to create a Sitecore.Search crawler for the file system as part of a Sitecore.Search framework.

What is a Sitecore.Search Crawler?

Sitecore.Search Crawler is a component in Sitecore that scans a specific storage system such as a database or file system, extracting information and storing it in a search index, making it available to Sitecore Search. It can perform several roles:

  • Indexer –Extracts data from a specified document requested by the crawler or monitor. The data extracted consists of metadata and content.
    • Metadata – The Indexer extracts metadata that is understood by the system. This metadata can be filtered and prioritized. For example, using the _name field.
    • Content – The Indexer also extracts body content and prioritizes it. You can prioritize content in the document by using boost. This is usually only applied to a single field, giving the document a single prioritization.
  • Crawler – Traverses a storage system and uses the indexer to populate the search index.
  • Monitor – Monitors changes in a storage and updates the search index (not implemented in this example).

Search crawlers implement the Sitecore.Search.Crawlers.ICrawler interface.

This blog explains how to extend your Sitecore Desktop search functionality to include files from your Web site root as well as Sitecore items in its search results. It explains how to create a Sitecore.Search Crawler and integrate it with Sitecore.Search framework to get results in Sitecore Desktop Search UI.

The following screenshot shows a typical set of search results after implementing a Sitecore.Search crawler for the file system:

Quick Search after

Search crawlers provide a way for Sitecore.Search to consume data from various sources to make your search results more comprehensive. You can design a crawler to index any of the following:

  • Files contained in a specific folder in the file system. For example the Web site file system
  • Tables in an external database
  • Another folder containing Word or PDF documents
  • Content from an external system, such as a CRM
  • An external Web site

Task

  • Implement a simple crawler for files in a specific folder on a Web site. The crawler makes XML and text files in the folder searchable by their content. All other files are searchable by name.
  • Create a processor to integrate with the Quick Search UI so a user can perform actions on the search. For example, open and view a file that appears in the search results.
  • Create a crawler.config file that integrates the crawler with the Quick Search index and installs the processor for the Quick Search UI. This solution can retrieve all Sitecore items and files that match the search criteria without any significant performance overheads.

Implement this solution in C# using Visual Studio.

Prerequisites

  • Web site built on Sitecore CMS 6
  • Visual Studio 2008

Summary of Steps

To create a Sitecore.Search Crawler complete the following steps:

  1. Create the Sitecore.Search Crawler –implement the ICrawler interface to index the file system
  2. Display search results in the Desktop –integrate with the Quick Search UI so that users can perform actions based on the search results
  3. Create a File Crawler Config File – sample configuration of the crawler component and UI integration
  4. Test the Sitecore.Search Crawler
Categories: Lucene Search Tags:
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.