Tuesday, April 7, 2009

Default parameters for search collection content sources created by IBM Web Content Management (WCM) searchable sites

Question
What are the default parameters for search collection content sources created by IBM Web Content Management (WCM) searchable sites?




Answer
When you configure a Web Content Management (WCM) site to be searchable, a Managed Web Content site content source (in the specified search collection) is created for you with the following parameters:
    Content Source Name: /<web_content_library>/<wcm_site>


    Portal user id: <copied_from_wcm_site_configuration>


    Portal user password: <copied_from_wcm_site_configuration>


    Stop collecting after (min): 30


    Stop fetching a document after (sec): 60


    Links expire after (days): 7


    Remove broken links after (days): 1


    Schedulers tab: Scheduled Update every 4 hours


You might want to change the default parameter values.


For example, if you do not get expected search results, you might try
increasing the "Stop collecting after (min)" value because the default
30 minutes might not be long enough for the crawler to get all the WCM
content in the site. You may also increase the "Stop fetching a
document after (sec)" value.


The time interval between the crawler runs must be more than the
maximum crawler execution time. The reason is that a crawler cannot be
executed if it is currently running. If a crawler job is started while
the crawler is running, this execution is ignored and the crawler is
only executed at the next scheduled time, provided that it is not
running already.


Some of the default parameter values for new content sources are configurable in <wp_root>/wcm/shared/app/config/wcmservices/SearchService.properties:
    # Scheduled Update every 4 hours
    SearchService.RecrawlInterval=4

    # Remove broken links after (days)
    SearchService.BrokenLinksExpirationAge=1


Blogged with the Flock Browser