Saturday, April 25, 2009

JCR troubleshooting topic: search index

Problem
As the system administrator of an implementation of IBM Web Content Management, you must troubleshoot the Java Content Repository (JCR) search index component.

Cause
Search not working properly

Resolving the problem


Rebuild Search Index

Title: How to rebuild the WebSphere Portal Document Manager search index
Doc #: 1296931
URL: http://www.ibm.com/support/docview.wss?rs=899&uid=swg21296931

NOTE: The SystemOut.log displays the following lines when rebuilding the search index:

* Start: START rebuilding juru index:
* End: DONE rebuilding juru index:


Stellent Conversion Errors

There is a database fix for Stellent Conversion Errors with htmlElement.elementData. contact IBM Support for more information

If you encounter other Stellant Conversion Errors while rebuilding the search index, contact IBM Support to involve the ODC/DCS team to help resolve the issues with running the conversion.

ParentIDNotFoundException

Example: com.ibm.icm.ts.path.ParentIDNotFoundException: Original parent not found for id: 120337072591476.

This error can occur if search indexing attempts to index an item which has been deleted. This is normal processing, and is not an error. The failed event is logged in the ICMJCRSTERRORS table.

Note that PK56104 will remove a lot of these messages. The former APAR, PK56033, has been replaced by PK56104.

After PK56104 has been applied, if search indexing encounters a deleted item, it will process that item only once and log the exception. The next time it will not attempt to process the deleted item again.

NOTE: PK56104 has been part of the JCR Cumulative fix since PK60132 (JCR Cumulative Fix #3).

This exception can cause the ICMJCRSTERRORS to grow very large. It is safe to remove the contents of this table after the JCR Cumulative Fix has been applied. Contact IBM Support to review the issue and provide the required SQL if indicated. It is recommended to make a full database backup before directly modifying the database.

Exception during Search


Previous Exceptions

If there is an exception starting a WebSphere service, this may lead to search problems later on. These are the exceptions that have been seen so far:

* java.lang.IllegalStateException: I18N0012I: The Internationalization service is not started on WebSphere_Portal


Duplicate exceptions during search

Example: COM.ibm.db2.jdbc.DB2Exception: SQL0601N The name of the object to be created is identical to the existing name "JCR.TSSTBL_2" of type "TABLE".

You should clean up the temporary search tables to resolve this error. To clean up the temporary search tables, contact IBM Support to review the issue and provide the required SQL if indicated. It is recommended to make a full database backup before directly modifying the database.

unique constraint violated

Example: java.sql.BatchUpdateException: ORA-00001: unique constraint (WCMICMADMIN.SYS_C0036649) violated
This error has been fixed in 6.0.1.1, but can occur if the search index has not been rebuilt since upgrading to 6.0.1.1.
Solution: Rebuild the search index.

Temp search tables

PK58346 is now available for 6.0.1.1 and later, which will greatly reduce the use use temporary search tables.
Note: PK58346 is part of the JCR Cumulative Fix since PK60132 (JCR Cumulative Fix #3).

To clean up the temporary search tables, contact IBM Support to review the issue and provide the required SQL if indicated. It is recommended to make a full database backup before directly modifying the database.


Incorrect Results from Search


Search index failures


If the search is not yielding correct results, make sure that the search index was created with no errors.

You can verify if a single document was reindexed successfully by the following steps:
1. Change the index maintenance interval to 2 minutes
2. Enable JCR trace at com.ibm.icm.*=finest
3. Edit the document
4. Allow 2 minutes (the index maintenance interval) for the document to be reindexed
Trace search results


Collect the following information:

* Search criteria
* Portal page from where search was invoked
* Any other search options (if advanced search)
* The user's locale
* Expected results
* Actual results
* JCR trace of the failure: com.ibm.icm.*=finest
(For instructions, read the MustGather documentation at the end of this technote)


Trace information:
1. Look for the entry to JCR query:
This will show the actual query that is being executed (including the text search):
Search string: QueryImpl execute includeLocks

Example:
[3/16/08 7:28:05:161 PDT] 0000009c QueryImpl 2 com.ibm.icm.jcr.query.QueryImpl execute includeLocks=false includeReferences=false includePaths=true statement=//element(, icm:documentLibrary)[@jcr:uuid = 'e8b5dc8046f1eb03a15db108d7e720a9']//(element(, ibmcontentwcm:authoringTemplate)|element(, ibmcontentwcm:webCategory))[@ibmcontentwcm:workflowStatus and @icm:authors = 'cn=userid,o=all users'][text-contains(.,'board')] order by text-score(.,'board*') descending propertiesToRetrieve=null


2. Look for the entry to text search (Juru):
This will show what is being sent to text search, and if any truncation has occurred:
Search string: executing search:

Example:
[3/16/08 7:28:05:416 PDT] 000000b6 JCRCFLLoggerI 3 com.ibm.icm.ts.tss.JCRCFLLoggerImpl com.ibm.icm.ts.tss.JuruIndexImpl.result [java.lang.ThreadGroup[name=icmciWorkManager: icmjcrear,maxpri=10]] com.ibm.icm.ts.tss.JuruIndexImpl.result [java.lang.ThreadGroup[name=icmciWorkManager: icmjcrear,maxpri=10]]: executing search: 'board*' with language: en wildcard expansion size: 20

Wildcard term expansion truncated for search: 'board*


3. Look for exit from text search:
This will identify how many results were found from text search (Juru).
Search string: num results:
Example:

[3/16/08 7:28:05:416 PDT] 000000b6 JCRCFLLoggerI 3 com.ibm.icm.ts.tss.JCRCFLLoggerImpl com.ibm.icm.ts.tss.JuruIndexImpl.result [java.lang.ThreadGroup[name=icmciWorkManager: icmjcrear,maxpri=10]] com.ibm.icm.ts.tss.JuruIndexImpl.result [java.lang.ThreadGroup[name=icmciWorkManager: icmjcrear,maxpri=10]]: num results: 4


4. Look for exit from query:
This will identify how many of the search results are returned after JCR has performed a query based on the results from text search.
Search string: query result size

Example:
[3/16/08 7:28:16:216 PDT] 0000009c QueryResultIt 2 com.ibm.icm.jcr.query.QueryResultIteratorImpl QueryResultIteratorImpl query result size=0

If the number of results returned from Juru is different than what is expected, then we must pursue the incorrect search with Juru.

If the number of results returned from Juru is what is expected, then we must pursue the incorrect search with IBM JCR Support, to find out if/where JCR has changed the result list.

If trace indicates "expansion truncated" as above, it indicates that Juru search is working as designed, but the search terms yield more results than are allowed, and so they are truncated by Juru. Note that you can increase the number of search terms with the jcr.textsearch.wildcardTermExpansionSize property in icm.properties. However, note that a larger wildcard expansion size will impact search performance.

Number of search results returned

At present, JCR cannot retrieve more than 100 results from a Juru search. Note that this number may be further reduced by JCR based on either access control, or additional query criteria. The best way to identify if the incorrect number of search results is being limited by the maximum number of results returned from Juru is to look at the search trace (see above). If the number of results returned from Juru is 100, it is very likely that the current search exceeds this maximum of 100.

Miscellaneous topics on search
Search performance issue in 6.0.1.3

We have encountered a performance issue with search on 6.0.1.3 and JCR Cumulative Fix #3. This problem causes the search performance to degrade with a large number of search results. This problem has been fixed with PK64038.

Search Across Locales

Nodes which are indexed in one language are not guaranteed to be searchable from another language. For example, a Turkish language node "fulya" is not searchable from English. This is working as designed.

To verify the search index language, compare the language from the search trace with the workspace language in icm.properties: jcr.workspace.defaultLanguage

Reorganize search index

If a lot of PDM or WCM content has been removed but the search index continues to grow, you may need to perform the administrator function to reorganize the search index. This capability is provided with PK61534. See the readme for PK61534 for instructions.

Reference information

Web Content Management Authoring Inline Advanced Search known limitations:
http://www-1.ibm.com/support/docview.wss?rs=688&uid=swg21259650

Known limitations and issues for Juru search utilized by WCM Authoring UI search:
http://www-1.ibm.com/support/docview.wss?rs=1041&uid=swg21259649

Web Content Management advanced authoring search does not return new or changed content:
http://www-1.ibm.com/support/docview.wss?rs=688&uid=swg21259884

--------------------------------