Saturday, April 25, 2009

JCR Troubleshooting topic: locks and deadlocks

Problem
How does one perform troubleshooting in IBM® Web Content Management regarding locking issues with JCR?


Resolving the problem
Locks

What is the best way to approach locks (AccessDenied, object has violated one or more lock constraints)?


There are two primary types of locks leveraged by the repository--external and internal. External locks are defined by the JCR specification and allow users to place locks on items to prohibit certain actions from other users. WCM supports only a "write" lock on a per-node basis within the product's implementation which means that user with the proper permission (PAC Action EDIT (EDITOR role)) can "lock" a node within the system. This "external" lock prohibits any other user from being able to persist changes to that node. In particular to be able to call save on that node alone. Interestingly enough it doesn't prohibit another user from indirectly modifying the node by operating on its parent (ie: another user could still delete the locked node by deleting its parent node).

Internal locks are known as "consistency" locks. These locks are used by the internal implementation to attempt to prohibit situations where merge conflicts or consistency conflicts could be encountered. For instance, if one creates a "dynamic workspace" from a stable workspace and then adds a node to a node that exists in the stable workspace, the system MUST place an "internal" lock on the node in the stable workspace to prohibit any other user from deleting it. If another user deleted the node, then the merge of the dynamic workspace would later fail. The consistency locks prevent situations like this from occurring.

In an effort to allow applications to know when such locks may prohibit actions, all internal locks are EXPOSED as "external" locks on nodes owned by the workspace itself. This allows applications to investigate locks and to take correct actions when "internal" locks exist.

When faced with an operation that is being reported as causing an AccessDeniedException (object has violated one or more lock constraints) this indicates that there are one or more locks that exist that prohibit this user from executing the operation just requested. Again, please note this isn't a bug it is the repository's way of alerting an application that they are prohibited from the operation at this moment due to a lock constraint. To help identify what the source of the constraint is, the following steps should again be utilized:

1. Identify the WCM class that is making the request that is failing and enable FINEST trace point for it.
2. Enable com.ibm.icm.*=finest trace point
3. Recreate and capture the traces. Look for the entry into the WCM class. From that point, look for the AccessDeniedException. From that exception walk upward on the thread id until you find a trace point for com.ibm.icm.jcr.NodeImpl save. This will output any locks that prohibited the save from completing. With this knowledge you can engage IBM Support. Please note within the output for the Lock object, the owner of the lock will be shown. If the lock is an "internal" lock the owner will be shown in some form as "Workspace XXXXX". This is how you can identify if an internal lock is prohibiting the operation vs and external lock owned by another true user of the system.


In addition, you can use selectableDisplayLocks.jsp to display all locks on a given node (and its children). Contact IBM Support for a copy of this jsp.


The following common exceptions are related to JCR Access and Locking exceptions:
User Name contains a comma (javax.jcr.LoginException)

Example: javax.jcr.LoginException: Login failed for UserId: cn=Smith, John, cn=users,dc=ibm,dc=com. Retrieved authenticated subject with unmatching UserId: CN=Smith\, John,CN=Users,dc=ibm,dc=com

The user name cannot contain a comma. If so, its name cannot be correctly processed by JCR internals.

AccessDeniedException

When the user sees an AccessDeniedException from WCM, it can mean one of the following possibilities:

The logged in user does not have available Portal Access Control for this object

This is a valid exception if the user does not have the correct access rights for the requested action. The administrator must grant the necessary rights through Portal for that object and action.

The JCR node is locked by another workspace

Example:
NodeImpl 3 com.ibm.icm.jcr.NodeImpl save(false, false) Found lock on path: /contentRoot/icm:libraries[8]/Content/epfsite/welcome owned by: Workspace 7c2ba800465031b597d5f719fed3c258
SystemErr R com.ibm.icm.jcr.access.AccessDeniedException: The requested operation violates one or more lock constraints.: [ErrorCode:7591]

This is the common occurrence if the node is being held by another draft. In this case, you must first delete the node and its draft workspace before proceeding.

Older version of Portal have seen problems after a failed library delete where drafts are still left in the library. If this is the case, contact IBM Support to review the issue and provide the cleanup tools as needed. It is recommended to make a full database backup before using any tools which directly modify the database.

The JCR node is locked by another user

This is the common occurrence where another user is working on the same node, and is considered to be normal behavior. The best course of action for this failure is to log in as the other user and unlock the node.

If that user has been removed, IBM Support has the tools to remove the locks for that user. Contact IBM Support to review the issue and provide the needed tools if indicated. It is recommended to make a full database backup before directly modifying the database.

Delete all JCR locks for a node.

WARNING: Incorrectly updating the database tables can lead to database inconsistencies and deadlocks. You should not remove all of the locks for a node unless you are in the process of deleting that node, and have exhausted all other possibilities.

To delete all of the JCR locks for a node, you need to know the UUID for that node. IBM Support has a utility to internally remove the locks for that node. Contact IBM Support to review the issue and provide the required utility if indicated. It is recommended to make a full database backup before using any tools which directly modify the database.



Tracing all SQL statements (including host variables)

Enabling the pls.debug.trackStatementCursorLeakage setting in icm.properties, combined with JCR trace (com.ibm.icm.*=all), will trace all of the SQL from JCR, combined with the host variables. Note that this setting will significantly slow down performance, so you you should reset pls.debug.trackStatementCursorLeakage to false after collecting the necessary trace data.

To enable this value, do the following:

1. Stop Portal Server
2. Edit /jcr/lib/com/ibm/icm/icm.properties, and set the following property:
pls.debug.trackStatementCursorLeakage=+
3. Set trace to com.ibm.icm.*=all and restart Portal

Deadlocks
Derby (Portal 6.1)
SQL Exception: A lock could not be obtained within the time requested

Check the derby.log file to verify that the customer is running at at least build 639536 of Derby 10.1.3.2.

DB2
Database hang during Portal upgrade

We have seen a problem where the customer will see a database hang while upgrading Portal version, for example upgrading to 6.0.1.4. This issue can occur when the database user does not have DBADM authority. Note that it is not enough to only grant SYSADM authority to the user, but the user must have explicit DBADM authority.