Friday, March 27, 2009

How to count WCM content items in v6 JCR database ?

As we did have big performances issues in portal v5 due to the high number of WCM content items in our database, we have decided to continue monitoring this count also in v6.
Of course the new JCR design in v6, the WCM software is now supposed to support about 500.000 content entries....but anyway I think it is still a good idea to monitor the volume of data you have in your system (this could be useful to report that to your management, and/or just to prevent reaching the limit).

Now, in v6 counting and analyzing the number of items is little bit more complex than in v5, mainly because the JCR database design is really different compare to a standard SQL schema design.

Actually, the v5 database request for counting the total number of content items are quite similar:
- Old request (in v5):
select count(*) from AJPE where COMPONENT_TYPE='AJPE_CONTENTS'
- New request (in v6.x):
SELECT COUNT(*) FROM JCR.EV_ENTRY WHERE ITEMCLASS=1762875877

But the problem is that in JCR database, there is no simple 1 to 1 correspondancy between the WCM content items and the number of entries in db...
In v5, when creating a new content or deleting it, you can see that the count is simply increased (+1) or decreased (-1) accordingly, whatever the status of the content.

Now in v6 JCR model, here is the behaviour of the system:

Let's assume at start you have:
select count(ITEMID) from JCR.EV_ENTRY -> 39004
select count(ITEMID) from JCR.EV_ENTRY where ITEMCLASS=1762875877 -> 27818

I will give you more details in the next section below, but basically:
- select count(ITEMID) from JCR.EV_ENTRY : this request gives you the total number of WCM objects in the db (could be content, but also design elements).
- select count(ITEMID) from JCR.EV_ENTRY where ITEMCLASS=1762875877 : give you only the WCM content items (like news, etc), assuming in your db "1762875877" also corresponds to the content item object type.

After creating 1 Published content, you will see that the count is increased by "+2":
select count(ITEMID) from JCR.EV_ENTRY -> 39006
select count(ITEMID) from JCR.EV_ENTRY where ITEMCLASS=1762875877 -> 27820

After creating 1 new Draft content, you will see that the count is increased only by "+1":
select count(ITEMID) from JCR.EV_ENTRY -> 39007
select count(ITEMID) from JCR.EV_ENTRY where ITEMCLASS=1762875877 -> 27821

Also, please note that deleting content does not affect the above count values.


All these results clearly show that:
- 1 WCM content item can correspond to several JCR nodes,
- Deleting a WCM content will not necessarily decrease the database count (as content is probably not deleted physically, but its status might only be updated).


To validate this last hypothesis (regarding delete management), I did the following tests:

Get the total number of content items:
db2 => SELECT COUNT(*) FROM JCR.EV_ENTRY WHERE ITEMCLASS=1762875877
-----------
27818

Get the distinct 'ITEMTYPE' values that exist in the db:
db2 => SELECT DISTINCT ITEMTYPE FROM JCR.EV_ENTRY
ITEMTYPE
--------
D
L
P

Then check the corresponding content items count, depending on the 'ITEMTYPE':
SELECT COUNT(*) FROM JCR.EV_ENTRY WHERE ITEMCLASS=1762875877 AND ITEMTYPE='D'
=> 566
SELECT COUNT(*) FROM JCR.EV_ENTRY WHERE ITEMCLASS=1762875877 AND ITEMTYPE='L'
=> 0
SELECT COUNT(*) FROM JCR.EV_ENTRY WHERE ITEMCLASS=1762875877 AND ITEMTYPE='P'
=> 27252

Now we can see that the count of ITEMTYPE='P' and ITEMTYPE='D' is equal to the total 27818 of content items.

An obvious assumption is that 'D' corresponds to DRAFT contents, and 'P' to PUBLISHED content. These results match the status of our contents in our db : we have about 500 draft content items, and about 13.500 published content items (remind that a published item = 2 entries in JCR.EV_ENTRY table).


So basically, to evaluate the count of WCM content items, I think the following equation makes sense:
[SELECT COUNT(*) FROM JCR.EV_ENTRY WHERE ITEMCLASS=1762875877 AND ITEMTYPE='D'] + 0,5*[SELECT COUNT(*) FROM JCR.EV_ENTRY WHERE ITEMCLASS=1762875877 AND ITEMTYPE='P']

Note: I have no idea for the moment of what could be the meaning of ITEMTYPE='L' ? (if 'P' is published, and 'D' is draft, then it could correspond to EXPIRED status, but this count does not match our number of expired content....).


Hope this will help you to better understand the structure of the WCM JCR database. Also please find below the detail of the database requests I made to understand the model.

==========================
To connect to the JCR database, you first have to get the following values (in our case, we are using DB2):
# DbType: The type of database to be used for WebSphere Portal JCR domain
jcr.DbType=db2
jcr.DbName=jcrdbe6
jcr.DbSchema=jcr


v6.x PORTAL DATABASES configuration file can be found here:
/portal/WebSphere/wp_profile/ConfigEngine/
properties/wkplc_comp.properties

The table JCR.EV_ENTRY contains the reference to all the content items.

db2 => DESCRIBE TABLE JCR.EV_ENTRY
Column Type Type
name schema name Length Scale Nulls
------------------------------ --------- ------------------ -------- ----- ------
IGID SYSIBM CHARACTER 64 0 No
ITEMID SYSIBM CHARACTER 64 0 No
ITEMCLASS SYSIBM INTEGER 4 0 Yes
ITEMTYPE SYSIBM CHARACTER 1 0 No
STATE SYSIBM BIGINT 8 0 Yes
DELETED SYSIBM CHARACTER 1 0 Yes
PARENTID SYSIBM CHARACTER 64 0 Yes
TSTAMP SYSIBM TIMESTAMP 10 0 Yes

The table JCR.EV_TYPES gives you the list of WCM object types:

SELECT TYPENAME FROM JCR.EV_TYPES
com.ibm.workplace.wcm.services.library.Library
com.aptrix.pluto.site.Site
com.aptrix.pluto.site.SiteArea
com.aptrix.pluto.presentation.Template
com.aptrix.pluto.workflow.stages.WorkflowStage
com.aptrix.pluto.workflow.Workflow
com.aptrix.pluto.workflow.actions.PublishAction
com.aptrix.pluto.content.Content
com.aptrix.pluto.taxonomy.Taxonomy
com.aptrix.pluto.taxonomy.Category
com.aptrix.pluto.content.link.ContentLink
com.aptrix.pluto.presentation.Style
com.aptrix.pluto.workflow.actions.ExpireAction
com.aptrix.pluto.cmpnt.FileResourceCmpnt
com.aptrix.pluto.cmpnt.ImageResourceCmpnt
com.aptrix.pluto.cmpnt.MenuCmpnt
com.aptrix.pluto.cmpnt.HTMLCmpnt
com.aptrix.pluto.cmpnt.NavigatorCmpnt
com.aptrix.pluto.cmpnt.JSPCmpnt


The first field of the JCR.EV_TYPES tables is the object type ID than can be used to identify content in the JCR.EV_ENTRY table. Based on the list below, the value 1762875877 corresponds to the ITEMCLASS column in the JCR.EV_ENTRY table.

SELECT * FROM JCR.EV_TYPES
473104548 com.ibm.workplace.wcm.services.library.Library
298371183 com.aptrix.pluto.site.Site
-161464932 com.aptrix.pluto.site.SiteArea
1494027823 com.aptrix.pluto.presentation.Template
-325197706 com.aptrix.pluto.workflow.stages.WorkflowStage
540603119 com.aptrix.pluto.workflow.Workflow
1448064964 com.aptrix.pluto.workflow.actions.PublishAction
176287587 com.aptrix.pluto.content.Content
-1290676433 com.aptrix.pluto.taxonomy.Taxonomy
-977641252 com.aptrix.pluto.taxonomy.Category
-965915661 com.aptrix.pluto.content.link.ContentLink
629030940 com.aptrix.pluto.presentation.Style
-1363088202 com.aptrix.pluto.workflow.actions.ExpireAction
858827521 com.aptrix.pluto.cmpnt.FileResourceCmpnt
-1535525020 com.aptrix.pluto.cmpnt.ImageResourceCmpnt
-1598427284 com.aptrix.pluto.cmpnt.MenuCmpnt
-1110739776 com.aptrix.pluto.cmpnt.HTMLCmpnt
538485974 com.aptrix.pluto.cmpnt.NavigatorCmpnt
1671285158 com.aptrix.pluto.cmpnt.JSPCmpnt