The Basics – Specialized Query Methods

Following the Running a DQL Query post, this post looks at some simple ways to specialize DQL queries based upon their type (e.g., select, set, create, delete, insert, drop, cached, and update) and expected return values, and the benefits such specialization provides.  The Basics library of Documentum methods containing the specialized query methods as well as all other methods discussed in this series, can be found here.

The Basics library contains 10 query methods.  The method signatures follow:

  1. public static IDfCollection runSelectQuery(String query, IDfSession session) throws DfException
  2. public static dmRecordSet runSelectQueryAsRecordSet(String query, IDfSession session) throws Exception
  3. public static int runUpdateObjectQuery(String query, IDfSession session) throws DfException
  4. public static IDfCollection runCachedQuery(String query, IDfSession session) throws DfException
  5. public static String runCreateObjectQuery(String query, IDfSession session) throws DfException
  6. public static int runDeleteObjectQuery(String query, IDfSession session) throws DfException
  7. public static IDfCollection runExecQuery(String query, IDfSession session) throws DfException
  8. public static int runInsertQuery(String query, IDfSession session) throws DfException
  9. public static int runDeleteQuery(String query, IDfSession session) throws DfException
  10. public static int runUpdateQuery(String query, IDfSession session) throws DfException

I will leave the exploration of the code to you, and instead discuss a few of the benefits of these specialized query methods.

  1. Using specialized query methods makes your primary code much simpler to create, read, and maintain.  You don’t have to clutter the code with IDfQuery and IDfCollection objects.
  2. These methods provide simpler handling of return values.  For example, if you are only interested in a single result value (e.g., number objects updated), you don’t have to clutter your code with loops, etc. to retrieve that value.
  3. Using specialized query methods, you don’t have to worry about using the correct or most efficient query constants.
  4. These methods provide rudimentary syntax checking, which could be improved/extended for your purposes.  This can help alleviate receiving cryptic error messages from the query engine.
  5. Because all of these methods eventually call a single “uber” query method, error checking/trapping is centralized.

Here are some quick examples:

				// SELECT
				query = "select r_object_id, object_name from dm_document where folder('/Temp')";
				System.out.println("running SELECT query: " + query);
				col = DCTMBasics.runSelectQuery(query,session);
				// do something with IDfCollection
				col.close();

				// CACHED
				query = "select r_object_id, object_name from dm_document where folder('/Temp',descend)";
				System.out.println("running CACHED query: " + query);
				long start = System.currentTimeMillis();
				col = DCTMBasics.runCachedQuery(query,session);
				long stop = System.currentTimeMillis();
				long dif = stop - start;
				// do something with IDfCollection
				col.close();
				System.out.println("duration=" + dif);

				System.out.println("[running second query]");
				start = System.currentTimeMillis();
				query = "select r_object_id, object_name from dm_document where folder('/Temp',descend)";
				col = DCTMBasics.runCachedQuery(query,session);
				stop = System.currentTimeMillis();
				dif = stop - start;
				// do something with IDfCollection
				col.close();
				System.out.println("duration=" + dif);

				// OBJ CREATE
				query = "create dm_document object set object_name = 'DCTMBasics Test Object' link '/Temp'";
				System.out.println("running OBJ CREATE query: " + query);
				String objId = DCTMBasics.runCreateObjectQuery(query,session);
				System.out.println("created " + objId);

				// EXEC
				query = "execute db_stats";
				System.out.println("running EXEC query: " + query);
				col = DCTMBasics.runExecQuery(query,session);
				// do something with IDfCollection
				col.close();

				// OBJ UPDATE
				System.out.println("running OBJ UPDATE query");
				query = "update dm_document object set object_name = 'DCTMBasics updated object name' where r_object_id = '" + objId + "'";
				int cnt = DCTMBasics.runUpdateObjectQuery(query,session);
				System.out.println("update " + cnt + " objs");

				// OBJ DELETE
				System.out.println("running OBJ DELETE query");
				query = "delete dm_document object where r_object_id = '" + objId + "'";
				cnt = DCTMBasics.runDeleteObjectQuery(query,session);
				System.out.println("deleted " + cnt + " objs");
Advertisements

DFC Query Timing, Take 2

A few years back I wrote a post explaining a brute force method to capture and calculate the execution time of queries run against a Documentum repository.  I have found this technique to be useful from time-to-time to prove to users, managers, other engineers that the latency in a system is not caused by Documentum*.  🙂

Krzysztof Jurkowski has taken this technique to a new level with his DQLPerformanceTest tool.  The DQLPerformanceTest tool is a command line Java app with lots of features.  To try it out, I reran the queries that were the subject of my previous post on query timing to see if those conclusions still held true using Krzysztof’s tool.  Recall the two queries were:

  • select r_object_id from dm_document where DATEFLOOR(day,"r_creation_date") > DATE("01/01/2014")
  • select r_object_id from dm_document where r_creation_date > DATE("01/01/2014")**

where I was interested in how “expensive” the DATEFLOOR function was.  Here are the transcripts of my two tests using the DQLPerformanceTest tool:

C:\>java -jar DQLPerformanceTest.jar -d repo1 -l dmadmin -p dmadmin -w 10 -q "select r_object_id from dm_document where DATEFLOOR(day,""r_creation_date"") > DATE('01/01/2014')" -f 
Do you want to execute following query 10 time(s)? 
select r_object_id from dm_document where DATEFLOOR(day,"r_creation_date") > DATE('01/01/2014') 
Do you want to proceed [y/n]? 
y 
Script started 
Creating performance testing workers... 
Worker-0 loged in as dmadmin on repo1@DCTM7DEV 
Worker-1 loged in as dmadmin on repo1@DCTM7DEV 
Worker-2 loged in as dmadmin on repo1@DCTM7DEV 
Worker-3 loged in as dmadmin on repo1@DCTM7DEV 
Worker-4 loged in as dmadmin on repo1@DCTM7DEV 
Worker-5 loged in as dmadmin on repo1@DCTM7DEV 
Worker-6 loged in as dmadmin on repo1@DCTM7DEV 
Worker-7 loged in as dmadmin on repo1@DCTM7DEV 
Worker-8 loged in as dmadmin on repo1@DCTM7DEV 
Worker-9 loged in as dmadmin on repo1@DCTM7DEV 
Created 10 workers 
Executing workers... 
Waiting for workers to finish... 
Worker-6: 20318 ms 
Worker-1: 20318 ms 
Worker-3: 21977 ms 
Worker-0: 22120 ms 
Worker-8: 23721 ms 
Worker-7: 23725 ms 
Worker-9: 41218 ms 
Worker-5: 41220 ms 
Worker-2: 41307 ms 
Worker-4: 41309 ms 
Workers finished 
Average time: 29,29694 s 
Median time: 23,23700 s 
Min time: 20,20298 s 
Max time: 41,41268 s 
Script finished 

C:\>java -jar DQLPerformanceTest.jar -d repo1 -l dmadmin -p dmadmin -w 10 -q "select r_object_id from dm_document where r_creation_date > DATE('01/01/2014')" -f 
Do you want to execute following query 10 time(s)? 
select r_object_id from dm_document where r_creation_date > DATE('01/01/2014') 
Do you want to proceed [y/n]? 
y 
Script started 
Creating performance testing workers... 
Worker-0 loged in as dmadmin on repo1@DCTM7DEV 
Worker-1 loged in as dmadmin on repo1@DCTM7DEV 
Worker-2 loged in as dmadmin on repo1@DCTM7DEV 
Worker-3 loged in as dmadmin on repo1@DCTM7DEV 
Worker-4 loged in as dmadmin on repo1@DCTM7DEV 
Worker-5 loged in as dmadmin on repo1@DCTM7DEV 
Worker-6 loged in as dmadmin on repo1@DCTM7DEV 
Worker-7 loged in as dmadmin on repo1@DCTM7DEV 
Worker-8 loged in as dmadmin on repo1@DCTM7DEV 
Worker-9 loged in as dmadmin on repo1@DCTM7DEV 
Created 10 workers 
Executing workers... 
Waiting for workers to finish... 
Worker-1: 5584 ms 
Worker-7: 14351 ms 
Worker-3: 14539 ms 
Worker-0: 14609 ms 
Worker-8: 14835 ms 
Worker-6: 15081 ms 
Worker-4: 15417 ms 
Worker-2: 15418 ms 
Worker-5: 15792 ms 
Worker-9: 15808 ms 
Workers finished 
Average time: 14,14129 s 
Median time: 14,14944 s 
Min time: 5,5579 s 
Max time: 15,15793 s 
Script finished

My conclusion is the same:  DATEFLOOR is expensive to use; in this case, roughly twice as expensive.  No surprise, really.  The point is the tool, not the conclusion.

The DQLPerformanceTest tool is nicely put together and a real cut above my brute force method for query timing.  You can access Krzysztof’s tool from the ECN, or on the dm_misc Tools page.

* – It is also a useful technique for tuning DQL queries to make them perform better before you engage with others.
** – I changed the dates from the original queries to adjust for the fact that A LOT of content has been added to the repository since the original post was written, and I didn’t want the queries to run too long.

Timing Queries

Recently some colleagues and I were trying to debug some slow running DQL.  We decided that it would be really convenient if iDQL32 or one of the various Documentum utilities (RepoInt, Samson, etc.) included a feature that captured timing statistics.  For example, it would be nice to know that the query ran for 5 seconds, it took 1 second to parse the results and 1 second to transmit the results back to the client for a total cost of 7 sec.  Well, as far as I know, no such feature exists, but the same thing can be accomplished using tracing.  For example, using iAPI32, issue the following command before you run your query to turn on tracing with timing statistics:

API> trace,c,10,c:\query.log

Then run your query.

Then issue the following command to turn tracing off:

API> trace,c,0.

The query we were interested in tracing used the DATEFLOOR command.  I was curious about how “expensive” the DATEFLOOR command was.  Here are the two queries I tested followed by the results of the trace.

  • select r_object_id from dm_document where DATEFLOOR(day, "r_creation_date") > DATE("08/01/2000");
  • select r_object_id from dm_document where r_creation_date > DATE("08/01/2000");
1.985  0.109 N/A  [main] com.documentum.dmcl.impl.DmclApiNativeAdapter@9fea8a.get("readquery,c,select r_object_id from dm_document where DATEFLOOR(day, r_creation_date) > DATE('08/01/2000')") ==> "q0" 
2.094  0.000 N/A  [main] com.documentum.dmcl.impl.DmclApiNativeAdapter@9fea8a.get("get,c,q0,_count") ==> "1" 
2.094  0.000 N/A  [main] com.documentum.dmcl.impl.DmclApiNativeAdapter@9fea8a.get("get,c,q0,_names[0]") ==> "r_object_id" 
2.094  0.000 N/A  [main] com.documentum.dmcl.impl.DmclApiNativeAdapter@9fea8a.get("get,c,q0,_types[0]") ==> "2" 
2.094  0.000 N/A  [main] com.documentum.dmcl.impl.DmclApiNativeAdapter@9fea8a.get("get,c,q0,_lengths[0]") ==> "16" 
2.094  0.000 N/A  [main] com.documentum.dmcl.impl.DmclApiNativeAdapter@9fea8a.exec("next,c,q0") ==> true 
..... 
3.891  0.000 N/A  [main] com.documentum.dmcl.impl.DmclApiNativeAdapter@9fea8a.exec("close,c,q0") ==> true 
3.907  0.031 N/A  [main] com.documentum.dmcl.impl.DmclApiNativeAdapter@9fea8a.get("readquery,c,select r_object_id from dm_document where r_creation_date > DATE('08/01/2000')") ==> "q0" 
3.938  0.000 N/A  [main] com.documentum.dmcl.impl.DmclApiNativeAdapter@9fea8a.get("get,c,q0,_count") ==> "1" 
3.938  0.000 N/A  [main] com.documentum.dmcl.impl.DmclApiNativeAdapter@9fea8a.get("get,c,q0,_names[0]") ==> "r_object_id" 
3.938  0.000 N/A  [main] com.documentum.dmcl.impl.DmclApiNativeAdapter@9fea8a.get("get,c,q0,_types[0]") ==> "2" 
3.938  0.000 N/A  [main] com.documentum.dmcl.impl.DmclApiNativeAdapter@9fea8a.get("get,c,q0,_lengths[0]") ==> "16" 
3.938  0.016 N/A  [main] com.documentum.dmcl.impl.DmclApiNativeAdapter@9fea8a.exec("next,c,q0") ==> true 
..... 
5.641  0.000 N/A  [main] com.documentum.dmcl.impl.DmclApiNativeAdapter@9fea8a.exec("close,c,q0") ==> true

So, to get the execution time for the first query, subtract 1.985 from 2.094 to get 0.109 seconds.  This gives you the time from the beginning of the readquery API to the beginning of the get API on the returned collection.  If you subtract the time for the first get API from the time for the close API command, you find that it took 1.797 seconds to parse the results from the collection object and return them to client.  Similarly, the second query execution was 3.938 – 3.907 = 0.031 seconds.  It took 1.703 seconds to parse the results.

The conclusion is that using the DATEFLOOR command is approximately 3 times more ‘expensive’ than not using it, and as it turns out, not really necessary for this query any way.

Additional Notes:

The granularity of the timer results will depend upon your system hardware.  On Windows-based systems you can usually expect about 10-15ms granularity.

By the way, you never have to leave iAPI32 to run your queries if you use the “?” operator.  For example:

API> trace,c,10,c:\query.log
API> ?,c,select r_object_id from dm_document where DATEFLOOR(day, r_creation_date) > DATE('08/01/2000');
API> trace,c,0.

For a good explanation of trace files, see: http://www.xense.co.uk/resources_dmcltr_overview.htm

%d bloggers like this: