Captiva Patch List

I was trolling through the EMC Support site and found this:  List of Released Patches for Captiva InputAccel, Dispatcher, InputAccel for Invoices, eInput and FormWare Published: October 31, 2013.

This document covers Captiva 6.5SP1, 6.5SP2, and 7.0.  Interestingly, there is no mention of 6.5SP3 (Perhaps it has no patches?), and no indication as to whether the service packs contain a roll up of all previous patches.  Also, the only way to get the patched software is to contact EMC Support directly, they don’t offer a publicly available download site for Captiva patches.

HTA Monitor for InputAccel and Documentum

I recently did some work for a client who had a fairly large Documentum and Captiva installation which spanned numerous servers and environments (16 servers x 3 environments). One of the challenges they faced when troubleshooting a problem was simply determining if all of the required processes (services) on each server were running. They were not using any network monitoring tools like Hyperic, or Nagios, or Reveille.

My solution was to put together a quick batch file that used some standard DOS commands to ping each host, and where possible, solicit some information about the processes it was running. This solution was OK, but required the use of a super user’s password in the clear, and didn’t really provide the information that was needed.

My next attempt at this solution had me writing a VBScript using WMI Scripting API objects to solicit information from the servers. This worked pretty well but suffered from two major drawbacks: 1) the super user’s password was still used in the clear; 2) the server list was so long, the results scrolled off the screen.

My latest approach to this solution is to use HTA – an HTML Application. HTA is a Microsoft program (mshta.exe) which uses HTML for the UI, and VBScript for program logic. HTA programs look and act like web sites, but HTAs execute locally without the constraints of the IE security model. In fact, HTA applications run as fully trusted applications, just like Microsoft Word. Obviously, this can be both good and bad. Fortunately for me, this was a good thing and allowed me to quickly port my VBScript to a framework that provided a simple and familiar UI construct, and didn’t make me jump through unnecessary hoops to execute administrative functions on the desktop.

Another plus for HTA is that the code is not compiled and can be easily viewed/modified in any text editor. This makes it easy to change the list hosts and processes monitored; making the application portable to other environments.

Here is a screen shot of the IACheck HTA application.

Notice that the user’s password is protected from view, and the process can be configured to loop, in this case, every 10 minutes. The looping feature allows the application to be launched on the console or admin’s desktop and continuously monitor an environment.

Though originally created to monitor InputAccel and Documentum processes, the application can be easily changed to meet your needs.  If you are interested, the source code for the application can be downloaded here. Simply change the names of the hosts and processes you would like to monitor and double-click the file to run it. I have included a list of Documentum and Captiva processes in the source code so you don’t have to look them up.

UPDATE:  Newer version of the monitor here.

Leveraging InputAccel for OCR

A few months back I chronicled my first experience with Captiva InputAccel development. This week, I’d like to supplement that with another experience I recently had. I have a standalone Java/DFC application that extracts TIF files from a Docbase, merges them together in a PDF file using iText, and creates a simple HTML index of all the PDF files created. Recently, the customer asked me to create OCRed PDFs so they could search for words and phrases in each file. I briefly surveyed Google for open source (and proprietary) Java libraries that would OCR TIF or PDF files and wasn’t happy with what I found. Then it occurred to me, Captiva’s NuanceOCR module did exactly what I wanted — and my customer already owned it — it was just a matter of leveraging that existing capability from my Java application.

Long story short: I set up MultiDirectoryWatch (MDW) to watch a folder where my Java application copied PDF files after they were merged. MDW kicked off a batch in InputAccel which performed the OCR and deposited the OCRed file in an output directory watched by my Java application. When the output file arrived, the Java application copied it back to where it belonged.  Simple.

The interesting part of this process, and why I thought it was blog-worthy, has to do with the short InputAccel process used to do the OCR. I had to include two modules and process steps that I found to be unintuitive. Here are the details of the process:

  • MultiDirectoryWatch
    • Level 0
    • Multi:0.Ready = 8
  • Multi
    • Level 1
    • ImageDivider:1.InputFile = MultiDirectoryWatch:1.OutputImage
    • ImageDivider:1.Ready = 1
  • ImageDivider
    • Level 1
    • NuanceOCR:0.Level0_InputImage = ImageDivider:0.OutputFile
  • NuanceOCR
    • Level 1
    • Format 1 = Adobe PDF with image on text
    • Save file to file system = true
    • File = @(MultiDirectoryWatch.OriginalFileName)
    • Overwrite file if it exists

The parts I found to be unintuitive were the Multi and the ImageDivider steps. It turns out, NuanceOCR (and a lot of other InputAccel modules) only process one page at a time. So, when I had MDW pass it a PDF composed of numerous pages of TIFs, it only processed the first page. OK, so using ImageDivider became more obvious after that revelation. But Multi? Turned out that Multi is a utility module that is generally used to restructure the internal InputAccel tree (e.g., create folders/documents/pages, delete folders/page/docs, etc.). It is required for ImageDivider to do its thing.

So, if you ever need a fairly easy and painless way to quickly OCR files, a short InputAccel process like this one may be your answer. The trick is to use Multi and ImageDivider to prepare each page for the OCR module.

Question:  Is there a way to programmatically (i.e., via API) to directly access NuanceOCR without having to create an IA process?

Captiva InputAccel v6.5 Tutorial

Lately, I haven’t been posting as regularly as I usually do, but I have a good reason.  I have been spending all of my spare time and creative energy on completing a tutorial that chronicles the how-tos and lessons learned from my  first InputAccel project.  You can find this tutorial on my Publications page or here directly.

When I started this InputAccel project, I had no experience with the product and was dismayed that there seemed to be scant information freely available on the Internet to help me.  Upon completion of this project, I decided to gather my hard-learned lessons and bundle them into a tutorial for anyone in the same situation I was in.  The tutorial walks through the development, configuration, deployment, and testing and debugging of a simple capture process that utilizes the scanning, indexing, OCR, ODBC export and Documentum export modules. I hope you enjoy it and find it useful.

In the next few weeks I will be attending the EMC TechSet 2012 and learning about D7 and D2.  I expect to have plenty to blog about then, and hope to return to a more regular posting schedule.

%d bloggers like this: