A Language Translation Service for Documentum

I mentioned in June that my article for the EMC Proven Professional Knowledge Sharing Competition was selected for publication and inclusion in this year’s Book of Abstracts.  The publication date has finally arrived!  My 2015 EMC Proven Professionals Knowledge Sharing article, A Language Translation Service for Documentum, is now available on the ECN!  Here is the direct link to the PDF.  Please read it and let me know what you think.

With the kind consent of EMC, my 2015 Knowledge Sharing article is also available on my Publications page (or here directly) for those of you without an ECN account.

KS2015-cert

I did a short video interview about this article at EMC World 2015; check it out.

I also did a short demo video if you want to see the translation service in action.

 

 

Advertisements

Knowledge Sharing Video

Check out the interview I did for the EMC Knowledge Sharing program explaining my article, A Language Translation Service for Documentum.  The video is linked over at the Armedia site.

UPDATE:  Read the published paper here.

2015 Knowledge Sharing Book of Abstracts

The 2015 Knowledge Sharing Book of Abstracts has been published by EMC.  This book contains summaries of all of the 2015 Knowledge Sharing articles chosen for publication this year.  The articles are really interesting; download the Book of Abstracts and see what articles interest you and when they will be published.  You can download the competition winning articles immediately.  My Knowledge Sharing article, A Language Translation Service for Documentum, is on page 33 and is scheduled to be published (pdf) in September 2015.

Abstract:

In our highly-connected and diverse society, the expectation that online content be multilingual is greater today than ever before. It is expected that licensing agreements, rules and regulations, and disclaimers be available in multiple languages on websites belonging to software vendors, credit card companies, insurers, and other service providers. From a content management perspective, how do you produce, manage, and maintain all of these translations? What if the “source” document changes? How do these changes ripple through to the rest of the translations? What if one of the translations needs to be “tweaked”? How do you keep its versions in sync with the source document?

This Knowledge Sharing article discusses a solution for creating and maintaining multiple translations of content in a Documentum® repository. The solution uses the inherent content management capabilities of the Documentum Content Server to manage content, versions, and relationships among documents, and leverages the Content Server’s infrastructure (specifically Service-based Objects, asynchronous jobs, and external database tables) to integrate with a translation services provider for the production of translations. The translation services provider used for this discussion is Lingotek (www.lingotek.com). Lingotek offers a comprehensive RESTful API that integrates easily with Documentum to provide a seamless solution for the production and management of multilingual content.

 

2015-KS-cover 2015Abstract

UPDATE:  Read the published paper here.

Similarity Index Post at Armedia

FYI and ICYMI – I have a blog post at Armedia recapping my EMC Proven Professional Knowledge Sharing article, Finding Similar Documents without Using a Full Text Index.

2014 Knowledge Sharing Article Published

As I mentioned in June, my 2014 EMC Proven Professionals Knowledge Sharing article, Find Similar Documents Without Using A Full Text Index, has been published on the ECN.  Here is the direct link to the PDF.  Please read it, download the code, give it a try, and let me know what you think.

UPDATE:  With the kind consent of EMC, my 2014 Knowledge Sharing article is now available on my Publications page (or here directly) for those of you without an ECN account.

My cert for being a published author!

KS_pub_author_2014_cert

My 2014 Knowledge Sharing Abstract

For the past 8 years, EMC has held an annual Knowledge Sharing Competition among it Proven Professionals.  This past year, I entered the competition with my article, Finding Similar Documents In Documentum Without Using a Full Text Index.  Though I didn’t win, my article was chosen for publication.

Here is the article’s abstract:

This Knowledge Sharing article will discuss how to configure Documentum to enable identification of syntactically similar content without the use of a full text indexing engine. The technique described utilizes a Java Aspect to calculate SimHash values for content objects and stores them in a database view. The database view can then be queried programmatically via an Aspect or by using DQL to identify content similar to a selected object.

Many systems that identify similar content do so by storing a collection of fingerprints (sometimes called a sketch) for each document in a database with other fingerprints. When similar content is requested, these systems apply various algorithms to match the selected content’s fingerprints with those stored in the database. Full text indexing solutions also require databases and index files to store word tokens, stems, synonyms, locations, etc. to facilitate identification of similar content. Some full text search engines can be configured to select the most important words from a document, and build a query using those words to identify similar content in its indexes.

The solution I discuss in the article condenses the salient features of a document into a single, 64-bit hash value that can be attached directly to the content object as metadata, thus eliminating the need for additional databases, indexes, or advanced detection algorithms. Similar content can be detected by simply comparing hash values.

All of the articles selected for publication have been collected into a book of abstracts.  The 2014 book of abstracts can be accessed here (login may be required); mine is on page 41.  My article should be available for download in September 2014.  I will let you know when it is available.

AbstractCover Abstract

 

%d bloggers like this: