Wednesday, November 26, 2014

Deploying Nuxeo IDE Customizations

In this article, I have a tip for deploying Nuxeo customizations developed  using the Nuxeo Eclipse IDE.

But first let me first mention a few things about how to go about customizing Nuxeo.

There are two main methods for customizing the Nuxeo web application:
  • Nuxeo Studio
  • Nuxeo Eclipse IDE plugin

Nuxeo Studio

Nuxeo Studio is a cloud-based configuration tool.  It's available by subscription, but it's not required to have in order to use Nuxeo.  Actually, without Studio you don't miss out on any of the cool end-user Nuxeo product features, but by using it, you can save yourself a tremendous amount of development and administration time, and you'll also know that your configurations made in Studio are guaranteed to be automatically upgradable to future product versions of Nuxeo.  

Nuxeo without Studio is much harder.  If you don't use Studio you'll need to hand-code and debug quite a few configuration files.  That can involve writing a lot of XML, XHTML and other code, easily hundreds of lines of code even for simple configurations.  And while manually creating those files isn't really that complex, writing those files can be tedious, and it's easy to introduce syntax errors while writing them that may end up later costing you many hours of time trying to track down and fix. 

You can do a lot with Studio, and if you're serious about Nuxeo, you really should use it.  We've found, for example, that most projects we work on start out by developing a custom content model and designing associated create, view and edit forms.  With Nuxeo Studio, an analyst could easily build and test the content model and all associated forms without needing assistance from a developer.  Studio also allows you to graphically design workflows, set up automation tasks, and a lot more.

Nuxeo IDE

For many installations, using Nuxeo Studio for configuring your application is sufficient, but if you need to do even more in-depth customizations than what Studio lets you do, you can use Nuxeo's Eclipse plugin.  Nuxeo has great on-line documentation showing you how to use it.  Unlike Studio, the Nuxeo IDE is a tool that targets Java developers.

Using the Nuxeo Eclipse IDE you can extend and override parts of the Nuxeo application.  From a Nuxeo perspective within Eclipse you can create a Nuxeo project and then add artifacts to it.  You can then deploy your project changes, launch Tomcat and run and debug the Nuxeo application, all within the Eclipse environment.

Deploying the Project Bundle

Now for the tip.  

It's easy to hot reload Nuxeo projects within Eclipse using the Nuxeo IDE plugin.  That feature really speeds up development.  But when deploying your IDE-developed customizations to a new Nuxeo instance, there's an additional deployment file that you need to have in your project.

Nuxeo has an option in the IDE to jar all the files of your project.  To deploy your changes, you just create the jar and then drop it into the nxserver/bundles directory of your new instance and restart.

The option to jar is available by first right-clicking on your project in the Eclipse Nuxeo perspective and then selecting Nuxeo and Export Jar.


That's easy enough.  But there's one more thing you need to do to prepare the jar file for deployment on another server, and if you don't do it, you're likely to run into problems.  This step isn't needed when you're developing and deploying from within Eclipse and is easy to overlook when reading Nuxeo's explanation for how to use the Nuxeo Eclipse IDE.

Every time Tomcat is restarted, the nuxeo.war directory under nxserver will get redeployed and expanded.  Because of that, any files you may have attempted to manually add to the nuxeo.war area after a deployment will be lost the next time the war is redeployed.  

A feature of the Nuxeo Eclipse IDE is that after the war is expanded, the project's web asset files from the src/main/resources/web directory of your project will be automatically copied into the war area, modifying the standard Nuxeo instance with your customizations.


But when deploying your project jar file to another Nuxeo instance, if you just copy over the jar to the new instance, the web asset files in your jar won't be visible to Tomcat.  Similar to what is done automatically for you with the Eclipse hot reload, the web asset files need to be placed within the expanded war.  This can be done using the deployment-fragment.xml file.  This file needs to be placed in your project at the top of the directory src/main/resources/OSGI-INF, for example:


Here's an example of what you can put in that file:

<?xml version="1.0"?>
<fragment version="1">
  
  <extension target="application#MODULE"> 
    <module> 
      <java>${bundle.fileName}</java> 
    </module> 
  </extension>  
  
  <require>all</require>
  <install>
    <delete path="${bundle.fileName}.tmp"/>
    <unzip from="${bundle.fileName}" to="${bundle.fileName}.tmp"/>
    <copy from="${bundle.fileName}.tmp/web/nuxeo.war" to="/"/>
    <append from="${bundle.fileName}.tmp/OSGI-INF/I18n/com.formtek.nuxeo.xrefs.messages.properties" to="nuxeo.war/WEB-INF/classes/messages_en_US.properties" addNewLine="true"/>
    <append from="${bundle.fileName}.tmp/OSGI-INF/I18n/com.formtek.nuxeo.xrefs.messages.properties" to="nuxeo.war/WEB-INF/classes/messages_en.properties" addNewLine="true"/>
    <delete path="${bundle.fileName}.tmp"/>
  </install>  
</fragment>

You can see that the install section of the code unjars your bundle and copies over all assets that are under the web directory of your project to the corresponding area of the expanded nuxeo.war.

That's it.  With the deployment-fragment.xml file in place, your bundle will be correctly deployed into the target Nuxeo instance when Tomcat starts up.

The content of the deployment-fragment.xml file looks something like an ant build file.  It describes tasks that are run when the bundle file is loaded.  Some of the things that you can script in this file include:

    • unzip or unjar files 
    • create folders 
    • move files
    • delete files and folders
    • append files

[Note that there was a problem in the initial release of Nuxeo 6.0 for handling hot reloads.  Future releases are fixed.  For the 6.0 release, this JIRA explains a workaround.]

Monday, November 17, 2014

Nuxeo 6.0 and Elasticsearch

Lucidworks' Lucene and Solr have been the dominant open source search options for the last decade.  Solr is now widely used and is tightly integrated with many products, like those from Alfresco and PTC, and it's used by a wide variety of companies and organizations like Comcast, Disney, Goldmansachs and the FCC.  Here at Formtek, we've integrated it into Formtek Orion software too.

But Solr isn't the only viable open-source search option any more.  For example, it got my attention earlier this year when ECM vendor Nuxeo upgraded the search capabilities of their core product to use Elasticsearch in their 5.9.3 fast track release.  That Elasticsearch integration is officially available now in the Nuxeo long term support (LTS) 6.0 release and was just made available this week.

Solr Versus Elasticsearch

What makes Elasticsearch attractive as a technology?

There's actually a lot of similarities between Solr and Elasticsearch technologies. Both Elasticsearch and Solr are built on top of Lucene, and they're both Java-based Apache-licensed open source software. The feature sets for both of them are very comparable, partly because they're both built on top of Lucene.  Both technologies offer:
  • Java API and REST
  • Faceting
  • Highlighting
  • Replication
  • Distribution
But despite the similarities, or maybe because of them, Elasticsearch has seen tremendous growth in mindshare over the last two years. Google Trends shows that Elasticsearch interest surpassed interest in Solr in 2014.  So, at this point, while the Solr community is significantly bigger and Solr is more mature, Elasticsearch is growing quickly and is expected to grow even faster, especially now after Elasticsearch received $70 million of venture funding in June 2014.

Compared to Solr, opinions about ElasticSearch are often that it is simpler to configure and administer, it's use of REST and JSON is more intuitive, and it is built on an architecture that was designed from the ground up for distributed scaling.

Nuxeo Implementation of Elasticsearch

Some of the benefits of Elasticsearch derived by Nuxeo in their 6.0 release include:

  • Faster full text search
  • Query features like facets, geo location, and "more results like this"
  • Consistency with Nuxeo's NXQL query language
  • Ability to aggregate data for running reports and generating statistics
  • Highly scalability horizontally by adding Elasticsearch nodes
Eric Barroca, Nuxeo CEO, commented that "with Elasticsearch, we have separated the query engine from the database, which has major implications for architectural flexibility and performance.  Because Elasticsearch scales horizontally, the Nuxeo Platform now has virtually infinite scalability.”


Eventual Consistency


When working with Alfresco and Solr implementations I first ran into the problem of 'eventual consistency'.  I like Nuxeo's solution for this with their Elasticsearch implementation.

In short, the problem is that a repository which uses an external search engine often takes time to update the search indexes after any changes are made in the repository.  As a way to make client software seem more responsive, repositories like Alfresco and Nuxeo separate out the process of updating the search index from the database transaction. 

'Eventual consistency' or 'asynchronous indexing' refers to a small gap of time, often just seconds, between when a database operation occurs and when the request to update the search index to reflect the data changes is queued and then finally processed.  Ultimately both the database and search index will be consistent.

In Alfresco 4.0 you had to choose a search engine: either Lucene or Solr.  Lucene searches were 'in transaction' so that database and search indexes were always consistent, while Solr searches would use 'eventual consistency'.  Depending on your use case, it was possible to choose either the Solr or Lucene implementation, and that one engine would then be used for all queries.  But with Alfresco 5.0 Lucene is no longer available, so 'in transaction' consistency is no longer an option.

For most use cases, eventual consistency doesn't cause a problem.  But it means that if a query were to fire off immediately after a database update, the search results may not be totally consistent with what's actually in the database.

With Nuxeo 6.0 there are two ways to search data in the repository:
  • Elasticsearch index query, and
  • Direct Relational or No-SQL database query
Based on your use case, with Nuxeo, you can control which of these types of queries to run.  Elasticsearch queries will be fast but use 'eventual consistency'.  Queries made directly to the database will likely be slower, but provide assurance that the results are totally accurate.

Nuxeo 6.0 allows you to decide which of the two types of queries will be used, either database or Elasticsearch, and both of the query types can be used at different points in the same client application.


Wednesday, November 12, 2014

Nuxeo Platform 6.0 is Released

Version 6.0 of the Nuxeo platform was officially released today.

Nuxeo has been pretty busy over the last year and they've added some innovative features to their enterprise content management (ECM) platform that really set them apart from other ECM vendors.

While a number of the big features in the new Nuxeo release have been available via 'Fast Track' preview releases made periodically since last December, those features will now all officially roll up and become part of the fully-supported Nuxeo product feature set going forward.

Some of the major highlights of the Nuxeo platform 6.0 release include:

Elasticsearch - extremely scalable and distributed search engine.  Enables hierarchical faceted search.

Collections - a light-weight folder-like object for grouping documents.  Bulk operations like export and download can then be applied to to the collection

MongoDB - optional NoSQL-backend storage offering high flexibility, easy sharding and replication

Mule Connector - enables Nuxeo Automation operations to be inserted inside a Mule Flow, allowing easy integration with other software platforms like Salesforce, Marketo, SAP, and Magento

User Interface Enhancements - including a spreadsheet editor and lightbox support.

CMIS - supports CMIS 1.1 specification, like the new JSON browser binding

Mobile APIs - includes native client SDKs for iOS and Android, including offline sync

Javascript API - includes two implementations, one for node.js and another for jQuery

SAML2 and OAuth 2.0 - enables secure authentication for client applications

AES Encryption - encrypts content with an AES algorithm before moving into the store

A complete list that documents the changes and new features of the Nuxeo platform 6.0 release can be found in the product release notes here.

Nuxeo's Josh Fletcher will also be giving an overview next week on the Nuxeo 6.0 release in a webinar on November 18th.

You can also test drive the latest release here (login: Administrator/Administrator).

The next step in Nuxeo's open product roadmap is just two months away with the 7.1 Fast Track release planned for mid-January 2015.

Monday, November 10, 2014

CMIS Document Migration with Apache Chemistry and Camel


The Headache of Data Migration 

Migration of data between different content repositories can be difficult.  The primary goal of a migration project is to move as losslessly as possible the stored files, associated metadata and filing hierarchy from one system into another.  But data migration can be challenging.

Migrations typically require that an analyst first create a detailed map for how document types and properties will be transferred between the two systems, and then a developer implements that strategy by writing a migration script.  The actual migration process can be tedious and involve a sequence of imports and exports and things like parallel intermediate files or databases which hold normalized property data.

Something Easier: The Apache Camel camel-cmis Component

Recently while looking at how to migrate content stored in an Alfresco repository into a Nuxeo repository, I came across a blog article by Bilgin Ibryam about the Apache Camel project connector for CMIS, a component he contributed to the Camel project.  I was impressed by how he was able to define in just two lines of Java code a program that could move all the data from an Alfresco repository into Nuxeo by recursively iterating through the folder hierarchy starting at the repository root node, and preserving the hierarchy in the move.

While an indiscriminate migration of all content from one repository into another wasn't exactly what I was looking for, I did find that the camel-cmis component was a good starting point for creating a simple migration tool that could move content easily between CMIS compliant repositories.

Besides the repo-to-repo copy, the camel-cmis component also has the ability to identify groups of documents by using a CMIS query and can then pipe the document data from the result set into the next processing step of a Camel route.

Migrating Engineering Documents from Alfresco to Nuxeo

My goal was to be able to successfully migrate into Nuxeo engineering documents which were stored in Alfresco and defined by a content model and document type based on Alfresco aspects.

To do that, I tweaked the camel-cmis component to accept source and target folders, rather than migrate all documents from the repository starting at the repository root.

I modified the camel-cmis component to accept custom metadata properties, and by using CMIS 1.1 'secondary-types' Alfresco aspect data can also be handled.  Both Nuxeo and Alfresco understand CMIS 1.1.

And finally, I created a simple Camel Message Translator (Java bean) that maps the names of the document types and properties extracted from Alfresco to the names in the content model that are used by Nuxeo.  In this case, the property name translations were defined in a simple key-value property file which, when applied, maps the extracted property names before passing them into Nuxeo.



With that it's then possible to write a simple Camel route that defines a migration of data under an Alfresco folder to a Nuxeo folder:
    
from("cmis://http://54.198.64.173/alfresco/api/-default-/public/cmis/versions/1.1/atom?username=admin&password=admin&folderId=744385f3-27fd-4096-a29a-e6108d35cfa0")
    .to("bean:translate")
    .to("cmis://http://localhost:8080/nuxeo/atom/cmis?username=Administrator&password=Administrator&folderId=66d138e4-b0e6-41ee-91c2-aa6fc5991c5e");

This Camel route recursively copies the contents of a specified Alfresco folder and its children to a folder in the Nuxeo repository, maintaining the folder hierarchy.  The following screenshots show how documents and folder structure were moved from an Alfresco Share folder into Nuxeo.



Documents in Alfresco Share

Documents Migrated to Nuxeo

You can see that the documents moved from Alfresco were all engineering AutoCAD DWG files.  The files, custom metadata, and foldering hierarchy were copied into Nuxeo.  Then within Nuxeo we can see the migrated documents.  Also, through a configuration of Nuxeo, we are able to display the engineering metadata and render the AutoCAD file content as both thumbnails and preview images.

Using CMIS tools, and software plug-ins for engineering data management and AutoCAD document management, Formtek can assist organizations with ECM migration to the Nuxeo platform.

Footnotes on CMIS and Camel

The use of CMIS makes it easy to interact with compliant content repositories in a standard way.  It enables the easy sharing of content between repositories from different vendors  CMIS is based on a web services interface that accepts either REST or SOAP protocol.

The Apache Chemistry project provides open source implementation of the CMIS standard.  Both the Alfresco and Nuxeo implementations of CMIS  are based on the Chemistry libraries.  Chemistry offers CMIS server libraries only available for Java.  CMIS client libraries exist for Java, Python, PHP, .NET and ObjectiveC, but the Java libraries are the most complete and best tested.

Apache Camel is an open source framework for implementing Enterprise Integration Patterns (EIP).  It lets you use messaging and transport models like HTTP, ActiveMQ, JMS, JBI, SCA, and CXF to grab data, transform and move it to different end points.