Wednesday, November 21, 2012

Synchronization of File Properties with Alfresco Metadata Properties

At Formtek we recently had a request to customize Alfresco Share for synchronizing document header properties with corresponding metadata for documents stored in Share.

I'm not able to share the code from the project here, but I thought that outlining the basic concept of the project here would serve as an example of some of the things which are possible to implement within Share.

There were a number of requirements for this project, but two two of the core ones were:

  1. Synchronize on uploads and metadata updates the properties of Microsoft Office (Word/Excel/PowerPoint all versions) and PDF files with corresponding metadata for the document in Alfresco.
  2. Provide a method for 'publishing' a synchronized document into another location as a PDF.  The file header of the published document should bring along with it the values for the Alfresco metadata at the time the document was published.
When I talk about file content header properties, I'm referring to the types of properties that can be set in the header of Microsoft Office and PDF files.  For example, the next figure is a screenshot in Microsoft Word 2010 for setting the standard (Title, Author, keywords, and subject) properties and custom properties.


Properties and custom properties can be similarly defined in PDF files.

Property Extraction on File Upload
When one of these files with properties/custom properties is uploaded into Alfresco, the document that is created captures this additional data based on a mapping properties file that is configured.

Within Share, the metadata would get mapped to something similar to the following panel view of the property data in the Share document detail window.


In this case, the mapping file that specifies how mapping from the content file to the Alfresco metadata properties is as follows:


Property Updates on Alfresco Metadata Edits
This mapping of properties on upload resembles standard Alfresco property extraction, or a special version of it that also accepts and knows how to map the custom property values.  But what is different is that two way synchronization with the properties in the file also occurs.  Note that the mapping also correctly handles the datatypes in the mapping, like boolean, text, number and date.

The property mapping is bi-directional so that when properties are updated in Alfresco, the electronic file associated with the document will be rewritten.  That means that the next time the document is downloaded, the properties in the file will be consistent with the corresponding properties in Alfresco.

Publishing to PDF
When a synchronized file is rendered as a PDF file and 'published', the user can select the location of a folder in the current or different Share site.  Actually for our customization, we call the 'publish' action 'transfer' to avoid confusion with the 'Publish' action already available in Share.

The user clicks on the 'Transfer to...' action for the document to start the process.


After that the user selects the target location of the published PDF document using a re-engineered Copy/Move to dialog from Share:


The rendered PDF file is then available as a new document in the target location.


When we download the file associated with this document and open it into Adobe Reader, we can examine the settings of the file properties.

The standard properties in the newly created PDF file are shown as:


And custom properties are seen here:


Tracking Published Documents

Within the original document, we also keep track of when the document has been published.  A panel in the document details page in Share for the original document now shows how many times the document has been published/transferred and to where.


No comments:

Post a Comment