Quantcast
Channel: SCN : All Content - Process Integration (PI) & SOA Middleware
Viewing all articles
Browse latest Browse all 7030

ExcelTransformBean Part 1: Convert various Excel formats to simple XML easily

$
0
0
Update 3 Nov 2014: Refactoring of source code to cater for SimpleExcel2XML and SimpleXML2Excel conversions.

 

Introduction

There are already a handful of blogs on SCN dealing with Excel to XML conversion - so why another one??

 

Here is my wish list for a comprehensive solution:-

  • Able to read all kinds of Excel format (XLS and XLSX)
  • Behaves in a similar way as MessageTransformBean
  • Highly configurable - develop/deploy once, use multiple times
  • Able to handle XML special characters

 

In the reference section below are some of the more popular approaches, however below are some of the (non-exhaustive) limitations/drawbacks

  • XLSX files stores string contents in a separate sharedStrings.xml file in the zipped XLSX file
  • JExcel API does not support Excel 2007 XLSX formats
  • Limitations in handling formulas, formatting and special characters in cells

 

ExcelTransformBean is an attempt to provide a generic adapter module solution (a la MessageTransformBean) that is highly configurable and reusable. It is based on the Apache POI API. Utilizing the combined SS interface of the API, it uses a single logic to read all kinds of Excel files (XLS and XLSX.)

 

This first part covers Excel to simple XML conversion, while the second part covers simple XML to Excel conversion.

 

Source Code

The full source code can be found in the following public repository on GitHub.

GitHub repository for ExcelTransformBean

 

This module is based on Apache POI 3.9 library. In order for the Java project to compile and build successfully, the following JAR files need to be referenced/imported into the project.

  • poi-3.9-20121203.jar
  • poi-ooxml-3.9-20121203.jar
  • poi-ooxml-schemas-3.9-20121203.jar
  • xmlbeans-2.3.0.jar
  • dom4j-1.6.1.jar

 

The library files can be downloaded from Apache's website, direct link to the ZIP file is provided below.

Apache POI 3.9 ZIP file

 

Update: Source codes have been refactored. Always get the latest version from the GitHub repository.

 

Module Parameter Reference

Below is a list of the parameters for configuration of the module for Excel to XML conversion (conversionType = 'SimpleExcel2XML'.) Certain parameters will automatically inherit the default values if it is not configured.

 

Parameter NameAllowed valuesDefault valueRemarks
conversionTypeSimpleExcel2XMLRequired field. Determines conversion type.
sheetNameThe name of the active Excel sheet to extract. Either sheetName or sheetIndex must be populated.
sheetIndexInteger values beginning from 0The index of the active Excel sheet to extract (starts from 0.) Either sheetName or sheetIndex must be populated.
skipEmptyRowsYES, NOYESEmpty rows to be skipped or not
rowOffsetInteger values beginning from 10Starting row to begin extracting content from (i.e. 0 = start from first row, 1 = start from second row.) If processFieldNames = 'fromFile' and rowOffset = 0, first line is always skipped
processFieldNames

fromFile, fromConfiguration, notAvailable

Required field. Determines the naming of each column of the rows, and the number of columns to extract:

  • fromFile = Column names and number of columns are determined from header line of the sheet
  • fromConfiguration = Column names and number of columns are determined from parameter fieldNames
  • notAvailable = Column names will be set as ColumnX, where X = 1,2,3,4. Number of columns will be determined from parameter columnCount
fieldNamesName of columns. Required field whenprocessFieldNames = 'fromConfiguration'
columnCountInteger values beginning from 1Number of columns for extraction. Required field whenprocessFieldNames = 'notAvailable'
recordNameRecordXML element name for row of record in output
documentNameRequired field. Document name of root element of XML output
documentNamespaceRequired field. Namespace of root element of XML output
formattingexcel, rawexcelControls how the cell contents are formatted in XML output
  • excel = Cells are displayed the same way as Excel formatting of corresponding cell
  • raw = Raw value of cells are displayed
evaluateFormulasYES, NOYESControls how cell contents with formulas are displayed in XML output
  • YES = Cells are displayed with result of formula evalution
  • NO = Cells are displayed with actual formula
emptyCellOutputsuppress, defaultValuesuppressControls how empty cells are displayed in XML output
  • suppress = Empty cells are not displayed (no corresponding XML tags for empty cells)
  • defaultValue = Empty cells will be displayed with default value
emptyCellDefaultValue<blank>If emptyCellOutput = 'defaultValue', all empty cells will be populated with value in this parameter
indentXMLYES, NONODetermines if XML output will be indented or not
debugYES, NONODisplays contents in Audit Log of each cell extracted. WARNING: Use this only for debugging in non-productive systems

 

 

Example Scenarios

Here are some example scenarios of the behavior of the conversion based on different configuration options.

 

Scenario 1

Excel 2007 XSLX file format.

Extract Sheet1 with column names determined directly from header line of file.

Special character & automatically converted

 

Module parameters

Parameter NameParameter Value
conversionTypeSimpleExcel2XML
sheetNameSheet1
processFieldNamesfromFile
documentNameMT_Order
documentNamespaceurn:equalize:com

 

Result

Inputin1.png
Outputout1.png

 

Scenario 2

Excel binary XLS file format.

Extract sheet at index 0. Column names are provided from configuration.

Row offset provided to skip first two lines.

No formatting of cells, so raw values displayed.

 

Module parameters

Parameter NameParameter Value
conversionTypeSimpleExcel2XML
sheetIndex0
processFieldNamesfromConfiguration
fieldNamesOrder,Date,Material,Quantity
rowOffset2
recordNameLine
documentNameMT_CustomOrder
documentNamespaceurn:equalize:com
formattingraw

 

Result

Inputin2.png
Outputout2.png

 

 

Scenario 3

Excel 2007 XSLX file format.

Extract sheet at index 0.

Column names are not available. Number of columns = 5.

Row offset provided to skip first line.

Empty rows are included.

Cells with formula are displayed with formula instead of result.

Empty cells are displayed with default value "space".

 

Module parameters

Parameter NameParameter Value
conversionTypeSimpleExcel2XML
sheetIndex0
processFieldNamesnotAvailable
columnCount5
rowOffset1
documentNameMT_CustomOrder
documentNamespaceurn:equalize:com
skipEmptyRowsNO
evaluateFormulasNO
emptyCellOutputdefaultValue
emptyCellDefaultValuespace

 

Result

Inputin3.png
Outputout3.png

 

 

Reference

Part 2 - ExcelTransformBean Part 2: Convert simple XML to various Excel formats easily

 

This article does not cover the steps for creating a custom adapter module. This can be found easily via SCN search. It is also listed in the reference section of the article below regarding adapter module testing.

Standalone testing of Adapter Module in NWDS

 

Alternative methods for Excel conversion

PI/XI: Reading MS Excel's XLSX and XLSM files with standard PI modules - easily...

Excel Files - How to handle them in SAP XI/PI (The Alternatives)

A Simple approach in Reading Excel File


Viewing all articles
Browse latest Browse all 7030

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>