Introduction
There are already a handful of blogs on SCN dealing with Excel to XML conversion - so why another one??
Here is my wish list for a comprehensive solution:-
- Able to read all kinds of Excel format (XLS and XLSX)
- Behaves in a similar way as MessageTransformBean
- Highly configurable - develop/deploy once, use multiple times
- Able to handle XML special characters
In the reference section below are some of the more popular approaches, however below are some of the (non-exhaustive) limitations/drawbacks
- XLSX files stores string contents in a separate sharedStrings.xml file in the zipped XLSX file
- JExcel API does not support Excel 2007 XLSX formats
- Limitations in handling formulas, formatting and special characters in cells
ExcelTransformBean is an attempt to provide a generic adapter module solution (a la MessageTransformBean) that is highly configurable and reusable. It is based on the Apache POI API. Utilizing the combined SS interface of the API, it uses a single logic to read all kinds of Excel files (XLS and XLSX.)
Source Code
The full source code can be found in the following public repository on GitHub.
GitHub repository for ExcelTransformBean
This module is based on Apache POI 3.9 library. In order for the Java project to compile and build successfully, the following JAR files need to be referenced/imported into the project.
- poi-3.9-20121203.jar
- poi-ooxml-3.9-20121203.jar
- poi-ooxml-schemas-3.9-20121203.jar
- xmlbeans-2.3.0.jar
- dom4j-1.6.1.jar
The library files can be downloaded from Apache's website, direct link to the ZIP file is provided below.
Module Parameter Reference
Below is a list of the parameters for configuration of the module. Certain parameters will automatically inherit the default values if it is not configured.
Parameter Name | Allowed values | Default value | Remarks |
---|---|---|---|
sheetName | The name of the active Excel sheet to extract. Either sheetName or sheetIndex must be populated. | ||
sheetIndex | Integer values beginning from 0 | The index of the active Excel sheet to extract (starts from 0.) Either sheetName or sheetIndex must be populated. | |
skipEmptyRows | YES, NO | YES | Empty rows to be skipped or not |
rowOffset | Integer values beginning from 1 | 0 | Starting row to begin extracting content from (i.e. 0 = start from first row, 1 = start from second row.) If processFieldNames = 'fromFile' and rowOffset = 0, first line is always skipped |
processFieldNames | fromFile, fromConfiguration, notAvailable | Required field. Determines the naming of each column of the rows, and the number of columns to extract:
| |
fieldNames | Name of columns. Required field whenprocessFieldNames = 'fromConfiguration' | ||
columnCount | Integer values beginning from 1 | Number of columns for extraction. Required field whenprocessFieldNames = 'notAvailable' | |
recordName | Record | XML element name for row of record in output | |
documentName | Required field. Document name of root element of XML output | ||
documentNamespace | Required field. Namespace of root element of XML output | ||
formatting | excel, raw | excel | Controls how the cell contents are formatted in XML output
|
evaluateFormulas | YES, NO | YES | Controls how cell contents with formulas are displayed in XML output
|
emptyCellOutput | suppress, defaultValue | suppress | Controls how empty cells are displayed in XML output
|
emptyCellDefaultValue | <blank> | If emptyCellOutput = 'defaultValue', all empty cells will be populated with value in this parameter | |
indentXML | YES, NO | NO | Determines if XML output will be indented or not |
debug | YES, NO | NO | Displays contents in Audit Log of each cell extracted. WARNING: Use this only for debugging in non-productive systems |
Example Scenarios
Here are some example scenarios of the behavior of the conversion based on different configuration options.
Scenario 1
Excel 2007 XSLX file format.
Extract Sheet1 with column names determined directly from header line of file.
Special character & automatically converted
Module parameters
Parameter Name | Parameter Value |
---|---|
sheetName | Sheet1 |
processFieldNames | fromFile |
documentName | MT_Order |
documentNamespace | urn:equalize:com |
Result
Input | |
Output |
Scenario 2
Excel binary XLS file format.
Extract sheet at index 0. Column names are provided from configuration.
Row offset provided to skip first two lines.
No formatting of cells, so raw values displayed.
Module parameters
Parameter Name | Parameter Value |
---|---|
sheetIndex | 0 |
processFieldNames | fromConfiguration |
fieldNames | Order,Date,Material,Quantity |
rowOffset | 2 |
recordName | Line |
documentName | MT_CustomOrder |
documentNamespace | urn:equalize:com |
formatting | raw |
Result
Input | |
Output |
Scenario 3
Excel 2007 XSLX file format.
Extract sheet at index 0.
Column names are not available. Number of columns = 5.
Row offset provided to skip first line.
Empty rows are included.
Cells with formula are displayed with formula instead of result.
Empty cells are displayed with default value "space".
Module parameters
Parameter Name | Parameter Value |
---|---|
sheetIndex | 0 |
processFieldNames | notAvailable |
columnCount | 5 |
rowOffset | 1 |
documentName | MT_CustomOrder |
documentNamespace | urn:equalize:com |
skipEmptyRows | NO |
evaluateFormulas | NO |
emptyCellOutput | defaultValue |
emptyCellDefaultValue | space |
Result
Input | |
Output |
Reference
This article does not cover the steps for creating a custom adapter module. This can be found easily via an SCN search. It is also listed in the reference section of the article below regarding adapter module testing.
Standalone testing of Adapter Module in NWDS
Alternative methods for Excel conversion
PI/XI: Reading MS Excel's XLSX and XLSM files with standard PI modules - easily...
Excel Files - How to handle them in SAP XI/PI (The Alternatives)