The source schema comes from an SQL database. It comprises a single repeatable Order record, containing all the fields.
The destination schema comprises a number of records which are embedded.
Our requirement here is to group the source data based on a number of criteria:
We could receive some 50,000 input rows to the mapping. Hence the need to find the optimum method to carry out this processing.
Firstly, it seems difficult to use the mapper for such processing. A map in full XSLT is definitely needed. But there is no Group By in XSLT 1.0, so the Muenchian method must be used.
https://en.wikipedia.org/wiki/XSLT/Muenchian_grouping
This method entails creating a composite key.
<xsl:key name="" match="" use=""/>
A key is built from a name, an XPath query and the aspect enabling identification of the row sought, which might be an element, attribute or a calculated value.
Note that a key cannot be created in a template. We cannot therefore use a variable in the XPath query.
In our case, the first grouping uses the ShipmentNumber field.
The first key is therefore as follows:
<xsl:key name="groupByShipment" match="/s0:ListOrders/Order" use="./ShipmentNumber"/>
Then, to carry out the grouping, the key must be called in a for each loop.
The generate_id() function then serves to identify a specific node depending on the key that is used.
<xsl:for-each select= "/s0:ListOrders/Order[generate-id(.)=generate-id(key('groupByShipment', ./ShipmentNumber))]">
On exiting the for each loop, we therefore have the equivalent of a ‘distinct’ executed against the ShipmentNumber field.
Within the for each loop, we assign all the rows with the same ShipmentNumber to a variable.
<xsl:variable name="Shipment" select="key('groupsByShipment', ./ShipmentNumber)"/>
Now we have to group on the ContainerNumber field, without losing the grouping by ShipmentNumber. We therefore need a second key:
<xsl:key name="groupByContainer" match="/s0:ListOrders/Order" use="concat(./ShipmentNumber, ./ContainerNumber)"/>
Using ‘concat’ on the two fields ungroups two identical ContainerNumbers that do not have the same ShipmentNumber.
We therefore have a new for each loop, which this time is going to group on ContainerNumber. Here, our start point is no longer the root, but the variable containing all the rows with the same ShipmentNumber.
Note that had we not used ‘concat’ but just ContainerNumber, the key would only have been valid once per ContainerNumber, and therefore, on the second iteration of groupeBy. Shipment, we would have been unable to have any ContainerNumber already found in the first iteration.
Then, the same operation is repeated for each grouping.