Case transform is a part of Platform set of Transform in data services. It deals with the branching logic i.e it separates source data into multiple output data sets based on a condition.
For example, the source data from different countries is diverted to separate country tables based on certain conditions.
The condition based on which the data is branched has two parts: Label and Expression.
The label is the path name to the target table and the expression has the SQL logic that separates the data.
For example, define a label Region_INDIA where expression is Employee.regionId = 1.
Here, Employee is the source table having regionId as a column.
To understand it more clearly, login to the data services. Create some sample data in Microsoft SQL Server like below:
Import this table in your data services.
Create a datastore that connects data services to your database:
Right click on your datastore and select Search. Search for external table regionId:
Right click on the table and select ‘Import’. You can see now the table imported under your datastore:
Create a new project -Case_transform. Create a job under it and name it as ‘Case’.
Workflow is not mandatory, so create dataflow under the job.
Inside your dataflow, add the source table and make it as source.
Double click on the case transform and give the expressions. Click the ‘Add’ button. Give a label as ‘IND_Table’.
In the expression window, drag the region_id. Type ‘=1’
Similarly define other expressions.
Observe the options in case transform.
The option ‘Produce default output with label’ stores the data which do not satisfy any expression in a default table. Thus the rows with region PAK and China will go to default table.
‘Row can be true for one case only’ option means that if a row satisfies more than one condition, then if this option is selected, the system will allow it to go into one output table only.
Click the back button.
Create template tables and add them as target. Join the case transform to each of these tables.
Validate your transform. Ignore warnings if any.
Run the job and observe the data in each of the target template tables.
Points to Remember:
- Case transform is just for branching of data, it does not have an input and an output schema
- All the tables created as target should be connected to case transform, otherwise it will give an error
- If the option ‘Preserve expression order’ is unchecked, it will allow the system to populate simple expressions first. An example of simple expression is regionId = 1.An example of complex expression is substr(regionId,1,1) = 3.
Change the expression for UK table like below-
Create an ALL template table and add an expression like below-
Add a template table and join it to case transform for the ALL_Table label. Remove the default table as it is optional.
Now execute the job, Make sure that the ‘Preserve expression order’ option is unchecked in the case transform-
The UK table now does not have any entries-
The ALL_Table has the UK entries-
thus the complex expression was skipped and executed in the last. Since the option ‘Row can be TRUE for once case only’ is checked, the UK table is empty.
If the user checks this option, the expressions are executed in order no matter they are simple or complex.
In this case, the UK table will have entries and the ALL table will not have UK entries.
If the option ‘Row can be true for one CASE only’ is not selected, the ALL table will have entries of all the countries.
Observe that the option ‘Preserve expression order’ is only available when ‘Row can be TRUE for one case only’ option is selected.
Refer to the ebook for more details: