SAP BO DATA Integrator / Data Services

Data services is integrated with SAP BI/SAP R3/SAP Applications and also non SAP databases.

The purpose of data services tool is to do ETL via batch Job and online method through bulk and delta load processing of both structured and unstructured data and to load it to a Warehouse (sap and non-sap)

Data Services is the combination of Data Integrator and Data Quality. Previously these were separate tools – Data Integrator was used to do the ETL part and Data Quality for the data profiling and data cleansing. Now with Data Services both DI and DQ are combined into one interface so that it provides the complete solution (data integration and Quality) under one platform.

This even combines the separate job servers & Repositories of DI and DQ into one.

Data Federator: – The output of the data federator is the virtual data. Federator provides the data as input to the data services and using federator we can project data from multiple sources as a single source.

Data Services Scenarios:-

Source                                         Warehouse

  • SQL        —            DS           —             SQL
  • Flat File   —           DS           —             SQL
  • Flat File   —           DS           —             BI
  • R/3           —           DS           —             BI
  • R/3           —           DS           —             SQL
  • SQL          —           DS           —             BI

We can move the data from any source to any target DB using Data Services.

Data Services is an utility to do ETL process, it is not a warehouse , so it doesn’t stage any data in it.

Data Services can create ETL process and can create a warehouse (SAP / non-SAP) .

DS is used majorly for 3 sort of projects

  • Migration
  • Warehouse or DB building
  • Data Quality

Data Profiling: – Pre-processing of data before the ETL to check the health of the data. By profiling we check the health of the data if it’s good or bad.

Advantages of Data Services over SAP BI/BW ETL process

  • It’s a GUI based frame work
  • It has multiple data sources in-built configuration
  • It has numerous inbuilt Transformations (Integrator, Quality, Platform)
  • It does data profiling activity
  • It easily adds external systems
  • It supports Export Execution Command to load the data in to the warehouse via batch mode process
  • It generates ABAP code automatically
  • It recognizes structured and unstructured data
  • It can generate a warehouse (SAP / non SAP)
  • It supports huge data cleansing/ Consolidation/ Transformation
  • It can do real time data load/ Full data load/ Incremental Data load

Data integrator / Services Architecture

 

1 flowchart.PNG

There is no concept of Process chains/ DTP/ Info packages if you use the data services to load the data.

Data Integrator Components

Designer

wp_bods1.png

  • It is used to create the ETL Process
  • It has wide set of transformations
  • It includes all the artifacts of the project ( Work Flow, Data Flow, Data Store, Tables)
  • It is a gateway to do profiling
  • All the designer objects are reusable

Management Console (URL based tool / Web based tool)

wp_bods2.png

  • It is used to activate the repositories
  • It allows us to activate user profiles to specific environment
  • You can create users and user groups and assign the users to the user groups with privileges
  • It allows to auto schedule or execute the jobs
  • You can execute the jobs from any geographic location as this is a web based tool
  • You can connect the repositories to Connections (Dev/ Qual / Prod)
  • You can customize the data stores in Management Console

Access Server

  • It gets the XML input (real time data)
  • XML inputs can be loaded to the Warehouse using the Access server
  • It is responsible for the execution of online / real time jobs

Repository Manager

wp_bods3.png

  • It allows to create the Repositories (Local, Central, and Profiler)
  • Repositories are created using standard database
  • Data Services system tables are available here

Meta Data Integrator

  • It generates Auto Documentation
  • It generates sample reports and semantic layers
  • It generates job based statistic dash boards

Job Server

This is the server which is responsible to execute the jobs. Without assigning the local / central repository , we cannot execute the job.

Data Integrator Objects

Projects :-

Project is a folder where you store all the related jobs at once place. We can call it as a folder to organize jobs.

Jobs:-

Jobs are the executable part of the Data Services. A job is present under a project. There are two types of jobs:

  1. Batch Job
  2. Online jobs

Work Flows:-

A work flow acts as a folder to contain the related Data Flows. The Work Flows are re-usable. These are optional, i.e. you can execute a job containing a dataflow and no workflow.

Conditionals:-

Conditional contains Work Flows or data flows and these are controlled by scripts. Scripts will decide whether to trigger the conditionals or not.

Scripts:-

Scripts are set of codes used to define or initialize the global variables, control the flow of conditionals or control the flow of execution , to print some statements at the runtime and also to assign specific default values to the variables.

Data Flow:-

The actual data processing happens here. 

Source Data Store:-

It acts as a place to import the data from the database/ sap to data services local repository.

Target Data Store:-

It is the collection of dimensions and fact tables to create the data warehouse.

Transformations:-

These are the query transformations that are used to carry out the ETL process. These are broadly categorized into 3 types(platform, quality and integrator)

File Format :-

It contains various legacy system file formats

Variables:-

You can create and use the local and global variables and use them in the project. The variables name starts with “$” Symbol.

Functions:-

There are numerous inbuilt functions like (String, math, lookup , enrich and so on)

Template Table:-

These are the temporary tables that are used to hold the intermediate data or the final data.

Data Store:-

The data stores acts a port from which you can define the connections to the source or the target systems. You can create multiple configurations in one data store to connect it to the different systems

ATL :-

ATL  files are like the BIAR files. This is named after a company. ATL doesn’t hold any full form like BIAR.

The Project/ Job/ Work Flow/ Data Flow/ Tables can be exported to ATL so that they can be moved between Development to Quality and from Quality to Production.

Similarly you can also import the Project/ Job/ Work Flow/ Data Flow/ Tables which are exported to ATL, back into the data services.