SAP Data Services builds momentum with BigQuery
Thanks to its key benefits like low startup costs and fast deployment time, there is no doubt about why Cloud-based analytics like Google BigQuery is rapidly gaining popularity. However, this does not mean that companies will completely abandon their on premise data centers due to security concerns and other factors. For this reason, many companies have chosen the hybrid approach to implement their big data analytics solution which requires bi-directional ETL capabilities to move and transform data among on premise and in the cloud applications. With the upcoming release in November 2015, you can rely on SAP Data Services to do just that.
Support native Google BigQuery data store since DS 4.2 SP4
SAP Data Services features a rich set of out-of-the-box transformations, with over 80 built-in functions including native text data processing, data masking, and data quality features that allows users to prepare only the relevant and trusted information before loading into BigQuery tables. The software supports the JSON data format, thus you can use the same designer UI to create dataflow that defines the process for loading data in flat structure or with nested/repeated fields into BigQuery in a drag-and-drop manner.
For example, using a Data Quality transform to improve and load a JSON document contains multilevel hierarchical-structure data into Google BigQuery can be accomplished with just few simple steps. Below is a diagram to illustrate how to create a dataflow in DS to flatten data, perform required transformations, create hierarchical data as needed and load it into BigQuery for analytics.
Diagram 1 – an example to cleanse and load a multilevel hierarchical data from a JSON document to Google BigQuery
Enhance SQL Transform to read data from BigQuery data store in DS 4.2 SP6
You can write any BigQuery SQL statements such as selections, projections, joins, etc. directly in the SQL Transform for any complex data retrieval operations. By doing it this way, the software will pushdown all BigQuery SQL statements to the database layer. As a result, all queries are executed by the native Big Query analytics engine giving optimal performance even if you are working with complex data from multiple tables that contain deeply nested structures.
For example, suppose you want to find the number of children each person has in personsData.json, you can use SQL transform to aggregate across children and repeated fields within records and nested fields.
After you click the “Update schema” button, Data Services will automatically populate the output schema which obtains column information returned by the select statement.
And you will get the following result:
As you can see below, the results are the same when run through the BigQuery Web UI.