Quantcast
Channel: SCN : All Content - Data Services and Data Quality
Viewing all articles
Browse latest Browse all 4013

A KISS approach to naming standards in Data Services

$
0
0

A strict naming schema for all DS objects (projects, jobs, workflows, dataflows, datastores, file formats, custom functions) is essential when working in a multi-user environment. The central repository has no folder concept or hierarchy or grouping functionality. The only way to distinguish between objects from one grouping and another one is by name. Most effective approach for naming objects is based on prefixing.

 

Note: In order to display the full names of DS objects in the Designer workspace, increase the icon name length. You do this by selecting Tools --> Options from the Designer menu, then expand Designer and select General. In this window, specify the value of 64 in the “Number of characters in workspace icon name” box.

 

1.png

 

General note: Versioning should not be handled by naming conventions. So, never include a version number in an object name. Use the central repository concept for maintaining successive versions of any object.

 

 

1.  Reusable objects

 

 

ObjectNaming ConventionExample
Project<project_name>BI4B
Job<project_name>_<job_name>BI4B_D
Workflow contained in one job only

<project_name>_<job_name>_[XT|TF|LD|AG…]

<project_name>_<job_name>_[XT|TF|LD|AG…]_<workflow_name>

<project_name>_<job_name>_<workflow_name>

BI4B_D_XT

BI4B_D_LD_Facts

BI4B_D_Init

Workflow that is reused

<project_name>_COMMN_[XT|TF|LD|AG…]_<workflow_name>

COMMN_[XT|TF|LD|AG…]_<workflow_name>

BI4B_COMMN_Init

COMMN_ErrorHandling

Dataflow contained in one job only<project_name>_<job_name>_[XT|TF|LD|AG…]_<dataflow_name>BI4B_D_LD_Opportunities
Dataflow that is reused<project_name>_COMMN_[XT|TF|LD|AG…]_<dataflow_name>

BI4B_COMMN_LD_JobCycles

COMMN_LD_Jobs

Embedded Dataflow<project_name>_<job_name>_[XT|TF|LD|AG…]_<dataflow_name>_EMBBI4B_D_LD_Employees_EMB
ABAP Dataflow<project_name>_<job_name>_XT_<dataflow_name>_ABAPBI4B_D_XT_KNA1_ABAP
Custom Function contained in one job only<project_name>_<function_name>BI4B_getDate
Custom Function that is reusedCOMMN_<function_name>COMMN_dateKey

 

 

 

1.1. Projects: <project_name>


Give every DS project a 5-character short name. The name has to be short, because it will be used as a prefix for the name of all reusable objects defined within the project.

 

E.g.: P2345

 

 

 

1.2. Jobs: <project_name>_<job_name>


Give every job a 5-character short name. Use < project name>_ as a prefix for the job name. The name has to be short, because it will be used as a prefix for the name of all workflows and dataflows defined within that job.

 

E.g.: P2345_J2345

 

 

 

1.3. Workflows: <project_name>_<job_name>_[XT|TF|LD|AG…][_<workflow_name>]


Name every workflow with <project_name>_<job name>_ as a prefix. Use COMMN_ as prefix for shared workflows, used across projects, <project_name>_COMMN_ when used in multiple jobs within a given project.

 

Workflows are often used to group dataflows for serial or parallel execution. In a typical ETL job, dataflows are executed in “stages”: a first set of dataflows have be executed (in parallel) before a next set can be started; and so on. A data warehouse loading job may extract data from the sources, load them into staging, optionally transform from staging-in to staging-out before loading into the core EDW and aggregating into the semantic layer.

 

Distinguish between job stages by extending the prefix with a 2 character code:

 

  • XT: extract from source to staging
  • TF: transform from staging-in to staging-out
  • LD: load from staging into the core EDW layer
  • AG: load (physically aggregate) from core to semantic layer

 

The workflow name will be used as a prefix for the name of all embedded workflows and dataflows.

 

E.g.: P2345_J2345_XT

 

Within a workflow, objects (scripts, sub-workflows, dataflows) must either all be defined in parallel or all sequentially, and will be executed as such. There is no limit to the number of objects within a workflow. When the number of objects is higher than the number of processors available, DS will internally control the execution order of embedded parallel objects. Only when there are fewer objects than the number of processors available, they will really be executed in parallel.

 

Complex hierarchical structures can be defined by nesting workflows. There is no limit to the number of nesting levels, either. With nested workflows, use a name (_Facts for facts extraction or load, _Dims for dimension processing…) combined with an outline numbering scheme (1, 11, 111, 112, 12, 2…).

 

E.g.: P2345_J2345_LD_Dims21

 

Some workflows may not contain dataflows at all; they only contain not reusable objects. In that case, just name the workflow according to its function.

 

E.g. for a workflow embedding an initialization script: P2345_J2345_Initialise


 

1.4. Dataflows

 

DS supports three types of dataflows. The dataflow names must be unique across the different types. To distinguish the embedded and ABAP dataflows from the regular ones, use a suffix in their name.

 

  • Regular dataflows: <project_name>_<job_name>_[XT|TF|LD|AG…]_<dataflow_name>

 

According to design and development best practices there should only be a single target table in a dataflow. Name a dataflow according to that target table.

 

Use <project_name>_<job name>_ as a prefix. Use COMMN_ as prefix for shared dataflows, used across projects, <project_name>_COMMN_ when used in multiple  jobs within a given project. Distinguish between dataflow locations (extract, transform, load, aggregate…) by extending the prefix with a 3 character code (XT_, TF_, LD_, AG_...) as from the embedding workflow.

 

E.g.: P2345_J2345_XT_S_TABLE1, P2345_J2345_LD_TargetTable

 

  • Embedded dataflows: <project_name>_<job_name>_[XT|TF|LD|AG…]_<dataflow_name>_EMB

 

Name every embedded dataflow with <project_name>_<job name>_ as a prefix; use _EMB as a suffix for the dataflow name. Distinguish between dataflow locations (extract, transform, load and aggregate) by extending the prefix with a 3 character code (XT_, TF_, LD_ and AG_).

 

E.g.: P2345_J2345_LD_TargetTable_EMB

  

  • ABAP dataflows: <project_name>_<job_name>_XT_<dataflow_name>_ABAP

 

An ABAP dataflow is always used as a source in a regular dataflow. Reuse that name for the  ABAP dataflow and add _ABAP as a suffix to make it unique.

 

E.g.: P2345_J2345_XT_S_TABLE1_ABAP

 

 

 

1.5. Custom Functions: <project_name>_<function_name>

 

Give every Custom Function a descriptive name. Use <project_name>_<job name>_ as a prefix. Use COMMN_ as prefix for shared custom functions, used across projects.

 

E.g.: P2345_TrimBlanksExt

 

 

 

2.  Datastores: [SAP|BWS|BWT|HANA…]_<datastore_name>


As datastores are often used in multiple projects, they do not follow the same naming conventions as for other reusable projects.

 

Name a datastore in line with its physical name, and make the prefix depend on the object’s type:

 

 

Datastore TypeDatabase TypeNaming ConventionExample
DatabaseSQL Server SQL_SQL_OC4A1
DatabaseOracle ORA_ORA_ITSD
DatabaseTeradata TD_TD_Staging
DatabaseDB2 DB2_DB2_MDM
DatabaseMySQL MySQL_MySQL_GEB
DatabaseSybase ASE ASE_ASE_CRN
DatabaseSybase IQ IQ_IQ_CMDWH
SAP Applications SAP_SAP_MDR
SAP BW as a source BWS_BWS_Achme
BW as a target BWT_BWT_Hana
SAP Hana HANA_HANA_DB
Adapter AS_AS_Nexus
Web Services WS_WS_Weather

 

Note 1: Pay attention when choosing datastore names. A datastore name cannot be changed anymore once the object has been created.

 

Note 2: Landscape-related information should not be handled with datastore names. So, never include a physical system indicator (DEV, T, QA…) in a datastore name. Landscape information should be configured using datastore configurations. Create one datastore, then create a datastore configuration for every tier (development, test, QA, production…) in the landscape.

 

 

 

3.  File formats: <project_name>_<file_format_name>

 

 

Reuse the file name for the format name of a project-specific file. Use < project name>_ as a prefix.

 

E.g.: P2345_ISO_Cntr_Codes

 

Note: Pay attention when choosing a file format names. A file format name cannot be changed anymore once the object has been created.

 

 

 

4.  Not reusable objects

 

Because not reusable objects are only defined within the context of a workflow or a dataflow, no strict naming standards are necessary. Names will only serve documentation purposes.

 

Use meaningful names for workflow objects (Script, Condition, While Loop, Try, Catch).

 

Do not change the transform names unless you want to stress the specific purpose of a Query transform, e.g. Join, Order, OuterJoin…


Viewing all articles
Browse latest Browse all 4013

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>