RiskTech Forum

SAS: Governing Data Acquisition

Posted: 26 January 2017  |  Author: David Loshin  |  Source: SAS

As the application stack supporting big data has matured, it has demonstrated the feasibility of ingesting, persisting and analyzing potentially massive data sets that originate both within and outside of conventional enterprise boundaries. But what does this mean from a data governance perspective?

governingMost aspects of data governance for internally created data sets are not significantly impacted. Because the processes are managed internally, the definition and implementation of policies governing these data sets should be directly integrated into the development life cycle. But data sets that are acquired from outside the organization’s boundaries pose challenges when there is little or no metadata or information about their provenance.

As interest in exploiting big data analytics explodes, it's valuable to establish good practices early on that will prevent rampant and uncontrolled data downloads and ingestion. Not doing so will create a risk of conflicting interpretation of data semantics – and that can lead to undesired analytical outcomes.

The importance of process

We must consider establishing procedures for introducing new data sources and artifacts into the environment that will simplify integration efforts while harmonizing inferred semantics and interpretations to ensure consistent use. An overall process would analyze ingested data sets to determine the best ways to align the content in a managed and governed internal information architecture.

An example process would embrace these types of procedures:

     - Statistically analyze the values in each column.
     - Do type inferencing (i.e., attempt to infer whether the values are character strings, numeric values,
       dates, etc.).
     - Match column names (if provided) to known data types.
     - Identify columns and reference domains that are undetermined so they can be investigated further.

These are just a few ideas for procedures to follow when instituting a governed process for data acquisition. By formally developing data acquisition policies, you can ensure consistency in transformation as well as in use.