Does a schema for Data Warehouse Automation metadata remove the need to buy software?

For a few days I have been adding content to the Frequently Asked Questions section regarding the generic schema for Data Warehouse Automation.

A common question, and one that was raised again in today’s Ensemble Modelling conference, is what the benefit is for an organisation to take this approach over purchasing vendor software licenses (‘off the shelf’ solutions). Is adopting a schema and approach such as this at odds with purchasing ETL or Data Warehouse Automation software?

I personally don’t believe so.

There are many considerations on this. First of all, the mechanism to record metadata can -in almost all cases- be used to interface with APIs that ETL platforms usually have. And, not every ETL tool has the same level of support for metadata management, or at all (except of course the ETL object itself).

This is a core principle of the (intent of the) schema, in that it functionally decouples the management and storage of metadata from the application of it (i.e. as a deployed objects in a target platform).

For example, you can manage your metadata in PowerDesigner, export it against the schema and generate Powercenter mappings from it. Or, you can manage the metadata in TEAM, export it against the schema and generate SSIS via Biml.

This, and various other use-cases, have been implemented in addition to the SQL-based output that can be generated by applying a templating engine (such as Handlebars, Pebble or Jinja). Depending on what technology you use, you can also take the templating engine further to generate objects that can be ‘uploaded’ into an ETL platform as well – as has been done for ARM templates for example.

Alternatively, you could also start ‘small’ and move towards a licensed platform later on. For me, I prefer using an ETL tool for it’s connectivity to various systems and use SQL or code from there – at least as a starting assumption.

In all the above cases, there is an opportunity to share know-how on how to best interface with various tooling and platforms in a way that can be reusable.

How good it that?

Could you build your own (enterprise grade) ETL tool like this, perhaps using templating engines only? Yes, definitely, but whether you should depends on various things.

ETL platforms (still) have merit, and the metadata schema may provide a standardised way to code against their APIs to get the benefits these platforms provide.

Does the same apply for Data Warehouse Automation solutions (often a bit of a combination between various components, including design, metadata management and ETL) when it comes to playing nice with this kind of schema definition? Less so, but I believe it still does to a certain extent. I can picture ways to upload the metadata into these tools in some cases, and it may have benefits to work around some of the functionality that is sometimes perceived as (too) rigid.

Of course, the support model decisions comes into play here.

As an organisation, do you invest in (the skills required to) take ownership of the patterns and potentially code or would you rather leave this to vendors. There are obvious pros and cons for this, and much depends on how simple you can keep things, and how satisfied you are with the available functionality.

Another aspect is the degree of lock-in you are comfortable with and the extend of (support for) interoperability between multiples products.

Perhaps there is merit in starting small using templating engines and moving to larger platforms later on, assuming you can upload the schema easily (which I think could be possible). At least, it supports a gradual build-up of know-how.

My personal view, and the focus of this weblog, is to simplify Data Warehousing to the extent that can all it really takes is some configuration of metadata – something I believe will benefit all parties involved (including software vendors).

Although it is only one of the necessary components for a full solution, I think that the schema definition for automation metadata may help defining everything that is relevant (including PII, security, data types etc. etc.), even though not all tools may support all use-cases for it yet.

For now, it’s just a great way to quickly test out new ideas and patterns!

Roelant Vos

Ravos Business Intelligence admin

You may also like...

Leave a Reply