For many businesses, business intelligence is the end goal of implementing a data stack. Afterall, business intelligence is the only tangible result of a data stack for most observers. Because of this, much of the upstream processes of ETLing data and transforming data get overlooked because “we can just handle the data in a BI tool.” This type of response not only creates poor architectural decisions throughout the entire data stack, but also likely requires more effort within your BI tool than necessary. On top of this, by performing data transformation in your BI tool, you are locking yourself into that tool, as the work complete will not be transferable if you were to move to another BI tool in the future. To avoid getting locked into a BI tool and to capitalize on data stack modularity, we can use data transformation.
What Is Data Transformation?
Data transformation is the process of preparing and modeling data that support business processes. Unlike traditional definitions of the term, Untitled believes that data transformation is not limited to just changing the format or values of data, but should include modeling, documentation, and even testing of your data. In this post we will explain why data transformation is critical to the success of your modern data stack.
The Reason Why Data Transformation is Critical for BI
The beauty of the modern data stack is twofold. First, there is a lower technical barrier to entry than ever before. This allows less technical people or small, one person shops to leverage their data just like large enterprises around the world. Secondly, modern data stack technologies resemble microservices in that they are focused on solving specific problems really well and are loosely coupled to the technologies around them. The beauty of this is that by using modern data stack tools like this, lock in is prevented, giving the business flexibility to choose which providers should be used.
Nowhere is this principle more relevant than in the data transformation layer. You’ve probably heard of dbt by now, and if you haven’t, you should definitely check it out. Though leveraging a tool like dbt is it’s own microservice focused on data transformation, when building a microservice architecture of modern data stack tools, it can create an issue of not knowing where the job of one tool ends. When it comes to data transformation, modularizing all of the work that builds the foundation of all data modeling is critical.
By modularizing data transformation as a component of the modern data stack, data transformation can operate independently from other modules and is only loosely coupled to them, allowing for all tools and processes in the stack to be easily interchanged, preventing lock-in. So what does this actually look like for data transformation?
To begin building a data transformation module, it is important to first understand the data stack that the data transformation module will be integrated with. This is critical because even the ETL process can have a detrimental impact on business intelligence. This relates back to the difficulty of understanding where the job of one tool ends and another begins. Below are three reasons why data transformation should be a separate data stack module, instead of being performed within a business intelligence tool.
Changing ETL tools
As we’ve discussed, the beauty of the modern data stack is the flexibility granted by microservice oriented solutions that can be easily interchanged without dramatically impacting a data stack. However, if systems in the modern data stack are not modularized and loosely coupled, switching systems becomes much more difficult.
For example, let’s say an organization is switching from using Stitch Data as an ETL tool to Fivetran. While much, if not all, of the same data is going to be made available by both tools, the schemas, tables, and even field names are almost guaranteed to have variation between the tools. If data transformation is completed within a business intelligence tool, this poses a serious challenge.
Not only will dashboards and reports break because the data model is deprecated and field names have changed, but it may be necessary to completely rebuild any dashboards and reports that have previously been created based on how the data is structured from the new ETL tool changeover. By building data transformation as a separate module in the data stack, switching ETL providers can be handled through the updating of one configuration file, if using a tool like dbt. The business intelligence layer never needs to know that the ETL provider changed, and the change itself was handled with relative ease because of the loose coupling handling data transformation in a separate module provides.
By developing a data transformation module, data transformation can be designed so that standardized data models are referenced by business intelligence tools. This is advantageous because it eliminates lock-in in terms of both ETL and business intelligence tool selection. By having a standardized data transformation module, if ETL tools were swapped, the change can be easily reflected through simple configuration of the data transformation module. The same is true for business intelligence tool selection. Because data transformation has been standardized through the use of a separate module, any business intelligence tool can reference any of the data models designed within this module.
Additionally, if all data is transformed within a business intelligence tool, that tool has now added a significant amount of lock-in to your data stack. Because the business intelligence tool is not only being used for reporting and is now being used for data transformation, the ability to use a different tool in the future becomes significantly more complex.
By transforming data in a separate module from your business intelligence module, all tools, processes, and users have access to the same data models. When data is transformed in a business intelligence tool, the way it is transformed is siloed to the business intelligence tool that performed the transformation. By avoiding transforming data in a business intelligence tool, not only are you preventing business intelligence tool lock-in, but you enable the entire data stack and users interacting with the data stack to have access to standardized data models.
For the vast majority of organizations, it is unlikely that there is only one use case for the data processed in a data stack. Because of this, enabling all tools and users to have access to the same layer of standardized data transformation means that these tools and users will all be referencing the same exact data. This should provide confidence, knowing that there is a standardized layer of data being referenced by any process, and not siloed in a business intelligence tool.
By keeping the data transformation module separate from business intelligence tooling, the only effort required when using a business intelligence tool is telling the BI tool how to navigate the data models developed in the data transformation module. As has been discussed, by logically separating these processes into the modules that allow for a loosely coupled, microservice architecture, technological lock-in is prevented, giving organizations massive amounts of flexibility when it comes to capitalizing on their data and modifying their data stacks.
How Can Untitled Help?
Why Data Transformation is Critical for BI
Untitled believes that the tools leveraged within the modern data stack should be agnostic to the technologies that come before or after them. By using the same principles discussed here, Untitled is able to deploy any variation of modern data stack. This gives our clients the ultimate flexibility to easily change technologies in the future, preventing any lock in that can hinder the growth of any organization.