On the blog, we’ve talked about the multiple ways in which Big Data and Business Intelligence are reshaping the business world right now, particularly in how they help business owners make better decisions. As industries are increasingly reliant on data, you’ll need to start questioning where the data comes from and how it changes along the way. 

What is Data Lineage?

Data lineage is the term that defines the path taken by data as it goes through transformations over time. Although it may not be an easy task to carry out the process and traceback data lineage, it gives you a clear picture of where a particular piece of data originated, how it merges with other data, and what the data looks like at the end of the chain. An example? The rings of a tree trunk.



Why is data lineage important?

It’s a mistake to brush off the importance of data lineage, and here’s why.

It’s essential to good data governance

For a proper data governance program, the data need to be validated and transparent in terms of changes in attributes. Data with unknown origin or obscure transformation are against the principle of good data governance and affect data quality.

Without knowing the changes that the data went through, you could be making decisions based on partial or missing information. 

Meet legislation requirement

Organizations that operate in certain industries are required by the regulations to keep track and store their data’s origin. For example, the BCBS 239 requires financial bodies to document data acquired as a risk management measure.  

Data quality

Business intelligence analytics tools are as good as the quality of the data received. In large organizations, data often go through multiple nodes, where they are merged, modified, and transformed.

If the origin of the data is questionable, the entire process is deemed unreliable. With proper monitoring of data lineage, you can run through an audit of the data acquisition process and remove low-quality data.

Changing business processes

As businesses cope with growing changes, so do the underlying processes. However, the efficiency of how changes are made depends on the clarity of data lineage. 

For example, if you’re acquiring unexpected leads from a lead acquisition page, the marketing team will need to figure out the origins of these changes and modify the funnel accordingly, whether they are good leads or bad leads. If you’ve mapped out the data lineage, such changes can be planned and executed precisely.

Best practices with data lineage

Managing and inspecting data lineage is an arduous task. That’s because you’ll be dealing with an enormous volume of structured and unstructured data from multiple sources.

For a start, you’ll want to know where the data comes from and where it goes within your organization. Asking these 5 questions will give you a clearer picture:

  • Who is using the data?
  • What does that contain?
  • When is the data created?
  • When is the data being used?
  • Why is the data stored/used?

Understanding how different data relates to different nodes also helps in charting out the data lineage. A set of data captured by the sales team could pass through the finance, delivery, and customer management department. So, by learning how the data is used and transformed at each node you’ll be sure to have the most complete overview of the data process. 

Investing in the right tools helps to monitor data lineage and present them in easily-understood visuals. It saves precious time when the data manager needs to carry out investigations on the data path. 


Learn more about how Axual can help you in managing data lineage here.

Event Streaming for the Energy Industry

Hidden costs & risks of implementing Apache Kafka