There are few topics that can create as much confusion as the Data Life Cycle flood of contradicting content online. The goal of this post is to help you learn how data actually matures through its existence, and how you can turn this knowledge into processes that enable your data to grow from an infant obligation to a mature asset.
Let’s begin with sharing the most comprehensive visualization of the Information Life Cycle that you can find online:
Next, I will explain the 7 phases in the above picture that your data can evolve through. Let me warn you: Without a management plan and supportive actions, data does not progress automatically through the phases and in many cases never really reaches its full potential. Rather it cripples through this cycle. As a data steward it is your privilege and responsibility to see your data flourish. Here’s how:
Create / Collect
Creation or Collection of data results in the first manifestation of your most precious asset: Raw Data! However, just like every ore that might contain some gold, it takes a lot of refinement before the true value of your data unfolds.
It is important to realize that Data Creation or Collection needs to be a three staged process. At the beginning it takes a PLAN that answers the big W’s (What, Why, Where, When) and outlines the important HOW. After the actual assembly of data, either produced from observations through instruments or collected from an existing source, the data needs to be checked in a stage frequently referred to as quality assurance.
Let me get this straight.
Only data that is the result of an executed plan, that was properly collected or created, and passed a previously defined quality assurance check can be promoted to the next lifecycle phase. Much of the common data-mess found in research labs and enterprises is the result of haphazardly unchecked bit-pollution. If you want to be FAIR to your data (more on this later), you will not allow it to be born in a torn patchwork family, but rather into a well-established household with plans, rules and order. Just as Tolstoi said: “Happy families are all alike; every unhappy family is unhappy in its own way”; it’s the same with your data.
Download our whitepaper on FAIR data.
Description of data is without a doubt the milestone in a data’s life that has the greatest impact on its future career. A mishap here will turn a promising future Nobel Laureate into a misguided social annoyance. Description of data is the process of creating INFORMATION, the combination of some RAW DATA and a set of META DATA that contains insights about what it actually is that was graduated from phase one of the life cycle.
Meta Data, or data about your data, will preserve for the future users of your data all the details that went into the plan, creation, and collection process. It will provide a framework according to which a user can position your data and derive value from it.
This is where we find one major problem with most alternative representations of the “Data Lifecycle”. Once phase 1 and phase 2 are completed successfully, your DATA underwent a metamorphosis and emerged now as a beautiful piece of INFORMATION that can have a butterfly effect on your business. At this stage it makes no longer sense to speak about DATA. This is where the INFORMATION LIFECYCLE begins. Information is what needs to be passed on to the next stage – Information brings power and value.
Ironically, we all have used DATA at one point in our careers. Those were these frustrating moments when someone passed over a file with numbers and statistics that simply made no sense when we looked at them. You might recall desperate attempts to open up related reports or to talk to colleagues and peers to put some meaning to the symbols in front of you. What you were dealing with was the most frequently encountered type of bit-pollution existent in databases and data warehouses today. Unevolved raw data that can only be understood by the creator.
Most of our data science problems originate in the simple problem that people attempt to do science with data. It’s a problem that could be remedied, if they used information instead. A conscientious data steward will make sure that information (data + metadata) is the only object that any productive use will be performed with.
Let’s assume that you are one of those good stewards and your information is ready in phase 3 for usage. What exactly happens now? Well, this is the great thing about information – anything goes at this stage! Your innovation and abilities are the only boundaries to the value that you generate from your information. Whatever it is that your ingenuity decides to do at this stage, the result will always be some different kind of information: Information about customers, information about products, information about nature, maybe even information about your previous information!
Beware, do not use information at this stage to produce data once more. That would be the equivalent of humans giving birth to in-vitro implanted monkeys – so basically an evolutionary step down.
In the previous phase your information has begat more information! True business value and increase of our current knowledge frontier can only happen if your insights are shared. There are many distribution channels that are used to communicate our results, from PowerPoint to Peer-Reviewed-Publications. Sharing is the graduation party of our information, the moment where it can shine and glory in its own importance. For many of myopic minded members of mankind this is the end of the lifecycle.
It’s unfortunate, if a young and promising football player is injured in his first NFL game before he ever experienced the full benefits of a professional football career. Similarly, a piece of information that only shined once and was then discarded in the trash can of history, shortchanged its potential.
In order to let your information grow to its full capacity it has to be delivered to an appropriate first employer, full with great examples, role models, processes, and opportunities to grow and improve. The necessary next phase for information requires the data steward to move it into a long-term domicile.
Archive – a word with terrible connotations. A dusty data grave that will inflict you with deadly curses if you unravel its mummified contents. You might fear that sending information into an archive is damning it to eternal hellfire, however, the contrary is true. A living archive as required by the information lifecycle is true exaltation for information. While information existed before it was separated from the wide universe of its siblings and cousins with the narrow scope of its original use, it is now united with all the information your enterprise or lab has collected in the past. Completely new analytics and integration paths become available when a sufficiently large basis of well harmonized information can be queried.
Sometimes, the term data preservation is employed to render a more positive description of this phase, but while this is not such a dusty term, it only captures part of the archive’s responsibilities. Preservation, or keeping the information alive, is necessary, however not sufficient to make it into the most desired last phase of the lifecycle, the REUSE phase, after which reincarnation/iteration begins.
Reuse is the holy grail of data stewardship. If all your information produces value repeatedly, you will experience exponential growth in all your endeavors. Nevertheless, there is no free lunch. In order to reuse your data, the archive has to perform more basic, but absolutely mandatory tasks. It has to make your information FINDABLE and ACCESSIBLE. And it has to guarantee INTEROPERABILITY. This means the archive keeps the information at all times in a state where search operations can quickly identify it and give a user the ability to retrieve the data. Because of well managed information policies, such as strict adherence to data standards (like Allotrope) and preservation best practices (like OAIS), the archive maintains all information in a state where it can be compared and integrated with all other information.
This illustrates how important an archive is in achieving the goal of truly FAIR data, which you realize now should actually be termed Findable, Accesible, Interoperable and Reusable information.
Not to repeat myself too much, but just to make sure this message hits home: If you consistently fail to reuse your data without enormous efforts, you are not developing your most important company assets in the way they deserve it!
If you manage to treat data kindly, let it grow into information and be consistently FAIR to it, through Information Lifecycle Management compliant software (like the ZONTAL Space platform) you will reap infinite reuse and competitive advantage from your core assets.
Even the best things have to come to an end, and sometimes you might decide it is better for certain information to be made unavailable. To do this, an exit of the always iterating lifecycle loop needs to be triggered from the archiving system. A delete can come in different forms and shapes. It could be the deletion of either data or metadata, or even both (information). Based on your needs (maybe due to regulatory compliance?) an archiving system must provide flexibility.