Big Data vs. Big Information – The Importance of Data Hygiene in the Oil and Gas Industry
Oil and gas companies have long stuck to antiquated systems. There is a need for wholesale evolution, innovation, and diversification, but we must look closely at companies that retain workflows untouched by digitalization. Whether reflective of the lack of a tech savvy workforce, aging employees, or corporate disdain, many oil and gas companies have lingered in their transformation into contemporary companies. However, over the past two decades, the industry did begin to understand that data collection was desirable. Yet the disconnect between what is desirable and what is required results in poor collection frameworks and a decided lack of data hygiene – i.e., collective processes that ensure the cleanliness of data which is important for usage versatility. Artificial Intelligence (AI) could be the avenue by which the resources industry evolves, leveraging historical data, building new forms of value, and attracting a new generation of talent. But, in order to make use of the promise of AI, companies must ensure that the data they capture is useful, flexible, and machine-readable.
“Dirty data,” meaning data that is poorly structured, incomplete, rife with duplicates, improperly parsed or simply wrong, is a considerable challenge for organizations of any size. Mike Roberts, VP of AI/ML at Hypergiant, positions it as such, “In our experiences with oil and gas companies, we encounter data that is collected routinely, but rarely accessed and, as a result, may not be subject to the kind of iterative feedback that would ensure its quality. We encounter data collected via traditional means, like paper forms, and stored in a semi-digital state, like scanned images. And we encounter data fragmented across a variety of data storage systems, often lacking keys to tie the records from those various stores together. Digital transformation is more than simply digitizing analog information; it’s creating a complete and coherent data ecosystem that data science can plug into.”
One critical understanding that must become more pervasive in the industry is the difference between data and information. Data is raw, unprocessed, and can appear random, truncated, and lacking in organizing principles. However, information is data that has been processed, structured or presented in a way that incorporates context, thereby making it useful.
Upstream oil and gas companies have been contending with immense data sets and colossal files for years. But one of the unique shortfalls of the industry is the ephemeral nature of collected data – too often it is discarded, ignored or only analyzed in a cursory fashion. Oil and gas do not always view data as a resource with value and instead can be dismissive of its worth; data is often viewed as a physical asset that is largely descriptive. That is one of the dirty secrets of the industry: most data goes unused despite prevailing analyst sentiment that “data is the new oil.” Unknown and unused data – dark data – comprises more than half of all data collected. Lucidworks indicates that 7.5 septillion gigabytes of data are produced every day and only 10 percent of it is utilized significantly.