Introduction
Data Quality Management (DQM) was initially developed by Inner Logix in 2003 and is an adaptation of the Six Sigma methodology to data quality management. The Six Sigma methodology, initially implemented for manufacturing by Motorola in the 1980s, is recognized as the foremost methodology for improving customer satisfaction and business processes. This document outlines the steps in how industry members have implemented and executed Data Quality Management within the upstream oil and gas industry with a key focus on well related data such as wells, wellbores, deviation surveys, log data, marker picks, perforations, completions, production/injection data, core data, and more.
Faster Data Verification Time
Data users know that it is their responsibility to verify that they are working with the correct data. The time a data user spends on verifying and correcting the data is referred to as Data Verification Time (DVT). Industry has dramatically reduced the Data Verification Time by implementing Data Quality Management. There is a strong relationship between Data Verification Time and the quality of the data. This relationship is shown below
Root Causes of Increased Data Verification
Industry members determined key data defects that had significant impacts on the time it took to perform the Data Verification Time.
These data defects included:
Inconsistent Data: When data users identified multiple versions of the truth, they had to determine which “value” is the correct value.
Missing Data: There are two distinct types of this problem:
Missing wells: If the user determined that a well was missing, then the complete well data had to be located and loaded into the project.
Missing data: If the user determined that key data, required for the workflow, was missing, then the missing data needed to be located and updated in the project.
Invalid Data: When data errors were found, the data user would have to determine the correct values and make the corrections.
Out-of-date Data: Some data would change overtime such as production data, perforations, completions, ownership, and status. Some of these were easy to spot, i.e. missing production data for the last four months, others were harder such as when the wellbore was recompleted, but the new completion data was not present.
A key driver of implementing Data Quality Management (DQM) was to reduce the number of data errors, i.e. data defects, that were being discovered during the data verification processes, thereby reducing the Data Verification Time.
Implementing DQM
DQM follows the DMAIC process of Six Sigma, which consists of the following five steps:
Define the problem.
Measure the size of the problem.
Analyze the root cause of the problem.
Implement improvements.
Control the process to ensure that the improvements continue.
The Define Step
This step identified key data defects that were causing increased Data Verification Time. These problems were identified by working with data users, thereby creating a list of friction points. For each friction point an estimate was obtained regarding the impact it had on the Data Verification Time, i.e. the manhours per occurrence of a data defect.
The Measure Step
For each friction point a set of measuring rules were identified and executed against the data to detect the number of data defects, thereby enabling the measurement of data quality.
The Analysis Step
Defect analysis was performed to identify:
Cost of the problem: This cost of the problem was obtained by multiplying the defects for each friction point by the estimated impact in man hours.
Root causes: A root cause analysis was performed to better understand underlying factors involved in producing the data defects.
Examples of root causes found were:
Data was not moved between data sources in a timely manner.
Data was loaded incorrectly into data sources.
Data corrections were not propagated across all data sources.
Incorrect cartographic reference systems.
Incorrect unit systems.
Incorrect data identifiers.
The Improvement Step
The improvement step was where action was taken to mitigate the data defects. This consisted of three key parts:
Automatic data corrections using a set of data correction rules.
Modifications to existing processes to prevent data defects from occurring.
Manually correcting data defects was implemented as a final solution for rare cases in which other solutions were not possible.
The Control Step
The key objective of this step was to ensure that the improvements to the processes continued to be performed. The measure step allowed measuring the number of data defects, which allowed understanding how steps taken in the improvement step affected the data quality overtime. It was expected that:
The observed data quality should increase over time.
The data users would experience a decrease in the Data Verification Time.
If either of these two requirements were not met, then each step of the DMAIC process was re-evaluated.
The Initial Implementation
When companies started to implement DQM it initially appeared overwhelming. They asked themselves “Where do we start?”, “What do we measure?”. Talking with data users, they had a list of problems a mile long, and wondered, “What friction points to we prioritize?”.
To implement DQM and overcome these initial overwhelming thoughts, the companies started with obtaining a senior management sponsorship and following a structured approach with stakeholder engagement.
Obtain Senior Management Sponsor
Implementing Data Quality Management (DQM) typically involved a coordinated effort between multiple dependent departments. Because of the cross-department dependencies, it was important to have a senior manager that sponsored the DQM initiative. Senior management sponsors were able to align and coordinate the multiple departments towards the same goal and ensure a better outcome and cooperation.
Stakeholder Engagement
The stakeholders were identified as three groups, the data users, data managers, and the senior management. After identifying the stakeholders, they set out to ensure that all stakeholders understood the Six Sigma principles and the DMAIC process. If the stakeholders were unfamiliar with Six Sigma or the DMAIC process, they hosted a training session to ensure all stakeholders were educated on the methodologies.
They split the DMAIC process into parts, referring to each part as an Epic. The execution of each Epic was designed to be a manageable duration of 6-10 weeks and consisted of key meetings that were held with the stakeholders:
Define Meeting: During this meeting the stakeholders identified the area of interest, data sources, and data types that were to be included in the Epic. Data types were selected in order of hierarchy starting with root data types such as wells, wellbores and deviation surveys. Future Epics would include lower-level data types such log curves, marker picks, and completions. For each data type, they identified a list of friction points. For each friction point, they estimated the “cost” of the friction point, i.e. the number of manhours spent addressing each data defect. The meeting concluded with the selection of the top 10 friction points that were to be addressed in the Epic. This created a scope for the measurement step such that the data defects to be measured for calculating data quality were restricted to only friction points included in the Epic.
Measure and Improvement Meeting: Stakeholders were presented with the initial measurement results of the data quality within the scope of the Epic. After which discussions were held to identify improvement strategies.
Epic Summary Meeting: After the improvement strategy had been implemented, the stakeholders were presented with the data quality measurements overtime. If the data quality had improved, then the epic was considered successful, and they then moved onto the next Epic.
Control Meeting: Stakeholders continued to meet to review and ensure that the data quality improvements were maintained for each Epic.
Conclusion
The Data Quality Management (DQM) framework presented in this paper is a systematic and iterative approach to identify, prioritize, measure, and reduce data defects. By applying this framework, companies in the upstream oil and gas industry made significant improvements to their data quality resulting in significant reductions in the Data Verification Time of up to 80%.
Commenti