Naveen ViswanathMachine Data and Human Intelligence Reducing Risk in the Petroleum Industry... Naveen ViswanathReducing Risk in the Petroleum Industry Machine Data and Human Intelligence
Trang 1Naveen Viswanath
Machine Data and Human Intelligence Reducing Risk in the Petroleum Industry
Trang 2Data science
Business and industry Big data architecture.
Get the entire collection of 50+ free data reports from O’Reilly
We’ve compiled the best insights from
O’Reilly editors, authors, and speakers
in one place, so you can dive deep into
the latest of what’s happening in data.
FPO barcode back cover
Trang 4Naveen Viswanath
Reducing Risk in the Petroleum Industry
Machine Data and Human Intelligence
Boston Farnham Sebastopol Tokyo Beijing Boston Farnham Sebastopol Tokyo
Beijing
Trang 5[LSI]
Reducing Risk in the Petroleum Industry
by Naveen Viswanath
Copyright © 2016 O’Reilly Media Inc All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use Online editions are also available for most titles (http://safaribooksonline.com) For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com.
Editor: Tim McGovern
Production Editor: Shiny Kalapurakkel
Copyeditor: Gillian McGarvey
Interior Designer: David Futato
Cover Designer: Karen Montgomery
Illustrator: Rebecca Panzer August 2016: First Edition
Revision History for the First Edition
2016-08-11: First Release
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc Reducing Risk in the Petroleum Industry, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc.
While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limi‐ tation responsibility for damages resulting from the use of or reliance on this work Use of the information and instructions contained in this work is at your own risk If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsi‐ bility to ensure that your use thereof complies with such licenses and/or rights.
Trang 6Table of Contents
Reducing Risk in the Petroleum Industry: Machine Data and Human
Intelligence 1
Introduction 1
Operational Risk 2
Long-Term Risk 7
Conclusion 10
Bibliography 11
v
Trang 8Reducing Risk in the Petroleum Industry: Machine Data and
Human Intelligence
Introduction
To the buzzword-weary, Big Data has become the latest in the infin‐
ite series of technologies that “change the world as we know it.” But amidst the hype, there is an epochal shift: the current exponential growth in data is unprecedented and is not showing any signs of slowing down
Compared to the short timelines of technology startups, the long history of the petroleum industry provides stark examples to illus‐ trate this change Seismic research happens early in the exploration and extraction stages In 1990, one square kilometer yielded 300 megabytes of seismic data In 2015, this was 10 petabytes—33 mil‐ lion times more, according to Satyam Priyadarshy, chief data scien‐ tist at Halliburton First principles, intuition, and manual arts are overwhelmed by this volume and variety of data Data-driven mod‐ els, however, can derive immense value from this data flood This report gathers highlights from Strata+Hadoop World conferences that showcase the use of data science to minimize risk in the petro‐ leum industry
In the short term, data can be used to mitigate operational risk Given good data, machine learning can be used to optimize well completion parameters such as the amount and type of proppant used Ben Hamner, chief technology officer at the data science startup, Kaggle, says these are the biggest drivers of well cost and the
1
Trang 9biggest expense when drilling the well They also have a proportion‐ ate impact on how much a well can produce Using completion parameters from machine learning on one well, the gain after costs was $700,000
Priyadarshy shared how pipelining seismic, drilling, and production data can be used for long-term reservoir management Since it can
be expensive to move data from offshore or remote operations, models use the data on site and the results are aggregated with pre‐ viously collected data and models
Oliver Mainka (vice president of product management at SAP), Hamner, and Priyadarshy all agree that the quality of data deter‐ mines the value that can be derived from it Machines are very good
at spotting new patterns in oceans of data The iterative use of human intelligence to clean the input data and validate results based
on experience makes machine data-crunching an effective generator
of value Big or small, using all the available data is justified if it gen‐ erates value
Operational Risk
The spectrum of available data can be used to answer a variety of questions High-quality input data is required for most analyses, and the output data can address different realms, like current opera‐ tional risk and longer-term organizational challenges
Here are some examples of addressing operational risk during dif‐ ferent stages of the upstream process
Exploration
Exploration is an exciting time during which there can be immense payback for making the correct choices The right data and the information that results from this data processing can be valuable tools in the upstream arsenal
Domain expertise on data sources
The oil and gas industry has been a prolific user of data for a long time, as Chevron’s Martin Waterhouse points out—and just as keep‐ ing oil flowing is a complex operation running across continents,
keeping information flowing can be just as much of a challenge Big
oil are large companies, but they are not monoliths They are con‐
2 | Reducing Risk in the Petroleum Industry: Machine Data and Human Intelligence
Trang 10glomerations of organizations which can be considered large com‐ panies on their own The culture of the people, the role data plays, and the time that the data is retained can be very different in each organization It can take years to figure out whom to ask questions, where things are done, and how the company functions Connecting domain expertise with the latest in modeling and predictive analyt‐ ics is as important as implementing those models, but the payoff is worth it
In unconventional production (shale), well production is highly cor‐ related with location Machine learning can help determine where to acquire acreage The input data can come from:
Geology
Core samples are rich and accurate, but also rare and very expensive
Drilling and completion
Amount of proppant and fluid, number of stages, and injection rate
Production
Publicly available in the US; varies by state
Garbage in, garbage out applies here just as much as anywhere else.
Human intelligence is critical for quality control of data Domain experts can tell the difference between a bad sensor measurement and slowed production because of transport issues For good perfor‐ mance, a combination of manual and automated approaches is used
to correct data when possible and reject otherwise Hamner esti‐ mates, 95% of the effort in tackling predictive problems in the industry lies in deeply understanding data sources and how they fit into the business use case A related challenge is how to expose results to key decision-makers
Integrating disparate data sources
A variety of sources can contribute to the data repository This can range from automated high-sample-rate sensors to a human drop‐ ping a rope in a tank every six months They can include audio, video, handwritten notes, and text reports The challenge is to con‐ vert these different sample rates, accuracies, accessibilities, costs, and difficulties into a validated, usable form In a case that (like many others) cuts across both data varieties and domains,
Operational Risk | 3
Trang 11André Karpištšenko and his team at Marinexplore Inc (now Plan‐ etOS) have been working to ease the flow and increase the utility of ocean-related data
In many parts of the world, risk is synonymous with weather The advent of inexpensive, robust drones powered by wave and solar energy has made available data that was once impossible to gather (in the eye of a storm) or too expensive (across the Pacific), which can keep us better informed of upcoming weather This can directly impact planning locations for offshore drilling platforms and ship‐ ping routes for oil tankers
Risk is also equated to uncertainty In the ocean, no two days are the same and attributes like wind, waves, ocean currents, temperature, and pressure vary depending on location and time A prompt, easily accessible system is more valuable than one with long data collec‐ tion and processing times, when delays can render information use‐ less
When data is democratized, the experts are not isolated anymore There are no long timelines to process and visualize data Data streams from sensors, models, and simulations are available to
everyone This can even involve sharing—that often maligned word.
Since many data sources (satellites, models, gliders, buoys) are capital-intensive, Marinexplore started sharing public data as a dem‐ onstration of using existing resources well Now, leading companies are thinking about how to better exchange data Karpištšenko’s aim
is a borderless ocean-data analysis world
Drilling and Production
Over the life of a well, the risk-return equation can be optimized
with predictive maintenance Predictive maintenance, as understood
by data folk, uses predictive analytics to understand causation and correlation with millions or even billions of records as a matter of course, and formulates predictions about machine failure in order to proactively service devices instead of relying on isolated inspections
In a compressor, monitoring oil temperatures and vibrations in real time offers direct cost advantages by maximizing utility (service too soon) and minimizing downtime (service too late) by operating until the desired point on the PF curve (potential failure, functional failure) This, says Mainka, can result in big numbers Even a 0.1% reduction in maintenance costs can translate into millions of dollars
4 | Reducing Risk in the Petroleum Industry: Machine Data and Human Intelligence
Trang 12saved For example, in Europe, maintenance cost is estimated to be
450 billion euros Of this, 300 billion could be addressed by mainte‐ nance improvements and 70 billon is lost due to ineffective mainte‐ nance
The methods chosen for data processing should be able to handle the characteristics of the incoming data Priyadarshy highlights the characteristics of different types of upstream data During seismic studies, the volume of data is very large, but the velocity is slow and the data does not have to be analyzed in real time The value is sig‐ nificant because if you wrongly choose the drilling location of a well,
it could cost you a few hundred million dollars The complementary example is during drilling The volume of data is much smaller com‐ pared to seismic studies, but the velocity is faster, and sometimes you have to analyze the data in real time If predictive models fail, it can be expensive (when a drill bit gets stuck, for example) The value
of real-time data in any particular case is significant but not as high
as well location
Sensors in real time
Sensors are becoming more pervasive, but what companies do with them still varies significantly Mainka offers an example Consider six data sources, producing trillions of records Processing all of them as a matter of course, in real time, is new for 98% of compa‐ nies—even though these are sophisticated companies (Fortune 100, Fortune 500)
Sensor maturity translates to lower cost and improved robustness Petabytes of data are now collected by millions of sensors The chal‐ lenge is how to use this fast enough so that value is not lost due to collection and processing Karpištšenko shares an example from the early life of Marinexplore: once buoy data was collected and ana‐ lyzed, it took a customer three months to make a decision Given that the ocean is highly dynamic, this delay seems to negate the use‐ fulness of the information Marinexplore’s platform can show meas‐ urements from sensors and data from models and simulations (such
as daily sea temperatures) in seconds instead of months or years
Operational Risk | 5
Trang 13Data methods
A few data science methods can be applied verbatim, whereas others require tailoring to suit the petroleum industry While explaining use cases, the speakers offer a glimpse into their instantiation of this world
Asset-intensive industries are especially interested in maximizing asset productivity Mainka describes how either the end user or the manufacturer is involved depending on whether the assets are owned or rented By looking at billions of records, models can create rules and back-calculate possible root causes of failure Anomalies can be either good or bad If good, try to repeat it If bad, try to avoid it Multiple rules can be chained together to classify scenarios
In each case, by monitoring future performance, the system can be iteratively improved When an impending failure is detected, from the perspective of the manufacturer, the next step could be to offer preventative maintenance service for a positive customer experi‐ ence The risk of unscheduled maintenance and associated costs can thus be reduced Organizations that generate the majority of mainte‐ nance work orders from preventative and predictive inspections and use sophisticated reliability-based maintenance procedures and tools to increase asset availability have a 27% lower unplanned downtime without any increase in service and maintenance cost
As with most modeling, machine learning applied to exploration and production can be validated against future performance Hamner lists the following model evaluation strategies as being use‐ ful in picking parameters for deeper study or for selecting between models:
Random cross-validation
Test performance with randomly withheld wells This could be biased when correlation exists between wells
Time-based validation
Use results from existing wells to predict new well performance This can correct for (1) but is harder in newer plays with not as many wells
6 | Reducing Risk in the Petroleum Industry: Machine Data and Human Intelligence
Trang 14Spatial validation
Test performance with held-out geographic areas This corrects for spatial biases and is applicable in newer plays This helps quantify acreage evaluation models
In oil and gas, drilling is based on physics and first principles, with data crunching to generate metrics and evaluate key performance indicators (KPIs) However, using the volumes of data already stored, the goal is to learn, innovate, and move to holistic data-driven analytics in real time Priyadarshy details the three aspects that make now seem like the right time:
Hardware
From a single processor to distributed grid processing
Data
From local files to flexible, nonrelational distributed file systems
Applications
From one machine, one processor to parallel distributed frame‐ works
This confluence of developments has made real-time analytics not
only possible, but the new normal in industry.
Long-Term Risk
Different aspects of long-term risk require unique approaches and solutions Practical matters whose value can be quantified, like res‐ ervoir management, are better understood than institutional ones, like loss of expertise, whose value is more difficult to quantify
Practical
The oil and gas industry was one of the first aggregators of large amounts of data Most of the data challenges in upstream operations revolve around storage Upstream data is expensive to gather, and it isn’t clear at the time what will be useful in the future Because com‐ panies could use it at some future time for some yet-to-be-determined purpose, they store as much as they can Chevron has exabytes of such data, according to Waterhouse The long arc of data analytics in the industry reaches back to the ’80s and ’90s, when Chevron was an early adopter of Cray Supercomputers, used for res‐ ervoir modeling More recently, to maximize production over the
Long-Term Risk | 7