An In-depth Look at Machine Learning

Over the last decade, many industries have applied Machine Learning to increase efficiency and reduce costs, or to solve previously unsolvable technical problems. For example, Machine Learning is used for image recognition in many fields including autonomous driving. Predictive maintenance for jet engines, wind turbines, pumps, motors, and a variety of industrial equipment is based largely on Machine Learning techniques. Machine Learning techniques have also been extensively used in retail, internet and social networks, and other technology-driven industries. Recently, the oil & gas industry is also making significant investment in leveraging Machine Learning to add business value.

The availability of vast amounts of data, efficient algorithms to process those data, and increased computational power are a few of the key factors that have led to the proliferation of Machine Learning applications. In this article, we will focus on what Machine Learning is, and how it can add value to businesses, especially when combined with domain expertise. To illustrate the application of Machine Learning, a case study on Predictive Maintenance of pumps is presented.

To better understand Machine Learning, we refer to the definition of a well-posed learning problem, given by Tom Mitchell (1988): “A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.” For example, for pump Predictive Maintenance, the “experience” is the suction and discharge pressure data; the “task” is the ability to correctly classify a fault state of the pump; and the “performance measure” is a metric to measure the accuracy of the classifier.

Machine Learning can be broadly classified into “supervised” and “unsupervised” learning. In supervised learning, both the input and output data are given, and the goal is to build a model that correctly predicts outputs. Supervised learning can be of a “regression” or “classification” type. In regression, a continuous function is developed to predict outputs. In classification, the model predicts discrete outputs such as different fault states of a pump or pass/fail states for a system. By comparison, in unsupervised learning, there is no concept of inputs and outputs; instead, the goal is to find useful patterns (groups or structures) in the data. Unsupervised learning is sometimes used to gain a better understanding of the most important variables which are then used in supervised learning.

Machine Learning

Numerous Machine Learning algorithms have been published in the literature. These include gradient descent, regularization, neural networks, support vector machines, collaborative filtering, k-means clustering, and principal component analysis. Most of these algorithms have roots in statistics and mathematical optimization. Machine Learning algorithms are capable of handling a large number of variables (hundreds to thousands), which has allowed engineers to build closed-form input/output relationships for complex systems that would otherwise not be possible. Machine Learning algorithms are also able to leverage large amounts of data (millions of data points per variable). Note, however, that many Machine Learning applications use only a small amount of data.

A key aspect of any Machine Learning project is what is called “feature engineering,” which is essentially identifying which features or variables would provide the most efficient and effective Machine Learning model. For engineering problems, knowledge of the system physics and how the equipment operates (or malfunctions) is invaluable to choose the proper features and build a useful and cost-effective model.  Leveraging domain expertise with machine learning generally provides a more accurate and cheaper model than the brute-force approach of dumping a large set of data into a Machine Learning tool and hoping that the tool would provide the best model.

To illustrate a few Machine Learning concepts, we now describe an example of Predictive Maintenance for a triplex positive-displacement pump. This example has broad applications because similar pumps are used extensively in oil & gas as well as in the other industries. Also, the procedure used would be similar for Predictive Maintenance of any other industrial equipment. Unexpected failure of pumps can cause operational disruptions and the resulting revenue loss can often be significantly greater than the price of the failed component. For pumps operating subsea or downhole, mobilizing the repair itself can be very expensive.

Schematic of Pump and Fault States
Figure 1: Schematic of Pump and Fault States

Figure 1 shows a schematic of the pump and the fault states. The objective in this Predictive Maintenance example is to identify degradation or fault states in a pump before the pump completely fails. Six specific fault states are identified: blocked plunger, worn bearing, leaking valve, blocked plunger with worn bearing, blocked plunger with leaking valve, and leaked valve with worn bearing. Only the suction and discharge pressure time series data, which can be easily collected, are used as input variables. In the example shown, data from a Simulink® model of the pump is used; the pump model is based on that used by Mathworks®. Our full study involved validating the Simulink® model from lab tests of a pump with and without faults, and using both the measured and simulated data; results only from the simulated data are discussed here.

As a first step, 100 sets of time-series suction and discharge pressures are generated for each of the six fault states and the nominal state. Data are randomly separated into a 70% training set and a 30% cross-validation set. The model is fitted to the training set, and the accuracy of the model is evaluated based on the cross-validation set.

Figure 2: Power Spectral Density (PSD) of Discharge Pressure

Statistical and spectral analysis of the pressure time series is performed to identify features that can distinguish different fault states. Figure 2 shows the power spectral density (PSD) of the discharge pressure for all fault states. While different states have different PSD signatures, many states are not easily distinguishable. The problem is worse for the Worn Bearing.

From the physics of pump operation, it is known that small faults in the plunger, bearing, or valve can distort the phasing between the suction and discharge pressures. Such phasing is captured in the cross spectral density (CSD) between these two channels. Figure 3 shows the imaginary component of the CSD; it can be seen that most states exhibit a distinct CSD signature. Based on these observations, frequency and amplitudes (real and imaginary) of the largest 8 peaks from the CSD are used as features in the Machine Learning classifier model. Thus, a total of 24 features and over 700 data sets are used to build the classifier.

Figure 3: Imaginary component of Cross Spectral Density (CSD) of Suction and Discharge Pressures

After trying various algorithms, a three layer feed-forward Neural Network was found to provide the best results. The neural network consisted of one hidden layer with 50 nodes. The output layer has 7 nodes which represent 7 fault states of the pump to be identified. The regularization technique was used to estimate the optimal parameters for the neural network.

Results from the classifier are presented in the so-called “Confusion Matrix” shown in Figure 4. The horizontal axis is the actual class (i.e., fault state) from the data, while the vertical axis is the class predicted by the classifier. If each class is always predicted accurately, the confusion matrix will have entries only along the diagonal cells. Non-zero off-diagonal terms represent erroneous predictions or “confusion” in the model. Note that this confusion matrix is based on the cross-validation data set, which was not used in developing the neural network model. The first observation from this confusion matrix is that whether the pump is good (nominal) or faulty is always predicted with 100% accuracy. Further, most fault states are correctly identified with close to 95% accuracy. Exceptions are Leak P1 with Block P1 and Leak P1 with Worn Bearing (P1 denotes one the three plungers in which faults were induced). Even for these two states, the model predicts the fault state as Block P1 and Worn Bearing respectively, which is one of the true fault states in these combined fault states. The overall accuracy of this model is 92%, which is considered to be good. For comparison purposes, a similar Neural Network model was developed using only the PSD of the discharge pressure; that approach resulted in an overall accuracy of 85% with much lower accuracy for several fault states.

Figure 4
Figure 4: Confusion Matrix showing results of Machine Learning classifier to identify fault states of pump.

This case study on pump Predictive Maintenance illustrates that, by combining the domain expertise (phasing between the suction and discharge pressure) and spectral data analysis with Machine Learning, a very accurate classifier can be built for detection of multiple fault states of a pump. Furthermore, only easily measurable pressure data are used, which avoids the need for complicated sensors.

The benefit of this Machine Learning based Predictive Maintenance is that an unexpected failure can be turned into planned maintenance. By considering only the pressure data, degradation or fault states of pump components can be identified before complete failure and breakdown of the pump. With this type of advance knowledge of the component degradation, operators can plan repairs and maintenance before operations are disrupted.

A similar Machine Learning approach is being used for Predictive Maintenance of a variety of equipment across multiple industries. Moreover, Predictive Maintenance is not the only application of Machine Learning. We have used Machine Learning along with domain expertise for developing intelligent alarms for offshore riser operations, for detecting a broken mooring line, for developing very efficient surrogate models for complex and time-consuming computational analyses, and even for semi-autonomous robotic surgery. SES believes that Machine Learning is a powerful tool that will help us solve engineering problems—using both data and simulations—in a variety of industries and applications.


Puneet Agarwal, PhD, PE –  Senior Associate, Houston Office

Puneet has been a Stress Engineering Services employee for over 10 years, and is a valued member of the Upstream Group.  His core areas of expertise are machine learning, probability & statistics, and digital signal processing. He is also an expert in structural mechanics and dynamics. He has combined his data science and engineering mechanics skills to solve problems in upstream, midstream, and downstream oil & gas industry, and also in the wind energy industry.

Leave a Comment


Contact Us

If you would like more information on Stress Engineering Services, please call us at 281.955.2900, or complete the following form and one of our representatives contact you shortly. For a complete listing of contact information, visit our Locations page.

"*" indicates required fields

This field is for validation purposes and should be left unchanged.