Survival Analysis in PetroVisor

Learn how to create, evaluate and interpret Survival Analysis Models in PetroVisor

Introduction

The Cox proportional hazards model is a commonly used statistical technique for analyzing the survival times of individuals or, in the case of hydraulic systems, the failure times of components such as centrifugal pumps. It is particularly valuable when assessing the relationship between the survival time of a subject and one or more predictor variables (PV Signals). These signals should encompass any relevant operational or environmental factors that could impact the pump's lifespan, such as pump speed, pressure, and temperature.

Creating Models

To create a model in PetroVisor, navigate to the ML module and create a new model of type Survival Analysis.

Needed Inputs:

  • Features: PetroVisor is capable of accepting a wide range of predictor variables as input features, which play a crucial role in providing valuable information for conducting survival analysis.
  • Time To Events: This data represents the time duration until the occurrence of an event (i.e. the survival time). This critical information is fundamental for accurately modeling survival analysis.
  • Events: The presence or absence of events (such as failures, incidents, or other specified outcomes) is essential for the outcome of the survival analysis. Analyzing event data helps in grasping the patterns of events occurring over time.

PetroVisor utilizes the Cox proportional hazards model to forecast the hazard function at precise time points or across various time intervals. The model's predictions provide valuable insights into the probability of events occurring at different time instances based on the input features and coefficients provided.

Evaluating Models - Explainability and Feature Importance

After training, the models can be evaluated using the following metrics:

  • Coefficients: These represent the estimated impact of each signal on the analysis. For instance, if 'signal' refers to motor frequency, a coefficient of 0.123 indicates that a one-unit increase in motor frequency results in a corresponding 0.123 increase in the log hazard.
  • Exp(Coefficients): The exponential of the coefficient, known as the hazard ratio (HR), illustrates the proportional change in the hazard for a single-unit increase in the corresponding predictor variable. For instance, if the exp(coef) value for 'motor frequency' is 1.131, it suggests that, on average, the hazard increases by around 13.1% for every one-unit rise in 'motor frequency'.
  • (LogLik) Log-Likelihood: The log-likelihood is a measure of how well the model describes the observed data. Higher values indicate a better fit, showing that the model aligns well with the actual outcomes. In survival analysis, comparing changes in the log-likelihood helps in evaluating different models. The likelihood ratio test, which is based on the log-likelihood, is utilized to determine whether adding or removing variables significantly enhances the model's performance.
  • Z Value: The z-score is computed by dividing the coefficient by its standard error.It is a standardized measure with a mean of zero and indicates the significance of the coefficient. Higher absolute z-scores indicate a stronger impact of the coefficient on the analysis.
  • P Value: The p-value linked to the z-statistic indicates the statistical significance of the coefficient. Typically, a threshold of 0.05 is commonly used; if the p-value is less than 0.05, the coefficient is generally deemed significant.

Interpreting Results

Hazard Ratios (HR) determine how the likelihood of an event, like hydraulic system failure, is impacted by specific signals being analyzed.
  • HR greater than 1 signifies a higher likelihood of the event occurring as the variable increases. For example, in the context of hydraulic systems, if the hazard ratio (HR) for pump speed is 1.5, this means that for every unit increase in pump speed, the risk of pump failure increases by 50%.
  • HR less than 1 indicates a reduced likelihood of the event occurring as the variable increases. For instance, if the hazard ratio (HR) for pressure is 0.8, it implies that with each unit increase in pressure, the risk of pump failure decreases by 20%.