Data Analysis - Well Clustering

version - 0.75

The purpose of this notebook is to do well clustering and start to link the clusters geographilcally and to static data - it needs the field data - quicklook forecast (or final forecast) in order to analyse the full history of the well

It allows to do: - Do well clustering (any property) - visualizing and analysing cluster - Forecast cluster

libraries

[1]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
import inspect
import frflib
from frflib.forecast_methods.forecast_wf import ForecastWF
from importlib import reload
from frflib.data_class import input_data
InputData = input_data.InputData
import time
from frflib.utils.var_computations import sum_dyn_df
from frflib.plots.plotly_ import plot_histogram,plot_time_serie, plot_stacked_series, plot_scatter, plot_regression, pie_plot
from frflib.plots.forecast_plots import plot_group_forecast
from frflib.utils.filter import filter_date, filt_cat

from frflib.plots.plotly_ import timeserie_heatmap
import seaborn as sns

from frflib.utils import pca_wf
import importlib
importlib.reload(pca_wf)

plt.style.use('ggplot')

[2]:

PATH_FRFLIB= os.path.dirname(inspect.getsourcefile(frflib))
PATH_TO_DATA= os.path.join(PATH_FRFLIB,"sample_data")

Data loading

Here are the main parameters to fill for this

[3]:

project_id="01"
wf_name="dataset_01_simple_wf"

WF_PATH=os.path.join(PATH_TO_DATA,f"{wf_name}.hdf")
DATA_PATH=os.path.join(PATH_TO_DATA,f"dataset_{project_id}.hdf")

[4]:

field_data = InputData.load_hdf(DATA_PATH)
df_st = field_data.df_static
wf = ForecastWF.load_from_hdf(WF_PATH, field_data)
field_data._compute_indicators()
wf.enrich_data()

Selection of area of interest

[5]:

field_data = field_data.subset(category='reservoir',filt_value= 'Fluvial' )
wl = field_data.filter_cat(category='reservoir',filt_value= 'Fluvial' )

[6]:

df_full = wf.get_result_summary(include_stat = True).loc[wl]

clustering

choose arguments in order to compute clustering

[7]:

df_enriched = wf.enriched_data.df_dynamic.loc[wl]
x_axis = 'cum_active_days'
k_comp = 3
min_x = 120
n_clusters = 4

computing on oil rate

[8]:

y_var = 'oil_rate_stbd'
df_oil, pca_oil, kmeans_oil, df_class_oil = pca_wf.pca_and_cluster(df_enriched, x_axis, y_var,min_x,k_comp, n_clusters)
principalComponents = pca_oil.fit_transform(df_oil)

[9]:

print(f"There are {df_oil.dropna().shape[0]} left in the dataset and {len(wl) - df_oil.dropna().shape[0]} removed based on min_x value")

There are 102 left in the dataset and 2 removed based on min_x value

computing on bsw

[10]:

y_var = 'bsw_vv'
df_bsw, pca_bsw, kmeans_bsw, df_class_bsw = pca_wf.pca_and_cluster(df_enriched, x_axis, y_var,min_x,k_comp, n_clusters)

[11]:

print(f"There are {df_bsw.dropna().shape[0]} left in the dataset and {len(wl) - df_bsw.dropna().shape[0]} removed based on min_x value")

There are 102 left in the dataset and 2 removed based on min_x value

Step 1

Plot oil_rate_stbd for all the wells according to the active_days

[12]:

_ = plt.plot(df_oil.T)

../_images/notebooks_02_data_analysis_well_clustering_22_0.png

Step 2

After computing the PCA, we make a scatter plot of the 2 first components for all the wells.

[13]:

principalComponents = pca_oil.fit_transform(df_oil)
ax = sns.scatterplot(x = principalComponents[:,0], y= principalComponents[:,1],alpha=0.3, color='black')

../_images/notebooks_02_data_analysis_well_clustering_24_0.png

Step 3

Then we display the same scatter plot with the corresponding clusters. We also display the centroids of the clusters.

[14]:

i=0
j=1
colorslist = ["green", "yellow", "orange", "red"]
color_list = sns.color_palette(colorslist, n_colors=len(kmeans_oil.cluster_centers_[:,0]))


n_colors = len(kmeans_oil.cluster_centers_[:,i])
ax = sns.scatterplot(x = principalComponents[:,i], y= principalComponents[:,j],
                     hue=kmeans_oil.ordered_labels_, alpha=0.3, palette = color_list)

sns.scatterplot(x = kmeans_oil.cluster_centers_[:,i], y= kmeans_oil.cluster_centers_[:,j],
                hue=kmeans_oil.label_rank,ax=ax, legend = False, palette=color_list)

[14]:

<Axes: >

../_images/notebooks_02_data_analysis_well_clustering_26_1.png

Step 4

We finally plot the centroid’s curve of each cluster, in addition to the well’s curve production.

[15]:

pca_wf.plot_wells(df_oil, kmeans_oil, pca_oil, inverted = False)

../_images/notebooks_02_data_analysis_well_clustering_28_0.png

Plot other clusters bsw

[16]:

pca_wf.plot_wells(df_bsw, kmeans_bsw, pca_bsw, inverted=True)

../_images/notebooks_02_data_analysis_well_clustering_30_0.png

Merge and Export

renaming the cluster based on cumulative production

[17]:

df_class_oil.columns = ['group_oil_num']
df_class_bsw.columns = ['group_bsw_num']

df_class_oil['group_oil']=df_class_oil['group_oil_num'].map({4:'excellent',3: 'good', 2:'average',1: 'bad'})
df_class_bsw['group_bsw']=df_class_bsw['group_bsw_num'].map({1:'excellent',2: 'good', 3:'average',4: 'bad'})

Confusion Matrix

[18]:

confusion_matrix = pd.crosstab(df_class_oil['group_oil'], df_class_bsw['group_bsw'], rownames=['group_oil'], colnames=['group_bsw'])
confusion_matrix = confusion_matrix.loc[['bad','average','good','excellent'],['bad','average','good','excellent']]

[19]:

import seaborn as sns
sns.heatmap(confusion_matrix.iloc[::-1,:], annot=True, cmap="YlGnBu", cbar=False)

[19]:

<Axes: xlabel='group_bsw', ylabel='group_oil'>

../_images/notebooks_02_data_analysis_well_clustering_36_1.png