Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
16 views21 pages

Visualisation of The Data - Jupyter Notebook

Uploaded by

Naineni Shiny
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views21 pages

Visualisation of The Data - Jupyter Notebook

Uploaded by

Naineni Shiny
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [1]: import pandas as pd


import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

In [2]: df = pd.read_excel('2001_final.xlsx')

In [3]: df

Out[3]: ad_observation_id depth temperature salinity density ao_wmo_number latitu

0 2900168_29/10/2001 7.1 28.690 36.367 1023.197021 2900168 10.0

1 2900168_29/10/2001 9.4 28.696 36.367 1023.195007 2900168 10.0

2 2900168_29/10/2001 19.2 28.697 36.367 1023.195007 2900168 10.0

3 2900168_29/10/2001 28.9 28.702 36.367 1023.192993 2900168 10.0

4 2900168_29/10/2001 39.8 28.690 36.365 1023.195007 2900168 10.0

... ... ... ... ... ... ...

2224 2900164_31/12/2001 1699.3 4.101 34.866 1027.668945 2900164 5.9

2225 2900164_31/12/2001 1799.4 3.621 34.840 1027.697998 2900164 5.9

2226 2900164_31/12/2001 1898.9 3.269 34.818 1027.714966 2900164 5.9

2227 2900164_31/12/2001 1999.6 2.875 34.798 1027.734985 2900164 5.9

2228 2900164_31/12/2001 2000.5 2.876 34.798 1027.734985 2900164 5.9

2229 rows × 9 columns

localhost:8888/notebooks/Visualisation of the Data.ipynb 1/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [4]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2229 entries, 0 to 2228
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ad_observation_id 2229 non-null object
1 depth 2229 non-null float64
2 temperature 2229 non-null float64
3 salinity 2229 non-null float64
4 density 2229 non-null float64
5 ao_wmo_number 2229 non-null int64
6 latitude 2229 non-null float64
7 longitude 2229 non-null float64
8 date 2229 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(6), int64(1), object(1)
memory usage: 156.9+ KB

In [5]: df1=df

localhost:8888/notebooks/Visualisation of the Data.ipynb 2/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [6]: df1

Out[6]: ad_observation_id depth temperature salinity density ao_wmo_number latitu

0 2900168_29/10/2001 7.1 28.690 36.367 1023.197021 2900168 10.0

1 2900168_29/10/2001 9.4 28.696 36.367 1023.195007 2900168 10.0

2 2900168_29/10/2001 19.2 28.697 36.367 1023.195007 2900168 10.0

3 2900168_29/10/2001 28.9 28.702 36.367 1023.192993 2900168 10.0

4 2900168_29/10/2001 39.8 28.690 36.365 1023.195007 2900168 10.0

... ... ... ... ... ... ...

2224 2900164_31/12/2001 1699.3 4.101 34.866 1027.668945 2900164 5.9

2225 2900164_31/12/2001 1799.4 3.621 34.840 1027.697998 2900164 5.9

2226 2900164_31/12/2001 1898.9 3.269 34.818 1027.714966 2900164 5.9

2227 2900164_31/12/2001 1999.6 2.875 34.798 1027.734985 2900164 5.9

2228 2900164_31/12/2001 2000.5 2.876 34.798 1027.734985 2900164 5.9

2229 rows × 9 columns

localhost:8888/notebooks/Visualisation of the Data.ipynb 3/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [7]: df1=df1.drop_duplicates()
df1

Out[7]: ad_observation_id depth temperature salinity density ao_wmo_number latitu

0 2900168_29/10/2001 7.1 28.690 36.367 1023.197021 2900168 10.0

1 2900168_29/10/2001 9.4 28.696 36.367 1023.195007 2900168 10.0

2 2900168_29/10/2001 19.2 28.697 36.367 1023.195007 2900168 10.0

3 2900168_29/10/2001 28.9 28.702 36.367 1023.192993 2900168 10.0

4 2900168_29/10/2001 39.8 28.690 36.365 1023.195007 2900168 10.0

... ... ... ... ... ... ...

2224 2900164_31/12/2001 1699.3 4.101 34.866 1027.668945 2900164 5.9

2225 2900164_31/12/2001 1799.4 3.621 34.840 1027.697998 2900164 5.9

2226 2900164_31/12/2001 1898.9 3.269 34.818 1027.714966 2900164 5.9

2227 2900164_31/12/2001 1999.6 2.875 34.798 1027.734985 2900164 5.9

2228 2900164_31/12/2001 2000.5 2.876 34.798 1027.734985 2900164 5.9

2229 rows × 9 columns

In [8]: df=df[['date','temperature','salinity']]

localhost:8888/notebooks/Visualisation of the Data.ipynb 4/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [9]: df

Out[9]: date temperature salinity

0 2001-10-29 06:02:32 28.690 36.367

1 2001-10-29 06:02:32 28.696 36.367

2 2001-10-29 06:02:32 28.697 36.367

3 2001-10-29 06:02:32 28.702 36.367

4 2001-10-29 06:02:32 28.690 36.365

... ... ... ...

2224 2001-12-31 00:43:39 4.101 34.866

2225 2001-12-31 00:43:39 3.621 34.840

2226 2001-12-31 00:43:39 3.269 34.818

2227 2001-12-31 00:43:39 2.875 34.798

2228 2001-12-31 00:43:39 2.876 34.798

2229 rows × 3 columns

In [10]: df.index = df.pop('date')


df

Out[10]: temperature salinity

date

2001-10-29 06:02:32 28.690 36.367

2001-10-29 06:02:32 28.696 36.367

2001-10-29 06:02:32 28.697 36.367

2001-10-29 06:02:32 28.702 36.367

2001-10-29 06:02:32 28.690 36.365

... ... ...

2001-12-31 00:43:39 4.101 34.866

2001-12-31 00:43:39 3.621 34.840

2001-12-31 00:43:39 3.269 34.818

2001-12-31 00:43:39 2.875 34.798

2001-12-31 00:43:39 2.876 34.798

2229 rows × 2 columns

In [11]: import seaborn as sns

localhost:8888/notebooks/Visualisation of the Data.ipynb 5/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [12]: sns.pairplot(df1)

Out[12]: <seaborn.axisgrid.PairGrid at 0x2c6cf612280>

In [ ]: ​

In [13]: df1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2229 entries, 0 to 2228
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ad_observation_id 2229 non-null object
1 depth 2229 non-null float64
2 temperature 2229 non-null float64
3 salinity 2229 non-null float64
4 density 2229 non-null float64
5 ao_wmo_number 2229 non-null int64
6 latitude 2229 non-null float64
7 longitude 2229 non-null float64
8 date 2229 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(6), int64(1), object(1)
memory usage: 238.7+ KB

localhost:8888/notebooks/Visualisation of the Data.ipynb 6/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [14]: df1 = df1.drop(columns=['ad_observation_id'])

Variation of Temperature and Salinity Based


on the DEPTH
In [15]: df1 = df1.sort_values(by='depth')

# Line graph
fig, ax1 = plt.subplots(figsize=(10, 6))

# Plotting temperature on the scatter plot
ax1.plot(df1['temperature'], df1['depth'], marker='o', color='black', label
ax1.set_xlabel('Temperature')
ax1.set_ylabel('Depth', color='blue')
ax1.tick_params('y', colors='blue')

# Creating a secondary y-axis for Salinity
ax2 = ax1.twiny()
ax2.plot(df1['salinity'], df1['depth'], marker='x', color='red', label='Sal
ax2.set_xlabel('Salinity', color='red')
ax2.tick_params('x', colors='red')

plt.title('Temperature, Depth, and Salinity Relationship (Line Graph)')
plt.legend()
plt.show()


localhost:8888/notebooks/Visualisation of the Data.ipynb 7/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [16]: df1

Out[16]: depth temperature salinity density ao_wmo_number latitude longitude da

200
1214 5.2 27.520 36.029 1023.327026 2900080 5.023 63.625 12-
04:38:

200
1740 5.3 27.904 36.040 1023.210999 2900080 4.910 63.375 12-
05:08:

200
1844 5.4 28.274 36.081 1023.119995 2900080 4.754 63.104 12-
05:41:

200
1110 5.5 27.749 36.145 1023.340027 2900080 5.139 63.844 12-
05:41:

200
333 6.5 28.999 36.061 1022.864014 2900164 5.286 59.843 11-
22:38:

... ... ... ... ... ... ... ...

200
1038 2007.5 2.735 34.792 1027.743042 2900164 5.685 59.218 12-
22:29:

200
1598 2007.6 2.997 34.800 1027.725952 2900167 6.857 62.363 12-
20:49:

200
686 2008.6 2.829 34.796 1027.738037 2900164 5.556 59.597 11-
22:31:

200
545 2009.9 2.903 34.802 1027.735962 2900168 9.313 60.849 11-
23:02:

200
967 2010.8 2.967 34.799 1027.728027 2900167 7.089 62.598 12-
20:55:

2229 rows × 8 columns

localhost:8888/notebooks/Visualisation of the Data.ipynb 8/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [17]: ​
# Assuming df1 is your DataFrame containing "salinity" and "temperature" co
plt.figure(figsize=(13, 9))

# Scatter plot with colors based on salinity and temperature
scatter = plt.scatter(df1["salinity"], df1["temperature"], s=65, c=df1["sal

plt.xlabel('Salinity', fontsize=25)
plt.ylabel('Temperature', fontsize=25)
plt.title('Salinity vs Temperature', fontsize=25)

# Adding colorbar to show the mapping of colors to salinity values
#cbar = plt.colorbar(scatter)
#cbar.set_label('Salinity', fontsize=20)

plt.show()

localhost:8888/notebooks/Visualisation of the Data.ipynb 9/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [18]: df1

Out[18]: depth temperature salinity density ao_wmo_number latitude longitude da

200
1214 5.2 27.520 36.029 1023.327026 2900080 5.023 63.625 12-
04:38:

200
1740 5.3 27.904 36.040 1023.210999 2900080 4.910 63.375 12-
05:08:

200
1844 5.4 28.274 36.081 1023.119995 2900080 4.754 63.104 12-
05:41:

200
1110 5.5 27.749 36.145 1023.340027 2900080 5.139 63.844 12-
05:41:

200
333 6.5 28.999 36.061 1022.864014 2900164 5.286 59.843 11-
22:38:

... ... ... ... ... ... ... ...

200
1038 2007.5 2.735 34.792 1027.743042 2900164 5.685 59.218 12-
22:29:

200
1598 2007.6 2.997 34.800 1027.725952 2900167 6.857 62.363 12-
20:49:

200
686 2008.6 2.829 34.796 1027.738037 2900164 5.556 59.597 11-
22:31:

200
545 2009.9 2.903 34.802 1027.735962 2900168 9.313 60.849 11-
23:02:

200
967 2010.8 2.967 34.799 1027.728027 2900167 7.089 62.598 12-
20:55:

2229 rows × 8 columns

localhost:8888/notebooks/Visualisation of the Data.ipynb 10/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [19]: import matplotlib.pyplot as plt


import seaborn as sns
from sklearn.linear_model import LinearRegression

# Assuming df1 is your DataFrame
X = df1[['temperature', 'depth']]
Y = df1['salinity']

# Create an instance of LinearRegression
lin_reg5 = LinearRegression()

# Fit the model
lin_reg5.fit(X, Y)

# Make predictions using the fitted model
predictions = lin_reg5.predict(X)

# Plotting the scatter plot
sns.set(font_scale=1)
plt.figure(figsize=(15, 15))

# Scatter plot
plt.scatter(Y, predictions, s=65, label='Actual vs. Predicted Salinity')

# Diagonal line for perfect fit
plt.plot([Y.min(), Y.max()], [Y.min(), Y.max()], '--', color='red', label='

plt.xlabel('Actual Salinity', fontsize=25)
plt.ylabel('Predicted Salinity', fontsize=25)
plt.title('Actual vs. Predicted Salinity in Multi-linear Regression', fonts
plt.legend()
plt.show()

localhost:8888/notebooks/Visualisation of the Data.ipynb 11/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [20]: salt = df1.iloc[:, 2:3].values


salt

Out[20]: array([[36.029],
[36.04 ],
[36.081],
...,
[34.796],
[34.802],
[34.799]])

In [21]: temp = df1.iloc[:, 1:2].values


temp

Out[21]: array([[27.52 ],
[27.904],
[28.274],
...,
[ 2.829],
[ 2.903],
[ 2.967]])

localhost:8888/notebooks/Visualisation of the Data.ipynb 12/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [ ]: ​

In [22]: from sklearn.linear_model import LinearRegression

In [23]: lin_reg=LinearRegression()

In [24]: lin_reg=LinearRegression()
lin_reg.fit(temp,salt)

Out[24]: LinearRegression()

In [25]: sns.set(font_scale=1)
plt.figure(figsize=(15, 15))
plt.scatter(temp,salt,s=65)
plt.plot(temp,lin_reg.predict(temp), color='red', linewidth='2')
plt.xlabel('Temperature',fontsize=25)
plt.ylabel('Salinity',fontsize=25)
plt.title('salinity prediction using temperature',fontsize=25)
plt.show()

localhost:8888/notebooks/Visualisation of the Data.ipynb 13/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [26]: import matplotlib.pyplot as plt


import seaborn as sns
from sklearn.linear_model import LinearRegression

# Assuming df1 is your DataFrame
X = df1[['temperature', 'depth']]
Y = df1['salinity']

# Create an instance of LinearRegression
lin_reg5 = LinearRegression()

# Fit the model
lin_reg5.fit(X, Y)

# Make predictions using the fitted model
predictions = lin_reg5.predict(X)

# Plotting the scatter plot
sns.set(font_scale=1)
plt.figure(figsize=(15, 15))

# Scatter plot
plt.scatter(Y, predictions, s=65, label='Actual vs. Predicted Salinity')

# Diagonal line for perfect fit
plt.plot([Y.min(), Y.max()], [Y.min(), Y.max()], '--', color='red', label='

plt.xlabel('Actual Salinity', fontsize=25)
plt.ylabel('Predicted Salinity', fontsize=25)
plt.title('Actual vs. Predicted Salinity in Multi-linear Regression', fonts
plt.legend()
plt.show()

localhost:8888/notebooks/Visualisation of the Data.ipynb 14/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [27]: import operator

In [28]: plt.scatter(temp,salt, s=65)


sort_axis = operator.itemgetter(0)
sorted_zip = sorted(zip(temp,salt), key=sort_axis)
X_test, y_pred = zip(*sorted_zip)
plt.plot(temp, salt, color='g')
plt.show()

localhost:8888/notebooks/Visualisation of the Data.ipynb 15/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [29]: from sklearn.preprocessing import PolynomialFeatures

In [30]: from sklearn.preprocessing import PolynomialFeatures

In [31]: pol = PolynomialFeatures(degree = 3)


Slt_pol = pol.fit_transform(salt)
pol.fit(Slt_pol, temp)
lin_reg2 = LinearRegression()
lin_reg2.fit(Slt_pol, temp)

Out[31]: LinearRegression()

In [32]: Predict_Tmp_pol = lin_reg2.predict(pol.fit_transform([[33]]))


Predict_Tmp_pol

Out[32]: array([[65.63395657]])

In [33]: pol = PolynomialFeatures(degree = 3)


Slt_pol = pol.fit_transform(salt)
pol.fit(Slt_pol, temp)

lin_reg2 = LinearRegression()
lin_reg2.fit(Slt_pol, temp)

Out[33]: LinearRegression()

In [34]: Predict_Tmp_pol = lin_reg2.predict(pol.fit_transform([[33]]))


Predict_Tmp_pol

Out[34]: array([[65.63395657]])

In [35]: from sklearn.metrics import r2_score



# Assuming you have Polynomial Regression results stored in Tmp_head_pol
Tmp_head_pol = lin_reg2.predict(Slt_pol)

# Initialize degerlendirme as an empty dictionary
degerlendirme = {}

# Calculate R-squared score for Polynomial Regression
polynomial_r2_score = r2_score(temp, Tmp_head_pol)

# Update degerlendirme with the new R-squared score
degerlendirme["Polynomial Regression R_Square Score"] = polynomial_r2_score

# Print or use degerlendirme
print("Polynomial Regression R_Square Score:", degerlendirme["Polynomial Re

Polynomial Regression R_Square Score: 0.7337768474290758

localhost:8888/notebooks/Visualisation of the Data.ipynb 16/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [36]: import numpy as np

In [37]: sns.set(font_scale=2.0)
plt.figure(figsize=(13, 9))
x_grid = np.arange(min(salt), max(salt), 0.1)
x_grid = x_grid.reshape(-1,1)
plt.scatter(salt,temp,s=65)
plt.plot(x_grid,lin_reg2.predict(pol.fit_transform(x_grid)) , color='red',
plt.xlabel('Slt',fontsize=25)
plt.ylabel('Temp',fontsize=25)
plt.title('salt degerlerine gore temp tahmin gosterimi',fontsize=25)
plt.show()

In [38]: x=df.drop(['salinity'],axis=1)
y=df[['salinity']]

In [ ]: ​

In [39]: x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2)

In [40]: from sklearn.tree import DecisionTreeRegressor



dt_reg = DecisionTreeRegressor() # create DecisionTreeReg with sk
dt_reg.fit(x_train,y_train)

Out[40]: DecisionTreeRegressor()

localhost:8888/notebooks/Visualisation of the Data.ipynb 17/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [41]: dt_predict = dt_reg.predict(x_train)

In [ ]: ​

In [42]: ​

# Create Decision Tree Regressor and fit the model
tree_reg = DecisionTreeRegressor()
tree_reg.fit(temp, salt)

# Set seaborn font scale
sns.set(font_scale=2.0)

# Create a new figure
plt.figure(figsize=(13, 9))

# Create a grid for smoother plot
x_grid = np.arange(min(temp), max(temp), 0.1).reshape(-1, 1)

# Scatter plot
plt.scatter(temp, salt, s=65)

# Plot Decision Tree Regression line
plt.plot(x_grid, tree_reg.predict(x_grid), color='red', linewidth=5)

# Set labels and title
plt.xlabel('Temperature', fontsize=25)
plt.ylabel('Salinity', fontsize=25)
plt.title('Salinity Prediction based on Temperature (Decision Tree Regressi

# Show the plot
plt.show()

localhost:8888/notebooks/Visualisation of the Data.ipynb 18/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [43]: rmse = np.sqrt(mean_squared_error(y_train,dt_predict))


r2 = r2_score(y_train,dt_predict)
print("RMSE Score for Test set: " +"{:.2}".format(rmse))
print("R2 Score for Test set: " +"{:.2}".format(r2))

RMSE Score for Test set: 0.042


R2 Score for Test set: 0.99

In [44]: from sklearn.ensemble import RandomForestRegressor



rf_reg = RandomForestRegressor(n_estimators=5, random_state=0)
rf_reg.fit(x_train,y_train)
rf_predict = rf_reg.predict(x_train)
#rf_predict.mean()

C:\Users\shiny\AppData\Local\Temp\ipykernel_22996\2504878549.py:4: DataCo
nversionWarning: A column-vector y was passed when a 1d array was expecte
d. Please change the shape of y to (n_samples,), for example using ravel
().
rf_reg.fit(x_train,y_train)

localhost:8888/notebooks/Visualisation of the Data.ipynb 19/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [45]: ​

# Create Random Forest Regressor and fit the model
forest_reg = RandomForestRegressor(n_estimators=100, random_state=42)
forest_reg.fit(temp, salt)

# Set seaborn font scale
sns.set(font_scale=2.0)

# Create a new figure
plt.figure(figsize=(13, 9))

# Create a grid for smoother plot
x_grid = np.arange(min(temp), max(temp), 0.1).reshape(-1, 1)

# Scatter plot
plt.scatter(temp, salt, s=65)

# Plot Random Forest Regression line
plt.plot(x_grid, forest_reg.predict(x_grid), color='red', linewidth=5)

# Set labels and title
plt.xlabel('Temperature', fontsize=25)
plt.ylabel('Salinity', fontsize=25)
plt.title('Salinity Prediction based on Temperature (Random Forest Regressi

# Show the plot
plt.show()

C:\Users\shiny\AppData\Local\Temp\ipykernel_22996\311321806.py:3: DataCon
versionWarning: A column-vector y was passed when a 1d array was expecte
d. Please change the shape of y to (n_samples,), for example using ravel
().
forest_reg.fit(temp, salt)

localhost:8888/notebooks/Visualisation of the Data.ipynb 20/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [46]: rmse = np.sqrt(mean_squared_error(y_train,rf_predict))


r2 = r2_score(y_train,rf_predict)
print("RMSE Score for Test set: " +"{:.2}".format(rmse))
print("R2 Score for Test set: " +"{:.2}".format(r2))

RMSE Score for Test set: 0.12


R2 Score for Test set: 0.92

In [ ]: ​

In [ ]: ​

In [ ]: ​

localhost:8888/notebooks/Visualisation of the Data.ipynb 21/21

You might also like