Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
13 views9 pages

Assignment3 Part2

The project analyzes three datasets to build a machine learning model using BERT for classifying patient notes, achieving a mean F1 score of 78%. The model's performance is evaluated based on accuracy and F1 score due to the imbalanced nature of the data, with accuracy reaching 99% but deemed unreliable. The findings highlight the complexity of the data and the challenges faced in model prediction, emphasizing the importance of F1 score for assessment.

Uploaded by

chtwpk5st8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views9 pages

Assignment3 Part2

The project analyzes three datasets to build a machine learning model using BERT for classifying patient notes, achieving a mean F1 score of 78%. The model's performance is evaluated based on accuracy and F1 score due to the imbalanced nature of the data, with accuracy reaching 99% but deemed unreliable. The findings highlight the complexity of the data and the challenges faced in model prediction, emphasizing the importance of F1 score for assessment.

Uploaded by

chtwpk5st8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

2. Present the final results, followed by an explanation of the findings.

Discuss how the


results and findings address the research problem statement of your project. (If your project
has an application development aspect, discuss the functionalities of the application with
illustrative examples in this section)

There are 3 datasets which are considered:

1. The feature dataset:


Shape: (143, 3)

2. The patient_notes dataset:


Shape: (42146, 3)
3. The sample dataset of train.csv data:
(14300, 6)

The above 3 datasets are merged into 1 single data frame (merged_df)
merged_df data is as below:

Visualizing:

Based on the patient notes, the cases are categorized. As the result says that, most of the
patients belongs to case_num=3.
The below figure displaying the count plot of the most frequent case numbers. Based on the
graph, the case_number5 is high, saying most patients are related to this particular case.

The statistical presentation of most commonly used words in the patient notes.
If we observe these words, these are the stop words which will no way contribute for our
model prediction. These words are common English words used in every patient notes.
Hence, we have removed these stop words.
After removing the stop words, the below table chart shows the most common words in patient
notes.

Basic wordcloud plot of the common words in patient notes after removing stop words:

Although there are useful words such as ‘pain’, ‘epigastric’, there are stop words still existing in
the patient notes, even after removing the stop words using the below 2 modules:
from wordcloud import WordCloud, STOPWORDS
from nltk.corpus import stopwords

Hence, the data is very complex and many irregularities were found, which will be hard for the
model to predict. This is one of the drawbacks. The imported modules may not always work for
the complex raw sentences.
Tokenization techniques:

Tokenization is the process of breaking the raw sentence to smaller blocks which are referred
as tokens. This process helps in developing the NLP model. This will result in getting the insights
of text by analyzing the series of tokens. Each token will be assigned numeric values in order to
be considered by the model.

input_ids are the indices for each of the token in the sentence.
attention_mask indicates whether a token should be handled or not.
token_type_ids indicates the sequence to which the token belongs if there are multiple
sequences.

Model building:

Pytorch, an ML framework is used as Pytorch supports various modules to support building of


NLP. Hence, we have implemented below modules for our usecase. After tokenization, we have
used BERT technique to get the final outcome.

import torch.nn as nn
from torch import optim
from torch.utils.data import DataLoader
from torch.utils.data import Dataset

The architecture of the model is defined as below:


Model name: bert-base-uncased
Hyperparameters are:

Dropout Learning rate Optimizer Building Block


0.5 1e-5 AdamW Linear

Loss function used: BCEWithLogitsLoss


Layers:
outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask,
token_type_ids=token_type_ids)
logits = self.fc1(outputs[0])
logits = self.fc2(self.dropout(logits))
logits = self.fc3(self.dropout(logits)).squeeze(-1)
No. of epochs: 3
Batch Size: 10

With use of this, the model has been trained. The below metrics are calculated.
Time taken to build the model: 95 mins.

Mean of F1 score: 75.9

Performance Tuning:

1. Added one more linear layer with input size 300, output size 150

self.fc1 = nn.Linear(768, 512)


self.fc2 = nn.Linear(512, 300)
self.fc3 = nn.Linear(300, 150)
self.fc4 = nn.Linear(150, 1)

Observed values are as below:


Mean F1 score: 73.2%

2. Changed the dropout value to 0.05

Results are as below:

Mean of the metrics:

{'Accuracy': 0.9934958112237656,
'f1': 0.7842475466560781,
'precision': 0.7697296996353311,
'recall': 0.7993235625704622}

Mean F1 score: 78%

Best model:
Considering the 2nd case of performance tuning that is, changing the dropout value to 0.05, we
are getting the better f1 score which is of 78%. This is the highest among the other predictions.

Below are the graph plots of metrics values observed in each epoch.

Observations: Based on the above graphs, the accuracy is observed to be 99% always. In
Classification models, achieveing of accuracy of 99% is not always expected. It will depend on
the data which is being considered.
In this classification problem, accuracy is defined as:

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑 𝑐𝑎𝑠𝑒𝑠


𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑜𝑡𝑎𝑙 𝐶𝑎𝑠𝑒𝑠

Here the model is considering almost each and every case as correctly classified case due to the
imbalanced data.
Reason for considering F1 score:
F1 metric is considered as it is not affected by imbalanced data. The dataset is highly
imbalanced. Having this imbalance, it is hard for the model to predict as per the expectations.
Hence, we are considering F1 score as it takes the input data how it got distributed/extracted
from the features. Considering the right data for the model performance always give the better
assessment.

Therefore, the mean of F1 score is considered for the assessment of model prediction thus
addressing our problem statement.

Since, the data is huge and complex (data in raw format), the model is taking up to 95 minutes
for each run. These lead to the 2 Vs of Big Data that are Volume and Variety.

You might also like