Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
20 views2 pages

Bert Tokenizer

The document outlines a process for importing libraries and loading a pretrained BERT model to generate embeddings for a sample dataset of product reviews. It includes a function to obtain BERT embeddings and demonstrates how to compute and print these embeddings for specific phrases. The embeddings illustrate how the word 'love' is represented in different contexts within the reviews.

Uploaded by

gauthamsivathan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views2 pages

Bert Tokenizer

The document outlines a process for importing libraries and loading a pretrained BERT model to generate embeddings for a sample dataset of product reviews. It includes a function to obtain BERT embeddings and demonstrates how to compute and print these embeddings for specific phrases. The embeddings illustrate how the word 'love' is represented in different contexts within the reviews.

Uploaded by

gauthamsivathan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

bltrpmcwl

March 9, 2025

0.1 Importing Required Libraries:


[5]: from transformers import BertTokenizer, BertModel
import torch

0.2 Sample Dataset


[6]: reviews = [
"I love this product amazing quality",
"Terrible product poor quality",
"I love the amazing service"
]

0.3 Importing and Loading Pretrained BERT Model


[7]: tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

0.4 Function to Get BERT Embeddings


[8]: def get_bert_embedding(text):
inputs = tokenizer(text, return_tensors="pt", padding=True,␣
↪truncation=True, max_length=512)

with torch.no_grad():
outputs = model(**inputs)
return outputs.last_hidden_state[:, 0, :].squeeze().numpy()

0.5 Generating BERT Embeddings for Reviews


[9]: bert_embeddings = [get_bert_embedding(review) for review in reviews]

1
0.6 Bert Tokenizer Output
[10]: print("\nBERT Embedding Review 1 (first 3 dims):", bert_embeddings[0][:3], "...
↪")

print("BERT Embedding 'love' in context (Review 1):", get_bert_embedding("I␣


↪love this")[:3], "...")

print("BERT Embedding 'love' in context (Review 3):", get_bert_embedding("I␣


↪love the")[:3], "...")

BERT Embedding Review 1 (first 3 dims): [0.19232687 0.19475281 0.09595714] …


BERT Embedding 'love' in context (Review 1): [-0.07139711 0.1853717 0.0144626
] …
BERT Embedding 'love' in context (Review 3): [ 0.08918754 0.08665477
-0.01285441] …
Computes contextual embeddings for “I love this” and “I love the” to see how BERT represents
“love” in different contexts.

You might also like