0% found this document useful (0 votes)

79 views4 pages

NLP - Lab - 1.ipynb - Colab

The document outlines various Natural Language Processing (NLP) techniques using the NLTK library, including tokenization, stopwords removal, part-of-speech tagging, stemming, lemmatization, and word frequency counting. Each technique is demonstrated with example code and outputs. The document serves as a practical guide for implementing these NLP methods in Python.

Uploaded by

Likhit Pvss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views4 pages

NLP - Lab - 1.ipynb - Colab

Uploaded by

Likhit Pvss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

3/3/25, 12:37 PM NLP_Lab_1.

ipynb - Colab

keyboard_arrow_down Tokenization
import nltk
nltk.download('punkt_tab')
from nltk.tokenize import word_tokenize
def tokenize_text(text):
tokens = word_tokenize(text)
return tokens
# Example usage:
text = "This is an example sentence. Tokenization is important in NLP."
tokens = tokenize_text(text)
tokens

[nltk_data] Downloading package punkt_tab to /root/nltk_data...

[nltk_data] Package punkt_tab is already up-to-date!
['This',
'is',
'an',
'example',
'sentence',
'.',
'Tokenization',
'is',
'important',
'in',
'NLP',
'.']

keyboard_arrow_down Stopwords removal

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

nltk.download('stopwords')
nltk.download('punkt')

text = "This is a sample sentence, showing off stop word filtering."

words = word_tokenize(text)
filtered_text = [word for word in words if word.lower() not in stopwords.words('english')]

print(filtered_text) # Output: ['sample', 'sentence', ',', 'showing', 'stop', 'word', 'filt

['sample', 'sentence', ',', 'showing', 'stop', 'word', 'filtering', '.']

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data] Package stopwords is already up-to-date!

https://colab.research.google.com/drive/1ZkyIk18BbWhzTjFZ_358EaB-3VBgogi9?authuser=3#scrollTo=eq6UqK-1k9j4&printMode=true 1/4
3/3/25, 12:37 PM NLP_Lab_1.ipynb - Colab
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Package punkt is already up-to-date!

keyboard_arrow_down POS tagging

import nltk
nltk.download('averaged_perceptron_tagger_eng')

text = "The quick brown fox jumps over the lazy dog."
words = word_tokenize(text)
pos_tags = nltk.pos_tag(words)

filtered_text = [word for word, tag in pos_tags if tag.startswith('NN')] # Keep only nouns
print(filtered_text) # Output: ['fox', 'dog']

[nltk_data] Downloading package averaged_perceptron_tagger_eng to

[nltk_data] /root/nltk_data...
[nltk_data] Unzipping taggers/averaged_perceptron_tagger_eng.zip.
['brown', 'fox', 'dog']

keyboard_arrow_down Stemming
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
import nltk
nltk.download('punkt')
# Initialize the Porter Stemmer
ps = PorterStemmer()
# Example sentence
text = "Running runners run easily and are loving the adventure."
# Tokenize the sentence
words = word_tokenize(text)
# Apply stemming
stemmed_words = [ps.stem(word) for word in words]
print("Stemmed Words:", stemmed_words)

Stemmed Words: ['run', 'runner', 'run', 'easili', 'and', 'are', 'love', 'the', 'adventur
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Package punkt is already up-to-date!

keyboard_arrow_down lemmatization
https://colab.research.google.com/drive/1ZkyIk18BbWhzTjFZ_358EaB-3VBgogi9?authuser=3#scrollTo=eq6UqK-1k9j4&printMode=true 2/4
3/3/25, 12:37 PM NLP_Lab_1.ipynb - Colab

import nltk
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
# Download necessary datasets
nltk.download('wordnet')
nltk.download('punkt')
nltk.download('omw-1.4')
# Initialize lemmatizer
lemmatizer = WordNetLemmatizer()
# Sample text
text = "The leaves are falling from the trees and the wolves are howling."
# Tokenize words
words = word_tokenize(text)
# Apply lemmatization
lemmatized_words = [lemmatizer.lemmatize(word) for word in words]
print("Lemmatized Words:", lemmatized_words)

[nltk_data] Downloading package wordnet to /root/nltk_data...

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
Lemmatized Words: ['The', 'leaf', 'are', 'falling', 'from', 'the', 'tree', 'and', 'the',

keyboard_arrow_down word frequency count

from collections import Counter
import re
# Sample text
text = "Natural Language Processing is amazing! NLP is a subset of AI, and AI is the future.
# Preprocessing: Convert to lowercase and remove punctuation
text = re.sub(r'[^\w\s]', '', text.lower())
# Tokenize words
words = text.split()
# Count word frequency
word_counts = Counter(words)
print("Word Frequency:", word_counts)

Word Frequency: Counter({'is': 3, 'ai': 2, 'natural': 1, 'language': 1, 'processing': 1,

https://colab.research.google.com/drive/1ZkyIk18BbWhzTjFZ_358EaB-3VBgogi9?authuser=3#scrollTo=eq6UqK-1k9j4&printMode=true 3/4
3/3/25, 12:37 PM NLP_Lab_1.ipynb - Colab

https://colab.research.google.com/drive/1ZkyIk18BbWhzTjFZ_358EaB-3VBgogi9?authuser=3#scrollTo=eq6UqK-1k9j4&printMode=true 4/4

NLP Lab Programs
No ratings yet
NLP Lab Programs
3 pages
7 Idf
No ratings yet
7 Idf
5 pages
Cotb46 7
No ratings yet
Cotb46 7
3 pages
IR Assignment1
No ratings yet
IR Assignment1
3 pages
Token Ization
No ratings yet
Token Ization
5 pages
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
No ratings yet
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
4 pages
Week 1
No ratings yet
Week 1
14 pages
Tokenizer
No ratings yet
Tokenizer
4 pages
NLP Practicals All
No ratings yet
NLP Practicals All
57 pages
NLP Lab Work
No ratings yet
NLP Lab Work
34 pages
NLP Exp1
No ratings yet
NLP Exp1
4 pages
Write A Python Program For The Following Preprocessing of Text in NLP: Tokenization Filtration Script Validation Stop Word Removal Stemming
No ratings yet
Write A Python Program For The Following Preprocessing of Text in NLP: Tokenization Filtration Script Validation Stop Word Removal Stemming
2 pages
NLP Lab 1
No ratings yet
NLP Lab 1
1 page
NLP Core Using NLTK: Dr. Muhammad Nouman Durrani
No ratings yet
NLP Core Using NLTK: Dr. Muhammad Nouman Durrani
42 pages
NLP Lab1
No ratings yet
NLP Lab1
2 pages
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
No ratings yet
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
7 pages
NLP Experiment 2
No ratings yet
NLP Experiment 2
5 pages
NLP Tasks for MCA Students
No ratings yet
NLP Tasks for MCA Students
16 pages
NLP Programs
No ratings yet
NLP Programs
5 pages
Lecture 2 Tokenization
No ratings yet
Lecture 2 Tokenization
16 pages
NLP Pratical
No ratings yet
NLP Pratical
14 pages
NLP Applications and Preprocessing
No ratings yet
NLP Applications and Preprocessing
56 pages
NLP-Lab Manual - Ashwini - Kachare
No ratings yet
NLP-Lab Manual - Ashwini - Kachare
41 pages
NLP Techniques for Students
No ratings yet
NLP Techniques for Students
55 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
19 pages
Tokenizations
No ratings yet
Tokenizations
3 pages
Lab Prgms Weel1-Output
No ratings yet
Lab Prgms Weel1-Output
4 pages
Dsbdal A7
No ratings yet
Dsbdal A7
65 pages
NLP Lab File
No ratings yet
NLP Lab File
13 pages
EXP1
No ratings yet
EXP1
4 pages
Lab 2
No ratings yet
Lab 2
4 pages
Text Preprocessing
No ratings yet
Text Preprocessing
3 pages
Exp2 Ananya 66 C NLP
No ratings yet
Exp2 Ananya 66 C NLP
7 pages
Text Preprocessing For NLP
No ratings yet
Text Preprocessing For NLP
15 pages
Lab-1 - Tokenization, Stemming, Stopwords - Jupyter Notebook
No ratings yet
Lab-1 - Tokenization, Stemming, Stopwords - Jupyter Notebook
15 pages
Experiment 2
No ratings yet
Experiment 2
4 pages
NLP 02
No ratings yet
NLP 02
6 pages
Ir 1 Stop Word Removed
No ratings yet
Ir 1 Stop Word Removed
1 page
Natural Langauage Processing (NLP) : Tokenization of Words
No ratings yet
Natural Langauage Processing (NLP) : Tokenization of Words
8 pages
Tokenization in NLP
No ratings yet
Tokenization in NLP
10 pages
Prog 1
No ratings yet
Prog 1
2 pages
NLP Test Questions With Answers
No ratings yet
NLP Test Questions With Answers
3 pages
Pract3 NLP
No ratings yet
Pract3 NLP
2 pages
Natural Language Pre-Processing: Prepared By: Syed Afroz Ali
No ratings yet
Natural Language Pre-Processing: Prepared By: Syed Afroz Ali
81 pages
Wsma Final Manual
No ratings yet
Wsma Final Manual
58 pages
NLP Lab Manual 3-2 Aiml R22 Update
100% (2)
NLP Lab Manual 3-2 Aiml R22 Update
20 pages
For Assignment-10 (Machine Learning With Python - NLP-2)
No ratings yet
For Assignment-10 (Machine Learning With Python - NLP-2)
37 pages
Exp1 NLP
No ratings yet
Exp1 NLP
2 pages
NLP Assignment (917722H031)
No ratings yet
NLP Assignment (917722H031)
18 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
NLP
No ratings yet
NLP
12 pages
NLPPractical
No ratings yet
NLPPractical
12 pages
AM604PC Natural Language Processing LAB R22 AI&ML 3rd Yr 2nd Sem AM604PC Natural Language Processing LAB R22 AI&ML 3rd Yr 2nd Sem
No ratings yet
AM604PC Natural Language Processing LAB R22 AI&ML 3rd Yr 2nd Sem AM604PC Natural Language Processing LAB R22 AI&ML 3rd Yr 2nd Sem
20 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
54 pages
Experiment 2 Manual
No ratings yet
Experiment 2 Manual
6 pages
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
No ratings yet
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
11 pages
Natural Language Processing: Practical 1
No ratings yet
Natural Language Processing: Practical 1
64 pages
Removing Stopwords in NLP
No ratings yet
Removing Stopwords in NLP
32 pages
Module 1 Updated Final
No ratings yet
Module 1 Updated Final
45 pages
Slides For 'Large Language Model: From Theory To Implementations', Chapter 5
No ratings yet
Slides For 'Large Language Model: From Theory To Implementations', Chapter 5
71 pages
Business Benefits of ChatGPT LLMs
No ratings yet
Business Benefits of ChatGPT LLMs
4 pages
Machine Translation - Wikipedia
No ratings yet
Machine Translation - Wikipedia
16 pages
Evaluating of Efficacy Semantic Similarity Methods
No ratings yet
Evaluating of Efficacy Semantic Similarity Methods
8 pages
NLP A
No ratings yet
NLP A
6 pages
Natural Language Processing (NLP) Roadmap
No ratings yet
Natural Language Processing (NLP) Roadmap
6 pages
What Is CL-NLP-1
No ratings yet
What Is CL-NLP-1
12 pages
Recent Advances in Natural Language Processing Via Large Pre-Trained Language Models-A Survey
No ratings yet
Recent Advances in Natural Language Processing Via Large Pre-Trained Language Models-A Survey
40 pages
BERT for NLP Experts
No ratings yet
BERT for NLP Experts
17 pages
Corpus Annotation Tools
No ratings yet
Corpus Annotation Tools
11 pages
Chalkids 2020 Legal Bert
No ratings yet
Chalkids 2020 Legal Bert
7 pages
NLP and Machine Translation Overview
No ratings yet
NLP and Machine Translation Overview
100 pages
A 5
No ratings yet
A 5
1 page
NLTK - N-Gram LM
No ratings yet
NLTK - N-Gram LM
13 pages
Gensim: A Python Library For NLP and Word Embeddings.: Import As Import As From Import From Import From Import
No ratings yet
Gensim: A Python Library For NLP and Word Embeddings.: Import As Import As From Import From Import From Import
4 pages
Comprehensive NLP Practice Assignment
No ratings yet
Comprehensive NLP Practice Assignment
2 pages
Branches of Linguistics
60% (5)
Branches of Linguistics
23 pages
Python and NLP Notes
No ratings yet
Python and NLP Notes
32 pages
Ed 3 Book
No ratings yet
Ed 3 Book
577 pages
1 PB
No ratings yet
1 PB
13 pages
Approaches and Methods in Computational Linguistics
No ratings yet
Approaches and Methods in Computational Linguistics
18 pages
NLP Course Overview for Students
No ratings yet
NLP Course Overview for Students
25 pages
2024.naacl Industry.23
No ratings yet
2024.naacl Industry.23
16 pages
Natural Language Understanding - James Allen PDF
33% (3)
Natural Language Understanding - James Allen PDF
885 pages
Proceedings CLICit 2014
No ratings yet
Proceedings CLICit 2014
404 pages
NLP - Prelim Exam - SE24-25
No ratings yet
NLP - Prelim Exam - SE24-25
2 pages
Cohort 1 - NLP and LLM - LLM Training
No ratings yet
Cohort 1 - NLP and LLM - LLM Training
31 pages
Natural Language Processing
No ratings yet
Natural Language Processing
16 pages
FLAT Unit-1 2M Questions PDF
No ratings yet
FLAT Unit-1 2M Questions PDF
4 pages
XCS224N Module5 Slides
No ratings yet
XCS224N Module5 Slides
80 pages

NLP - Lab - 1.ipynb - Colab

Uploaded by

NLP - Lab - 1.ipynb - Colab

Uploaded by

3/3/25, 12:37 PM NLP_Lab_1.

[nltk_data] Downloading package punkt_tab to /root/nltk_data...

keyboard_arrow_down Stopwords removal

text = "This is a sample sentence, showing off stop word filtering."

print(filtered_text) # Output: ['sample', 'sentence', ',', 'showing', 'stop', 'word', 'filt

['sample', 'sentence', ',', 'showing', 'stop', 'word', 'filtering', '.']

keyboard_arrow_down POS tagging

[nltk_data] Downloading package averaged_perceptron_tagger_eng to

[nltk_data] Downloading package wordnet to /root/nltk_data...

keyboard_arrow_down word frequency count

Word Frequency: Counter({'is': 3, 'ai': 2, 'natural': 1, 'language': 1, 'processing': 1,

You might also like