0% found this document useful (0 votes)

412 views17 pages

Python NLP Tasks with NLTK

The document discusses various natural language processing tasks in Python including tokenization, stop word removal, stemming, part-of-speech tagging, morphological analysis, n-gram generation, and n-gram smoothing. Code examples with outputs are provided to demonstrate how to perform tokenization, stop word removal, stemming with PorterStemmer, word sense disambiguation with Lesk, part-of-speech tagging, morphological analysis with NLTK, n-gram generation with NLTK ngrams library, and n-gram smoothing.

Uploaded by

coding ak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

412 views17 pages

Python NLP Tasks with NLTK

Uploaded by

coding ak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 17

1.

Write a Python Program to perform following tasks on text a)

Tokenization b) Stop word Removal
import nltk

from nltk.tokenize import word_tokenize

from nltk.corpus import stopwords

def preprocess_text(text):

# Tokenization

tokens = word_tokenize(text)

# Removing stop words

stop_words = set(stopwords.words('english'))

filtered_tokens = [word for word in tokens if word.lower() not in stop_words]

return filtered_tokens

def main():

text = "NLTK is a leading platform for building Python programs to work with
human language data."

preprocessed_text = preprocess_text(text)

print("Original Text:")

print(text)

print("\nTokenized Text:")

print(preprocessed_text)
if __name__ == "__main__":

main()

Output:-

Original Text:

NLTK is a leading platform for building Python programs to work with

human language data.

Tokenized Text:

['NLTK', 'leading', 'platform', 'building', 'Python', 'programs', 'work', 'human',

'language', 'data', '.']

2. Write a Python program to implement Porter stemmer algorithm for

stemming
import nltk

from nltk.tokenize import word_tokenize

from nltk.corpus import stopwords

from nltk.stem import PorterStemmer

def preprocess_text(text):

# Tokenization

tokens = word_tokenize(text)

# Removing stop words

stop_words = set(stopwords.words('english'))
filtered_tokens = [word for word in tokens if word.lower() not in
stop_words]

return filtered_tokens

def apply_stemming(tokens):

porter = PorterStemmer()

stemmed_tokens = [porter.stem(token) for token in tokens]

return stemmed_tokens

def main():

text = "NLTK is a leading platform for building Python programs to

work with human language data."

preprocessed_text = preprocess_text(text)

stemmed_text = apply_stemming(preprocessed_text)

print("Original Text:")

print(text)

print("\nTokenized Text:")

print(preprocessed_text)

print("\nStemmed Text:")

print(stemmed_text)

if __name__ == "__main__":

main()
Output:-
Original Text:

NLTK is a leading platform for building Python programs to work with human
language data.

Tokenized Text:

['NLTK', 'leading', 'platform', 'building', 'Python', 'programs', 'work', 'human',

'language', 'data', '.']

Stemmed Text:

['nltk', 'lead', 'platform', 'build', 'python', 'program', 'work', 'human', 'languag', 'data',
'.']

3. Write Python Program for a) Word Analysis b) Word Generation

with output.
import nltk

from nltk.corpus import brown

def word_analysis():

# Load the Brown corpus

nltk.download('brown')

words = brown.words()

# Calculate word frequency

freq_dist = nltk.FreqDist(words)
# Print 10 most common words

print("10 Most Common Words:")

print(freq_dist.most_common(10))

def word_generation():

# Load the Brown corpus

nltk.download('brown')

words = brown.words()

# Generate words using bigrams

bigrams = nltk.bigrams(words)

word_dict = {}

for w1, w2 in bigrams:

if w1 not in word_dict:

word_dict[w1] = []

word_dict[w1].append(w2)

# Generate a sentence

import random

sentence = []

current_word = random.choice(list(word_dict.keys()))

sentence.append(current_word)

for _ in range(10):

next_word = random.choice(word_dict[current_word])

sentence.append(next_word)
current_word = next_word

# Print the generated sentence

print("\nGenerated Sentence:")

print(' '.join(sentence))

def main():

print("Word Analysis:")

word_analysis()

print("\nWord Generation:")

word_generation()

if __name__ == "__main__":

main()

Output:-

Word Analysis:

10 Most Common Words:

[('the', 62713), (',', 58334), ('.', 49346), ('of', 36080), ('and', 27915), ('to', 25732), ('a',
21881), ('in', 19536), ('that', 10237), ('is', 10011)]

Word Generation:

Generated Sentence:

combination of radiologist in their own issues for financing their ability to create a
different thing . And in contrast to learn to the games where you have been
4. Create a Sample list for at least 5 words with ambiguous sense and
Write a Python program to implement WSD
from nltk.wsd import lesk

from nltk.tokenize import word_tokenize

def wsd(sample_sentences):

for sentence in sample_sentences:

words = word_tokenize(sentence)

for word in words:

synset = lesk(words, word)

if synset is not None:

print("Word:", word)

print("Definition:", synset.definition())

print("Example:", synset.examples())

print("-------------------------------------------------")

def main():

sample_sentences = [

"The bank can guarantee deposits will eventually cover future tuition costs
because it invests in adjustable-rate mortgage securities.",

"I went to the bank to deposit my money.",

"The bark of the tree was rough.",

"I heard a loud bark from the dog.",

"I need to address the issue with the address provided."

]
wsd(sample_sentences)

if __name__ == "__main__":

main()

Output:-

Word: bank

Definition: a financial institution that accepts deposits and channels the

money into lending activities

Example: ['he cashed a check at the bank', 'that bank holds the mortgage
on my home']

-------------------------------------------------

Word: bank

Definition: a financial institution where money is kept for saving or

commercial purposes or is invested, supplied for loans, or exchanged.

Example: ['he cashed a check at the bank', 'that bank holds the mortgage
on my home']

-------------------------------------------------

Word: bark

Definition: the sound made by a dog

Example: ['the dog's barking kept me awake all night']

-------------------------------------------------

Word: bark

Definition: tough protective covering of the woody stems and roots of trees
and other woody plants

Example: ['it was stripped of bark']

-------------------------------------------------
Word: address

Definition: the place where a person or organization can be found or

communicated with

Example: ['he didn't leave an address', 'my address is 123 Main Street']

-------------------------------------------------

Word: address

Definition: give a speech to

Example: ['The chairman addressed the board of trustees']

-------------------------------------------------

5. Install NLTK tool kit and perform stemming

import nltk

nltk.download('punkt')

nltk.download('stopwords')

nltk.download('wordnet')

from nltk.stem import PorterStemmer

from nltk.tokenize import word_tokenize

# Sample text

text = "It is important to be very pythonly while you are

pythoning with python. All pythoners have pythoned poorly at
least once."
# Tokenize the text

words = word_tokenize(text)

# Create a PorterStemmer object

porter = PorterStemmer()

# Stem each word in the text

stemmed_words = [porter.stem(word) for word in words]

# Print the stemmed words

print("Original text:")

print(text)

print("\nStemmed text:")

print(" ".join(stemmed_words))

output:-

Original text:

It is important to be very pythonly while you are pythoning with

python. All pythoners have pythoned poorly at least once.

Stemmed text:

It is import to be veri pythonli while you are python with python .

all python have python poorli at least onc .
6. Create Sample list of at least 10 words POS tagging and find the
POS for any given word

import nltk

# Sample list of words

sample_words = ["Python", "Programming", "Language", "is", "widely", "used",
"for", "developing", "various", "applications"]

# Perform POS tagging

pos_tags = nltk.pos_tag(sample_words)

# Function to find POS for a given word

def find_pos(word):
for w, pos in pos_tags:
if w.lower() == word.lower():
return pos
return "POS not found"

# Test the function with a given word

given_word = "Python"
pos = find_pos(given_word)
print(f"POS tag for '{given_word}': {pos}")

Output:-

POS tag for 'Python': NN

7. Write a Python program to

a) Perform Morphological Analysis using NLTK library

b) Generate n-grams using NLTK N-Grams library

c) Implement N-Grams Smoothing also give me output

import nltk

from nltk.util import ngrams

from nltk.corpus import stopwords

from nltk.stem import WordNetLemmatizer

from collections import Counter

import math

def morphological_analysis(text):

# Tokenize the text

tokens = nltk.word_tokenize(text)

# Remove stopwords

stop_words = set(stopwords.words('english'))

filtered_tokens = [word for word in tokens if word.lower() not in stop_words]

# Perform lemmatization

lemmatizer = WordNetLemmatizer()

lemmas = [lemmatizer.lemmatize(token) for token in filtered_tokens]

return lemmas

def generate_ngrams(text, n):

# Tokenize the text

tokens = nltk.word_tokenize(text)

# Generate n-grams

n_grams = list(ngrams(tokens, n))

return n_grams

def calculate_ngram_smoothing(n_grams):

# Count occurrences of n-grams

n_gram_counts = Counter(n_grams)

# Calculate probabilities with Laplace smoothing

n_gram_probabilities = {}

for n_gram in n_gram_counts:

context = n_gram[:-1]

context_count = sum(1 for ng in n_grams if ng[:-1] == context)

probability = (n_gram_counts[n_gram] + 1) / (context_count + len(n_gram_counts))

n_gram_probabilities[n_gram] = probability
return n_gram_probabilities

def main():

text = "The quick brown fox jumps over the lazy dog."

print("Original Text:", text)

# a) Morphological Analysis

morph_analysis_result = morphological_analysis(text)

print("\nMorphological Analysis:", morph_analysis_result)

# b) Generate n-grams

n=3

n_grams = generate_ngrams(text, n)

print("\n{}-grams:".format(n), n_grams)

# c) N-Grams Smoothing

n_gram_probabilities = calculate_ngram_smoothing(n_grams)

print("\nN-Gram Probabilities (with Laplace smoothing):", n_gram_probabilities)

if __name__ == "__main__":

main()

Output:-
Original Text: The quick brown fox jumps over the lazy dog.

Morphological Analysis: ['The', 'quick', 'brown', 'fox', 'jump', 'lazy', 'dog', '.']

3-grams: [('The', 'quick', 'brown'), ('quick', 'brown', 'fox'), ('brown', 'fox', 'jumps'),
('fox', 'jumps', 'lazy'), ('jumps', 'lazy', 'dog'), ('lazy', 'dog', '.')]

N-Gram Probabilities (with Laplace smoothing): {('The', 'quick', 'brown'):

0.16666666666666666, ('quick', 'brown', 'fox'): 0.16666666666666666, ('brown',
'fox', 'jumps'): 0.16666666666666666, ('fox', 'jumps', 'lazy'): 0.16666666666666666,
('jumps', 'lazy', 'dog'): 0.16666666666666666, ('lazy', 'dog', '.'):
0.16666666666666666}

8. Using NLTK package to convert audio file to text and text file to
audio files.
import speech_recognition as sr

import pyttsx3

def audio_to_text(audio_file):

# Initialize the recognizer

recognizer = sr.Recognizer()

# Load the audio file

with sr.AudioFile(audio_file) as source:

audio_data = recognizer.record(source)
# Convert audio to text

try:

text = recognizer.recognize_google(audio_data)

return text

except sr.UnknownValueError:

return "Speech Recognition could not understand audio"

except sr.RequestError as e:

return f"Could not request results from Speech Recognition service; {e}"

def text_to_audio(text, output_file):

# Initialize the Text-to-Speech engine

engine = pyttsx3.init()

# Save the text to an audio file

engine.save_to_file(text, output_file)

engine.runAndWait()

if __name__ == "__main__":

# Audio file to text

audio_file = "audio_sample.wav"

text = audio_to_text(audio_file)

print("Text from audio:", text)

# Text to audio

output_file = "output_audio.wav"
text_to_audio(text, output_file)

print("Text converted to audio")

Output:-

Text from audio: hello how are you

Text converted to audio

NLP Lab Manual Updated
No ratings yet
NLP Lab Manual Updated
34 pages
Smooth N-Gram
No ratings yet
Smooth N-Gram
2 pages
IRS Theory & Lab Syllabus
100% (1)
IRS Theory & Lab Syllabus
3 pages
Artifical Intelligence and Machine Learning Lab
No ratings yet
Artifical Intelligence and Machine Learning Lab
109 pages
System Paradigms in NLP
No ratings yet
System Paradigms in NLP
8 pages
NLP Course File Notes
100% (1)
NLP Course File Notes
71 pages
NLP Unit-1 Notes
No ratings yet
NLP Unit-1 Notes
50 pages
KRR Unit 4 Part 1 Lecture Notes
No ratings yet
KRR Unit 4 Part 1 Lecture Notes
9 pages
NLP Important and Super Important Questions-18CS743
No ratings yet
NLP Important and Super Important Questions-18CS743
2 pages
STM Notes
No ratings yet
STM Notes
153 pages
N-Gram Language Models in NLP
No ratings yet
N-Gram Language Models in NLP
48 pages
KRR Unit-5
100% (1)
KRR Unit-5
51 pages
CS8691 AI CO-PO Mapping
No ratings yet
CS8691 AI CO-PO Mapping
6 pages
UNIT-V NLP
No ratings yet
UNIT-V NLP
25 pages
CSE4022 Natural-Language-Processing ETH 1 AC41
No ratings yet
CSE4022 Natural-Language-Processing ETH 1 AC41
6 pages
Irs Question Papers
No ratings yet
Irs Question Papers
6 pages
Iat 1 QP NLP
No ratings yet
Iat 1 QP NLP
3 pages
Natural Language Processing
No ratings yet
Natural Language Processing
37 pages
Software Testing Lab Guide
No ratings yet
Software Testing Lab Guide
50 pages
Tsa Ut III Tsa Notes
No ratings yet
Tsa Ut III Tsa Notes
30 pages
KRR Unit-3
No ratings yet
KRR Unit-3
19 pages
Representing Knowledge in An Uncertain Domain IN AI: Bayesian Networks
No ratings yet
Representing Knowledge in An Uncertain Domain IN AI: Bayesian Networks
7 pages
Semantic Analysis in NLP
No ratings yet
Semantic Analysis in NLP
8 pages
ISOMAP in ML
No ratings yet
ISOMAP in ML
12 pages
STM - Lab - Manul III Cse II Sem
No ratings yet
STM - Lab - Manul III Cse II Sem
36 pages
STM Question Paper R18
No ratings yet
STM Question Paper R18
2 pages
KRR Notes
No ratings yet
KRR Notes
5 pages
Unit-1 Cyber Laws
No ratings yet
Unit-1 Cyber Laws
21 pages
DV LAB-5th Sem (2022)
No ratings yet
DV LAB-5th Sem (2022)
92 pages
Vtu NLP Questions
100% (1)
Vtu NLP Questions
5 pages
Unit-5 Alt
No ratings yet
Unit-5 Alt
15 pages
Unit 3 AI Srs 13-14
No ratings yet
Unit 3 AI Srs 13-14
45 pages
NLP Question Bank
No ratings yet
NLP Question Bank
1 page
R18 CSM 3-2 Devops
No ratings yet
R18 CSM 3-2 Devops
28 pages
R22 III II KRR CSEAIML Model QP
100% (2)
R22 III II KRR CSEAIML Model QP
2 pages
NLP for Tech Enthusiasts
No ratings yet
NLP for Tech Enthusiasts
35 pages
Cns Lab Manual III CSE II SEM
No ratings yet
Cns Lab Manual III CSE II SEM
36 pages
Generative Models For Ambiguity Resolution
No ratings yet
Generative Models For Ambiguity Resolution
8 pages
NLP UNIT 1 (Ques Ans Bank)
No ratings yet
NLP UNIT 1 (Ques Ans Bank)
20 pages
5.2 Natural Language Processing
No ratings yet
5.2 Natural Language Processing
43 pages
NLP Unit III
No ratings yet
NLP Unit III
17 pages
MCS101-Artificial Intelligence
50% (2)
MCS101-Artificial Intelligence
3 pages
Natural Language Processing
No ratings yet
Natural Language Processing
40 pages
Compiler Design Notes Unit-1 & Unit-2
No ratings yet
Compiler Design Notes Unit-1 & Unit-2
59 pages
CO1 CC PPT Session 6
No ratings yet
CO1 CC PPT Session 6
22 pages
NLP Notes
No ratings yet
NLP Notes
71 pages
KRR Unit 1
No ratings yet
KRR Unit 1
26 pages
NLP Sample Question Bank
No ratings yet
NLP Sample Question Bank
9 pages
AI NEW Lab Manual-R22 BATCH-CSE
No ratings yet
AI NEW Lab Manual-R22 BATCH-CSE
32 pages
AI Knowledge Representation Basics
No ratings yet
AI Knowledge Representation Basics
33 pages
Dbms Lab Manual II Cse II Sem
No ratings yet
Dbms Lab Manual II Cse II Sem
58 pages
Scalable Parallel Computing
No ratings yet
Scalable Parallel Computing
11 pages
PPL 2
No ratings yet
PPL 2
144 pages
AM601PC KRR Unit 1
No ratings yet
AM601PC KRR Unit 1
16 pages
Cognitive Computing (Course Code: 18CS3272) : CO1 - Session4 Session Topic: The Elements of A Cognitive System
No ratings yet
Cognitive Computing (Course Code: 18CS3272) : CO1 - Session4 Session Topic: The Elements of A Cognitive System
9 pages
JNTUH B.Tech Formal Languages Exam
No ratings yet
JNTUH B.Tech Formal Languages Exam
12 pages
Knowledge Representation & Reasoning Notes
100% (1)
Knowledge Representation & Reasoning Notes
32 pages
R22 NLP Python Programs
No ratings yet
R22 NLP Python Programs
15 pages
Ujian Bulan Mac: Bahasa Inggeris Kertas 1 Tahun 4
No ratings yet
Ujian Bulan Mac: Bahasa Inggeris Kertas 1 Tahun 4
9 pages
KG2 Marine & Winged Life Activities
No ratings yet
KG2 Marine & Winged Life Activities
1 page
Digital Number Systems Guide
No ratings yet
Digital Number Systems Guide
12 pages
Rate Analysis - 2016
No ratings yet
Rate Analysis - 2016
23 pages
Listado de Partes
No ratings yet
Listado de Partes
299 pages
Module 6 Stoichiometry 1
No ratings yet
Module 6 Stoichiometry 1
37 pages
SK Faustino Annual Budget 2025
No ratings yet
SK Faustino Annual Budget 2025
3 pages
Pamflet CPHI 2018
No ratings yet
Pamflet CPHI 2018
1 page
01 Aen 17526 s17 Model Answer
No ratings yet
01 Aen 17526 s17 Model Answer
26 pages
Class12 CS Project Hospital Management System Bhavan New (1) Button
No ratings yet
Class12 CS Project Hospital Management System Bhavan New (1) Button
24 pages
Aci sp-226-2005
No ratings yet
Aci sp-226-2005
158 pages
Latihan Soal Bahasa Inggris Kelas Viii-1
No ratings yet
Latihan Soal Bahasa Inggris Kelas Viii-1
5 pages
2020 BBO Answers PDF
No ratings yet
2020 BBO Answers PDF
2 pages
HRM Strategies and Models Overview
No ratings yet
HRM Strategies and Models Overview
38 pages
Skytrain Avia Services: Emergency Procedures Manual
No ratings yet
Skytrain Avia Services: Emergency Procedures Manual
32 pages
Business Intelligence For Big Data Analytics
No ratings yet
Business Intelligence For Big Data Analytics
8 pages
Windows Client Setup Guide
No ratings yet
Windows Client Setup Guide
13 pages
Gary Goldschneider's Everyday Astrology PDF
No ratings yet
Gary Goldschneider's Everyday Astrology PDF
31 pages
100 General Grammar MCQs
No ratings yet
100 General Grammar MCQs
16 pages
Easy Tense Chart
No ratings yet
Easy Tense Chart
3 pages
Invoice Ads Jasuprapto
No ratings yet
Invoice Ads Jasuprapto
3 pages
DragonArt Fantasy Characters
90% (10)
DragonArt Fantasy Characters
131 pages
David Whyte Essentials (David Whyte) (Z-Library)
100% (1)
David Whyte Essentials (David Whyte) (Z-Library)
104 pages
Regional and Community Techniques in Food Preparation
No ratings yet
Regional and Community Techniques in Food Preparation
5 pages
Cipms: Computerized Integrated Plant Management System
No ratings yet
Cipms: Computerized Integrated Plant Management System
19 pages
Calculus in Basketball 1
100% (1)
Calculus in Basketball 1
3 pages
Hastamalaka
No ratings yet
Hastamalaka
8 pages
Arson Investigation
No ratings yet
Arson Investigation
36 pages
HD WF4 Specification V6.0.1
100% (1)
HD WF4 Specification V6.0.1
5 pages
Differences Between Face Up Blackjack and Regular Blackjack
No ratings yet
Differences Between Face Up Blackjack and Regular Blackjack
2 pages