Python computer science technology .pptx

Python + NLTK
Natural Language Processing

Brief History of Python
• Invented in the Netherlands, early 90s by Guido van Rossum
• Named after Monty Python
• Open sourced from the beginning
• Used by Google from the beginning
• Increasingly popular
• https://www.python.org/downloads/

Naming Rules
• Names are case sensitive and cannot start with a number. They can
contain letters, numbers, and underscores.
bob Bob _bob _2_bob_ bob_2 BoB
• There are some reserved words:
and, assert, break, class, continue, def, del, elif,
else, except, exec, finally, for, from, global, if,
import, in, is, lambda, not, or, pass, print, raise,
return, try, while

Assignment
• You can assign to multiple names at the same time
>>> x, y = 2, 3
>>> x
2
>>> y
3
This makes it easy to swap values
>>> x, y = y, x
• Assignments can be chained
>>> a = b = x = 2

A Code Sample (in IDLE)
x = 10 - 5 # A comment.
y = “Hello” # Another one.
z = 3.45
if z == 3.45 or y == “Hello”:
x = x + 1
y = y + “ World” # String concat.
print x
print y

Array
• array = [2, 54, 5, 7, 8, 9]
• list1 = list() #empty list

NLTK
• NLTK (Natural Language Toolkit) is the go-to API for NLP
(Natural Language Processing) with Python
pip install nltk

Tokenization
• Tokenization is a way of separating a piece of text into
smaller units called tokens.
• sentence: “Never give up”.
• 3 tokens – Never-give-up.

• from nltk import word_tokenize, sent_tokenize
• sent = "I will walk 500 miles and I would walk 500 more, just to be the
man who walks a thousand miles to fall down at your door!“
• print(word_tokenize(sent))
• print(sent_tokenize(sent))

Stop words
• Stop words are the words which are very common in text
documents
• Example
• as a, an, the, you, your, etc.
• Print all stopwords in English

Stop Word Removal
• from nltk.corpus import stopwords
• from nltk.tokenize import word_tokenize
• example_sent = """early symptoms of the coronavirus"""
• stop_words = set(stopwords.words('english'))
• word_tokens = word_tokenize(example_sent)
• filtered_sentence = [w for w in word_tokens if not w in stop_words]

Stop Word Removal
• filtered_sentence = []
• for w in word_tokens:
• if w not in stop_words:
• filtered_sentence.append(w)
• print(word_tokens)
• print(filtered_sentence)

Stemming
• Stemming is the process of producing morphological
variants of a root/base word.
• Stemming is used in information retrieval systems like
search engines.
• It is used to determine domain vocabularies in domain
analysis.
• Some more example of stemming for root word
"like" include:
-> "likes"
-> "liked"
-> "likely"
Some more example of stemming for root word "like" include: -> "likes" -> "liked" -> "likely" -> "liking"

Python computer science technology .pptx

More Related Content

Similar to Python computer science technology .pptx

More from Athar Baig

Recently uploaded

Python computer science technology .pptx

Editor's Notes