KEMBAR78
Python computer science technology .pptx
Python + NLTK
Natural Language Processing
Brief History of Python
• Invented in the Netherlands, early 90s by Guido van Rossum
• Named after Monty Python
• Open sourced from the beginning
• Used by Google from the beginning
• Increasingly popular
• https://www.python.org/downloads/
Naming Rules
• Names are case sensitive and cannot start with a number. They can
contain letters, numbers, and underscores.
bob Bob _bob _2_bob_ bob_2 BoB
• There are some reserved words:
and, assert, break, class, continue, def, del, elif,
else, except, exec, finally, for, from, global, if,
import, in, is, lambda, not, or, pass, print, raise,
return, try, while
Assignment
• You can assign to multiple names at the same time
>>> x, y = 2, 3
>>> x
2
>>> y
3
This makes it easy to swap values
>>> x, y = y, x
• Assignments can be chained
>>> a = b = x = 2
if elseelse elif
A Code Sample (in IDLE)
x = 10 - 5 # A comment.
y = “Hello” # Another one.
z = 3.45
if z == 3.45 or y == “Hello”:
x = x + 1
y = y + “ World” # String concat.
print x
print y
Array
• array = [2, 54, 5, 7, 8, 9]
• list1 = list() #empty list
NLTK
• NLTK (Natural Language Toolkit) is the go-to API for NLP
(Natural Language Processing) with Python
pip install nltk
Tokenization
• Tokenization is a way of separating a piece of text into
smaller units called tokens.
• sentence: “Never give up”.
• 3 tokens – Never-give-up.
• from nltk import word_tokenize, sent_tokenize
• sent = "I will walk 500 miles and I would walk 500 more, just to be the
man who walks a thousand miles to fall down at your door!“
• print(word_tokenize(sent))
• print(sent_tokenize(sent))
Stop words
• Stop words are the words which are very common in text
documents
• Example
• as a, an, the, you, your, etc.
• Print all stopwords in English
Stop Word Removal
• from nltk.corpus import stopwords
• from nltk.tokenize import word_tokenize
• example_sent = """early symptoms of the coronavirus"""
• stop_words = set(stopwords.words('english'))
• word_tokens = word_tokenize(example_sent)
• filtered_sentence = [w for w in word_tokens if not w in stop_words]
Stop Word Removal
• filtered_sentence = []
• for w in word_tokens:
• if w not in stop_words:
• filtered_sentence.append(w)
• print(word_tokens)
• print(filtered_sentence)
Stemming
• Stemming is the process of producing morphological
variants of a root/base word.
• Stemming is used in information retrieval systems like
search engines.
• It is used to determine domain vocabularies in domain
analysis.
• Some more example of stemming for root word
"like" include:
-> "likes"
-> "liked"
-> "likely"
Some more example of stemming for root word "like" include: -> "likes" -> "liked" -> "likely" -> "liking"

Python computer science technology .pptx

  • 1.
    Python + NLTK NaturalLanguage Processing
  • 2.
    Brief History ofPython • Invented in the Netherlands, early 90s by Guido van Rossum • Named after Monty Python • Open sourced from the beginning • Used by Google from the beginning • Increasingly popular • https://www.python.org/downloads/
  • 3.
    Naming Rules • Namesare case sensitive and cannot start with a number. They can contain letters, numbers, and underscores. bob Bob _bob _2_bob_ bob_2 BoB • There are some reserved words: and, assert, break, class, continue, def, del, elif, else, except, exec, finally, for, from, global, if, import, in, is, lambda, not, or, pass, print, raise, return, try, while
  • 4.
    Assignment • You canassign to multiple names at the same time >>> x, y = 2, 3 >>> x 2 >>> y 3 This makes it easy to swap values >>> x, y = y, x • Assignments can be chained >>> a = b = x = 2
  • 5.
  • 6.
    A Code Sample(in IDLE) x = 10 - 5 # A comment. y = “Hello” # Another one. z = 3.45 if z == 3.45 or y == “Hello”: x = x + 1 y = y + “ World” # String concat. print x print y
  • 7.
    Array • array =[2, 54, 5, 7, 8, 9] • list1 = list() #empty list
  • 8.
    NLTK • NLTK (NaturalLanguage Toolkit) is the go-to API for NLP (Natural Language Processing) with Python pip install nltk
  • 9.
    Tokenization • Tokenization isa way of separating a piece of text into smaller units called tokens. • sentence: “Never give up”. • 3 tokens – Never-give-up.
  • 10.
    • from nltkimport word_tokenize, sent_tokenize • sent = "I will walk 500 miles and I would walk 500 more, just to be the man who walks a thousand miles to fall down at your door!“ • print(word_tokenize(sent)) • print(sent_tokenize(sent))
  • 11.
    Stop words • Stopwords are the words which are very common in text documents • Example • as a, an, the, you, your, etc. • Print all stopwords in English
  • 12.
    Stop Word Removal •from nltk.corpus import stopwords • from nltk.tokenize import word_tokenize • example_sent = """early symptoms of the coronavirus""" • stop_words = set(stopwords.words('english')) • word_tokens = word_tokenize(example_sent) • filtered_sentence = [w for w in word_tokens if not w in stop_words]
  • 13.
    Stop Word Removal •filtered_sentence = [] • for w in word_tokens: • if w not in stop_words: • filtered_sentence.append(w) • print(word_tokens) • print(filtered_sentence)
  • 14.
    Stemming • Stemming isthe process of producing morphological variants of a root/base word. • Stemming is used in information retrieval systems like search engines. • It is used to determine domain vocabularies in domain analysis. • Some more example of stemming for root word "like" include: -> "likes" -> "liked" -> "likely" Some more example of stemming for root word "like" include: -> "likes" -> "liked" -> "likely" -> "liking"

Editor's Notes