Charotar University of Science and Technology [CHARUSAT]
NLP Assignment-1
Last date for submission: 30/07/2024
Subject : CE478/CS476 Semester : 7 Academic : 2024-25
code Year
Subject : Natural Language Processing
name
1 Why is Natural Language Technology not yet perfect, but still good enough for several
useful applications?
2 Explain in brief: Different Applications of NLP
3 Explain: Lexical Ambiguity with an example
4 State True/False with reason: Ambiguity is pervasive.
5 What is the primary difference between natural and computer languages? Why?
Give example for it.
6 State True/False with reason: Formal programming languages are designed to be
unambiguous.
7 How do function words and content words are different? Give example for it.
8 Define: corpus, morphemes
9 How many Type and Tokens are there in this sentence: Will will
10 What is the importance of TTR (type/token ration)? What does it indicates?
11 How does TTR is calculated in the following example?
12 What does the high value TTR indicates? What does the low value TTR
indicates?
13 Why TTR is not a valid measure of text complexity by itself?
14 What is hapax legomena?
15 Explain: Zipf’s Law with an example.
16 State True/False with reason: Word frequency is inversely proportional to their
length
17 Explain: Tokenization and sentence segmentation with example. What are the
different issues are there in tokenization?
18 How does the problem of deciding where the sentences begin and end can be
solved?
19 Explain in detail: Sentence segmentation using decision tree.
20 Explain: Specific Issues related to different languages. Why to “normalize” the
text? How to perform it?
21 In a corpus, you found that the word with rank 4th has a frequency of 600.
What can be the best guess for the rank of a word with frequency 300?
22 In the sentence, "The only thing we have to fear is fear itself", the ratio
between total number of word tokens and word types is _______.
23 Let the rank of two words, w1 and w2, in a corpus be 1600 and 400,
respectively. Let m1 and m2 represent the number of meanings of w1 and w2
respectively. The ratio m1 : m2 would tentatively be (choose one answer from the
below options)
1. 1:4
2. 4:1
3. 1:2
4. 2:1
24 If first corpus has and second corpus has , where
and represents type/token ratio in first and second corpus respectively, then
Which of the following is /are false?
1. First corpus has more tendency to use different words.
2. Second corpus has more tendency to use different words.
3. TTR value sometime can be greater than 1
4. A high TTR indicates a high degree of lexical variation while a low TTR
indicates
the opposite.
25 What is the size of unique words in a document where total number of
words = 12000. K = 3.71 Beta = 0.69?