0% found this document useful (0 votes)

40 views4 pages

NLP Unit 2 Imp

The document discusses key concepts in morphology, including its definition, challenges in Indian languages, and its relationship with finite state automata. It also covers processes like word segmentation, tokenization, part-of-speech tagging, and the Maximum Entropy model for tagging, as well as the importance of smoothing in N-gram models. Additionally, it highlights the significance of corpora in linguistic research and natural language processing applications.

Uploaded by

chourasiaronit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views4 pages

NLP Unit 2 Imp

Uploaded by

chourasiaronit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

NLP Unit 2 Imp

1. What do you mean by Morphology?

Morphology is the branch of linguistics that deals with the internal structure and formation of words. It studies how words

are built from smaller meaningful units called morphemes. A morpheme is the smallest grammatical unit in a language,

which carries meaning. These morphemes can either be free (they can stand alone as words) or bound (they must be

attached to other morphemes to convey meaning). Morphology explains how words change form to indicate things like

tense, number, gender, case, and so on. For example, in English, the word "unhappiness" is made up of the root

"happy", the prefix "un-" meaning "not", and the suffix "-ness" which turns the word into a noun. Thus, morphology helps

us understand how new words are formed and how existing words are modified according to grammatical rules.

2. What are issues related to Morphology of Indian Languages?

Indian languages are morphologically rich and diverse, which poses several challenges in natural language processing.

One of the major issues is the high degree of inflection. Words in Indian languages undergo numerous changes to

express tense, gender, number, mood, etc., leading to a large number of word forms for a single root word. This makes

morphological analysis difficult. Another issue is the agglutinative nature of some Indian languages like Tamil and

Telugu, where multiple suffixes are added to a root word to express complex meanings, forming very long words.

Furthermore, Indian languages often have dialectal variations and spelling inconsistencies, which affect the accuracy of

morphological tools. Additionally, there is a lack of standard digital linguistic resources and annotated corpora for many

Indian languages, making it harder to build effective computational models for these languages.

3. What is the relationship between Morphology and Finite State Automata?

Finite State Automata (FSA) is a mathematical model used to recognize patterns within input data, and it has an

important role in computational morphology. Morphology is concerned with the structure and formation of words, and

FSA can be used to model these structures. A morphological analyzer uses FSA to check whether a word is valid or not,
by moving through different states based on the morphemes it processes. For example, a finite state machine can

recognize word forms by defining transitions between states for roots and affixes. This approach is efficient in handling

regular morphological rules and is widely used in language processing tasks such as spelling checkers, stemmers, and

syntactic analyzers. Therefore, FSA provides a systematic way to model the morphological rules of a language

computationally.

4. Discuss the difference between Word Segmentation and Tokenization.

Word segmentation and tokenization are two fundamental processes in natural language processing, but they serve

slightly different purposes. Word segmentation refers to the process of identifying word boundaries in a continuous string

of text. This is particularly important in languages like Chinese, Japanese, and Thai, where spaces are not used to

separate words. In such languages, word segmentation involves determining where one word ends and another begins.

On the other hand, tokenization is a broader process that involves dividing text into smaller units called tokens. These

tokens can be words, punctuation marks, or other meaningful elements. While tokenization is straightforward in

languages like English, where words are typically separated by spaces, it can be more complex when dealing with

compound words or punctuation. The main difference is that word segmentation focuses on identifying words in

languages with no spaces, whereas tokenization applies to many languages and involves breaking text into analyzable

units.

5. Describe Part-of-Speech Tagging with suitable example.

Part-of-Speech (POS) tagging is the process of assigning a grammatical category, such as noun, verb, adjective, etc., to

each word in a sentence. This is a crucial step in many natural language processing applications, as it helps the

computer understand the structure and meaning of sentences. POS tagging is not always straightforward because the

same word can have different tags depending on its context. For example, the word "book" can be a noun in the

sentence "I read a book" and a verb in "I will book a ticket." POS tagging systems use linguistic rules or statistical

models to determine the most likely tag for a word based on its context in the sentence. This tagging allows further

processing such as parsing, machine translation, and question-answering systems.

6. Discuss Maximum Entropy Model of Part-of-Speech Tagging.

The Maximum Entropy (MaxEnt) model is a statistical method used for part-of-speech tagging and other natural

language processing tasks. It is based on the principle of maximum entropy, which states that among all possible

probability distributions, the one with the highest entropy (i.e., the most uniform) should be chosen, provided it satisfies

the known constraints. In the context of POS tagging, the MaxEnt model considers multiple features of the context in

which a word appears, such as the words before and after it, word prefixes and suffixes, and other lexical clues. It then

uses these features to predict the most probable tag for a given word. The advantage of the MaxEnt model is its ability

to combine many types of information in a flexible and powerful way. It does not assume independence between

features, making it suitable for complex natural language tasks.

7. What is smoothing in the context of N-gram models and why is it necessary? Provide example to show how

smoothing works.

In the context of N-gram language models, smoothing is a technique used to handle the problem of zero probability for

unseen word sequences. N-gram models predict the likelihood of a word based on the previous words, but if a particular

sequence of words has not been seen in the training data, it is assigned a zero probability. This can severely affect the

performance of the model, especially in applications like speech recognition or machine translation. Smoothing solves

this problem by assigning a small non-zero probability to unseen word combinations. One simple method is add-one

smoothing, where one is added to the count of every possible N-gram in the data. For example, if the bigram "love

pizza" did not occur in the training data, its probability would be zero. After applying add-one smoothing, it gets a small

positive probability, allowing the model to handle new combinations more effectively. Smoothing makes the model more

robust and realistic in practical applications.

8. Give an introductory note on Corpora. Write significance of Corpora analysis.

A corpus (plural: corpora) is a large and structured collection of texts that are used for linguistic research and

computational language processing. Corpora can include books, newspapers, websites, social media content, and
spoken dialogues, and they are often annotated with additional linguistic information such as part-of-speech tags or

syntactic structures. Corpus analysis is the study of language using these collections, allowing researchers to

investigate language patterns, frequency of word usage, grammar rules, and much more. The significance of corpora in

natural language processing lies in its role as training data for machine learning models. By analyzing large corpora,

computers can learn how language is actually used in real life, rather than relying only on fixed rules. This helps in

building applications like machine translation, voice assistants, grammar checkers, and search engines. Corpora also

support language learning, dictionary creation, and linguistic theory development.

Unit Ii NLP Notes Final
No ratings yet
Unit Ii NLP Notes Final
6 pages
CMR University School of Engineering and Technology Department of Cse and It
No ratings yet
CMR University School of Engineering and Technology Department of Cse and It
8 pages
517-C-30070-Assignment - Chapter NLP
No ratings yet
517-C-30070-Assignment - Chapter NLP
9 pages
NLP Shorts 3
No ratings yet
NLP Shorts 3
25 pages
NLP Mid-1
No ratings yet
NLP Mid-1
15 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
NLP Unit-Ii - Mma
No ratings yet
NLP Unit-Ii - Mma
19 pages
NLP Quiz Seg 1 To 4
No ratings yet
NLP Quiz Seg 1 To 4
9 pages
NLP Q&A for Class X AI Course
No ratings yet
NLP Q&A for Class X AI Course
7 pages
NLP Basics for Beginners
No ratings yet
NLP Basics for Beginners
4 pages
NLP-Questions Class 10 Ai
No ratings yet
NLP-Questions Class 10 Ai
8 pages
NLP 2
No ratings yet
NLP 2
13 pages
Lemmatization Is The Grouping Together of Different Forms of The Same Word. in Search
No ratings yet
Lemmatization Is The Grouping Together of Different Forms of The Same Word. in Search
11 pages
UNIT-1 Notes
No ratings yet
UNIT-1 Notes
19 pages
NLP Final
No ratings yet
NLP Final
27 pages
NLP QB
No ratings yet
NLP QB
7 pages
NLP Applications and Techniques
No ratings yet
NLP Applications and Techniques
7 pages
Module 3 - Part 1
No ratings yet
Module 3 - Part 1
54 pages
NLP One Mark Questions With Answers
No ratings yet
NLP One Mark Questions With Answers
8 pages
Q ClassX AI Ch7
No ratings yet
Q ClassX AI Ch7
6 pages
NLP QB
No ratings yet
NLP QB
5 pages
VND Openxmlformats-Officedocument Wordprocessingml Document&rendition 1
No ratings yet
VND Openxmlformats-Officedocument Wordprocessingml Document&rendition 1
5 pages
NLP Unit 2
No ratings yet
NLP Unit 2
48 pages
NaturalLanguageProcessing Lecture02 PDF
No ratings yet
NaturalLanguageProcessing Lecture02 PDF
18 pages
NLP Unit 1 Answers
No ratings yet
NLP Unit 1 Answers
7 pages
Chapter 1
No ratings yet
Chapter 1
41 pages
Part-of-Speech (POS) Tagging
No ratings yet
Part-of-Speech (POS) Tagging
4 pages
Solution NLP UT1
No ratings yet
Solution NLP UT1
7 pages
CAT King Study Material 5
No ratings yet
CAT King Study Material 5
21 pages
NLP QB
No ratings yet
NLP QB
13 pages
Assignment of AI Finished
No ratings yet
Assignment of AI Finished
16 pages
NLP 2-5 Unit Notes
No ratings yet
NLP 2-5 Unit Notes
83 pages
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
No ratings yet
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
30 pages
Viva Q&a
No ratings yet
Viva Q&a
5 pages
Important Questions and Answer NLP
No ratings yet
Important Questions and Answer NLP
10 pages
NLP Sem 7 Imp Questions
No ratings yet
NLP Sem 7 Imp Questions
11 pages
NLP Notes
No ratings yet
NLP Notes
10 pages
Important Questions-Answers Text Analytics and Natural Language Processing (KAI073)
No ratings yet
Important Questions-Answers Text Analytics and Natural Language Processing (KAI073)
37 pages
NLP Answers
No ratings yet
NLP Answers
6 pages
404-BA-Chapter V
No ratings yet
404-BA-Chapter V
22 pages
NLP Material
No ratings yet
NLP Material
250 pages
Part - A (2 Mark Questions)
No ratings yet
Part - A (2 Mark Questions)
35 pages
NLP Sem Unit 1
No ratings yet
NLP Sem Unit 1
8 pages
NLP and Evaluation - MCQ
No ratings yet
NLP and Evaluation - MCQ
10 pages
NLP Unitwise Imp Questions
100% (1)
NLP Unitwise Imp Questions
5 pages
SKD Academy (CBSE) Session - 2024-2025 Subject - Artificial Intelligence (417) Important Questions Chap - NLP
No ratings yet
SKD Academy (CBSE) Session - 2024-2025 Subject - Artificial Intelligence (417) Important Questions Chap - NLP
7 pages
Quest NLP
No ratings yet
Quest NLP
13 pages
NLP CIE 1 Important Questions
No ratings yet
NLP CIE 1 Important Questions
4 pages
Natural Language Processing Tools and Approaches
No ratings yet
Natural Language Processing Tools and Approaches
106 pages
End Sem Answer Key 2023
No ratings yet
End Sem Answer Key 2023
4 pages
Question Bank-Responsbile AI VTU
No ratings yet
Question Bank-Responsbile AI VTU
2 pages
Selected Topic CH 1
No ratings yet
Selected Topic CH 1
36 pages
NLP-PT 1
No ratings yet
NLP-PT 1
15 pages
NLP Reading Material-1
No ratings yet
NLP Reading Material-1
15 pages
24kV Single Core Underground Cable Specs
No ratings yet
24kV Single Core Underground Cable Specs
10 pages
REview Form
No ratings yet
REview Form
5 pages
Gowifi - Ahd - HLD - Rosario Maclang Bautista General Hospital Rev1
No ratings yet
Gowifi - Ahd - HLD - Rosario Maclang Bautista General Hospital Rev1
32 pages
Admin Assistant Career Profile
No ratings yet
Admin Assistant Career Profile
1 page
Accordion Arduino Mega Code
No ratings yet
Accordion Arduino Mega Code
9 pages
E-Book - 1.9
No ratings yet
E-Book - 1.9
776 pages
Unit-6 Ai Tools-Chatgpt
100% (1)
Unit-6 Ai Tools-Chatgpt
9 pages
Oracle Linux 6 Installation
No ratings yet
Oracle Linux 6 Installation
35 pages
Fastener Standards
No ratings yet
Fastener Standards
8 pages
Chapter 00.course Information
No ratings yet
Chapter 00.course Information
14 pages
Completing the Square Practice
No ratings yet
Completing the Square Practice
4 pages
1-Wildfire Architecture Overview PDF
No ratings yet
1-Wildfire Architecture Overview PDF
22 pages
GP Install
No ratings yet
GP Install
15 pages
Fertilizer Forecasting Using Machine Learning
100% (1)
Fertilizer Forecasting Using Machine Learning
4 pages
Solutions To Missing Data
No ratings yet
Solutions To Missing Data
8 pages
Modernizing Legacy C++ Code - Gregory and McNellis - CppCon 2014
No ratings yet
Modernizing Legacy C++ Code - Gregory and McNellis - CppCon 2014
81 pages
Catalogo LanPro
No ratings yet
Catalogo LanPro
8 pages
PTE - (Full Presentation)
0% (1)
PTE - (Full Presentation)
64 pages
A Review On Emerging Smart Technological Innovations in Healthcare Sector For Increasing Patient's Medication Adherence
No ratings yet
A Review On Emerging Smart Technological Innovations in Healthcare Sector For Increasing Patient's Medication Adherence
7 pages
Openscape Business v2 Feature Description Issue 7
No ratings yet
Openscape Business v2 Feature Description Issue 7
676 pages
Java Unit Wise Questions
No ratings yet
Java Unit Wise Questions
4 pages
Addition Tips and Tricks
No ratings yet
Addition Tips and Tricks
11 pages
Deepfake's Impact on Trust
No ratings yet
Deepfake's Impact on Trust
7 pages
Sam International CV
No ratings yet
Sam International CV
3 pages
B.Tech CSE Provisional Grade Sheet
No ratings yet
B.Tech CSE Provisional Grade Sheet
4 pages
CRM - Tables and Transaction Codes SAP - BYTE
100% (1)
CRM - Tables and Transaction Codes SAP - BYTE
39 pages
Set Up Form For Trade On Net PDF
No ratings yet
Set Up Form For Trade On Net PDF
3 pages
SDLC
100% (3)
SDLC
85 pages
ANTLMonitoring - MRBTS-525536 - MED - Bello Sauces - WN19 - FL19 - GF19 - SBTS19B - ENB - 0000 - 001696 - 000000 - 20210211-1157
No ratings yet
ANTLMonitoring - MRBTS-525536 - MED - Bello Sauces - WN19 - FL19 - GF19 - SBTS19B - ENB - 0000 - 001696 - 000000 - 20210211-1157
6 pages
Oracle Control File Recreation Guide
100% (1)
Oracle Control File Recreation Guide
3 pages

NLP Unit 2 Imp

Uploaded by

NLP Unit 2 Imp

Uploaded by

NLP Unit 2 Imp

1. What do you mean by Morphology?

2. What are issues related to Morphology of Indian Languages?

3. What is the relationship between Morphology and Finite State Automata?

4. Discuss the difference between Word Segmentation and Tokenization.

5. Describe Part-of-Speech Tagging with suitable example.

processing such as parsing, machine translation, and question-answering systems.

features, making it suitable for complex natural language tasks.

robust and realistic in practical applications.

8. Give an introductory note on Corpora. Write significance of Corpora analysis.

support language learning, dictionary creation, and linguistic theory development.

You might also like