KEMBAR78
NLP Notes | PDF | Parsing | Semantics
0% found this document useful (0 votes)
16 views9 pages

NLP Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views9 pages

NLP Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Mod 1

What is Natural Language Processing? Explain ambiguity in natural languages with suitable examples.

Natural Language Processing (NLP) is a branch of computer science and artificial intelligence that
helps computers understand, interpret, and interact with human language. NLP enables machines to
process text or speech in ways that are meaningful, such as translating languages, understanding
spoken commands, or summarizing long documents.

Think of NLP as the technology behind systems like Siri, Google Translate, or chatbots that respond to
your questions. The goal of NLP is to teach machines to understand the way humans naturally
communicate.

Applications of Natural Language Processing

1. Chatbots

Enables systems to have conversations with users, answering questions or performing tasks.

Chatbots are a form of artificial intelligence that are programmed to interact with humans in such a
way that they sound like humans themselves. Depending on the complexity of the chatbots, they can
either just respond to specific keywords or they can even hold full conversations that make it tough
to distinguish them from humans. Chatbots are created using Natural Language Processing
and Machine Learning, which means that they understand the complexities of the English language
and find the actual meaning of the sentence and they also learn from their conversations with
humans and become better with time. Chatbots work in two simple steps. First, they identify the
meaning of the question asked and collect all the data from the user that may be required to answer
the question. Then they answer the question appropriately.

2. Autocomplete in Search Engines

Have you noticed that search engines tend to guess what you are typing and automatically complete
your sentences? For example, On typing “game” in Google, you may get further suggestions for
“game of thrones”, “game of life” or if you are interested in maths then “game theory”. All these
suggestions are provided using autocomplete that uses Natural Language Processing to guess what
you want to ask. Search engines use their enormous data sets to analyze what their customers are
probably typing when they enter particular words and suggest the most common possibilities. They
use Natural Language Processing to make sense of these words and how they are interconnected to
form different sentences.

3. Voice Assistants

These days voice assistants are all the rage! Whether its Siri, Alexa, or Google Assistant, almost
everyone uses one of these to make calls, place reminders, schedule meetings, set alarms, surf the
internet, etc. These voice assistants have made life much easier. But how do they work? They use a
complex combination of speech recognition, natural language understanding, and natural language
processing to understand what humans are saying and then act on it. The long term goal of voice
assistants is to become a bridge between humans and the internet and provide all manner of
services based on just voice interaction. However, they are still a little far from that goal seeing as Siri
still can’t understand what you are saying sometimes!
4. Language Translator

Want to translate a text from English to Hindi but don’t know Hindi? Well, Google Translate is the
tool for you! While it’s not exactly 100% accurate, it is still a great tool to convert text from one
language to another. Google Translate and other translation tools as well as use Sequence to
sequence modeling that is a technique in Natural Language Processing. Earlier, language translators
used Statistical machine translation (SMT) which meant they analyzed millions of documents that
were already translated from one language to another (English to Hindi in this case) and then looked
for the common patterns and basic vocabulary of the language. However, this method was not that
accurate as compared to Sequence to sequence modeling.

5. Sentiment Analysis

Almost all the world is on social media these days! And companies can use sentiment analysis to
understand how a particular type of user feels about a particular topic, product, etc. They can use
natural language processing, computational linguistics, text analysis, etc. to understand the general
sentiment of the users for their products and services and find out if the sentiment is good, bad, or
neutral. Companies can use sentiment analysis in a lot of ways such as to find out the emotions of
their target audience, to understand product reviews, to gauge their brand sentiment, etc.

6. Grammar Checkers

Grammar and spelling is a very important factor while writing professional reports for your superiors
even assignments for your lecturers. After all, having major errors may get you fired or failed! That’s
why grammar and spell checkers are a very important tool for any professional writer. They can not
only correct grammar and check spellings but also suggest better synonyms and improve the overall
readability of your content. And guess what, they utilize natural language processing to provide the
best possible piece of writing! The NLP algorithm is trained on millions of sentences to understand
the correct format.

7. Email Classification and Filtering

Emails are still the most important method for professional communication. However, all of us still
get thousands of promotional Emails that we don’t want to read. Thankfully, our emails are
automatically divided into 3 sections namely, Primary, Social, and Promotions which means we never
have to open the Promotional section! But how does this work? Email services use natural language
processing to identify the contents of each Email with text classification so that it can be put in the
correct section.

Explain the challenges of Natural Language processing

Natural Language Processing (NLP) involves making computers understand and interact with human
language, which is incredibly complex and nuanced. Here are some of the major challenges NLP
faces:

1. Ambiguity in Language

 Problem: Words, phrases, and sentences can have multiple meanings depending on context.

 Example: The word "bank" can mean a financial institution or the side of a river.
 Challenge: Teaching machines to correctly interpret the intended meaning requires deep
context understanding.

2. Diversity of Language

 Problem: Different languages have unique grammar, syntax, and vocabulary. Even within one
language, there are variations in dialects, slang, and regional expressions.

 Example: American English uses "elevator," while British English uses "lift."

 Challenge: Building systems that work across all languages and their nuances is extremely
resource-intensive.

3. Lack of Context Understanding

 Problem: Machines struggle to grasp context, tone, and implied meaning in conversations.

 Example: "I can’t recommend this movie enough" sounds positive but is sarcastic here.

 Challenge: Recognizing sarcasm, idioms, and emotions requires advanced contextual


modeling.

4. Handling Large Vocabulary

 Problem: Human language constantly evolves with new words, phrases, and abbreviations.

 Example: Words like "selfie" and "yeet" didn’t exist years ago but are now commonly used.

 Challenge: Keeping NLP systems updated to handle expanding vocabularies.

5. Morphological Complexity

 Problem: Words change forms based on tense, plurality, gender, or case, especially in
morphologically rich languages.

 Example: In Hindi, लड़का (boy) changes to लड़कों (boys) based on context.

 Challenge: Developing NLP systems that can handle such transformations accurately.

6. Dependency on High-Quality Data

 Problem: NLP models require large, annotated datasets for training, which are often
unavailable for many languages or domains.

 Example: It’s easy to find English datasets, but much harder for regional Indian languages.

 Challenge: Creating and maintaining these datasets is time-consuming and expensive.


7. Polysemy and Synonymy

 Problem: One word can have multiple meanings (polysemy), and different words can mean
the same thing (synonymy).

 Example: The word "light" can mean not heavy or bright. Similarly, "happy" and "joyful" are
synonyms.

 Challenge: Differentiating and understanding word meanings in various contexts is difficult.

8. Named Entity Recognition (NER) Challenges

 Problem: Identifying proper nouns (e.g., people, places) in text is tricky, especially for new or
uncommon names.

 Example: In "Apple acquired Beats," distinguishing between the company Apple and the fruit
can be challenging.

 Challenge: NER systems often fail with overlapping entities or unseen names.

9. Noise in Data

 Problem: Real-world text data is often messy, containing spelling errors, incomplete
sentences, and irrelevant content.

 Example: Tweets or social media posts with hashtags, emojis, and abbreviations like "OMG
ur gr8!"

 Challenge: Cleaning and preprocessing noisy data is essential for accurate NLP.

10. Sarcasm and Sentiment

 Problem: Understanding sarcasm, irony, and mixed sentiments is hard for machines.

 Example: "What a fantastic day" could mean the opposite if spoken sarcastically.

 Challenge: Detecting subtle nuances in human language to avoid misinterpretation.

11. Multimodal Inputs

 Problem: Real-world applications often combine text, speech, and visual inputs.

 Example: Understanding spoken commands with background noise or recognizing


handwritten text.

 Challenge: Integrating data from multiple modalities into a single coherent system.

12. Ethical and Bias Concerns


 Problem: NLP systems can unintentionally learn and replicate biases present in training data.

 Example: Gender bias in machine translation (e.g., translating "nurse" to female pronouns
and "doctor" to male pronouns).

 Challenge: Ensuring fairness and avoiding discrimination in NLP applications.

13. Real-Time Processing

 Problem: Many NLP applications, like virtual assistants, require instant responses.

 Example: A delay in responding to a voice command can ruin the user experience.

 Challenge: Balancing accuracy with speed in resource-constrained environments.

14. Lack of Resources for Low-Resource Languages

 Problem: Many languages (especially regional ones) lack sufficient training data and tools for
NLP development.

 Example: Indian regional languages like Assamese or Odia often lack robust NLP systems.

 Challenge: Developing multilingual NLP solutions for such languages.

What is Ambiguity in Natural Languages?

Ambiguity happens when a word, phrase, or sentence has more than one possible meaning. This is a
common challenge in NLP because computers need to figure out the correct meaning based on the
context, just like humans do.
Discuss the challenges in various stages of natural language processing.

1. Lexical Analysis (Tokenization)

 What happens: The input text is divided into smaller units (tokens) like words, phrases, or
symbols.

 Example: For the sentence "NLP is exciting!", tokens are:

o NLP, is, exciting, !.

 Challenges:

o Handling punctuation: Deciding whether punctuation marks (e.g., !, .) should be


treated as separate tokens.

o Compound words: Splitting words like "ice-cream" or "e-mail" can be ambiguous.

o Language-specific rules: In languages like Chinese or Japanese, there are no spaces


between words, making tokenization more difficult.
2. Syntactic Analysis (Parsing)

 What happens: Analyzes the grammatical structure of the text and builds a parse tree.

 Example: For "The dog barked," parsing identifies:

o Subject: The dog

o Verb: barked.

 Challenges:

o Grammatical ambiguities: Sentences like "The old man and the woman sat down"
can have multiple interpretations.

o Complex sentences: Parsing long, nested, or incomplete sentences is


computationally expensive.

o Language variation: Differences in grammar rules across languages add complexity.

3. Semantic Analysis

 What happens: The system interprets the meaning of the text based on word meanings and
relationships.

 Example: In "The bank is on the river," semantic analysis determines:

o bank = side of a river (not a financial institution).

 Challenges:

o Word sense disambiguation: Choosing the correct meaning of ambiguous words like
"bank" or "light."

o Idiomatic expressions: Phrases like "kick the bucket" (meaning "to die") cannot be
interpreted literally.

o Context dependency: The same words can mean different things in different
contexts.

4. Pragmatic Analysis

 What happens: Determines the real-world context and intent behind the text.

 Example: For "Can you pass the salt?" pragmatic analysis understands:

o It’s a polite request, not a question about ability.

 Challenges:

o Sarcasm and irony: Machines struggle to detect sarcasm or hidden meanings, e.g.,
"Oh great, another meeting!".
o Context understanding: Requires background knowledge or conversation history to
infer intent.

o Cultural differences: Pragmatic meaning can vary widely across cultures and
languages.

5. Morphological Analysis

 What happens: Words are broken into their root forms or morphemes.

 Example: For "running," morphological analysis identifies:

o Root: run

o Suffix: -ing.

 Challenges:

o Irregular forms: Handling irregular words like "went" (past tense of "go").

o Complex languages: Languages like Finnish or Turkish have highly inflected words
with many suffixes and prefixes.

o Homographs: Words like "lead" (to guide) and "lead" (a metal) have the same form
but different meanings.

You might also like