KEMBAR78
Relevance of The Results: Documents Are Retrieved Relevant Irrelevant Measure | PDF | Information Retrieval | Information Science
0% found this document useful (0 votes)
65 views42 pages

Relevance of The Results: Documents Are Retrieved Relevant Irrelevant Measure

- Precision and recall are common measures used to evaluate the relevance of search results. Precision measures the percentage of retrieved documents that are relevant, while recall measures the percentage of relevant documents that are retrieved. - Examples are provided to demonstrate calculating precision and recall based on true positives, false positives, true negatives, and false negatives. Precision and recall are important concepts for information retrieval systems.

Uploaded by

Ali Hasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views42 pages

Relevance of The Results: Documents Are Retrieved Relevant Irrelevant Measure

- Precision and recall are common measures used to evaluate the relevance of search results. Precision measures the percentage of retrieved documents that are relevant, while recall measures the percentage of relevant documents that are retrieved. - Examples are provided to demonstrate calculating precision and recall based on true positives, false positives, true negatives, and false negatives. Precision and recall are important concepts for information retrieval systems.

Uploaded by

Ali Hasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Relevance of the Results

● A number of documents are retrieved against a query


● Could be relevant or irrelevant
● How to measure the relevance of the results?
– Precision
– Recall
– F-measure

Selected topics in IR And NLP by Dr. Maheen Bakhtyar 1


Relevance of the Results
● Precision
– Precision is the fraction of retrieved instances that are
relevant

Relevant Not
Relevant
Retrieved TP FP

Not FN TN
Retrieved

Selected topics in IR And NLP by Dr. Maheen Bakhtyar 2


Relevance of the Results
● Precision
Relevant Not
Relevant
Retrieved TP FP

Not FN TN
Retrieved

http://en.wikipedia.org/wiki/Precision_and_recall

Selected topics in IR And NLP by Dr. Maheen Bakhtyar 3


Relevance of the Results
● Precision Example:
Query: “Capital Cities in Pakistan”
● Results Retrieved: ●Results Not Retrieved:
● Page about Quetta ● Page about Sydney
● Page about Peshawar ● Page about Melbourne
● Page about Berlin ● Page about Perth
● Page about Bangkok ● Page about Moscow
● Page about Kuala lampur ● Page about Paris
● Page about Beijing ● Page about Lahore
● Page about Karachi

Precision = TP / (TP + FP) = 3 / (3 + 4) = 0.42 = 42%


●TP = 3
●TN = 5
●FP = 4
●FN = 1

Selected topics in IR And NLP by Dr. Maheen Bakhtyar 4


Relevance of the Results
● Recall
– Recall is the fraction of relevant instances that are
retrieved.
Relevant Not
Relevant
Retrieved TP FP

Not FN TN
Retrieved

http://en.wikipedia.org/wiki/Precision_and_recall

Selected topics in IR And NLP by Dr. Maheen Bakhtyar 5


Relevance of the Results
● Recall Example:
Query: “Capital Cities in Pakistan”
● Results Retrieved: ● Results Not Retrieved:
● Page about Quetta ● Page about Sydney
● Page about Peshawar ● Page about Melbourne
● Page about Berlin ● Page about Perth
● Page about Bangkok ● Page about Moscow
● Page about Kuala lampur ● Page about Paris
● Page about Beijing ● Page about Lahore
● Page about Karachi

Recall = TP / (TP + FN) = 3 / (3 + 1) = 0.75 = 75%


●TP = 3
●TN = 5
●FP = 4
●FN = 1

Selected topics in IR And NLP by Dr. Maheen Bakhtyar 6


Relevance of the Results
● Example:
Query: “Capital Cities in Pakistan”
● Results Retrieved: ●Results Not Retrieved:
● Page about Quetta ● Page about Sydney
● Page about Peshawar ● Page about Melbourne
● Page about Beijing ● Page about Perth
● Page about Karachi ● Page about Berlin
● Page about Bangkok
● Page about Kuala lampur
● Page about Moscow
● Page about Paris
● Page about Lahore

Precision = ?
●TP = ? Recall = ?
●TN = ?
●FP = ?
●FN = ?

Selected topics in IR And NLP by Dr. Maheen Bakhtyar 7


Relevance of the Results
● Example:
Query: “Capital Cities in Pakistan”
● Results Retrieved: ●Results Not Retrieved:
● Page about Quetta ● Page about Sydney
● Page about Peshawar ● Page about Melbourne
● Page about Beijing ● Page about Perth
● Page about Karachi ● Page about Berlin
● Page about Bangkok
● Page about Kuala lampur
● Page about Moscow
● Page about Paris
● Page about Lahore

Precision = TP / (TP + FP) = 3 / (3 + 1) = 0.75 = 75%


●TP = 3 Recall = TP / (TP + FN) = 3 / (3 + 1) = 0.75 = 75%
●TN = 8
●FP = 1
●FN = 1

Selected topics in IR And NLP by Dr. Maheen Bakhtyar 8


Relevance of the Results
● Example:
Query: “Capital Cities in Pakistan”
● Results Retrieved: ●Results Not Retrieved:
● Page about Quetta ● Page about Sydney
● Page about Beijing ● Page about Melbourne
● Page about Bangkok ● Page about Perth
● Page about Kuala lampur ● Page about Berlin
● Page about Moscow ● Page about Peshawar
● Page about Paris
● Page about Karachi
● Page about Lahore

Precision = ?
●TP = ? Recall = ?
●TN = ?
●FP = ?
●FN = ?

Selected topics in IR And NLP by Dr. Maheen Bakhtyar 9


Relevance of the Results
● Example:
Query: “Capital Cities in Pakistan”
● Results Retrieved: ●Results Not Retrieved:
● Page about Quetta ● Page about Sydney
● Page about Beijing ● Page about Melbourne
● Page about Bangkok ● Page about Perth
● Page about Kuala lampur ● Page about Berlin
● Page about Moscow ● Page about Peshawar
● Page about Paris
● Page about Karachi
● Page about Lahore

Precision = TP / (TP + FP) = 1 / (1 + 4) = 0.2 = 20%


●TP = 1 Recall = TP / (TP + FN) = 1 / (1 + 3) = 0.25 = 25%
●TN = 5
●FP = 4
●FN = 3

Selected topics in IR And NLP by Dr. Maheen Bakhtyar 10


Relevance of the Results
● Example:
Query: “Capital Cities in Pakistan”
● Results Retrieved: ●Results Not Retrieved:
● Page about Quetta ● Page about Sydney
● Page about Beijing ● Page about Melbourne
● Page about Perth
● Page about Berlin
● Page about Peshawar
● Page about Paris
● Page about Karachi
● Page about Lahore
● Page about Bangkok
● Page about Kuala lampur
● Page about Moscow
Precision = ?
●TP = ? Recall = ?
●TN = ?
●FP = ?
●FN = ?

Selected topics in IR And NLP by Dr. Maheen Bakhtyar 11


Relevance of the Results
● Example:
Query: “Capital Cities in Pakistan”
● Results Retrieved: ●Results Not Retrieved:
● Page about Quetta ● Page about Sydney
● Page about Beijing ● Page about Melbourne
● Page about Perth
● Page about Berlin
● Page about Peshawar
● Page about Paris
● Page about Karachi
● Page about Lahore
● Page about Bangkok
● Page about Kuala lampur
● Page about Moscow
Precision = TP / (TP + FP) = 1 / (1 +1) = 0.5 = 50%
●TP = 1 Recall = TP / (TP + FN) = 1 / (1 + 3) = 0.25 = 25%
●TN = 8
●FP = 1
●FN = 3

Selected topics in IR And NLP by Dr. Maheen Bakhtyar 12


Relevance of the Results
● Example:
Query: “Capital Cities in Pakistan”
● Results Retrieved: ●Results Not Retrieved:
● Page about Quetta ● Page about Sydney
● Page about Peshawar ● Page about Melbourne
● Page about Karachi ● Page about Perth
● Page about Lahore ● Page about Berlin
● Page about Beijing
● Page about Paris
● Page about Bangkok
● Page about Kuala lampur
● Page about Moscow

Precision = ?
●TP = ? Recall = ?
●TN = ?
●FP = ?
●FN = ?

Selected topics in IR And NLP by Dr. Maheen Bakhtyar 13


Relevance of the Results
● Example:
Query: “Capital Cities in Pakistan”
● Results Retrieved: ●Results Not Retrieved:
● Page about Quetta ● Page about Sydney
● Page about Peshawar ● Page about Melbourne
● Page about Karachi ● Page about Perth
● Page about Lahore ● Page about Berlin
● Page about Beijing
● Page about Paris
● Page about Bangkok
● Page about Kuala lampur
● Page about Moscow

Precision = TP / (TP + FP) = 4 / (4 + 0) = 1 = 100%


●TP = 4 Recall = TP / (TP + FN) = 4 / (4 + 0) = 1 = 100%
●TN = 9
●FP = 0
●FN = 0

Selected topics in IR And NLP by Dr. Maheen Bakhtyar 14


Relevance of the Results
● Example:
Query: “Capital Cities in Pakistan”
● Results Retrieved: ●Results Not Retrieved:
● Page about Sydney ● Page about Quetta
● Page about Melbourne ● Page about Peshawar
● Page about Perth ● Page about Karachi
● Page about Berlin ● Page about Lahore
● Page about Beijing
● Page about Paris
● Page about Bangkok
● Page about Kuala lampur
● Page about Moscow

Precision = ?
●TP = ? Recall = ?
●TN = ?
●FP = ?
●FN = ?

Selected topics in IR And NLP by Dr. Maheen Bakhtyar 15


Relevance of the Results
● Example:
Query: “Capital Cities in Pakistan”
● Results Retrieved: ●Results Not Retrieved:
● Page about Sydney ● Page about Quetta
● Page about Melbourne ● Page about Peshawar
● Page about Perth ● Page about Karachi
● Page about Berlin ● Page about Lahore
● Page about Beijing
● Page about Paris
● Page about Bangkok
● Page about Kuala lampur
● Page about Moscow

Precision = TP / (TP + FP) = 0 / (0 + 9) = 0 = 0%


●TP = 0 Recall = TP / (TP + FN) = 0 / (0 + 4) = 0 = 0%
●TN = 0
●FP = 9
●FN = 4

Selected topics in IR And NLP by Dr. Maheen Bakhtyar 16


Relevance of the Results
● Example:
Query: “Capital Cities in Pakistan”
● Results Retrieved: ●Results Not Retrieved:
● Page about Sydney
● Page about Melbourne
● Page about Perth
● Page about Berlin
● Page about Beijing
● Page about Paris
● Page about Bangkok
● Page about Kuala lampur
● Page about Moscow
● Page about Quetta
● Page about Peshawar
● Page about Karachi
● Page about Lahore

Precision = ?
●TP = ? Recall = ?
●TN = ?
●FP = ?
●FN = Selected topics in IR And NLP by Dr. Maheen Bakhtyar 17
?
Relevance of the Results
● Example:
Query: “Capital Cities in Pakistan”
● Results Retrieved: ●Results Not Retrieved:
● Page about Sydney
● Page about Melbourne
● Page about Perth
● Page about Berlin
● Page about Beijing
● Page about Paris
● Page about Bangkok
● Page about Kuala lampur
● Page about Moscow
● Page about Quetta
● Page about Peshawar
● Page about Karachi
● Page about Lahore

Precision = TP / (TP + FP) = 4 / (4 + 9) = 0.31 = 31%


●TP = 4 Recall = TP / (TP + FN) = 4 / (4 + 0) = 1 = 100%
●TN = 0
●FP = 9
●FN = Selected topics in IR And NLP by Dr. Maheen Bakhtyar 18
0
Relevance of the Results
● Example:
Query: “Capital Cities in Pakistan”
● Results Retrieved: ●Results Not Retrieved:
● Page about Sydney
● Page about Melbourne
● Page about Perth
● Page about Berlin
● Page about Beijing
● Page about Paris
● Page about Bangkok
● Page about Kuala lampur
● Page about Moscow
● Page about Quetta
● Page about Peshawar
● Page about Karachi
● Page about Lahore

Precision = ?
●TP = ? Recall = ?
●TN = ?
●FP = ?
●FN = Selected topics in IR And NLP by Dr. Maheen Bakhtyar 19
?
Relevance of the Results
● Example:
Query: “Capital Cities in Pakistan”
● Results Retrieved: ●Results Not Retrieved:
● Page about Sydney
● Page about Melbourne
● Page about Perth
● Page about Berlin
● Page about Beijing
● Page about Paris
● Page about Bangkok
● Page about Kuala lampur
● Page about Moscow
● Page about Quetta
● Page about Peshawar
● Page about Karachi
● Page about Lahore

Precision = TP / (TP + FP) = 0 / (0 + 0) = ∞


●TP = 0 Recall = TP / (TP + FN) = 0 / (0 + 4) = 0
●TN = 9
●FP = 0
●FN = Selected topics in IR And NLP by Dr. Maheen Bakhtyar 20
4
Relevance of the Results
High precision: More relevant results than irrelevant
High recall: Most of the relevant results are returned

Applications requiring high precision? Security systems,


biometric based systems

Application requiring high recall? Search engines

Selected topics in IR And NLP by Dr. Maheen Bakhtyar 21


Relevance of the Results

● F-Measure:

Selected topics in IR And NLP by Dr. Maheen Bakhtyar 22


Relevance of the Results
● Example:
Query: “Capital Cities in Pakistan”
Results Retrieved:
● ●Results Not Retrieved:
● Page about Quetta ● Page about Sydney
● Page about Peshawar ● Page about Melbourne
● Page about Beijing ● Page about Perth
● Page about Karachi ● Page about Berlin
● Page about Bangkok
● Page about Kuala lampur
● Page about Moscow
● Page about Paris
● Page about Lahore

Precision = TP / (TP + FP) = 3 / (3 + 1) = 0.75 = 75%


●TP = 3 Recall = TP / (TP + FN) = 3 / (3 + 1) = 0.75 = 75%
●TN = 8 F-Measure = 2 x (prec x rec) / (prec + rec) = 0.75 = 75%
●FP = 1
●FN = 1

Selected topics in IR And NLP by Dr. Maheen Bakhtyar 23


Relevance of the Results
● Example:
Query: “Capital Cities in Pakistan”
Results Retrieved:
● ●Results Not Retrieved:
● Page about Quetta ● Page about Sydney
● Page about Beijing ● Page about Melbourne
● Page about Bangkok ● Page about Perth
● Page about Kuala lampur ● Page about Berlin
● Page about Moscow ● Page about Peshawar
● Page about Paris
● Page about Karachi
● Page about Lahore

Precision = TP / (TP + FP) = 1 / (1 + 4) = 0.2 = 20%


●TP = 1 Recall = TP / (TP + FN) = 1 / (1 + 3) = 0.25 = 25%
●TN = 5 F-Measure = 2 x (prec x rec) / (prec + rec) =0.22 =22%
●FP = 4
●FN = 3

Selected topics in IR And NLP by Dr. Maheen Bakhtyar 24


Relevance of the Results
● Example:
Query: “Capital Cities in Pakistan”
Results Retrieved:
● ●Results Not Retrieved:
● Page about Quetta ● Page about Sydney
● Page about Beijing ● Page about Melbourne
● Page about Perth
● Page about Berlin
● Page about Peshawar
● Page about Paris
● Page about Karachi
● Page about Lahore
● Page about Bangkok
● Page about Kuala lampur
● Page about Moscow

Precision = TP / (TP + FP) = 1 / (1 +1) = 0.5 = 50%


●TP = 1 Recall = TP / (TP + FN) = 1 / (1 + 3) = 0.25 = 25%
●TN = 8 F-Measure = 2 x (prec x rec) / (prec + rec) =0.33= 33%
●FP = 1
●FN = 3
Selected topics in IR And NLP by Dr. Maheen Bakhtyar 25
Relevance of the Results
● Example:
Query: “Capital Cities in Pakistan”
● Results Retrieved: ●Results Not Retrieved:
● Page about Quetta ● Page about Sydney
● Page about Peshawar ● Page about Melbourne
● Page about Karachi ● Page about Perth
● Page about Lahore ● Page about Berlin
● Page about Beijing
● Page about Paris
● Page about Bangkok
● Page about Kuala lampur
● Page about Moscow

Precision = TP / (TP + FP) = 4 / (4 + 0) = 1 = 100%


●TP = 4 Recall = TP / (TP + FN) = 4 / (4 + 0) = 1 = 100%
●TN = 9 F-Measure = 2 x (prec x rec) / (prec + rec) = 1= 100%
●FP = 0
●FN = 0

Selected topics in IR And NLP by Dr. Maheen Bakhtyar 26


Relevance of the Results
● Example:
Query: “Capital Cities in Pakistan”
● Results Retrieved: ●Results Not Retrieved:
● Page about Sydney ● Page about Quetta
● Page about Melbourne ● Page about Peshawar
● Page about Perth ● Page about Karachi
● Page about Berlin ● Page about Lahore
● Page about Beijing
● Page about Paris
● Page about Bangkok
● Page about Kuala lampur
● Page about Moscow

Precision = TP / (TP + FP) = 0 / (0 + 9) = 0 = 0%


●TP = 0 Recall = TP / (TP + FN) = 0 / (0 + 4) = 0 = 0%
●TN = 0 F-Measure = 2 x (prec x rec) / (prec + rec) = ∞
●FP = 9
●FN = 4

Selected topics in IR And NLP by Dr. Maheen Bakhtyar 27


Relevance of the Results
● Example:
Query: “Capital Cities in Pakistan”
Results Retrieved:
● ●Results Not Retrieved:
● Page about Sydney
● Page about Melbourne
● Page about Perth
● Page about Berlin
● Page about Beijing
● Page about Paris
● Page about Bangkok
● Page about Kuala lampur
● Page about Moscow
● Page about Quetta
● Page about Peshawar
● Page about Karachi
● Page about Lahore

Precision = TP / (TP + FP) = 4 / (4 + 9) = 0.31 = 31%


●TP = 4 Recall = TP / (TP + FN) = 4 / (4 + 0) = 1 = 100%
●TN = 0 F-Measure = 2 x (prec x rec) / (prec + rec) =0.47 = 47%
●FP = 9 Selected topics in IR And NLP by Dr. Maheen Bakhtyar 28
●FN = 0
Relevance of the Results
● Example:
Query: “Capital Cities in Pakistan”
● Results Retrieved: ●Results Not Retrieved:
● Page about Sydney
● Page about Melbourne
● Page about Perth
● Page about Berlin
● Page about Beijing
● Page about Paris
● Page about Bangkok
● Page about Kuala lampur
● Page about Moscow
● Page about Quetta
● Page about Peshawar
● Page about Karachi
● Page about Lahore

Precision = TP / (TP + FP) = 0 / (0 + 0) = ∞


●TP = 0 Recall = TP / (TP + FN) = 0 / (0 + 4) = 0
●TN = 9 F-Measure = 2 x (prec x rec) / (prec + rec) = ∞
●FP = 0
●FN = Selected topics in IR And NLP by Dr. Maheen Bakhtyar 29
4
Introduction: Index Terms

IR systems usually adopt index terms to process
queries

Index term:
– a keyword or group of selected words
– all words (more general)

Stemming might be used:
– connect: connecting, connection, connections

Stop words may be eliminated:
– a, an, the, with, of, ...

An inverted file is built for the chosen index terms
Formal Characterization of IR
Models

D is a set of logical views to represent documents
in the collection.

Q is a set of logical views to represent queries (=
user information needs).

F is a framework for modeling relationships
between document and query representation (e.g
Boolean, vector space, probabilistic).
● R(qi,dj) is ranking function which associates a real
number with a query qi ∈ Q and dj ∈ D. Such a
ranking orders documents relevant to the query qi.
More about Index Terms
•Each document is represented by a set of representative
keywords or index terms
•An index term is a document word useful for remembering
the document main themes
•Usually, index terms are nouns because nouns have
meaning by themselves
•However, some search engines assume that all words are
index terms (full text representation)
Classic IR Models –
Basic Concepts
•Not all terms are equally useful for representing the document
contents: less frequent terms allow identifying a narrower set
of documents
•The importance of the index terms is represented by weights
associated with them
•Let
•ki be an index term
•dj be a document
•wij is a weight associated with (ki,dj)
•The weight wij quantifies the importance of the index term for
describing the document contents
Classic IR Models - Basic
Concepts
•ki is an index term
•dj is a document
•t is the total number of index terms
•K = (k1, k2, …, kt) is the set of all index terms
•wij >= 0 is a weight associated with (ki,dj)
wij = 0 indicates that term does not belong to doc
•vec(dj) = (w1j, w2j, …, wtj) is a weighted vector
associated with the document dj
•gi(vec(dj)) = wij is a function which returns the weight
associated with pair (ki,dj)
The Boolean Model

•Simple model based on set theory


•Queries specified as boolean expressions
precise semantics
neat formalism
q = ka ∧ (kb ∨ ¬kc) (applying distributive law)
= (ka ∧ kb) ∨ (ka ∧ ¬kc) (disjunctive normal form or DNF)
•Term are either present or absent. Thus wij ∈ {0,1}
Simple Query Language:
Boolean
– Terms + Connectors (or operators)
– terms

words

normalized (stemmed) words

phrases

thesaurus terms
– connectors

AND

OR

NOT
Boolean Queries

Cat

Cat OR Dog

Cat AND Dog

(Cat AND Dog)

(Cat AND Dog) OR Collar

(Cat AND Dog) OR (Collar AND Leash)

(Cat OR Dog) AND (Collar OR Leash)
Boolean Queries

(Cat OR Dog) AND (Collar OR Leash)
– Each of the following combinations works:
Boolean Queries

(Cat OR Dog) AND (Collar OR Leash)
– None of the following combinations work:
Boolean Logic
C=A
C=A
C = A∩ B
C = A∪ B B
A
DeMorgan' s Law :
A∩ B = A∪ B
A∪ B = A∩ B
Boolean Queries

– Usually expressed as INFIX operators in IR



((a AND b) OR (c AND b))
– NOT is UNARY PREFIX operator

((a AND b) OR (c AND (NOT b)))
– AND and OR can be n-ary operators

(a AND b AND c AND d)
– Some rules - (De Morgan revisited)

NOT(a) AND NOT(b) = NOT(a OR b)

NOT(a) OR NOT(b)= NOT(a AND b)

NOT(NOT(a)) = a
Boolean queries

Small variations in a query can generate very different
results
– data AND compression AND retrieval
– text AND compression AND retrieval

the user should be able to pose complex queries like:
– (text OR data OR image) AND
(compression OR compaction OR decompression) AND
(archiving OR retrieval OR storage)
– ...but many users are not able (or willing)...

You might also like