KEMBAR78
SW Unit-V Notes | PDF | Semantic Web | World Wide Web
0% found this document useful (0 votes)
182 views16 pages

SW Unit-V Notes

The document discusses semantic web applications including semantic web services, e-learning, semantic bioinformatics, and enterprise application integration. It also covers semantic search technology and how semantic methods can improve search results. Key aspects covered include using metadata to understand information, enabling programs to interoperate, and automating discovery and composition of web services.

Uploaded by

Pooja Mekala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
182 views16 pages

SW Unit-V Notes

The document discusses semantic web applications including semantic web services, e-learning, semantic bioinformatics, and enterprise application integration. It also covers semantic search technology and how semantic methods can improve search results. Key aspects covered include using metadata to understand information, enabling programs to interoperate, and automating discovery and composition of web services.

Uploaded by

Pooja Mekala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Unit-V

Syllabus
Semantic Web Application: Semantic Web Services, e-Learning, Semantic
Bioinformatics, Enterprise Application Integration, Knowledge Base. Semantic Search
Technology: Search Engines, Semantic Search, Semantic Search Technology, Web
Search Agents, Semantic Methods, Latent Semantic Index Search, TAP, Swoogle.

Introduction about semantic web application


1. Semantic Web Basics: It's about using web applications that understand not just
information but also data about that information (metadata). This helps in better
discovery, automation, integration, and reuse of content across different applications.
2. Expanded Infrastructure:The Semantic Web isn’t just for web pages but extends to
databases, services, programs, sensors, and personal devices. Software agents can use this
data to search, filter, and organize information more effectively.
3. Ease of Information Exchange: Current web systems struggle with transferring content
between applications. Semantic Web technologies like RDF and XML aim to make this
easier by allowing databases and apps to share and combine data seamlessly.
4. Benefits and Opportunities: Using the Semantic Web can lead to improved data
integration, better search results, and more efficient use of web services. It's particularly
beneficial in areas like e-Learning, bioinformatics, and enterprise applications, enabling
applications to work together without manual connections. This integration presents
significant opportunities for businesses to leverage these technologies for better results
and competitive advantages.

Introduction to Semantic web services

1.Challenges in Integration: Current web systems struggle to integrate databases and


programs seamlessly. For example, in e-business (especially B2B interactions), running
someone else’s program locally can be difficult.

2. Need for Semantic Web Services: Semantic Web Services aim to address this by
enabling programs to interoperate on the web. These services are self-contained and self-
described, allowing other applications to discover and use them.

3. Web Services and Semantic Web Integration: Tim Berners-Lee suggests combining
Web Services' business logic with the Semantic Web's meaningful content. Technologies
like UDDI, WSDL, and SOAP could be enhanced with OWL to automate Semantic Web
Services, improving interaction with business rules’ engines.

5. Vision for Semantic Web Services: The goal is to automate the discovery,
invocation, composition, and monitoring of web services through machine
processing. OWL-S (Web Ontology Language for Services) is designed to help
achieve this by providing a framework for describing and declaring services in a
standardized way.
SEMANTIC SEARCH

Semantic search methods can augment and improve traditional search results by using, not just
words, but concepts and logical relationships. There are two approaches to improving search
results through semantic methods: (1) the direct use of Semantic Web metadata and (2) Latent
Semantic Indexing (LSI).
The Semantic Web will provide more meaningful metadata about content, through the use of
RDF and OWL documents that will help to form the Web into a semantic network. In a
semantic network, the meaning of content is better represented and logical connections are
formed between related information. However, most semantic-based search engines suffer
increasingly difficult performance problems because of the large and rapidly growing scale of
the Web. In order for semantic search to be effective in finding responsive results, the network
must contain a great deal of relevant information.
e-LEARNING

E-learning has various tools and apps but lacks a cohesive, collaborative environment. The
goal is to create a seamless Educational Semantic Web, where systems can communicate and
share content easily. On the one hand, we wish to achieve interoperability among educational
systems and on the other hand, to have automated, structured, and unified authoring.

The Semantic Web is the key to enabling the interoperability by capitalizing on


(1) semantic conceptualization and ontologies,
(2) common standardized communication syntax, and
(3) large-scale integration of educational content and usage.
The RDF describes objects and their relationships. It allows easy reuse of information for
different devices, such as mobile phones and PDAs, and for presentation to people with
different capabilities, such as those with cognitive or visual impairments.

It is possible that in the near future students will be able to extract far more data from a
networked computer or wireless device, far more efficiently. Based on a few specific search
terms, library catalogues could be scanned automatically and nearest library shelf marks
delivered immediately to students, alongside multimedia and textual resources culled from the
Web itself. Students could also be directed to relevant discussion lists and research groups.

By tailored restructuring of information, future systems will be able to deliver content to the
end-user in a form applicable to them, taking into account users’ needs, preferences, and prior
knowledge. Much of this work relies on vast online databases and thesauri, such as wordnet,
which categorize synonyms into distinct lexical concepts.
The educational sector can also use the Internet Relay Chat (IRC) (http://www.irc.org/) a tool
that can be used by the Semantic Web. The IRC is a chat protocol where people can meet on
channels and talk to each other. The Semantic Web community is enhancing this capability by
writing robots that can help to log the chat when members are away. It can also assist with
meetings, discussions, and recording of results.

Semantic BioInformatics

1. Semantic Web for Scientific Data: The Semantic Web can help unlock scientific
data from different applications by providing a standardized way to understand and
share information. This is especially valuable for life scientists.

2. Interest Groups and Standards: Organizations like the World Wide Web
Consortium are creating interest groups like the Semantic Web Health Care and Life
Sciences Interest Group (HCLSIG) to apply Semantic Web technology in healthcare
and life sciences. They develop use cases and standard specifications for these fields.

3. Historical Context: The early growth of the World Wide Web was boosted by its
adoption in the high-energy physics community. Similarly, the Semantic Web could
gain momentum in life sciences if key ontologies (like frameworks for drug
discovery) are made available.

4. Life Sciences and Semantic Web: Life sciences, particularly in areas like drug
discovery, use many databases and systems globally. Efforts like the Biological
Pathways Exchange aim to create standard formats for exchanging information on
biological pathways, aiding research and collaboration.

ENTERPRISE APPLICATION INTEGRATION


It explains how the Semantic Web, through technologies like ontologies, can revolutionize
various industries, particularly focusing on enterprise application integration and call center
operations.
1. Semantic Web and Enterprise Integration: Similar to how the current web integrates
human-oriented information systems, the Semantic Web can integrate applications with
well-defined data meanings.

2. Example with British Telecom: British Telecom is developing a system called OntoShare
for its call center. This system stores information in an ontology (a structured knowledge
base) and automatically shares relevant information with call center agents.

5. XML to RDF: XML documents used in business integration can be transformed into a
more structured RDF data model. This enables different systems to communicate effectively
with shared semantics.

6. Middleware and Web Services: Middleware facilitates interaction between applications


across different computing platforms. With the web as a primary access channel,
middleware now supports web application development, typically through application
servers and web services.
7. Challenges of Distributed Applications: Managing the complexity of distributed
applications, such as component dependencies and versions, remains a challenge.
Middleware and ontologies can help in managing these complexities.

8. Ontologies in Middleware: Ontologies cover various aspects like component descriptions,


service policies, quality of service, etc. They provide a formal way to represent concepts and
relationships, aiding in reasoning and inference at runtime.

9. Foundational Ontologies: Foundational ontologies ensure that conceptual models are


extendable and adaptable to new requirements. They draw from philosophy, logic, and
software engineering to guide ontology design practices.

10. Benefits of Ontologies: Ontologies enhance information retrieval, automate query


answering, maintenance, and document generation, thereby improving efficiency and
reducing manual effort.

In essence, the Semantic Web, facilitated by technologies like ontologies, offers a structured
approach to integrating diverse systems and improving information management in various
industries.

Semantic Search Technology


Today, searching the Web is an essential capability whether you are sitting at your desktop PC or
wandering the corporate halls with your wireless PDA. As a result, the business of commercial
search engines has become a vital and lucrative part of the Web. As search engines have become
commonplace tools for virtually every user of the Web, companies, such as Google and Yahoo!,
have become household names.
Recently, efforts have been made to implement limited semantic search by Google and other
innovators. However, even with Google, it is common that searches return substantial unwanted
results and may well miss the important information that you need. Semantic search methods
could augment and improve traditional search results by using not just words, but concepts and
logical relationships. There are two basic approaches to improving search results through
semantic methods:
(1) using Semantic Web documents and
(2) Latent Semantic Indexing (LSI)
SEARCH ENGINES

Commercial search engines are based upon one of two forms of Web search technologies: human
directed search and automated search.
1. Human-directed search: This type of search engine uses databases of keywords and
concepts. When you search for something, it looks for pages containing those keywords
and ranks them based on keyword frequency. More advanced versions consider where the
keywords are located on the page, like in titles or body text. Some, like Yahoo!, use topic
hierarchies to help refine search results. However, these hierarchies are manually created,
making them time-consuming and not frequently updated.
2. Automated search: This relies on web crawlers or bots that automatically follow links on
websites, collecting information and building indexes. Unlike human-directed search, this
method doesn't understand the meaning of the content; it just collects data based on links.
Search engines today mainly use huge databases of web page references gathered through
this automated process.

There are two implementations of search engines:


1. Individual search engines and
2. Metasearchers.
Individual search engines (e.g., Google) compile their own searchable databases on the Web,
while Metasearchers do not compile databases, but instead search the databases of multiple sets
of individual engines simultaneously.
Ranking and Relevancy
In ranking Web pages, search engines follow a certain set of rules. Their goal, of course, is to
return the most relevant pages at the top of their lists. To do this, they look for the location and
frequency of keywords and phrases in the Web page document and, sometimes, in the Hypertext
Markup Language (HTML) Meta tags. They check out the title field and scan the headers and text
near the top of the document. Some of them assess popularity by the number of links that are
pointing to a given site: the more links, the greater the popularity of the page. Because Web
search engines use keywords, they are subject to the two well known linguistic phenomena that
strongly degrade a query’s precision and recall:
1. Polysemy (one word might have several meanings) and
2. Synonymy (several wordsor phrases might designate the same concept).
There are several characteristics required to improve a search engine’s performance. It is
important to consider useful searches as distinct from fruitless ones.
To be useful, there are three necessary criteria:
(1) maximum relevant information
(2) minimum irrelevant information and
(3) meaningful ranking, with the most relevant results first.

The first of these criteria, getting all of the relevant information available, is called recall.
Without good recall, we have no guarantee that valid, interesting results will not be left out of our
result set. We want the rate of false negatives, relevant results that we never see, to be as low as
possible. The second criterion, minimizing irrelevant information so that the proportion of
relevant documents in our result set is very high, is called precision. With too little precision, our
useful results get diluted by irrelevancies, and we are left with the task of sifting through a large
set of documents to find what we want. High precision means the lowest possible rate of false
positives. There is an inevitable trade-off between precision and recall. Search results generally
lie on a continuum of relevancy, so there is no distinct place where relevant results stop and
extraneous ones begin. This is why the third criterion, ranking, is so important. Ranking has to do
with whether the result set is ordered in a way that matches our intuitive understanding of what is
more and what is less relevant. Of course, the concept of “relevance” depends heavily on our own
immediate needs, our interests, and the context of our search

Google Search Algorithm

The heart of Google search software is PageRank, a system for ranking Web pages developed by
the founders Larry Page and Sergey Brin at Stanford University. PageRank relies on the vast link
structure as an indicator of an individual page’s value. Essentially, Google interprets a link from
page A to page B as a vote, by page A, for page B. Important sites receive a higher PageRank.
Votes cast by pages that are themselves “important,” weigh more heavily and help to make other
pages important.” Google combines PageRank with sophisticated text-matching techniques to
find pages that are both important and relevant to the search. Google goes far beyond the number
of times a term appears on a page and examines all aspects of the page’s content (and the content
of the pages linking to it) to determine if it is a good match for the query
The PageRank of a Web page is therefore calculated as a sum of the PageRanks of all pages
linking to it (its incoming links), divided by the number of links on each of those pages (its
outgoing links).
Traditional search engines are based almost purely on the occurrence of words in documents.
Search engines like Google, however, augment this with information about the hyperlink
structure of the Web. Nevertheless their shortcomings are still significant, including, there is a
semantic gap between what a user wants and what they get, users cannot provide feedback
regarding the relevance of returned pages, users cannot personalize the ranking mechanism that
the search engine uses, and the search engine cannot learn from past user preferences.

Semantic Search
This discusses the challenges with traditional search engines like Google and introduces the
concept of semantic-based search engines.
1. Current Search Engine Challenges: Despite Google's popularity, finding specific
information amid vast search results remains a problem. Traditional search engines focus on
keywords, often leading to irrelevant results.
2. Semantic-based Search Engines: Semantic search aims to find documents with similar
concepts, not just matching keywords. There are two main approaches: Latent Semantic
Indexing (LSI) and Semantic Web Documents.
3. LSI Approach: LSI organizes existing web content into a semantic structure, capturing
implicit associations between words and text objects. This allows retrieval based on the
underlying meaning of documents, not just keywords.
4. Semantic Web's Role: The Semantic Web, facilitated by RDF and OWL documents, aims to
create a more meaningful network of content. This enables better representation of content
meaning and logical connections between related information.
5. Performance Challenges: However, semantic network-based search engines face
performance issues due to the scale of the network. While semantic search can provide more
relevant results, processing the vast network efficiently is a challenge.
6. Early Efforts: Early semantic-based search engines, like Cycorp, heavily relied on natural
language processing to understand queries. Cyc combines a vast knowledge base with the web,
enabling websites to add intelligent capabilities and understand ambiguous concepts.
In essence, semantic-based search engines aim to improve search accuracy by understanding the
meaning behind content, but they face challenges in managing the scale of information and
computational complexity.
SEMANTIC SEARCH TECHNOLOGY
As Web ontology becomes more advanced, using RDF and OWL tags will offer semantic
opportunities for search.
Searching Techniques
The practical problems with semantic search, particularly focusing on the incompleteness
problem and the halting problem.
1. Incompleteness Problem: Semantic search involves exploring logical relationships and
concepts. However, this leads to a branching set of possibilities, making it challenging to find all
possible correct solutions. In a complex logical system, there can be statements that cannot be
logically proven, as demonstrated by Gödel's undecidability theorem.
2. Halting Problem: This is a problem in computer science where it's impossible to determine
whether a program will ever stop running or continue indefinitely. Alan Turing proved this in
1936. The halting problem implies that certain algorithms may never provide a definite answer.
3.Implications for Semantic Search: In the context of semantic search on the web, there are
millions of facts and rules that can combine in complex ways, creating an infinite space of
potential proofs. This leads to inherent incompleteness issues, as it's impossible to examine every
possible factual statement. Instead, search strategies like depth-first search are used to explore
portions of the search tree.
A depth-first search would start at the top of the tree and go as deeply as possible down some
path, expanding nodes as you go, until you find a dead end. A dead end is either a goal (success)
or a node where you are unable to produce new children. So the system cannot prove anything
beyond that point.
Let us walk through a depth-first search and traverse the tree. Start at the top
node and go as deeply as possible:
1. Start at the highest node.
2. Go as deeply as possible down one path.
3. When you run into a dead-end (i.e., a false statement), back-up to the last node that you turned
away from. If there is a path there that you have not tried, go down it. Follow this option until you
reach a dead-end or a goal (a true statement with no child nodes).
4. If this path leads to another dead-end, go back up a node and try the other branches.
5. This path leads to a goal. In other words, this final node is a positive result to the query. So you
have one answer. Keep searching for other answers by going up a couple more nodes and then
down a path you have not tried.
6. Continue until you reach more dead-ends and have exhausted search possibilities.
Advantages of Depth-First Search (DFS):
- Efficient for searching trees.
- Requires less memory as it only remembers the path back up.
Disadvantage:
-Must trace paths all the way to the end.

Advantages of Breadth-First Search (BFS)


- Searches layer by layer.
- Guarantees finding simplest proofs first (Ockham's Razor benefit).
Disadvantages:
Memory-intensive for large trees, requires storing all unexamined nodes.

Uninformed vs. Informed Search


- Uninformed searches (like DFS and BFS) lack information about goal states.
- Informed searches use heuristics to estimate path costs or steps to the goal.
- Informed searches (like A*) perform better than uninformed ones, behaving in a more rational
manner.

Examples
- Uninformed searches: depth-first, breadth-first, uniform-cost, depth-limiting, iterative
deepening.
- Informed searches: best-first, hill-climbing, beam, A*, IDA* (iterative deepening A*).
- Informed searches can significantly improve search efficiency.
WEB SEARCH AGENTS
While Web search engines are powerful and important to the future of the Web, there is another
form of search that is also critical: Web search agents.
A Web search agent will not perform like a commercial search engine. Search engines use
database lookups from a knowledge base.
In the case of the Web search agent, the Web itself is searched and the computer provides the
interface with the user. The agent’s percepts are documents connected through the Web utilizing
HTTP. The agent’s actions are to determine if its goal of seeking a Web site containing a
specified target (e.g., keyword or phrase), has been met and if not, find other locations to visit. It
acts on the environment using output methods to update the user on the status of the search or the
end results.
Some important points on web Agent
1. Agent Intelligence:
- An intelligent agent can make rational decisions to achieve a goal efficiently.
- It can generate possible outcomes and search through them to reach the desired goal state.
2. Building an Intelligent Web Search Agent:
- Requires mechanisms for keyword searches, handling exclusions, and self-seeding when
necessary.
- The agent should search through various paths to find the target keyword or phrase.
3.Implementation
- Requires knowledge of programming, working with web protocols (HTTP), HTML, sorting,
and search algorithms.
- Many languages and tools are available for building web search agents, with efficient sorting
algorithms improving performance.
4. Phases of Web Search Agent Design:
Initialization: Setting up variables and obtaining necessary information for the search.
Perception: Contacting websites, retrieving information, identifying the presence of the target,
and finding paths to other URLs.
Action: Making decisions based on gathered information, determining if the goal is met, and
deciding the next steps.
Effect: Completing the search loop until the goal is achieved or cannot be achieved.

SEMANTIC METHODS
Semantic search methods augment and improve traditional search results by using not just words,
but meaningful concepts. Several major companies are seriously addressing the issue of semantic
search. There are two approaches to improving search results through semantic methods:
(1) LSI, and
(2) Semantic Web documents.
LATENT SEMANTIC INDEX SEARCH(LSI)
Building on the criteria of precision, ranking, and recall requires more than
brute force. Assigning descriptors and classifiers to text helps find relevant documents even
without an exact match to the search query. Fully described datasets can also show the scope and
distribution of the document collection through taxonomy.
Drawback of Taxonomies
- Taxonomies, while helpful, face challenges when categorizing things that resist easy
classification (like whether a tomato is a fruit or vegetable).
- Solutions like ontology taxonomies address these challenges.
Latent Semantic Indexing (LSI):
- Developed in the late 1980s, LSI examines documents as a whole to identify semantic
similarities.
- It considers documents with many words in common as semantically close.
- LSI doesn't understand the meaning of words but notices patterns in word usage.
Benefits of LSI:
- In an LSI-indexed database, the search engine returns documents based on similarity values
calculated for content words.
- LSI doesn't require an exact match and can return relevant documents even if they don't
contain the searched keyword.
Searching for Content
A semantic search engine is a remarkably useful solution. It can discover if two documents are
similar even if they do not have any specific words in common and it can reject documents that
share only uninteresting words in common. Latent semantic indexing looks at patterns of words
within a set of documents. Natural language is full of redundancies, and not every word that
appears in a document carries semantic meaning. Frequently used words in English often do not
carry content: examples include functional words, conjunctions, prepositions, and auxiliary verbs.
The first step in doing LSI, therefore, is culling these extraneous words from a document. This is
called stemming.
For each document:
(1) Stem all of the words and throw away any common ‘noise’ words;
(2) For each of the remaining words, visit and remember each document that has a direct
relationship to this word. Score each document based on a distance function from the
original document and the relative scarcity of the word in common;
(3) For each of the as-of-yet-unvisited new related documents now being tracked.
Recursively perform the same operation as above.
The particular weighting algorithm that was used is this:
(1) For each increase in distance, divide a baseline score by two
(2) The score of each document is equal to the baseline divided by the square root of the
popularity of the word. Overall this algorithm delivers a cheap semantic lookup based on walking
through a document and creating a word graph.
There are many other scoring algorithms that could be used. Additionally, a thesaurus could be
applied to help bridge semantic issues.
One interesting challenge would be to make the algorithm work “on the fly” so that as new
documents were added they would self-score. Another challenge would be to find a way to
distribute the algorithm over multiple machines for scalability.
Semantic Search Engine Application

TAP
An example of semantic search technology is TAP, where TAP (http://tap.stanford.edu/) is
a distributed project involving researchers from the Stanford, IBM, and Worldwide Web
Consortium (W3C). TAP leverages automated and semi-automated techniques to extract
knowledge bases from unstructured and semi-structured bodies of text. The system is able
to use previously learned information to learn new information, and can be used for
information retrieval. Ontology describes concepts and relationships with a set of
representational vocabulary. The aim of building ontologies is to share and reuse
knowledge. Since the Semantic Web is a distributed network, there are different ontologies
that describe semantically equivalent things. As a result, it is necessary to map elements of
these ontologies if we want to process information on the scale of the Web. An approach
for semantic search can be based on text categorization. In text categorization ontology,
maps compare each element of an ontology with each element of the other ontology, then
it determines a similarity metric on a per pair basis. Matched items are those whose
similarity values are greater than a certain threshold.
In TAP, existing documents are analyzed using semantic techniques and converted into
Semantic Web documents using automated techniques or manually by the document
author using standard word processing packages. Both automated and guided analyses are
used for intelligent reasoning systems and agents. As a result, traditional information
retrieval techniques are enhanced with more deeply structured knowledge to provide more
accurate results. The solutions are built on a core technology called Semantic Web
Templates.
SWOOGLE
“Swoogle” is a crawler-based indexing and retrieval system for
Semantic Web
documents using RDF and OWL. It extracts metadata and computes relations
between documents. Discovered documents are also indexed by an
information retrieval system to compute the similarity among a set of
documents and to compute rank as a measure of the importance of a
Semantic Web document (SWD). A SWD is known for its semantic content.
Since no conventional search engines can take advantage of semantic
features,
a search engine customized for SWDs, especially for ontologies, is
necessary to access, explore, and query the Web’s RDF and OWL
documents.
A prototype Semantic Web search engine, Swoogle facilitates the finding
of appropriate ontologies, and helping users specify terms and qualify
type (class or property). In addition, the Swoogle ranking mechanism
sorts ontologies by their importance.
Advantages of Swoogle
1. Swoogle helps users integrate Semantic Web data distributed on the
Web. It enables querying SWDs with constraints on the classes and
properties. By collecting metadata about the Semantic Web, Swoogle
reveals interesting structural properties, such as how the
Semantic Web is connected, how ontologies are referenced, and how
an ontology is modified externally.
2. Swoogle is designed to scale up to handle millions of documents
and enables rich query constraints on semantic relations. The
Swoogle architecture consists of a database that stores metadata
about the SWDs. Two distinct Web crawlers discover SWDs and
compute semantic relationships among the SWDs.
Semantic Web Documents
A Semantic Web Document is a document in RDF or OWL that is accessible to software
agents. Two kinds of SWDs create Semantic Web ontologies (SWOs) and Semantic Web
databases (SWDBs). A document is an SWO when its statements define new classes and
properties or by adding new properties. A document is considered as a SWDB when it
does not define or extend terms. An SWDB can introduce individuals and make
assertions about them.
Swoogle Architecture
Swoogle architecture can be broken into four major components:
1. SWD discovery
2. metadata creation
3. data analysis, and
4. interface.
These components work independently and interact with one another through a
database.The SWD discovery component discovers potential SWDs on the Web. The
metadata creation component caches a snapshot of an SWD and generates objective
metadata about SWDs at both the syntax and the semantic level. The data analysis
component uses the cached SWDs and the created metadata to derive analytical reports,
such as classification of SWOs and SWDBs, rank of SWDs, and the Information
Retrieval (IR) index f or the SWDs. The interface component focuses on providing data
service.
Finding SWDs
It is not possible for Swoogle to parse all the documents on the Web to see if they are
SWDs, however, the crawlers employ a number of heuristics for finding SWDs starting
with a Google crawler that searches URLs using the Google Web service. By looking at
the entire Semantic Web, it is hard to capture and analyze relations at the RDF node level.
Therefore, Swoogle focuses on SWD level relations that generalize RDF node level
relations.
Ranking SWDs
The Swoogle algorithm, Ontology Rank, was inspired by Google’s Page
Rank algorithm and is used to rank search results. Ontology Rank takes
advantage of the fact that the graph formed by SWDs has a richer set of
relations and the edges represent explicit semantics derivable from the
RDF and OWL. Given SWDs A and B, Swoogle classifies inter-SWD links into
four categories:

These relations should be treated as follows: if a surfer observes the


imports (A,B) relation while visiting A, it will follow this link
because B is semantically part of A. Similarly, the surfer may follow
the extends (A,B) relation becauseit can understand the defined term
completely only when it browses both A and B. Therefore, the assigned
weight is different, which shows the probability of following that kind
of link, to the four categories of inter-SWD relations.
The RDF node level relations to SWD level relations counts the number of
references. The more terms in B referenced by A, the more likely a
surfer will follow the link from A to B.
Indexing and Retrieving SWDs
Central to a Semantic Web search engine is the problem of indexing and searching SWDs.
It is useful to apply IR techniques to documents not entirely subject to markup. To apply
search to both the structured and unstructured components of a document it is conceivable
that there will be some text documents that contain embedded markup. Information
retrieval techniques have some value characteristics, such as researched methods for
ranking matches, computing similarity between documents, and employing relevance
feedback. These complement and extend the retrieval functions inherent in Swoogle.
Swoogle is intended to support services needed by software agents and programs via web
service interfaces. Using Swoogle, one can find all of the Semantic Web documents that
use a set of properties or classes.

You might also like