SEMSNTIC WEB
UNIT-1
Definition of the Traditional Web (Web 1.0)
The Traditional Web, also known as Web 1.0, refers to the first generation of the World
Wide Web, where websites were static, read-only, and primarily used for one-way
information sharing. It was a publisher-driven web with minimal interactivity and limited
user participation.
Key Characteristics
1. Static Content
o Web pages were fixed and manually updated by developers with no real-time
changes.
2. Read-Only Structure
o Users could only view or read content; no option to edit, comment, or
contribute.
3. HTML-Based
o Websites were built primarily using HTML with simple layouts and basic
media.
4. Limited Interactivity
o No user login, forms, or dynamic user experiences were common.
5. Centralized Publishing
o Information was controlled by webmasters or site owners; no user-generated
content.
6. Poor Data Understanding
o Machines could not interpret the meaning of data due to lack of metadata or
semantics.
7. Basic Navigation
o Users navigated using hyperlinks, directories (like Yahoo!), and early search
engines.
8. No Standard Metadata
o Metadata (data about data) was not standardized, making automated retrieval
difficult.
9. Visual-Focused Design
o Success depended on visual presentation, not the meaning or structure of
content.
10. Low Personalization
Content was the same for all users; no customization or adaptive experiences.
Semantic Web:-
The Semantic Web is an extension of the current web that aims to make data more machine-
readable, enabling computers to better understand and respond to complex human requests
based on their meaning (or semantics). It was introduced by Tim Berners-Lee, the inventor
of the World Wide Web.
The primary goal of the Semantic Web is to create a more intelligent and interconnected
web where data is structured in a way that machines can understand, interpret, and reason
about. This allows for:
Improved data sharing and integration: Making it easier to combine information from
different sources.
Enhanced search and retrieval: Providing more relevant and precise results based on the
meaning of data.
1Automation and reasoning: Enabling computers to perform complex tasks, draw
inferences, and support decision-making.
Personalized experiences: Delivering tailored content and services based on understanding
user needs and context.
Better Data Understanding (machine readable data)
Interconnectivity (links into across website )
Automation and AI integration(recommendations ,assistance )
Standardized information sharing (use common data format )
Definition of the Semantic Web
The Semantic Web is an evolution of the traditional web that enables machines to
understand, interpret, and reason with data by giving it well-defined meaning using
structured formats and linked data. It transforms the web from a collection of documents into
a web of interrelated data, understandable by both humans and computers.
Key Characteristics
1. Machine-Readable Data
o Data is structured in a way that machines can understand and process it
intelligently.
2. Use of Standard Technologies
o Core technologies include RDF (Resource Description Framework), OWL
(Web Ontology Language), and SPARQL (query language).
3. Linked Data Structure
o Data items are interlinked, creating a web of data rather than isolated
documents.
4. Meaningful Data Relationships
o Concepts and entities are described in relation to one another, enabling context-
aware processing.
5. Improved Search and Retrieval
o Enables semantic search, which understands context and intent, not just
keywords.
6. Ontology-Based Modeling
o Uses ontologies to define and share a common understanding of data across
systems.
7. Enhanced Interoperability
o Facilitates seamless data integration and communication between diverse
systems and platforms.
8. Supports Inference and Reasoning
o Allows systems to draw new conclusions based on existing data (e.g., infer
relationships).
9. Foundation for Intelligent Applications
o Powers smart assistants, recommendation systems, and knowledge graphs like
Google Knowledge Graph or Wikidata.
10. Goal: A Smarter Web
Aims to build an intelligent, adaptive, and automated web experience by making web
data understandable and actionable by machines.
Feature Traditional Web Semantic Web
Share data in a meaningful, machine-
Purpose Share documents and link pages
understandable way
Mostly HTML, designed for RDF, OWL, SPARQL, designed for
Content Format
human reading machine processing
Implied (for humans to Explicit (defined using metadata and
Data Meaning
interpret) ontologies)
Links between data elements (concepts,
Data Linking Hyperlinks between documents
entities, etc.)
Keyword-based search (e.g.,
Search Semantic search (meaning-based queries)
Google)
Limited; based on file formats High; uses shared vocabularies and
Interoperability
and APIs ontologies
User Humans read and interpret Machines can integrate and reason over
Interaction pages data
Standards Used HTML, CSS, JavaScript RDF, OWL, SPARQL, URI
A dataset describing book properties
Example A webpage showing a book
(title, author, ISBN)
Challenges and Limitations of the Semantic Web
The Semantic Web, while promising a more intelligent and interconnected digital
environment, faces technical, organizational, and social challenges that hinder its
widespread adoption. These challenges stem from complexities in technology, data
integration, standardization, performance, and user acceptance.
Key Challenges and Limitations
1. Technical Complexity
o Implementing Semantic Web technologies like RDF, OWL, and SPARQL
requires specialized knowledge that many organizations and developers may
lack.
2. Data Integration Issues
o Combining data from diverse sources is difficult due to inconsistencies in
formats, vocabularies, and schemas, which can affect interoperability.
3. Data Quality Concerns
o Semantic systems depend heavily on accurate, complete, and unbiased data.
Poor-quality data can lead to misinterpretation or false conclusions.
4. Lack of Standardization
o Despite existing standards, inconsistent adoption leads to fragmentation and
isolated data silos, undermining the Semantic Web’s goal of connectivity.
5. Scalability Problems
o Semantic technologies may not scale well with large datasets, leading to
performance issues in query execution and data processing.
6. Privacy and Security Risks
o Richer and more detailed data connections increase the risk of exposing
sensitive information, raising privacy and ethical concerns.
7. Tooling and Infrastructure Gaps
While tools exist, many are still immature, fragmented, or hard to use, making
practical implementation more difficult.
Core technologies of semantic web
1. RDF(Resource description framework)
RDF (Resource Description Framework) is a standard way to describe data and
relationships between data using simple subject–predicate–object statements, called triples.
Example: A Book and Its Author
Book title: "The Hobbit"
Author: J.R.R. Tolkien
Publisher: George Allen & Unwin
Year: 1937
The book is a type of "Book"
It has a title
It has an author
It has a publisher
It was published in a specific year
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix ex: <http://example.org/book/> .
ex:the-hobbit
a dc:Book ;
dc:title "The Hobbit" ;
dc:creator "J.R.R. Tolkien" ;
dc:publisher "George Allen & Unwin" ;
dc:date "1937" .
Part Meaning
dc:title Title of the book
dc:creator Author (creator of the content)
dc:publisher The publishing company
dc:date Publication year
dc:Book Type (declares it's a Book)
dc: stands for the Dublin Core vocabulary (used for describing books, authors, etc.).
ex: is your custom namespace, used to define things in your own domain (in this case,
books).
2. OWL (Web Ontology Language)
OWL (Web Ontology Language) is a language used to create ontologies — structured
frameworks that define the types of things (classes), their properties, and the relationships
between them in a way that computers can understand and reason about.
Example: Person and Student
A Person class
A Student class that is a subclass of Person
An individual named Alice, who is a Student
3. SPARQL
SPARQL (SPARQL Protocol and RDF Query Language) is a standard query language
and protocol for querying and manipulating data stored in RDF (Resource Description
Framework) format.
It allows users to:
Retrieve data using pattern matching (subject–predicate–object)
Filter and transform query results
Update RDF data in certain versions (SPARQL 1.1)
4. Linked Data
A method at connecting related datasets across different sources using standardize
identifiers (URI)
Linked Data is a method of publishing structured data on the Web in such a way that it is
interconnected, machine-readable, and can be easily discovered and linked to other data
sources.
It uses standard web technologies such as:
URIs to identify resources
HTTP to retrieve data
RDF to describe data
SPARQL to query data
Applications
1. Google knowledge graph
2. Healthcare systems
3. E-commerce
4. Smart assistance.
Stages Toward Smart Data in Semantic Web
1. Text & Databases (Pre-XML)
Raw text and traditional databases with limited structure; no standard semantic
markup.
Before XML, data was stored and exchanged using plain text files, proprietary
formats, and early database systems (like hierarchical or relational models).
There was no standard way to describe data structure, making integration
and machine processing difficult.
Example :
A CSV file storing employee records
ID,Name,Department
101,John Smith,HR
102,Jane Doe,Finance
2. XML Documents
XML (eXtensible Markup Language) documents are text-based files used
to store and transport data in a structured and machine-readable format.
XML uses custom tags to define and describe data, making it both human-
readable and machine-process able. It is platform-independent and widely
used for data interchange between systems.
Data is structured using XML; schema is usually specific to one domain,
allowing some machine-readability.
<Employee>
<ID>101</ID>
<Name>John Smith</Name>
<Department>HR</Department>
</Employee>
3. Taxonomies & Mixed Vocabulary
In the context of the Semantic Web and data organization, taxonomies refer to
hierarchical classifications of terms or concepts, often used to categorize and
structure information. Mixed vocabulary occurs when data or metadata is
described using multiple taxonomies or ontologies, often from different
domains or sources. This helps enrich meaning, but also introduces challenges
in alignment and interoperability.
Use of controlled vocabularies or taxonomies to classify data; data may come
from multiple domains with different vocabularies.
<Book rdf:about="book123">
<dc:title>Semantic Web Fundamentals</dc:title>
<schema:author>Jane Doe</schema:author>
<dc:subject>Computer Science</dc:subject>
</Book>
dc: = Dublin Core vocabulary (used for general metadata)
schema: = Schema.org vocabulary (used for structured data on the web)
This is a mixed vocabulary approach—combining multiple taxonomies for
richer semantics.
4. Ontologies & Rules
An ontology is a formal, structured representation of knowledge that defines
the concepts (classes), properties, and relationships between entities in a
specific domain. It provides a shared vocabulary and a logic-based framework
for data integration and reasoning.
Rich semantic models defining classes, properties, and logical rules enabling
reasoning and inference. Data is highly interconnected and machine-
understandable.
How does XML fit into semantic web:-
XML (eXtensible Markup Language) is a markup language used to store and transport
data in a structured, text-based format that is both human-readable and machine-
readable.
XML is a platform-independent, self-descriptive language designed to represent
hierarchical data using custom tags, making it easy to exchange information between
systems.
<book>
<title> </title>
<author> </author>
<year> </year>
</book>
“In the Semantic Web, XML is used to structure data, but it does not define what the
data means. It serves as a format for storing and transporting data, while technologies
like RDF, OWL, and SPARQL add the meaning and logic”.
RDF/XML (adds semantics):
<rdf:Description rdf:about="http://example.org/book/hobbit">
<dc:title>The Hobbit</dc:title>
<dc:creator>J.R.R. Tolkien</dc:creator>
</rdf:Description>
Underlying Technologies of XML
Technology Purpose
XSD Data validation (modern schema)
DTD Older form of validation
XPath Navigating/selecting elements
Technology Purpose
XQuery Querying XML data
XSLT Transforming XML
Namespaces Avoiding name conflicts
DOM Programmatic access to XML
How do web services fit into the semantic web ?
Web services provide machine-to-machine communication over the Web — they allow
systems to share data or perform tasks remotely.
In the Semantic Web, web services become smarter when they use semantic technologies
(like RDF, OWL) to describe what they do and how to interact with them — making automatic
discovery, composition, and execution possible.
URI
URI (Uniform Resource Identifier) is a string of characters used to identify a resource on
the internet or in a digital system. In the context of the Semantic Web, URIs uniquely identify
concepts, entities, or data items, making them globally recognizable and linkable.
URIs are essential for linking and sharing structured data.
Every concept (e.g., a person, book, city) can be assigned a URI.
The Business Case for the Semantic Web
The Semantic Web offers real business value by making data more connected,
understandable, and actionable — not just by humans, but also by machines. Here's a
breakdown of why businesses invest in Semantic Web technologies.
1. Improved Data Integration
Businesses use data from many sources (databases, APIs, documents), but it’s often
siloed and inconsistent.
Uses RDF, ontologies, and URIs to create a common data model, making it easier to
merge and understand data from different sources.
Example: A retailer links supplier, sales, and customer data for smarter inventory
decisions.
2. Smarter Search and Discovery
Traditional keyword search can't capture meaning or intent.
With structured and semantically enriched data, search becomes more accurate and
context-aware.
Example: A knowledge base that understands "CEO of Tesla" and returns "Elon Musk"
without keyword matching.
3. Automation and AI-Driven Decisions
Manual data processing is time-consuming and error-prone.
Ontologies and rules (OWL, SWRL) allow automated reasoning, enabling systems to
infer new facts and support intelligent decision-making.
Example: An e-commerce platform recommends products based on a user's semantic
profile and browsing context.
4. Better Interoperability
Systems use different data formats and schemas.
Uses shared vocabularies (like FOAF, SKOS, Dublin Core) to ensure
interoperability across applications and organizations.
Example: Healthcare providers share patient records across platforms using the same
medical ontology
5. Enhanced Customer Experience
Disconnected data leads to poor personalization.
By linking and understanding data about users, products, and context, businesses can
offer richer, more personalized services.
Example: A travel site automatically suggests tours based on a customer’s past bookings
and interests.
Example Use Cases:
Google: Uses semantic data (Schema.org) for rich search results.
Amazon: Uses semantic relationships to improve product recommendations.
Pharmaceuticals: Use ontologies to link research data, clinical trials, and regulations.
Real-World Use Cases
Sector Semantic Web Application
E-commerce Linked product data, smart recommendations
Healthcare Medical ontologies, patient data integration
Finance Linked financial datasets, risk analysis
Publishing Knowledge graphs (e.g., BBC uses RDF for content)
Government Open linked data for public services (e.g., data.gov)
Use of the Semantic Web in Business
1. Sales Support
Enhances customer profiling through linked data (e.g., preferences, history).
Enables personalized product recommendations using semantic matching.
Integrates sales data from multiple systems for a 360° view of the customer.
2. Strategic Vision & Decision Support
Helps build knowledge graphs that support strategic planning and foresight.
Enables semantic reasoning over market trends, customer behavior, and performance
metrics.
Supports predictive analytics by integrating and interpreting diverse data sources.
3. Administration
Automates routine administrative tasks using machine-readable policies and rules.
Standardizes and links internal documents, HR records, and compliance files for better
workflow efficiency.
Facilitates data governance and improves access control through ontology-based
systems.
4. Marketing
Powers semantic SEO with structured data (e.g., Schema.org) to improve visibility in
search engines.
Enables context-aware campaigns by understanding customer interests and intent.
Combines social, demographic, and behavioral data to refine audience segmentation.
5. Business Development
Supports market research by semantically linking external data (e.g., competitors,
partners, industry trends).
Identifies new opportunities through automated reasoning across business datasets.
Aids in partnership analysis using linked data (e.g., LinkedIn, Crunchbase, DBpedia).
6. Corporate Data Sharing & Collaboration
Facilitates interoperability between departments and partner organizations using
shared ontologies.
Reduces data silos by enabling semantic data exchange across systems and platforms.
Supports collaborative platforms with machine-understandable data structures.
XML
XML, or Extensible Markup Language, is a markup language designed for encoding
documents in a format that is both human-readable and machine-readable. It's used for storing,
transmitting, and reconstructing data, and is particularly useful for exchanging data between
different systems.
<message>
<text> Hello, world! </text>
</message>
Why is XML Important?
A markup language with rules to define any data is called Extensible Markup Language
(XML). Unlike other programming languages, XML is incapable of carrying out computations
on its own. Instead, structured data management can be implemented using any computer
language or piece of software.
Features of XML
Numerous aspects distinguish XML from other languages. The list of key XML features is
shown below.
Extensible and Human Readable
Even if additional data is added, most XML applications will continue to function as
intended.
Overall Simplicity
Data availability, platform modifications, data transit, and sharing are all made
simpler by XML. Without losing data, XML makes it simpler to upgrade or extend to
new operating systems, apps, or browsers. A wide range of "reading devices,"
including people, computers, voice assistants, news feeds, and more, can have access
to data.
Separates Data from HTML
Using XML, data can be saved in a variety of XML files. Thus, you won't need to
update HTML to make changes to the underlying data, allowing you to focus on using
HTML/CSS for display and style.
Allows XML Validation
An XML document can be validated using a DTD or XML schema. By doing this, the
XML document is guaranteed to be syntactically valid and any issues brought on by
flawed XML are avoided.
XML Supports Unicode
Given that XML is compatible with Unicode, it may transmit virtually any
information in any written human language.
Used to Create New Languages
Many new Internet languages have emerged as a result of XML. WSDL can be used
to describe available web services. WAP and WML are utilized as portable device
markup languages. The RSS languages are used for news feeds.
How Does XML Work?
XML works in several ways, one of which is a predictable data format. Due to the tight
formatting requirements of XML, programs that process or display the encoded data will
produce an error if the formatting is incorrect.
Benefits of XML
Here are a few major benefits of using XML:
Support Interbusiness Transaction
When a business sells a product or service to another business, the two companies must
share details including price, specifications, and delivery times. They can electronically
exchange all the required information and automatically finalize complicated deals
using Extensible Markup Language (XML), all without the need for human
participation.
High Independence
The extreme independence of extensible markup language XML data over many other
languages is one of its most obvious advantages. It implies that XML files can be easily
packaged for portability and used on a variety of platforms.
Readability
Although XML files are not intended to be read directly, if you have a little
understanding, you can greatly benefit from your job by studying the data layers within
them. XML documents choose to represent the syntax with pleasant text characters
rather than condensing it or utilizing sophisticated syntaxes.
Highly Customizable
Each user can construct a wide variety of tags with unique conditions of usage using
the XML specification without having to worry too much about the originals. If it suits
your purposes, you can even repurpose your tags from other users.
Suitable for Any Platform
The contents inside the XML file can be read and parsed swiftly by systems in addition
to people. Therefore, using XML files, the transmission of display data across many
programs and processes will be simple.
Improve Search Efficiency
XML files can be sorted and categorized more effectively and precisely than other
forms of documents by computer programs like search engines. The wordmark, for
instance, can be either a noun or a verb. Search engines can accurately categorize marks
for pertinent search results based on XML tags. As a result, XML makes it easier for
computers to understand natural language.
Design Flexible Applications
You can easily update or change the design of your application using XML. Numerous
technologies, particularly more recent ones, include built-in XML support. To make
updates without having to completely reformat your database, they can automatically
read and interpret XML data files.
Impact of XML on the enterprise
Extensible Markup Language (XML) has had a significant and lasting impact on the
enterprise landscape, playing a crucial role in shaping how businesses handle data.
XML's core strength lies in its ability to define and store data in a shareable and
structured manner, facilitating seamless information exchange across diverse systems
and platforms.
1. Data Integration Across Platforms
Enterprises use multiple systems (ERP, CRM, databases).
XML acts as a universal format for sharing data between these systems.
Example: XML enables integration between SAP, Oracle, and Salesforce.
2. Web Services & API Communication
SOAP (Simple Object Access Protocol) is based on XML.
Many B2B (Business-to-Business) integrations rely on XML for structured data
exchange.
Example: Banking & Financial Services use XML for secure transactions and reporting.
3. Document Management & Storage
XML is widely used for storing structured data in documents (e.g., invoices,
contracts,reports).
Formats like Docx (Microsoft Word), SVG (Scalable Vector Graphics), and
RSS(Really Simple Syndication) Feeds are based on XML.
Example: E-commerce businesses use XML for product catalogs and orders.
4. Standardization & Compliance
Many industries use XML-based standards for data exchange and regulatory
compliance:
Healthcare: HL7 (Health Level 7) for medical data exchange.
Finance: XBRL (Extensible Business Reporting Language) for financial reporting.
Government: XML-based legal and tax document formats
5. Automation & Workflow Optimization
XML enables automated processing of business documents (e.g., electronic invoices,
shipping notices).
Used in BPM (Business Process Management) to define workflows.
Example: Airlines use XML to manage flight reservations and ticketing.
Advantages of XML for Enterprises
Reduces Integration Costs – Easier to connect different systems.
Improves Data Accuracy & Consistency – Structured format prevents errors.
Enhances Data Security – Can be encrypted and validated using schemas (XSD).
Facilitates Scalability – Suitable for small businesses and large enterprises alike.
Boosts Automation & AI Adoption – Helps in data analytics, machine learning, and
automation.