KEMBAR78
Web Science Synergies: Exploring Web Knowledge through the Semantic Web | PDF
Exploring Web Data & Knowledge through
the Semantic Web
Dr. Stefan Dietze
L3S Research Center

Stefan Dietze

27/11/13

1
Pluto & the seven Dwarfs?

pluto the dwarf planet ?

„…solar
system…
#pluto“

Stefan Dietze

27/11/13
“A little semantics goes a long way” (J.

1)
Hendler

yago:AstronomicalObjects

Semantic Web
 Adding meaning through
shared vocabularies and
schemas (eg DBpedia)
 W3C standards RDF &
SPARQL for data &
knowledge representation
and querying
 Persistent URIs to reference
& interlink data on the Web

dbp:CelestialBody

typeOf

typeOf

dbp:Pluto
dwarfPlanetOf

redirectOf

dbp:SolarSystem

namedAfter

dbp:Pluto(mythology)
dbp:DwarfPlanetPluto

„…solar
system…
#pluto“

1 Hendler,

J., The Dark Side of the Semantic Web, IEEE Intelligent Systems, Jan/Feb 2007
Semantic Web / Linked Data
 De-facto standard for sharing data on the Web
 Vision: well connected graph of open Web data
 350+ datasets and 32 billion triples in LOD Cloud alone
 Other „incarnations“:
 Google
 „HTTP-accessibility“
(SPARQL, URI-dereferencing)

Knowledge Graph

 Facebook Open Graph

 „Structure“ & „Semantics“
(=> shared/linked vocabularies)

 http://schema.org
BBC
Program
mes

 „Interlinked“
 „Persistent“
FOAF

DBpedia
Ontology
Geo
Ontology

Gene
Ontology

Stefan Dietze

Dublin
Core

BIBO
That’s awesome, but...
Hm,
really?

…why are there so few datasets actually used?
 Date reuse and in-links focused on trusted „reference
graphs“ such as DBpedia (i.e. Wikipedia)
 Long tail of LD datasets which are neither reused nor linked
to (LOD Cloud alone consists of 300+ datasets)

 „HTTP-accessibility“
(SPARQL, URI-dereferencing)

 Explanations?

 „Structure“ & „Semantics“
(=> shared/linked vocabularies)
 „Interlinked“
 „Persistent“

Stefan Dietze

27/11/13
Open data is more diverse than we think
SPARQL Web-Querying Infrastructure: Ready for Action?,
Carlos Buil-Aranda, Aidan Hogan, Jürgen Umbrich Pierre-Yves
Vandenbussch, International Semantic Web Conference 2013,
(ISWC2013).

Accessibility of datasets?
 Less than 50% of all SPARQL endpoints actually responsive
at given point of time
 “THE” SPARQL protocol? No, but many variants & subsets
 …

SPARQL endpoint availability over time [Buil-Aranda et al 2013]

Shared vocabularies & schemas, but:

 …still very heterogeneous [d’Aquin, WebSci13]
 …data partially messy an not conformant
(RDFS, schemas) [HoganJWS2012]

Co-occurence graph of data
types in 146 datasets: 144
Vocabularies, 588 highly
overlapping types, 719
Properties

 …even widely used reference datasets such as
DBpedia noisy [Paulheim2013]
Assessing the Educational Linked Data Landscape, D’Aquin, M.,
Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris,
France, May 2013.
Type Inference on Noisy RDF Data, Paulheim H., Bizer, C. Semantic
Web – ISWC 2013, Lecture Notes in Computer Science Volume 8218,
2013, pp 510-525

Stefan Dietze

An empirical survey of Linked Data conformance. Hogan, A., Umbrich,
J., Harth, A., Cyganiak, R., Polleres, A., Decker., S., In the Journal of Web
Semantics 14: pp. 14–44, 2012
Too many/diverse datasets, too little information
 Which datasets are useful & trustworthy for case
XY (eg „learning about the solar system“) ?
 Which topics (eg „Astronomy“) are covered by
dataset X?
 Which datasets describe/offer videos (slides,
publications, statistics etc)?

?

?
?

Stefan Dietze

27/11/13
Data curation and dataset profiling
 Which datasets are useful & trustworthy for case
XY (eg „learning about the solar system“) ?
 Which topics (eg „Astronomy“) are covered by
dataset X?
 Which datasets describe/offer videos (slides,
publications, statistics etc)?

 Catalog of data (LinkedUp
Catalog): classification of datasets
according to resource types,
disciplines/topics, data quality,
accessability, etc

 Infrastructure for
distributed/federated querying

describes

Stefan Dietze

LinkedUp
Dataset Catalog

27/11/13
Dataset profiling: what’s all the data about
po:Programme

AAISO

BBC Programme

bibo:Fi
bibo:Film
bibo:Fil BIBO FOAF

<po:Programme …>
<po:Series>Wonders of the Solar System</.>
<po:Actor>Brian Cox</…>
</po:Programme…>

Schema mappings

yov:Video
contains

Yovisto Video
<yo:Video …>
<dc:title>Pluto & the
Dwarf Planets</dc:title>
…
</yo:Video…>

Entity disambiguation
db:Astro. Objects
db:Astro. Objects
db:Astronomy

Topic profile extraction

Dataset
Metadata

Stefan Dietze

LinkedUp
Dataset Catalog

27/11/13
LinkedUp Data Catalog
inExplore & query for datasets/types & topics
 a nutshell

http://data.linkededucation.org/linkedup/categories-explorer
http://data.linkededucation.org/linkedup/catalog/

 Federated queries using type mappings

Stefan Dietze

27/11/13
LinkedUp Challenge: using open data for learning
http://linkedup-challenge.org

 Open Data Competition to promote tools and applications that analyse / integrate (Linked)
Web data
 Organised by LinkedUp project over 2 years (“Veni”, “Vidi”, “Vici”) with 40.000 EUR awards
 Veni Competition - 22 submissions, 8 shortlisted for presentation at Open Knowledge
Conference (17 September, Geneva Switzerland)

Stefan Dietze

27/11/13
st
1

Place: PoliMedia
Exploring political debates & events
http://www.polimedia.nl/

 Cross-media exploration & analysis of political
events
(parliament debates and media coverage)
 Automatically generated links between transcripts
debates, newspaper articles, and radio bulletins.
 (Linked) Data available at http://data.polimedia.nl

 Data sources: 1) newspapers of the historical
newspaper archive, 2) radio bulletins of the Dutch
National Press Agency (ANP)
 9000+ debates (1945 – 1995)
 Over 3000 media links

Martijn Kleppe, Max Kemman, Henri Beunders (Erasmus
Universiteit Rotterdam), Laura Hollink Damir Juric (Vrije
Universiteit Amsterdam), Johan Oomen Jaap Blom
(Nederlands Instituut voor Beeld en Geluid)
Stefan Dietze

27/11/13
Outlook: more “focused” data reuse challenges
http://linkedup-challenge.org/

Open Track

Focused Track

 Scalable tools and applications
using (Linked) open data for
educational purposes

 LinkedUp data catalog
 Promotion of selected Veni
submissions

 Simplifying complex
information to make it
accessible (example:
publications from Elsevier)

 Recommender system for
educational resources (courses,
MOOCs) relevant to user
interests

 Approx. 20.000 EUR awards budget
 Final events at 11th Extended Semantic Web Conference (ESWC2014)
 Submission: 14 February 2014

Stefan Dietze

27/11/13

13
Thank you!

REFERENCES

WWW

Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou,
A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May
2013.

See also (data)

Generating structured Profiles of Linked Data Graphs, Fetahu, B; Dietze,
S., d’Aquin, M., Nunes, B.P., ISWC2013 – 12th International Semantic Web
Conference;

 http://datahub.io/group/linked-education
 http://data.linkededucation.org

 http://data.linkededucation.org/linkedup/catalog/
 http://lak.linkededucation.org

Combining a co-occurrence-based and a semantic measure for entity
linking, B. P. Nunes, S. Dietze, M.A. Casanova, R. Kawase, B. Fetahu, and
W. Nejdl., ESWC 2013 - 10th Extended Semantic Web Conference, (May
2013).

See also (general)

Type Inference on Noisy RDF Data, Paulheim H., Bizer, C. Semantic Web –
ISWC 2013, Lecture Notes in Computer Science Volume 8218, 2013, pp
510-525
An empirical survey of Linked Data conformance. Hogan, A., Umbrich, J.,
Harth, A., Cyganiak, R., Polleres, A., Decker., S., In the Journal of Web
Semantics 14: pp. 14–44, 2012

 http://linkedup-project.eu
 http://linkedup-challenge.org
 http://linkededucation.org
 http://linkeduniversities.org

SPARQL Web-Querying Infrastructure: Ready for Action?, Carlos BuilAranda, Aidan Hogan, Jürgen Umbrich Pierre-Yves Vandenbussch,
International Semantic Web Conference 2013, (ISWC2013).

Stefan Dietze

27/11/13

14

Web Science Synergies: Exploring Web Knowledge through the Semantic Web

  • 1.
    Exploring Web Data& Knowledge through the Semantic Web Dr. Stefan Dietze L3S Research Center Stefan Dietze 27/11/13 1
  • 2.
    Pluto & theseven Dwarfs? pluto the dwarf planet ? „…solar system… #pluto“ Stefan Dietze 27/11/13
  • 3.
    “A little semanticsgoes a long way” (J. 1) Hendler yago:AstronomicalObjects Semantic Web  Adding meaning through shared vocabularies and schemas (eg DBpedia)  W3C standards RDF & SPARQL for data & knowledge representation and querying  Persistent URIs to reference & interlink data on the Web dbp:CelestialBody typeOf typeOf dbp:Pluto dwarfPlanetOf redirectOf dbp:SolarSystem namedAfter dbp:Pluto(mythology) dbp:DwarfPlanetPluto „…solar system… #pluto“ 1 Hendler, J., The Dark Side of the Semantic Web, IEEE Intelligent Systems, Jan/Feb 2007
  • 4.
    Semantic Web /Linked Data  De-facto standard for sharing data on the Web  Vision: well connected graph of open Web data  350+ datasets and 32 billion triples in LOD Cloud alone  Other „incarnations“:  Google  „HTTP-accessibility“ (SPARQL, URI-dereferencing) Knowledge Graph  Facebook Open Graph  „Structure“ & „Semantics“ (=> shared/linked vocabularies)  http://schema.org BBC Program mes  „Interlinked“  „Persistent“ FOAF DBpedia Ontology Geo Ontology Gene Ontology Stefan Dietze Dublin Core BIBO
  • 5.
    That’s awesome, but... Hm, really? …whyare there so few datasets actually used?  Date reuse and in-links focused on trusted „reference graphs“ such as DBpedia (i.e. Wikipedia)  Long tail of LD datasets which are neither reused nor linked to (LOD Cloud alone consists of 300+ datasets)  „HTTP-accessibility“ (SPARQL, URI-dereferencing)  Explanations?  „Structure“ & „Semantics“ (=> shared/linked vocabularies)  „Interlinked“  „Persistent“ Stefan Dietze 27/11/13
  • 6.
    Open data ismore diverse than we think SPARQL Web-Querying Infrastructure: Ready for Action?, Carlos Buil-Aranda, Aidan Hogan, Jürgen Umbrich Pierre-Yves Vandenbussch, International Semantic Web Conference 2013, (ISWC2013). Accessibility of datasets?  Less than 50% of all SPARQL endpoints actually responsive at given point of time  “THE” SPARQL protocol? No, but many variants & subsets  … SPARQL endpoint availability over time [Buil-Aranda et al 2013] Shared vocabularies & schemas, but:  …still very heterogeneous [d’Aquin, WebSci13]  …data partially messy an not conformant (RDFS, schemas) [HoganJWS2012] Co-occurence graph of data types in 146 datasets: 144 Vocabularies, 588 highly overlapping types, 719 Properties  …even widely used reference datasets such as DBpedia noisy [Paulheim2013] Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013. Type Inference on Noisy RDF Data, Paulheim H., Bizer, C. Semantic Web – ISWC 2013, Lecture Notes in Computer Science Volume 8218, 2013, pp 510-525 Stefan Dietze An empirical survey of Linked Data conformance. Hogan, A., Umbrich, J., Harth, A., Cyganiak, R., Polleres, A., Decker., S., In the Journal of Web Semantics 14: pp. 14–44, 2012
  • 7.
    Too many/diverse datasets,too little information  Which datasets are useful & trustworthy for case XY (eg „learning about the solar system“) ?  Which topics (eg „Astronomy“) are covered by dataset X?  Which datasets describe/offer videos (slides, publications, statistics etc)? ? ? ? Stefan Dietze 27/11/13
  • 8.
    Data curation anddataset profiling  Which datasets are useful & trustworthy for case XY (eg „learning about the solar system“) ?  Which topics (eg „Astronomy“) are covered by dataset X?  Which datasets describe/offer videos (slides, publications, statistics etc)?  Catalog of data (LinkedUp Catalog): classification of datasets according to resource types, disciplines/topics, data quality, accessability, etc  Infrastructure for distributed/federated querying describes Stefan Dietze LinkedUp Dataset Catalog 27/11/13
  • 9.
    Dataset profiling: what’sall the data about po:Programme AAISO BBC Programme bibo:Fi bibo:Film bibo:Fil BIBO FOAF <po:Programme …> <po:Series>Wonders of the Solar System</.> <po:Actor>Brian Cox</…> </po:Programme…> Schema mappings yov:Video contains Yovisto Video <yo:Video …> <dc:title>Pluto & the Dwarf Planets</dc:title> … </yo:Video…> Entity disambiguation db:Astro. Objects db:Astro. Objects db:Astronomy Topic profile extraction Dataset Metadata Stefan Dietze LinkedUp Dataset Catalog 27/11/13
  • 10.
    LinkedUp Data Catalog inExplore& query for datasets/types & topics  a nutshell http://data.linkededucation.org/linkedup/categories-explorer http://data.linkededucation.org/linkedup/catalog/  Federated queries using type mappings Stefan Dietze 27/11/13
  • 11.
    LinkedUp Challenge: usingopen data for learning http://linkedup-challenge.org  Open Data Competition to promote tools and applications that analyse / integrate (Linked) Web data  Organised by LinkedUp project over 2 years (“Veni”, “Vidi”, “Vici”) with 40.000 EUR awards  Veni Competition - 22 submissions, 8 shortlisted for presentation at Open Knowledge Conference (17 September, Geneva Switzerland) Stefan Dietze 27/11/13
  • 12.
    st 1 Place: PoliMedia Exploring politicaldebates & events http://www.polimedia.nl/  Cross-media exploration & analysis of political events (parliament debates and media coverage)  Automatically generated links between transcripts debates, newspaper articles, and radio bulletins.  (Linked) Data available at http://data.polimedia.nl  Data sources: 1) newspapers of the historical newspaper archive, 2) radio bulletins of the Dutch National Press Agency (ANP)  9000+ debates (1945 – 1995)  Over 3000 media links Martijn Kleppe, Max Kemman, Henri Beunders (Erasmus Universiteit Rotterdam), Laura Hollink Damir Juric (Vrije Universiteit Amsterdam), Johan Oomen Jaap Blom (Nederlands Instituut voor Beeld en Geluid) Stefan Dietze 27/11/13
  • 13.
    Outlook: more “focused”data reuse challenges http://linkedup-challenge.org/ Open Track Focused Track  Scalable tools and applications using (Linked) open data for educational purposes  LinkedUp data catalog  Promotion of selected Veni submissions  Simplifying complex information to make it accessible (example: publications from Elsevier)  Recommender system for educational resources (courses, MOOCs) relevant to user interests  Approx. 20.000 EUR awards budget  Final events at 11th Extended Semantic Web Conference (ESWC2014)  Submission: 14 February 2014 Stefan Dietze 27/11/13 13
  • 14.
    Thank you! REFERENCES WWW Assessing theEducational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013. See also (data) Generating structured Profiles of Linked Data Graphs, Fetahu, B; Dietze, S., d’Aquin, M., Nunes, B.P., ISWC2013 – 12th International Semantic Web Conference;  http://datahub.io/group/linked-education  http://data.linkededucation.org  http://data.linkededucation.org/linkedup/catalog/  http://lak.linkededucation.org Combining a co-occurrence-based and a semantic measure for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R. Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended Semantic Web Conference, (May 2013). See also (general) Type Inference on Noisy RDF Data, Paulheim H., Bizer, C. Semantic Web – ISWC 2013, Lecture Notes in Computer Science Volume 8218, 2013, pp 510-525 An empirical survey of Linked Data conformance. Hogan, A., Umbrich, J., Harth, A., Cyganiak, R., Polleres, A., Decker., S., In the Journal of Web Semantics 14: pp. 14–44, 2012  http://linkedup-project.eu  http://linkedup-challenge.org  http://linkededucation.org  http://linkeduniversities.org SPARQL Web-Querying Infrastructure: Ready for Action?, Carlos BuilAranda, Aidan Hogan, Jürgen Umbrich Pierre-Yves Vandenbussch, International Semantic Web Conference 2013, (ISWC2013). Stefan Dietze 27/11/13 14