KEMBAR78
Introduction to Linked Data | PPTX
IntroductiontoLinked DataOscar Corcho, Asunción Gómez Pérez ({ocorcho, asun}@fi.upm.es)Universidad Politécnica de MadridUniversidad del Valle, Cali, ColombiaSeptember 10th 2010Credits: Raúl García Castro, Oscar Muñoz, Jose Angel Ramos Gargantilla, María del Carmen Suárez de Figueroa, Boris Villazón, Alex de León, Víctor Saquicela, Luis Vilches, Miguel Angel García, Manuel Salvadores, Guillermo Alvaro, Juan Sequeda, Carlos Ruiz Moreno and manyothersWorkdistributedunderthelicenseCreativeCommonsAttribution-Noncommercial-Share Alike 3.0
ContentsIntroductiontoLinked DataLinked Data publicationMethodologicalguidelinesforLinked Data publicationRDB2RDF toolsTechnicalaspects of Linked Data publicationLinked Data consumption2
Whatisthe Web of Linked Data?An extension of the current Web…… where information and services are given well-defined and explicitly represented meaning, …… so that it can be shared and used by humans and machines, ...... better enabling them to work in cooperationHow?Promoting information exchange by tagging web content with machineprocessable descriptions of its meaning. And technologies and infrastructure to do thisAnd clear principles on how to publish datadata
What is Linked Data?Linked Data is a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF.Part of the Semantic WebExposing, sharing and connecting dataTechnologies: URIs and RDF (although others are also important)
The fourprinciples (Tim Berners Lee, 2006)Use URIs as names for things Use HTTP URIs so that people can look up those names. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) Include links to other URIs, so that they can discover more things. http://www.w3.org/DesignIssues/LinkedData.html5http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html
Linked Open Data evolution2007
2008
20097LOD Cloud May 2007Facts:Focal points:
DBPedia: RDFizedvesion of Wikipiedia; many ingoing and outgoing links
Music-related datasets
Big datasets include FOAF, US Census data
Size approx. 1 billion triples, 250k linksFigure from [4]
8LOD Cloud September 2008Facts:More than 35 datasets interlinked
Commercial players joined the cloud, e.g., BBC
Companies began to publish and host dataset, e.g. OpenLink, Talis, or Garlik.
Size approx. 2 billion triples, 3 million linksFigure from [4]
9LOD Cloud March 2009Facts:Big part from Linking Open Drug cloud and the BIO2RDF project (bottom)
Notable new datasets: Freebase, OpenCalais, ACM/IEEE
Size > 10 billion triplesFigure from [4]
LOD clouds
WhyLinked Data?Basically, tomovefrom a Web of documentsto a Web of DataLet’s try anexample:Tell me whichfootballplayers, born in theprovince of Albacete, in Spain, havescored a goal in theWorld Cup finalDisclaimer:Sorryto use anexampleaboutfootball, butyouhavetounderstandthatforseveralyearsSpaniardswillbetalkingaboutfootball a lot ;-)Example courtesy of Guillermo Alvaro Rey
Informationsearch in the Web of documents¿?Example courtesy of Guillermo Alvaro Rey
What we were actually looking forExample courtesy of Guillermo Alvaro Rey
Itwouldbebettertomake a data query…(footballplayersfrom Albacete whoplayedEurocup 2008)Example courtesy of Guillermo Alvaro Rey
Howshouldwepublish data?Formats in which data ispublishednowadays…XMLHTMLDBsAPIsCSVXLS…However, mainlimitationsfrom a Web of Data point of viewDifficulttointegrateData isnotlinkedtoeachother, as ithappenswith Web documents.
Which format do we use then?RDF (ResourceDescription Framework)Data modelBasedon triples: subject, predicate, object<Oscar> <vive en> <Madrid><Madrid> <es la capital de> <España><España> <es campeona de> <Mundial de Fútbol>…Serialised in differentformatsRDF/XML, RDFa, N3, Turtle, JSON…
URIs (Universal-UniformResourceIdentifer)Two types of identifiers can be used to identify Linked Data resourcesURIRefs(Unique Resource IdentifiersReferences)A URI and an optional FragmentIdentifier separated from the URI by the hash symbol ‘#’http://www.ontology.org/people#Personpeople:PersonPlain URIs can also be used, as in FOAF:http://xmlns.com/foaf/0.1/Person17
How do wepublishLinked Data?ExposingRelationalDatabasesorother similar formatsintoLinked DataD2RTriplifyR2ONOR2OVirtuosoUltrawrap…Usingnative RDF triplestoresSesameJenaOwlimTalisplatform…Incorporatingit in theform of RDFa in CMSslikeDrupal18
How do we consume Linked Data?Linked Data browsersTo explore things and datasets and to navigate between them.Tabulator Browser (MIT, USA), Marbles (FU Berlin, DE), OpenLink RDF Browser (OpenLink, UK), Zitgist RDF Browser (Zitgist, USA), Disco Hyperdata Browser (FU Berlin, DE), Fenfire (DERI, Ireland)Linked Data mashupsSites that mash up (thus combine Linked data)Revyu.com (KMI, UK), DBtune Slashfacet (Queen Mary, UK), DBPedia Mobile (FU Berlin, DE), Semantic Web Pipes (DERI, Ireland) Search enginesTo search for Linked Data.Falcons (IWS, China), Sindice (DERI, Ireland), MicroSearch (Yahoo, Spain), Watson (Open University, UK), SWSE (DERI, Ireland), Swoogle (UMBC, USA)Listing on this slide by T. Heath, M. Hausenblas, C. Bizer, R. Cyganiak, O. Hartig19
Linked Data browsers (Disco)
Linked Data Mashup (LinkedGeoData)© Migración de datos a la Web de los Datos - Enfoques, técnicas y herramientasLuis Manuel Vilches Blázquez
Linked Data Mashup (DBpedia Mobile)http://wiki.dbpedia.org/DBpediaMobile© Migración de datos a la Web de los Datos - Enfoques, técnicas y herramientasLuis Manuel Vilches Blázquez
 Linked Data Search Engines (Sindice and SIG.MA)Entity lookup service. Find a document that mentions a URI or a keyword.
Linked Data SearchEngines (NYT)The New York Times: Alumni In The Newshttp://data.nytimes.com/schools/schools.html
Linked Data SearchEngines (NYT)The New York Times: Source code is available… and is based on SPARQL queries
Oneadditionalmotivation: Open GovernmentGovernment and state administration should be opened at all levels to effective public scrutiny and oversightObjectives:TransparencyParticipationCollaborationInclusionCost reductionInteroperabilityReusabilityLeadershipMarket & Value26Some Links:
 B. Obama –Transparency and Open Government
 T. Berners-Lee - Raw data now!
 J. Manuel Alonso - ¿Qué es Open Data?
Open Government Data
8 Principles of Open Government DataOpen Government. USA and UK27BOTTOM-UPTop-down
Linked Data Mashup (data.gov)Clean Air Status and Trends (CASTNET)http://data-gov.tw.rpi.edu/demo/exhibit/demo-8-castnet.php
Linked Data in the UKEducationhttp://education.data.gov.uk/id/school/106661Parliamenthttp://parliament.psi.enakting.org/id/member/1227MapsE.g., London: http://data.ordnancesurvey.co.uk/id/7000000000041428http://map.psi.enakting.orgTransporthttp://www.dft.gov.uk/naptan/SameAs servicehttp://www.sameas.orgChallengeshttp://gov.tso.co.uk/openup/sparql/gov-transport29
Linked Data Mashup (data.gov.uk)Research Funding Explorerhttp://bis.clients.talis.com/
Open GovernmentSpain. Euskadi31
Open GovernmentSpain. Abredatos32
Open GovernmentSpain. Zaragoza 33
Open GovernmentSpain. Asturias34
Linked Data Mashup (Waterquality)Water quality in Asturias’ beacheshttp://datos.fundacionctic.org/sandbox/asturias/playas/
ContentsIntroductiontoLinked DataLinked Data publicationMethodologicalguidelinesforLinked Data publicationRDB2RDF toolsTechnicalaspects of Linked Data publicationLinked Data consumption36
GeoLinkedDataIt is an open initiative whose aim is to enrich the Web of Data with Spanish geospatial data.This initiative has started off by publishing diverse information sources, such as National Geographic Institute of Spain (IGN-E) and National Statistics Institute (INE)http://geo.linkeddata.es
Motivation		99.171 % English		0.019 % SpanishThe Web of Data ismainlyforEnglishspeakersPoorpresence of SpanishSource:Billion Triples dataset at http://km.aifb.kit.edu/projects/btc-2010/Thanks to Aidan and Richard
Related Work
Impact of Geo.linkeddata.esNumber of triples in Spanish (July 2010): 1.412.248 Number of triples in Spanish (EndAugust 2010): 21.463.08840Asunción Gómez Pérez
Processfor Publishing Linked Data onthe WebIdentificationof the data sourcesVocabularydevelopmentGenerationof the RDF DataPublicationof the RDF data Data cleansingLinking the RDF dataEnable effective discovery
1. Identification and selection of the data sourcesIdentificationof the data sourcesInstituto GeográficoNacionalVocabularydevelopmentGenerationof the RDF DataPublicationof the RDF data Data cleansingLinking the RDF dataInstituto Nacionalde EstadísticaEnable effective discovery
1. Identification and selection of the data sourcesInstituto Geográfico Nacional (GeographicSpanishInstitute)Multilingual (Spanish, Vasc, Gallician, Catalan)ConceptualizationmistmatchesGranularity (scale concept)Textual informationParticularatiesLongitudelatitudeInstituto Nacional de Estadística (StatisticSpanishInstitute)
Monolingual
Numericalinformation
ParticularatiesGeo (textual level)Temporal43Asunción Gómez Pérez
1. Identification and selection of the data sourcesIGN-E
1. Identification and selection of the data sourcesIndustryProductionIndexYearProvince
2. Vocabulary developmenthttp://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/#whichvocabsIdentificationof the data sourcesVocabularydevelopmentGenerationof the RDF DataThisisnotenoughPublicationof the RDF data Data cleansingLinking the RDF dataEnable effective discovery
2. VocabularydevelopmentFeaturesLightweight : Taxonomies and a fewpropertiesConsensuatedvocabulariesToavoidthemappingproblemsMultilingualLinked data are multilingualTheNeOnmethodology can helptoRe-enginer Non ontologicalresourcesintoontologiesPros: use domainterminologyalreadyconsensuatedbydomainexpertsWithdraw in heavyweightontologiesthosefeaturesthatyoudon’tneedReuseexistingvocabularies47Identificationof the data sourcesVocabularydevelopmentGenerationof the RDF DataPublicationof the RDF data Data cleansingLinking the RDF dataEnable effective discoveryAsunción Gómez Pérez
Knowledge ResourcesOntological ResourcesO. Design Patterns34O. Repositories and Registries56FlogicRDF(S)OWLOntologicalResourceReuse              O. Aligning              O. Merging562Ontology DesignPattern ReuseNon Ontological ResourceReuse436Non Ontological Resources2Ontological ResourceReengineering7GlossariesDictionariesLexicons5Non Ontological ResourceReengineering46ClassificationSchemasThesauriTaxonomiesAlignments2RDF(S)1FlogicO. ConceptualizationO. ImplementationO. FormalizationO. SpecificationSchedulingOWL8Ontology Restructuring(Pruning, Extension, Specialization, Modularization)9O. Localization1,2,3,4,5,6,7,8, 9Ontology Support Activities: Knowledge Acquisition (Elicitation); Documentation; Configuration Management; Evaluation (V&V); Assessment48
Vocabularydevelopment: SpecificationContent requirements: Identifythe set of questionsthattheontologyshouldanswerWhichone are theprovinces in Spain?Where are thebeaches?Where are thereservoirs?Identifytheproductionindex in MadridWhichoneisthecitywithhigherproductionindex?Give me Madrid latitude and altitude….Non-contentrequirementsTheontologymustbe in thefourofficialSpanishlanguages49Asunción Gómez Pérez
2. Lightweight Ontology DevelopmentWGS84 Geo Positioning: an RDF vocabularyscv:Dimensionscv:Itemscv:Datasethydrographical phenomena (rivers, lakes, etc.)Vocabulary for instants, intervals, durations, etc.Names and international code systems for territories and groupsOntology for OGC Geography Markup Language reusedFollowing the INSPIRE (INfrastructure for SPatial InfoRmation in Europe) recommendation.hydrOntology,SCOVO, FAO Geopolitcal, WGS84, GML, and Time
Objetivos:INSPIRE intenta conseguir fuentes armonizadas de Información Geográfica para dar soporte a la formulación, implementación y evaluación de políticas comunitarias (Medio Ambiente, etc).Fuentes de Información Geográfica: Bases de datos de los Estados Miembros (UE) a nivel local, regional, nacional e internacional.Contexto – Directiva INSPIRE Luis Manuel Vilches Blázquez
INSPIRE - AnexosLuis Manuel Vilches Blázquez
hydrOntologyExistencia de gran diversidad de problemas (múltiples fuentes, heterogeneidad de contenido y estructuración, ambigüedad del lenguaje natural, etc.) en la información geográfica.Necesidad de un modelo compartido para solventar los problemas de armonización y estructuración de la información hidrográfica.hydrOntology es una ontología global de dominio desarrollada conforme a un acercamiento top-down. Recubrir la mayoría de los fenómenos representables cartográficamente asociados al dominio hidrográfico.Servir como marco de armonización entre los diferentes productores de información geo-espacial en el entorno nacional e internacional.Comenzar con los pasos necesarios para obtener una mejor organización y gestión de la información geográfica (hidrográfica).Luis Manuel Vilches Blázquez
FuentesTesauros y BibliografíaCatálogos de fenómenosGettyFTT ADLBCN25GEMETWFDCC.AA.EGM & ERMDiccionarios yMonografíasBCN200Nomenclátor Geográfico NacionalNomenclátor ConcisoLuis Manuel Vilches Blázquez
Criterios de estructuración Directiva Marco del AguaPropuesta por Parlamento y Consejo de la UELista de definiciones de fenómenos hidrográficosProyecto SDIGERProyecto piloto INSPIREDos cuencas, países e idiomasCriterios semánticosDiccionarios geográficosDiccionario de la Real Academia de la LenguaWordNetWikipediaBibliografía de varias áreas de conocimientoHerencia: Estructuración actual de catálogosAsesoramiento expertos en toponimia del IGNLuis Manuel Vilches Blázquez
 Modelización del dominio hidrográfico Luis Manuel Vilches Blázquez
Implementación & Formalizacón+  Pellet41253+150 conceptos (classes) , 47 tipos de relaciones (properties) y 64 tipos de atributos (attribute types)Luis Manuel Vilches Blázquez
2. Vocabularydevelopment: HydrOntology58Asunción Gómez Pérez
3. Generation of RDFFrom the Data sourcesGeographic information (Databases)Statistic information (.xsl)Geospatial information Different technologies for RDF generationReengineering patternsR20 and ODEMapsterAnnotation toolsGeometry generationIdentificationof the data sourcesVocabularydevelopmentGenerationof the RDF DataPublicationof the RDF data Data cleansingLinking the RDF dataEnable effective discovery
3. Generation of the RDF DataNOR2OINEODEMapsterIGNGeometry2RDFGeospatialcolumnIGN
3. Generation of the RDF Data / instances NOR2O is a software librarythatimplementsthetransformationsproposedbythePatternsfor Re-engineering Non-OntologicalResources (PR-NOR). Currentlywehave 16 PR-NORs.PR-NORs define a procedurethattransforms a Non-OntologicalResource (NOR) componentsintoontologyelements. http://ontologydesignpatterns.org/· ClassificationschemesNOR2O· Thesauri· LexiconsNOR2OFAO Water classification· Classification scheme· Path enumeration data model· Implemented in a database
Re-engineeringModelforNORsPatterns for Re-engineeringNon-Ontological Resources (PR-NOR)Ontology Forward Engineering    Con-ceptualSpeci-ficationNOR Reverse EngineeringConceptua-lizationTransformationRequirementsFormalizationDesignImplementationImplementationRDF(S)Non-Ontological ResourceOntology
PR-NOR library at the ODP PortalTechnologicalsupporthttp://ontologydesignpatterns.org/wiki/Submissions:ReengineeringODPs
3. Generation of the RDF Data – NOR2ONOR2OYearIndustry Production IndexProvince
hydrOntology & databasesNGN1:25.000multilingüe
3. Generation of the RDF Data – R2O & ODEMapsterCreation of the R2O Mappings
3. Generation of the RDF Data – Geometry2RDFOracle STO UTIL package SELECT  TO_CHAR(SDO_UTIL.TO_GML311GEOMETRY(geometry)) 	AS Gml311GeometryFROM "BCN200"."BCN200_0301L_RIO" cWHERE   c.Etiqueta='Arroyo'
3. Generation of the RDF Data – Geometry2RDF
3. Generation of the RDF Data – Geometry2RDF
3. Generation of the RDF data – RDF graphs			IGN				        INESo far7 RDF NamedGraphs1.412.248 triplesBTN25BCN200IPI….http://geo.linkeddata.es/dataset/IGN/BTN25http://geo.linkeddata.es/dataset/IGN/BCN200http://geo.linkeddata.es/dataset/INE/IPI
4. Publication of the RDF DataIdentificationof the data sourcesVocabularydevelopmentSPARQLLinked DataHTMLGenerationof the RDF DataIncludingProvenanceSupportPublicationof the RDF data PubbyPubby 0.3Data cleansingLinking the RDF dataEnable effective discoveryVirtuoso 6.1.0
4. Publication of the RDF Data
4. Publication of the RDF Data - LicenseLicense for GeoLinkedDataCreative Commons Attribution-ShareAlike 3.0 GNU Free Documentation LicenseEach dataset will have its own specific license, IGN, INE, etc.
5. Data cleansingIdentificationof the data sourcesLack of documentation of the IGN datasetsBroken links: Spain, IGN resourcesLack of documentation of theontologyMissingenglish and spanishlabelsBuilding a spanish ontology and importing some concepts of other ontology (in English):Importing the English ontology. Add annotations like a Spanish label to them.Importing the English ontology, creating new concepts and properties with a Spanish name and map those to the English equivalents.Re-declaring the terms of the English ontology that we need (using the same URI as in the English ontology), and adding a Spanish label.Creating your own class and properties that model the same things as the English ontology. VocabularydevelopmentGenerationof the RDF DataPublicationof the RDF data Data cleansingLinking the RDF dataEnable effective discovery
5. Data cleansingURIs in Spanishhttp://geo.linkeddata.es/ontology/RíoRDF allows UTF-8 characters for URIsBut, Linked Data URIs has to be URLs as wellSo, non ASCII-US characters have to be %codehttp://geo.linkeddata.es/ontology/R%C3%ADo
6. Linking of the RDF DataIdentificationof the data sourcesSilk - A Link Discovery Framework for the Web of DataFirst set of links: Provinces of Spain86% accuracyVocabularydevelopmentGeonamesGenerationof the RDF DataGeoLinkedDataDBPediaPublicationof the RDF data Data cleansingLinking the RDF dataEnable effective discovery
6. Linking of the RDF Datahttp://geo.linkeddata.es/page/Provincia/Granada77Asunción Gómez Pérez
7. Enable effective discoveryIdentificationof the data sourcesVocabularydevelopmentGenerationof the RDF DataPublicationof the RDF data Data cleansingLinking the RDF dataEnable effective discovery
DEMOhttp://geo.linkeddata.es/
Provinces
IndustryProductionIndex – Capital of Province
Rivers
Beaches
ContentsIntroductiontoLinked DataLinked Data publicationMethodologicalguidelinesforLinked Data publicationRDB2RDF toolsTechnicalaspects of Linked Data publicationLinked Data consumption84

Introduction to Linked Data

  • 1.
    IntroductiontoLinked DataOscar Corcho,Asunción Gómez Pérez ({ocorcho, asun}@fi.upm.es)Universidad Politécnica de MadridUniversidad del Valle, Cali, ColombiaSeptember 10th 2010Credits: Raúl García Castro, Oscar Muñoz, Jose Angel Ramos Gargantilla, María del Carmen Suárez de Figueroa, Boris Villazón, Alex de León, Víctor Saquicela, Luis Vilches, Miguel Angel García, Manuel Salvadores, Guillermo Alvaro, Juan Sequeda, Carlos Ruiz Moreno and manyothersWorkdistributedunderthelicenseCreativeCommonsAttribution-Noncommercial-Share Alike 3.0
  • 2.
    ContentsIntroductiontoLinked DataLinked DatapublicationMethodologicalguidelinesforLinked Data publicationRDB2RDF toolsTechnicalaspects of Linked Data publicationLinked Data consumption2
  • 3.
    Whatisthe Web ofLinked Data?An extension of the current Web…… where information and services are given well-defined and explicitly represented meaning, …… so that it can be shared and used by humans and machines, ...... better enabling them to work in cooperationHow?Promoting information exchange by tagging web content with machineprocessable descriptions of its meaning. And technologies and infrastructure to do thisAnd clear principles on how to publish datadata
  • 4.
    What is LinkedData?Linked Data is a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF.Part of the Semantic WebExposing, sharing and connecting dataTechnologies: URIs and RDF (although others are also important)
  • 5.
    The fourprinciples (TimBerners Lee, 2006)Use URIs as names for things Use HTTP URIs so that people can look up those names. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) Include links to other URIs, so that they can discover more things. http://www.w3.org/DesignIssues/LinkedData.html5http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html
  • 6.
    Linked Open Dataevolution2007
  • 7.
  • 8.
    20097LOD Cloud May2007Facts:Focal points:
  • 9.
    DBPedia: RDFizedvesion ofWikipiedia; many ingoing and outgoing links
  • 10.
  • 11.
    Big datasets includeFOAF, US Census data
  • 12.
    Size approx. 1billion triples, 250k linksFigure from [4]
  • 13.
    8LOD Cloud September2008Facts:More than 35 datasets interlinked
  • 14.
    Commercial players joinedthe cloud, e.g., BBC
  • 15.
    Companies began topublish and host dataset, e.g. OpenLink, Talis, or Garlik.
  • 16.
    Size approx. 2billion triples, 3 million linksFigure from [4]
  • 17.
    9LOD Cloud March2009Facts:Big part from Linking Open Drug cloud and the BIO2RDF project (bottom)
  • 18.
    Notable new datasets:Freebase, OpenCalais, ACM/IEEE
  • 19.
    Size > 10billion triplesFigure from [4]
  • 20.
  • 21.
    WhyLinked Data?Basically, tomovefroma Web of documentsto a Web of DataLet’s try anexample:Tell me whichfootballplayers, born in theprovince of Albacete, in Spain, havescored a goal in theWorld Cup finalDisclaimer:Sorryto use anexampleaboutfootball, butyouhavetounderstandthatforseveralyearsSpaniardswillbetalkingaboutfootball a lot ;-)Example courtesy of Guillermo Alvaro Rey
  • 22.
    Informationsearch in theWeb of documents¿?Example courtesy of Guillermo Alvaro Rey
  • 23.
    What we wereactually looking forExample courtesy of Guillermo Alvaro Rey
  • 24.
    Itwouldbebettertomake a dataquery…(footballplayersfrom Albacete whoplayedEurocup 2008)Example courtesy of Guillermo Alvaro Rey
  • 25.
    Howshouldwepublish data?Formats inwhich data ispublishednowadays…XMLHTMLDBsAPIsCSVXLS…However, mainlimitationsfrom a Web of Data point of viewDifficulttointegrateData isnotlinkedtoeachother, as ithappenswith Web documents.
  • 26.
    Which format dowe use then?RDF (ResourceDescription Framework)Data modelBasedon triples: subject, predicate, object<Oscar> <vive en> <Madrid><Madrid> <es la capital de> <España><España> <es campeona de> <Mundial de Fútbol>…Serialised in differentformatsRDF/XML, RDFa, N3, Turtle, JSON…
  • 27.
    URIs (Universal-UniformResourceIdentifer)Two typesof identifiers can be used to identify Linked Data resourcesURIRefs(Unique Resource IdentifiersReferences)A URI and an optional FragmentIdentifier separated from the URI by the hash symbol ‘#’http://www.ontology.org/people#Personpeople:PersonPlain URIs can also be used, as in FOAF:http://xmlns.com/foaf/0.1/Person17
  • 28.
    How do wepublishLinkedData?ExposingRelationalDatabasesorother similar formatsintoLinked DataD2RTriplifyR2ONOR2OVirtuosoUltrawrap…Usingnative RDF triplestoresSesameJenaOwlimTalisplatform…Incorporatingit in theform of RDFa in CMSslikeDrupal18
  • 29.
    How do weconsume Linked Data?Linked Data browsersTo explore things and datasets and to navigate between them.Tabulator Browser (MIT, USA), Marbles (FU Berlin, DE), OpenLink RDF Browser (OpenLink, UK), Zitgist RDF Browser (Zitgist, USA), Disco Hyperdata Browser (FU Berlin, DE), Fenfire (DERI, Ireland)Linked Data mashupsSites that mash up (thus combine Linked data)Revyu.com (KMI, UK), DBtune Slashfacet (Queen Mary, UK), DBPedia Mobile (FU Berlin, DE), Semantic Web Pipes (DERI, Ireland) Search enginesTo search for Linked Data.Falcons (IWS, China), Sindice (DERI, Ireland), MicroSearch (Yahoo, Spain), Watson (Open University, UK), SWSE (DERI, Ireland), Swoogle (UMBC, USA)Listing on this slide by T. Heath, M. Hausenblas, C. Bizer, R. Cyganiak, O. Hartig19
  • 30.
  • 31.
    Linked Data Mashup(LinkedGeoData)© Migración de datos a la Web de los Datos - Enfoques, técnicas y herramientasLuis Manuel Vilches Blázquez
  • 32.
    Linked Data Mashup(DBpedia Mobile)http://wiki.dbpedia.org/DBpediaMobile© Migración de datos a la Web de los Datos - Enfoques, técnicas y herramientasLuis Manuel Vilches Blázquez
  • 33.
    Linked DataSearch Engines (Sindice and SIG.MA)Entity lookup service. Find a document that mentions a URI or a keyword.
  • 34.
    Linked Data SearchEngines(NYT)The New York Times: Alumni In The Newshttp://data.nytimes.com/schools/schools.html
  • 35.
    Linked Data SearchEngines(NYT)The New York Times: Source code is available… and is based on SPARQL queries
  • 36.
    Oneadditionalmotivation: Open GovernmentGovernmentand state administration should be opened at all levels to effective public scrutiny and oversightObjectives:TransparencyParticipationCollaborationInclusionCost reductionInteroperabilityReusabilityLeadershipMarket & Value26Some Links:
  • 37.
    B. Obama–Transparency and Open Government
  • 38.
    T. Berners-Lee- Raw data now!
  • 39.
    J. ManuelAlonso - ¿Qué es Open Data?
  • 40.
  • 41.
    8 Principles ofOpen Government DataOpen Government. USA and UK27BOTTOM-UPTop-down
  • 42.
    Linked Data Mashup(data.gov)Clean Air Status and Trends (CASTNET)http://data-gov.tw.rpi.edu/demo/exhibit/demo-8-castnet.php
  • 43.
    Linked Data inthe UKEducationhttp://education.data.gov.uk/id/school/106661Parliamenthttp://parliament.psi.enakting.org/id/member/1227MapsE.g., London: http://data.ordnancesurvey.co.uk/id/7000000000041428http://map.psi.enakting.orgTransporthttp://www.dft.gov.uk/naptan/SameAs servicehttp://www.sameas.orgChallengeshttp://gov.tso.co.uk/openup/sparql/gov-transport29
  • 44.
    Linked Data Mashup(data.gov.uk)Research Funding Explorerhttp://bis.clients.talis.com/
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
    Linked Data Mashup(Waterquality)Water quality in Asturias’ beacheshttp://datos.fundacionctic.org/sandbox/asturias/playas/
  • 50.
    ContentsIntroductiontoLinked DataLinked DatapublicationMethodologicalguidelinesforLinked Data publicationRDB2RDF toolsTechnicalaspects of Linked Data publicationLinked Data consumption36
  • 51.
    GeoLinkedDataIt is anopen initiative whose aim is to enrich the Web of Data with Spanish geospatial data.This initiative has started off by publishing diverse information sources, such as National Geographic Institute of Spain (IGN-E) and National Statistics Institute (INE)http://geo.linkeddata.es
  • 52.
    Motivation 99.171 % English 0.019% SpanishThe Web of Data ismainlyforEnglishspeakersPoorpresence of SpanishSource:Billion Triples dataset at http://km.aifb.kit.edu/projects/btc-2010/Thanks to Aidan and Richard
  • 53.
  • 54.
    Impact of Geo.linkeddata.esNumberof triples in Spanish (July 2010): 1.412.248 Number of triples in Spanish (EndAugust 2010): 21.463.08840Asunción Gómez Pérez
  • 55.
    Processfor Publishing LinkedData onthe WebIdentificationof the data sourcesVocabularydevelopmentGenerationof the RDF DataPublicationof the RDF data Data cleansingLinking the RDF dataEnable effective discovery
  • 56.
    1. Identification andselection of the data sourcesIdentificationof the data sourcesInstituto GeográficoNacionalVocabularydevelopmentGenerationof the RDF DataPublicationof the RDF data Data cleansingLinking the RDF dataInstituto Nacionalde EstadísticaEnable effective discovery
  • 57.
    1. Identification andselection of the data sourcesInstituto Geográfico Nacional (GeographicSpanishInstitute)Multilingual (Spanish, Vasc, Gallician, Catalan)ConceptualizationmistmatchesGranularity (scale concept)Textual informationParticularatiesLongitudelatitudeInstituto Nacional de Estadística (StatisticSpanishInstitute)
  • 58.
  • 59.
  • 60.
  • 61.
    1. Identification andselection of the data sourcesIGN-E
  • 62.
    1. Identification andselection of the data sourcesIndustryProductionIndexYearProvince
  • 63.
    2. Vocabulary developmenthttp://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/#whichvocabsIdentificationofthe data sourcesVocabularydevelopmentGenerationof the RDF DataThisisnotenoughPublicationof the RDF data Data cleansingLinking the RDF dataEnable effective discovery
  • 64.
    2. VocabularydevelopmentFeaturesLightweight :Taxonomies and a fewpropertiesConsensuatedvocabulariesToavoidthemappingproblemsMultilingualLinked data are multilingualTheNeOnmethodology can helptoRe-enginer Non ontologicalresourcesintoontologiesPros: use domainterminologyalreadyconsensuatedbydomainexpertsWithdraw in heavyweightontologiesthosefeaturesthatyoudon’tneedReuseexistingvocabularies47Identificationof the data sourcesVocabularydevelopmentGenerationof the RDF DataPublicationof the RDF data Data cleansingLinking the RDF dataEnable effective discoveryAsunción Gómez Pérez
  • 65.
    Knowledge ResourcesOntological ResourcesO.Design Patterns34O. Repositories and Registries56FlogicRDF(S)OWLOntologicalResourceReuse O. Aligning O. Merging562Ontology DesignPattern ReuseNon Ontological ResourceReuse436Non Ontological Resources2Ontological ResourceReengineering7GlossariesDictionariesLexicons5Non Ontological ResourceReengineering46ClassificationSchemasThesauriTaxonomiesAlignments2RDF(S)1FlogicO. ConceptualizationO. ImplementationO. FormalizationO. SpecificationSchedulingOWL8Ontology Restructuring(Pruning, Extension, Specialization, Modularization)9O. Localization1,2,3,4,5,6,7,8, 9Ontology Support Activities: Knowledge Acquisition (Elicitation); Documentation; Configuration Management; Evaluation (V&V); Assessment48
  • 66.
    Vocabularydevelopment: SpecificationContent requirements:Identifythe set of questionsthattheontologyshouldanswerWhichone are theprovinces in Spain?Where are thebeaches?Where are thereservoirs?Identifytheproductionindex in MadridWhichoneisthecitywithhigherproductionindex?Give me Madrid latitude and altitude….Non-contentrequirementsTheontologymustbe in thefourofficialSpanishlanguages49Asunción Gómez Pérez
  • 67.
    2. Lightweight OntologyDevelopmentWGS84 Geo Positioning: an RDF vocabularyscv:Dimensionscv:Itemscv:Datasethydrographical phenomena (rivers, lakes, etc.)Vocabulary for instants, intervals, durations, etc.Names and international code systems for territories and groupsOntology for OGC Geography Markup Language reusedFollowing the INSPIRE (INfrastructure for SPatial InfoRmation in Europe) recommendation.hydrOntology,SCOVO, FAO Geopolitcal, WGS84, GML, and Time
  • 68.
    Objetivos:INSPIRE intenta conseguirfuentes armonizadas de Información Geográfica para dar soporte a la formulación, implementación y evaluación de políticas comunitarias (Medio Ambiente, etc).Fuentes de Información Geográfica: Bases de datos de los Estados Miembros (UE) a nivel local, regional, nacional e internacional.Contexto – Directiva INSPIRE Luis Manuel Vilches Blázquez
  • 69.
    INSPIRE - AnexosLuisManuel Vilches Blázquez
  • 70.
    hydrOntologyExistencia de grandiversidad de problemas (múltiples fuentes, heterogeneidad de contenido y estructuración, ambigüedad del lenguaje natural, etc.) en la información geográfica.Necesidad de un modelo compartido para solventar los problemas de armonización y estructuración de la información hidrográfica.hydrOntology es una ontología global de dominio desarrollada conforme a un acercamiento top-down. Recubrir la mayoría de los fenómenos representables cartográficamente asociados al dominio hidrográfico.Servir como marco de armonización entre los diferentes productores de información geo-espacial en el entorno nacional e internacional.Comenzar con los pasos necesarios para obtener una mejor organización y gestión de la información geográfica (hidrográfica).Luis Manuel Vilches Blázquez
  • 71.
    FuentesTesauros y BibliografíaCatálogosde fenómenosGettyFTT ADLBCN25GEMETWFDCC.AA.EGM & ERMDiccionarios yMonografíasBCN200Nomenclátor Geográfico NacionalNomenclátor ConcisoLuis Manuel Vilches Blázquez
  • 72.
    Criterios de estructuraciónDirectiva Marco del AguaPropuesta por Parlamento y Consejo de la UELista de definiciones de fenómenos hidrográficosProyecto SDIGERProyecto piloto INSPIREDos cuencas, países e idiomasCriterios semánticosDiccionarios geográficosDiccionario de la Real Academia de la LenguaWordNetWikipediaBibliografía de varias áreas de conocimientoHerencia: Estructuración actual de catálogosAsesoramiento expertos en toponimia del IGNLuis Manuel Vilches Blázquez
  • 73.
    Modelización deldominio hidrográfico Luis Manuel Vilches Blázquez
  • 74.
    Implementación & Formalizacón+ Pellet41253+150 conceptos (classes) , 47 tipos de relaciones (properties) y 64 tipos de atributos (attribute types)Luis Manuel Vilches Blázquez
  • 75.
  • 76.
    3. Generation ofRDFFrom the Data sourcesGeographic information (Databases)Statistic information (.xsl)Geospatial information Different technologies for RDF generationReengineering patternsR20 and ODEMapsterAnnotation toolsGeometry generationIdentificationof the data sourcesVocabularydevelopmentGenerationof the RDF DataPublicationof the RDF data Data cleansingLinking the RDF dataEnable effective discovery
  • 77.
    3. Generation ofthe RDF DataNOR2OINEODEMapsterIGNGeometry2RDFGeospatialcolumnIGN
  • 78.
    3. Generation ofthe RDF Data / instances NOR2O is a software librarythatimplementsthetransformationsproposedbythePatternsfor Re-engineering Non-OntologicalResources (PR-NOR). Currentlywehave 16 PR-NORs.PR-NORs define a procedurethattransforms a Non-OntologicalResource (NOR) componentsintoontologyelements. http://ontologydesignpatterns.org/· ClassificationschemesNOR2O· Thesauri· LexiconsNOR2OFAO Water classification· Classification scheme· Path enumeration data model· Implemented in a database
  • 79.
    Re-engineeringModelforNORsPatterns for Re-engineeringNon-OntologicalResources (PR-NOR)Ontology Forward Engineering Con-ceptualSpeci-ficationNOR Reverse EngineeringConceptua-lizationTransformationRequirementsFormalizationDesignImplementationImplementationRDF(S)Non-Ontological ResourceOntology
  • 80.
    PR-NOR library atthe ODP PortalTechnologicalsupporthttp://ontologydesignpatterns.org/wiki/Submissions:ReengineeringODPs
  • 81.
    3. Generation ofthe RDF Data – NOR2ONOR2OYearIndustry Production IndexProvince
  • 82.
  • 83.
    3. Generation ofthe RDF Data – R2O & ODEMapsterCreation of the R2O Mappings
  • 84.
    3. Generation ofthe RDF Data – Geometry2RDFOracle STO UTIL package SELECT TO_CHAR(SDO_UTIL.TO_GML311GEOMETRY(geometry)) AS Gml311GeometryFROM "BCN200"."BCN200_0301L_RIO" cWHERE c.Etiqueta='Arroyo'
  • 85.
    3. Generation ofthe RDF Data – Geometry2RDF
  • 86.
    3. Generation ofthe RDF Data – Geometry2RDF
  • 87.
    3. Generation ofthe RDF data – RDF graphs IGN INESo far7 RDF NamedGraphs1.412.248 triplesBTN25BCN200IPI….http://geo.linkeddata.es/dataset/IGN/BTN25http://geo.linkeddata.es/dataset/IGN/BCN200http://geo.linkeddata.es/dataset/INE/IPI
  • 88.
    4. Publication ofthe RDF DataIdentificationof the data sourcesVocabularydevelopmentSPARQLLinked DataHTMLGenerationof the RDF DataIncludingProvenanceSupportPublicationof the RDF data PubbyPubby 0.3Data cleansingLinking the RDF dataEnable effective discoveryVirtuoso 6.1.0
  • 89.
    4. Publication ofthe RDF Data
  • 90.
    4. Publication ofthe RDF Data - LicenseLicense for GeoLinkedDataCreative Commons Attribution-ShareAlike 3.0 GNU Free Documentation LicenseEach dataset will have its own specific license, IGN, INE, etc.
  • 91.
    5. Data cleansingIdentificationofthe data sourcesLack of documentation of the IGN datasetsBroken links: Spain, IGN resourcesLack of documentation of theontologyMissingenglish and spanishlabelsBuilding a spanish ontology and importing some concepts of other ontology (in English):Importing the English ontology. Add annotations like a Spanish label to them.Importing the English ontology, creating new concepts and properties with a Spanish name and map those to the English equivalents.Re-declaring the terms of the English ontology that we need (using the same URI as in the English ontology), and adding a Spanish label.Creating your own class and properties that model the same things as the English ontology. VocabularydevelopmentGenerationof the RDF DataPublicationof the RDF data Data cleansingLinking the RDF dataEnable effective discovery
  • 92.
    5. Data cleansingURIsin Spanishhttp://geo.linkeddata.es/ontology/RíoRDF allows UTF-8 characters for URIsBut, Linked Data URIs has to be URLs as wellSo, non ASCII-US characters have to be %codehttp://geo.linkeddata.es/ontology/R%C3%ADo
  • 93.
    6. Linking ofthe RDF DataIdentificationof the data sourcesSilk - A Link Discovery Framework for the Web of DataFirst set of links: Provinces of Spain86% accuracyVocabularydevelopmentGeonamesGenerationof the RDF DataGeoLinkedDataDBPediaPublicationof the RDF data Data cleansingLinking the RDF dataEnable effective discovery
  • 94.
    6. Linking ofthe RDF Datahttp://geo.linkeddata.es/page/Provincia/Granada77Asunción Gómez Pérez
  • 95.
    7. Enable effectivediscoveryIdentificationof the data sourcesVocabularydevelopmentGenerationof the RDF DataPublicationof the RDF data Data cleansingLinking the RDF dataEnable effective discovery
  • 96.
  • 97.
  • 98.
  • 99.
  • 100.
  • 101.
    ContentsIntroductiontoLinked DataLinked DatapublicationMethodologicalguidelinesforLinked Data publicationRDB2RDF toolsTechnicalaspects of Linked Data publicationLinked Data consumption84
  • 102.
    Ontology-based Access toDBs1324Build a new ontology from 1 DB schema and 1 DBAlign the ontology built with approach 1 with a legacy ontologyAlign an existing DB with a legacy ontology a) Massive dump (semantic data warehouse) b) Query-drivenAlign an ontology network with n DB schemas and other data sources a) Massive dump (semantic data warehouse) b) Query-drivennew ontologyexisting ontology
  • 103.
    Ontology-based Access toDatabasesUniversidadProfesorDoctorandoOntología?OrganizaciónPersonalBDRModeloRelacionalPregunta: Nombre de los profesores de la universidad UPM* Un profesor es una persona cuyo puesto es “docente”* Una universidad es una organización de tipo “3”ProcesadorProcesado de la consulta de acuerdo a la descripción formal de correspondenciaConsulta: valores de la columna nombre de los registros de la tabla Personal para los que el valor de la columna puesto is “docente” que estén relacionados con al menos un registro de la tabla Organización con el valor “3” en la columna tipo y “UPM” en la columna nombre.
  • 104.
    Align data sourceswithlegacyontologiesAeropuertosOntologíaO2Ontología O1CentroComunicacionesPuntoGPSEstaciónPunto EuropeoAeropuertoPuntoAsiaticoPuntoEspañolAeropuertof (Aeropuertos)=PuntoEuropeof (Aeropuertos)=RC(O2,M1)RC(O1,M1)Modelo Relacional M1
  • 105.
    R2O is a declarative language to specify mappings between relational data sources and ontologies.<xml>R2O Mapping</xml>OrganizationPersonsUniversityRDBProfessorStudentRelational ModelOntology
  • 106.
    Example: types ofmappingsneededAttibute Mapping with transformation(Regular Expression)Attibute Direct MappingRelation Mapping w. Transformation(Regular Expression)Relation Mapping w. Transformation(Keyword search)
  • 107.
    Population example (II)Populationexample (II)The Operation element defines a transformation based on a regular expression to be applied to the database column for extracting property values
  • 108.
    For concepts...One ormore concepts can be extracted from a single data field (not in 1NF).A view maps exactly one concept in the ontology.For attributes...A column in a database view maps directly an attribute or a relation.A subset of the columns in the view map a concept in the ontology.A subset (selection) of the records of a database view map a concept in the ontology.A column in a database view maps an attribute or a relation after some transformation.A subset of the records of a database view map a concept in the onto. but the selection cannot be made using SQL.A set of columns in a database view map an attribute or a relation.R2O (Relational-to-Ontology) Language
  • 109.
    R2O Basic Syntax<conceptmap-defname="Customer"> <identified-by> Table key </identified-by> <uri-as>operation</uri-as> <applies-if>condition</applies-if> <joins-via> expression </joins-via> <documentation>description …</documentation> <described-by>attributes,relations</described-by></conceptmap-def><attributemap-defname="http://esperonto/ff#Title"> <aftertransform> <operationoper-id="constant"> <arg-restrictionon-param="const-val"> <has-column>fsb_ajut.titol</has-column> </arg-restriction> </operation> </aftertransform></attributemap-def><relationmap-defname="http://esperonto/ff#isCandidateFor"> <to-concept name="http://esperonto/ff#FundOpp"> <joins-via> <operationoper-id=“equals"> <arg-restrictionon-param="value1"> <has-column>fsb_ajut.id</has-column> </arg-restriction> <arg-restriction on-param="value2"> <has-column>fsb_candidate.forFund</has-column> </arg-restriction> </operation> </joins-via></relationmap-def>
  • 110.
    ODEMapster generates RDF instances from relational instances based on the mapping description expressed in the R2O document
  • 111.
  • 112.
  • 113.
  • 114.
  • 115.
  • 116.
    Online mode (runtime query execution)
  • 117.
    Offline mode (materializedRDF dump)ContentsIntroductiontoLinked DataLinked Data publicationMethodologicalguidelinesforLinked Data publicationRDB2RDF toolsTechnicalaspects of Linked Data publicationLinked Data consumption95
  • 118.
    Using an RDFrepositoryItallowsstoring and accessing RDF dataForexample, SESAME (http://www.openrdf.org/)Downloaditfromhttp://www.openrdf.org/download.jspopenrdf-sesame-2.3.0-sdk.zipDeploythe .war in Tomcat (JDK and Tomcatneeded)Create a repository athttp://localhost:8080/openrdf-sesameCheck: http://localhost:8080/openrdf-sesame/repositories/XXXXhttp://localhost:8080/openrdf-sesame/repositories/XXX/statements
  • 119.
    Linked Data frontendToexposedata as Linked DataIncludingcontentnegotiation, etc.Forexample, Pubbyhttp://www4.wiwiss.fu-berlin.de/pubby/InstallationUse pubby-0.3.zipDeploythewebapp folder (and rename)in TomcatModify config.n3RestarttomcatCheck: http://localhost:8080/XXX/
  • 120.
  • 121.
    Java abstractionover RDFrepositorieshttp://rdf2go.semweb4j.org/
  • 122.
    Add SPARQL explorerForexample,SNORQL (http://wiki.github.com/kurtjx/SNORQL/)
  • 123.
  • 124.
    ContentsIntroductiontoLinked DataLinked DatapublicationMethodologicalguidelinesforLinked Data publicationRDB2RDF toolsTechnicalaspects of Linked Data publicationLinked Data consumption101
  • 125.
    RelFinder: finding relationsin Linked DataE.g., relations between films“Pulp Fiction”, “Kill Bill” y “Reservoir Dogs”
  • 126.
    Exerciseon data.gov.ukPublicschools inLondon thatcontaintheword “music”Exercise: find information in DBPediaImage by http://www.flickr.com/photos/bflv/http://dbpedia.org/resource/Darth_Vader)Findficticious serial killers in DBPedia(etc)
  • 127.
    Designing URI setsforthePublic Sector (UK)http://www.cabinetoffice.gov.uk/media/301253/puiblic_sector_uri.pdf105
  • 128.
  • 129.
    IntroductiontoLinked DataOscar Corcho,Asunción Gómez Pérez ({ocorcho, asun}@fi.upm.es)Universidad Politécnica de MadridUniversidad del Valle, Cali, ColombiaSeptember 10th 2010Credits: Raúl García Castro, Oscar Muñoz, Jose Angel Ramos Gargantilla, María del Carmen Suárez de Figueroa, Boris Villazón, Alex de León, Víctor Saquicela, Luis Vilches, Miguel Angel García, Manuel Salvadores, Guillermo Alvaro, Juan Sequeda, Carlos Ruiz Moreno and manyothersWorkdistributedunderthelicenseCreativeCommonsAttribution-Noncommercial-Share Alike 3.0

Editor's Notes

  • #64 727 veces el term-based record-basedthesaurus