KEMBAR78
Semantic Search on the Public Web with Creative Commons | PPT
Semantic Search on the Public Web with Creative Commons 2006.03.07 Mike Linksvayer
Billion$ (0) Let's get the hype out of the way....
Billion$ (1) Let's get the hype out of the way....
Billion$ (2) Let's get the hype out of the way....
Billion$ (3) This calls for a mashup...
Billion$ (4)
Billion$ (5) Fortunately CC's founders thought of that from the beginning...
Billion$ (6)
Billion$ (7)
About Creative Commons
 
Core Licensing Suite:  Creator/Licensor chooses license options NonCommercial No Derivatives ShareAlike Every Creative Commons licenses allows the world to copy and distribute a work provided that the licensee credits the creator/licensor  In addition creator/licensor may apply the following conditions:
 
Simple License Generator
Internet Archive Free Hosting for CC works http://www.archive.org/
Creative Commons Metadata
Creative Commons Metadata Example <rdf:RDF xmlns=&quot;http://web.resource.org/cc/&quot; xmlns:dc=&quot;http://purl.org/dc/elements/1.1/&quot; xmlns:rdf=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot;> <Work rdf:about=&quot;http://example.com/article.html&quot;> <dc:title>An Example Article</dc:title> <dc:date>2003-10-01</dc:date> <dc:type rdf:resource=&quot;http://purl.org/dc/dcmitype/Text&quot; /> <license rdf:resource=&quot;http://creativecommons.org/licenses/by-nc-sa/2.5/&quot; /> </Work> <License rdf:about=&quot;http://creativecommons.org/licenses/by-nc-sa/2.5/&quot;> <permits rdf:resource=&quot;http://web.resource.org/cc/Reproduction&quot; /> <permits rdf:resource=&quot;http://web.resource.org/cc/Distribution&quot; /> <requires rdf:resource=&quot;http://web.resource.org/cc/Notice&quot; /> <requires rdf:resource=&quot;http://web.resource.org/cc/Attribution&quot; /> <prohibits rdf:resource=&quot;http://web.resource.org/cc/CommercialUse&quot; /> <permits rdf:resource=&quot;http://web.resource.org/cc/DerivativeWorks&quot; /> <requires rdf:resource=&quot;http://web.resource.org/cc/ShareAlike&quot; /> </License> </rdf:RDF>
Rights Description Use Cases Discovery Expression Commerce Management(1)
Rights Description vs. Rights Management(2) Copy/Use promotion vs. Copy/Use protection Encourage fans vs. Discourage casual pirates Resource management vs. Customer management Web content model vs. 20 th  century content model Not mutually exclusive in theory.
Why Semantic Web? Small organization, no central registration for every license  Decentralization: Let a thousand search engines bloom; web as API  Existing RDF tools could take advantage of CC RDF
Why RDF-in-HTML comments? (yuck) Considered: Robots.txt-like HTML meta tags LINK to external RDF file RDF-in-HTML comments wins because Metadata colocated with human visible HTML, only single copy & paste for licensors Full power of RDF
CC Search History I Postgresql/tsearch2/python prototype (early 2004) Sloooowwwww, but did what a prototype should do
CC Search History II CC-Nutch (late 2004) Nutch aims to be open source search engine comparable to commercial web scale search engines Built on top of Lucene full text index CC plugin only ~500 lines of code (not counting UI, CC-required additions to Nutch core) http://search.creativecommons.org  uses Nutch, >1m CC-licensed pages indexed
 
CC Search History III Yahoo! Search for Creative Commons (early 2005) Search CC-licensed subset of Yahoo!’s index (~15m* pages) *very rough guesstimate
 
 
 
CC Search History IV Google CC search (November 2005) Search CC-licensed subset of Google’s index (~45m* pages) *very rough guesstimate
 
 
 
CC Search History V (the future) Better metadata formats Image and Video search Derivatives search Content commerce search “ Live” web search “ Management” (desktop, workgroup) Semantic mashups
Future CC metadata formats “ Semantic XHTML” AKA “lowercase semantic web” AKA “microformats” (now) <a  rel=“license”  href=“ http://creativecommons.org/licenses/by/2.5/ ”> RDF/A AKA XHTML2 metadata (in working group) GRDDL (gleaning resource descriptions from dialects of languages)
 
 
 
 
Image and Video search Better metadata formats Image and Video search Derivatives search Content commerce search “ Live” web search “ Management” (desktop, workgroup) Semantic mashups
Searching for Derivative Works
Creative Commons (0)
Creative Commons (0)
Creative Commons (0)
Creative Commons (0)
Derivatives search RDF/XML snippet: <dc:source rdf:resource=”http://ccmixter.org/media/files/victor/3385”/> Query like Yahoo! link: search or Technorati Cosmos search source:http://ccmixter.org/media/files/victor/3385 “ Who sampled this” as the new “who linked to this”
Content commerce search Transaction costs should be low even if rights are reserved Commercial terms and other commerce described by metadata associated with a work Find me work I can use at a price I can pay for usage rights warranty/paper trail (even if rights not reserved) Reintermediate consumer and creator
“ Live” web search (feeds) Feeds are explicitly metadata-rich (unlike typical web page) Existing blog search ignores metadata Web search will become more like blog search, vice versa?
“ Management” (desktop, workgroup) Desktop search (OS-level) Content creation and media player integration XMP Semantic Wikis
Semantic mashups
Issues for Semantic Search on the Public Web Metadata quality Trust Scalability Usability Compatibility Critical mass State of the art IR works very well – high expectations!
Semantic Search on the Public Web with Creative Commons 2006.03.07 Mike Linksvayer Questions, feedback, flames: [email_address] http://developer.creativecommons.org

Semantic Search on the Public Web with Creative Commons

  • 1.
    Semantic Search onthe Public Web with Creative Commons 2006.03.07 Mike Linksvayer
  • 2.
    Billion$ (0) Let'sget the hype out of the way....
  • 3.
    Billion$ (1) Let'sget the hype out of the way....
  • 4.
    Billion$ (2) Let'sget the hype out of the way....
  • 5.
    Billion$ (3) Thiscalls for a mashup...
  • 6.
  • 7.
    Billion$ (5) FortunatelyCC's founders thought of that from the beginning...
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
    Core Licensing Suite: Creator/Licensor chooses license options NonCommercial No Derivatives ShareAlike Every Creative Commons licenses allows the world to copy and distribute a work provided that the licensee credits the creator/licensor In addition creator/licensor may apply the following conditions:
  • 13.
  • 14.
  • 15.
    Internet Archive FreeHosting for CC works http://www.archive.org/
  • 16.
  • 17.
    Creative Commons MetadataExample <rdf:RDF xmlns=&quot;http://web.resource.org/cc/&quot; xmlns:dc=&quot;http://purl.org/dc/elements/1.1/&quot; xmlns:rdf=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot;> <Work rdf:about=&quot;http://example.com/article.html&quot;> <dc:title>An Example Article</dc:title> <dc:date>2003-10-01</dc:date> <dc:type rdf:resource=&quot;http://purl.org/dc/dcmitype/Text&quot; /> <license rdf:resource=&quot;http://creativecommons.org/licenses/by-nc-sa/2.5/&quot; /> </Work> <License rdf:about=&quot;http://creativecommons.org/licenses/by-nc-sa/2.5/&quot;> <permits rdf:resource=&quot;http://web.resource.org/cc/Reproduction&quot; /> <permits rdf:resource=&quot;http://web.resource.org/cc/Distribution&quot; /> <requires rdf:resource=&quot;http://web.resource.org/cc/Notice&quot; /> <requires rdf:resource=&quot;http://web.resource.org/cc/Attribution&quot; /> <prohibits rdf:resource=&quot;http://web.resource.org/cc/CommercialUse&quot; /> <permits rdf:resource=&quot;http://web.resource.org/cc/DerivativeWorks&quot; /> <requires rdf:resource=&quot;http://web.resource.org/cc/ShareAlike&quot; /> </License> </rdf:RDF>
  • 18.
    Rights Description UseCases Discovery Expression Commerce Management(1)
  • 19.
    Rights Description vs.Rights Management(2) Copy/Use promotion vs. Copy/Use protection Encourage fans vs. Discourage casual pirates Resource management vs. Customer management Web content model vs. 20 th century content model Not mutually exclusive in theory.
  • 20.
    Why Semantic Web?Small organization, no central registration for every license Decentralization: Let a thousand search engines bloom; web as API Existing RDF tools could take advantage of CC RDF
  • 21.
    Why RDF-in-HTML comments?(yuck) Considered: Robots.txt-like HTML meta tags LINK to external RDF file RDF-in-HTML comments wins because Metadata colocated with human visible HTML, only single copy & paste for licensors Full power of RDF
  • 22.
    CC Search HistoryI Postgresql/tsearch2/python prototype (early 2004) Sloooowwwww, but did what a prototype should do
  • 23.
    CC Search HistoryII CC-Nutch (late 2004) Nutch aims to be open source search engine comparable to commercial web scale search engines Built on top of Lucene full text index CC plugin only ~500 lines of code (not counting UI, CC-required additions to Nutch core) http://search.creativecommons.org uses Nutch, >1m CC-licensed pages indexed
  • 24.
  • 25.
    CC Search HistoryIII Yahoo! Search for Creative Commons (early 2005) Search CC-licensed subset of Yahoo!’s index (~15m* pages) *very rough guesstimate
  • 26.
  • 27.
  • 28.
  • 29.
    CC Search HistoryIV Google CC search (November 2005) Search CC-licensed subset of Google’s index (~45m* pages) *very rough guesstimate
  • 30.
  • 31.
  • 32.
  • 33.
    CC Search HistoryV (the future) Better metadata formats Image and Video search Derivatives search Content commerce search “ Live” web search “ Management” (desktop, workgroup) Semantic mashups
  • 34.
    Future CC metadataformats “ Semantic XHTML” AKA “lowercase semantic web” AKA “microformats” (now) <a rel=“license” href=“ http://creativecommons.org/licenses/by/2.5/ ”> RDF/A AKA XHTML2 metadata (in working group) GRDDL (gleaning resource descriptions from dialects of languages)
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
    Image and Videosearch Better metadata formats Image and Video search Derivatives search Content commerce search “ Live” web search “ Management” (desktop, workgroup) Semantic mashups
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
    Derivatives search RDF/XMLsnippet: <dc:source rdf:resource=”http://ccmixter.org/media/files/victor/3385”/> Query like Yahoo! link: search or Technorati Cosmos search source:http://ccmixter.org/media/files/victor/3385 “ Who sampled this” as the new “who linked to this”
  • 46.
    Content commerce searchTransaction costs should be low even if rights are reserved Commercial terms and other commerce described by metadata associated with a work Find me work I can use at a price I can pay for usage rights warranty/paper trail (even if rights not reserved) Reintermediate consumer and creator
  • 47.
    “ Live” websearch (feeds) Feeds are explicitly metadata-rich (unlike typical web page) Existing blog search ignores metadata Web search will become more like blog search, vice versa?
  • 48.
    “ Management” (desktop,workgroup) Desktop search (OS-level) Content creation and media player integration XMP Semantic Wikis
  • 49.
  • 50.
    Issues for SemanticSearch on the Public Web Metadata quality Trust Scalability Usability Compatibility Critical mass State of the art IR works very well – high expectations!
  • 51.
    Semantic Search onthe Public Web with Creative Commons 2006.03.07 Mike Linksvayer Questions, feedback, flames: [email_address] http://developer.creativecommons.org