KEMBAR78
Develop open source search engine | PPTX
DEVELOP OPEN
                SOURCE SEARCH
                ENGINE
26th Feb 2012   Ritesh Ambastha – CEO, iWillStudy.com
Open Source Search Engines

          Lucen     Datapark
Sphinx               Search
                               Zettair
            e

 YaCy     Xapian    SWISH-E    Seeks


 Recoll   OpenFTS    Nutch     Namazu
Platform Ideas !




                   Credits: http://zooie.wordpress.com
Comparision




              Credits: http://zooie.wordpress.com
Comparision




              Credits: http://zooie.wordpress.com
We are going to talk
      about
 Sphinx & Apache-
       Solr
Sphinx
   Sphinx is an open source full
    text search server.
   It's written in C++ and works
    on Linux
    (RedHat, Ubuntu, etc), Window
    s, MacOS, Solaris, FreeBSD, a
    nd a few other systems.
   Sphinx lets you either batch
    index and search data stored
    in an SQL database, NoSQL
    storage, or just files quickly
    and easily
Sphinx

 Text processing features
 Searching via SphinxAPI is as simple as
  3 lines of code, and querying via
  SphinxQL is even simpler
 Sphinx clusters scale up to billions of
  documents and tens of millions search
  queries per day, powering top websites
  such as
  Craigslist, DailyMotion, NetLog, etc.
Performance and scalability
   Indexing performance: Sphinx indexes up to 10-
    15 MB of text per second per single CPU core.
   Searching performance: Searching through
    1,000,000-document, 1.2 GB text collection that
    they use for everyday development and testing runs
    at 500+ queries/sec on a 2-core desktop machine
    with 2 GB of RAM.
   Scalability: Biggest known Sphinx cluster indexes
    almost 5 billion documents, resulting in over 6 TB of
    data.
   Busiest known one is, unsurpisingly, Craigslist, top-
    10 website in the US that serves 50+ million search
Key Features
   Batch and Real-Time full-text indexes
   Non-text attributes support
   SQL database indexing
   Non-SQL storage indexing
   Easy application integration
   Advanced full-text searching syntax
   Rich database-like querying features
   Better relevance ranking
   Flexible text processing
   Distributed searching
http://lucene.apache.org/solr/
Solr is the
popular, blazing fast
open source enterprise
search platform from
the Apache Lucene
project.
Its major features include
powerful full-text search, hit
highlighting, faceted
search, dynamic
clustering, database
integration, rich document
(e.g., Word, PDF)
handling, and geospatial
Solr is written in Java
and runs as a
standalone full-text
search server within a
servlet container such
as Tomcat.
Solr Features
   Advanced Full-Text Search Capabilities
   Optimized for High Volume Web Traffic
   Standards Based Open Interfaces - XML,JSON
    and HTTP
   Comprehensive HTML Administration Interfaces
   Server statistics exposed over JMX for monitoring
   Scalability - Efficient Replication to other Solr
    Search Servers
   Flexible and Adaptable with XML configuration
   Extensible Plugin Architecture
What is it all about?
Solr is based on Lucene
More about Lucene

Develop open source search engine

  • 1.
    DEVELOP OPEN SOURCE SEARCH ENGINE 26th Feb 2012 Ritesh Ambastha – CEO, iWillStudy.com
  • 2.
    Open Source SearchEngines Lucen Datapark Sphinx Search Zettair e YaCy Xapian SWISH-E Seeks Recoll OpenFTS Nutch Namazu
  • 3.
    Platform Ideas ! Credits: http://zooie.wordpress.com
  • 4.
    Comparision Credits: http://zooie.wordpress.com
  • 5.
    Comparision Credits: http://zooie.wordpress.com
  • 6.
    We are goingto talk about Sphinx & Apache- Solr
  • 7.
    Sphinx  Sphinx is an open source full text search server.  It's written in C++ and works on Linux (RedHat, Ubuntu, etc), Window s, MacOS, Solaris, FreeBSD, a nd a few other systems.  Sphinx lets you either batch index and search data stored in an SQL database, NoSQL storage, or just files quickly and easily
  • 8.
    Sphinx  Text processingfeatures  Searching via SphinxAPI is as simple as 3 lines of code, and querying via SphinxQL is even simpler  Sphinx clusters scale up to billions of documents and tens of millions search queries per day, powering top websites such as Craigslist, DailyMotion, NetLog, etc.
  • 9.
    Performance and scalability  Indexing performance: Sphinx indexes up to 10- 15 MB of text per second per single CPU core.  Searching performance: Searching through 1,000,000-document, 1.2 GB text collection that they use for everyday development and testing runs at 500+ queries/sec on a 2-core desktop machine with 2 GB of RAM.  Scalability: Biggest known Sphinx cluster indexes almost 5 billion documents, resulting in over 6 TB of data.  Busiest known one is, unsurpisingly, Craigslist, top- 10 website in the US that serves 50+ million search
  • 10.
    Key Features  Batch and Real-Time full-text indexes  Non-text attributes support  SQL database indexing  Non-SQL storage indexing  Easy application integration  Advanced full-text searching syntax  Rich database-like querying features  Better relevance ranking  Flexible text processing  Distributed searching
  • 11.
  • 12.
    Solr is the popular,blazing fast open source enterprise search platform from the Apache Lucene project.
  • 13.
    Its major featuresinclude powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial
  • 14.
    Solr is writtenin Java and runs as a standalone full-text search server within a servlet container such as Tomcat.
  • 15.
    Solr Features  Advanced Full-Text Search Capabilities  Optimized for High Volume Web Traffic  Standards Based Open Interfaces - XML,JSON and HTTP  Comprehensive HTML Administration Interfaces  Server statistics exposed over JMX for monitoring  Scalability - Efficient Replication to other Solr Search Servers  Flexible and Adaptable with XML configuration  Extensible Plugin Architecture
  • 16.
    What is itall about?
  • 18.
    Solr is basedon Lucene
  • 19.