Introduction of Solr
Solr is a standalone enterprise search server with a web-services like API. You put documents in it (called "indexing") via XML over HTTP (RESTful). You query it via HTTP GET and receive XML results.
- Advanced Full-Text Search Capabilities
- Optimized for High Volume Web Traffic
- Standards Based Open Interfaces - XML and HTTP
- Comprehensive HTML Administration Interfaces
- Scalability - Efficient Replication to other Solr Search Servers
- Flexible and Adaptable with XML configuration
- Extensible Plugin Architecture
Set up Solr
To set up Solr, you should follow this guideline. After the set up Solr, you practically have a indexing service up.
The HTTP/XML interface of the indexer has two main access points: the update URL, which maintains the index, and the select URL, which is used for queries. In the default configuration, they are found at:
http://[hostname:port]/solr/updatehttp://[hostname:port]/solr/select
To add a document to the index, we POST an XML representation of the fields to index to the update URL. In addition, you can delete, update (ie. re-post on unique). All change operations need to commit to flush to file system. On the other hand, once we have indexed some data, an HTTP GET on the select URL does the querying.
Powerful features Behind Solr
If you follow the guideline above, you already get yourself familiar with indexing, searching and facet browsing. Now lets get down to how to make Solr a scalable solution with great performance.
Caching
TBA
Distribution and Replication
For applications that receive large volumes of queries, a single Solr server may not be enough to meet performance requirements. Therefore, Solr provides mechanisms for replicating the Lucene index across multiple servers that are part of a load-balanced suite of query servers. The replication process is handled through a combination of event listeners enabled through the solrconfig.xml file and several shell scripts (located in solr/bin of the example application).
In a replicating architecture, one Solr server acts as the master server, providing copies of the index (called snapshots) to one or more slave servers that handle query requests. Indexing commands are sent to the master server and queries are sent to the slave servers. The master server can create snapshots manually or by configuring the <updateHandler> section of solrconfig.xml to trigger snapshot creation when commit and/or optimize events are received. In either the manual or the event-driven process, the snapshooter script is invoked on the master server, creating a directory on the server named snapshot.yyyymmddHHMMSS where yyyymmddHHMMSS is the actual time the snapshot was created. The slave servers then use rsync to copy only those files in the Lucene index that have been changed.
<listener event="postCommit" class="solr.RunExecutableListener">
<str name="exe">snapshooter</str>
<str name="dir">solr/bin</str>
<bool name="wait">true</bool>
<arr name="args"> <str>arg1</str> <str>arg2</str> </arr>
<arr name="env"> <str>MYVAR=val1</str> </arr>
</listener>
Reference
Below are some cool references I found:
- Search smarter with Apache Solr, Part 1: Essential features and the Solr schema
- Search smarter with Apache Solr, Part 2: Solr for the enterprise
- Advanced Lucene






































(4.75 out of 5)
No Comment Received
Sorry the comment area are closed for non registered users