Have you ever been in a situation where you have an enterprise app with data that exists in an SQL Database, and the business team asked you for a text search feature?
If you answered yes, most probably your first trial was to conduct a text-based search using the database query. However, it might work, but the database’s ability to provide relevancy is either non-existent or a bolted-on afterthought, and the relevancy ranking of results coming out of a database won’t be satisfactory.
Additionally, if we added the performance to the equation when you have large volumes of text data, RDBMS will be quite slow and provide poor user experience.
Full-text search engines excel at quickly and efficiently searching large volumes of structured and non-structured text data.
They have relevancy ranking capabilities to determine the best match for any text query.
There are many solutions/products in the market and one of the most popular, open-source, and easy-to-use enterprise search platforms are Solr.
Apache Solr is highly reliable, scalable, and fault-tolerant, providing distributed indexing, replication, and load-balanced querying, automated failover and recovery, centralized configuration, and more.
Solr powers the search and navigation features of many of the world’s largest internet sites.
Advanced full-text search
Optimized for High Volume Traffic
Standards Based Open Interfaces – XML, JSON and HTTP.
Comprehensive Administration Interfaces
Highly Scalable and Fault Tolerant
Flexible and Adaptable with easy configuration
Near Real-Time Indexing
Extensible Plugin Architecture
Schema when you want, schemaless when you don’t
Faceted Search and Filtering
Advanced Configurable Text Analysis
Highly Configurable and User Extensible Caching
Security built right in
Advanced Storage Options
Query Suggestions, Spelling and More
Rich Document Parsing
Multiple search indices
Solr deployment modes:
Single solr server with the ability to create replicas (master/slave).
Cluster of solr servers running with Apache Zookeeper acting as load balancer and central configuration manager.
A deep dive into the technical part
In this section, we will deep dive into setting up Solr in standalone mode, just to show you how easy and straightforward it is, then the different ways Solr supports adding/inserting the data, and finally the search part.
Viola! You are done and have a full-text search engine installed and operating!!
How cool and easy is that!
Solr documents are saved in the core, is which called the collection in SolrCloud, and may be referred to as index.
So let’s create our first core named “hello_solr”, you can use the admin dashboard or the command line by running the following command
bin/solr create -c <name>
Indexing is the process of feeding Solr with data. Solr’s basic unit of information is a document, which is a set of data that describes something. A recipe document would contain the ingredients, the instructions, the preparation time, the cooking time, the tools needed, and so on. A document about a person, for example, might contain the person’s name, biography, favorite color, and shoe size. A document about a book could contain the title, author, year of publication, number of pages, and so on.
In the Solr universe, documents are composed of fields, which are more specific pieces of information. Shoe size could be a field. First name and last name could be fields.
You have multiple options to add documents to solr:
Post Tool: Information about using post.jar to quickly upload some content to your system.
For the sake of simplicity, I’ll use the post tool to index set of xml files with data which already shipped with solr in example/exampledocs directory, and I recommend to start playing around with it, as it can post various types of content to Solr, including files in Solr’s native XML and JSON formats, CSV files, a directory tree of rich documents, or even a simple short web crawl.
bin/post -c helo_solr example/exampledocs/*.xml
The output should be similar to the following
SimplePostTool version 5.0.0 Posting files to [base] url http://localhost:8983/solr/hello_solr/update… Entering auto mode. File endings considered arexml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log POSTing file gb18030-example.xml (application/xml) to [base] POSTing file hd.xml (application/xml) to [base] POSTing file ipod_other.xml (application/xml) to [base] POSTing file ipod_video.xml (application/xml) to [base] POSTing file manufacturers.xml (application/xml) to [base] POSTing file mem.xml (application/xml) to [base] POSTing file money.xml (application/xml) to [base] POSTing file monitor.xml (application/xml) to [base] POSTing file monitor2.xml (application/xml) to [base] POSTing file mp500.xml (application/xml) to [base] POSTing file sd500.xml (application/xml) to [base] POSTing file solr.xml (application/xml) to [base] POSTing file utf8-example.xml (application/xml) to [base] POSTing file vidcard.xml (application/xml) to [base] 14 files indexed. COMMITting Solr index changes tohttp://localhost:8983/solr/hello_solr/update… Time spent: 0:00:00.153
As of now, you have a search engine with data indexed and ready to be searched! How difficult do you think it is?
Hmmm.. you guessed it correctly! Will be as easy as the previous steps!!
All you need is to call an API passing your core name as path parameter and the query term as query param and that’s it!
Let’s search for “video”, open your browser, and through the following link!
Now, try to change the query terms and play around with the different ways of querying Solr, deep dive into the concept, and HAPPY SEARCHING!
Want to be part of the Andela Community? Then join the Andela Talent Network!
With more than 175,000 technologists in our community, in over 90 countries, we’re committed to creating diverse remote engineering teams with the world’s top talent. And our network members enjoy being part of a talented community, through activities, benefits, collaboration, and virtual and in-person meetups.