Apache Solr Search Engine: What you need to know!

Have you ever been in a situation where you have an enterprise app with data that exists in an SQL Database, and the business team asked you for a text search feature?

If you answered yes, most probably your first trial was to conduct a text-based search using the database query. However, it might work, but the database’s ability to provide relevancy is either non-existent or a bolted-on afterthought, and the relevancy ranking of results coming out of a database won’t be satisfactory.

Additionally, if we added the performance to the equation when you have large volumes of text data, RDBMS will be quite slow and provide poor user experience.

If you agree with me on the previous points, here’s the need for a Full-text Search Engine.

Full-text search engines excel at quickly and efficiently searching large volumes of structured and non-structured text data.

They have relevancy ranking capabilities to determine the best match for any text query.

There are many solutions/products in the market and one of the most popular, open-source, and easy-to-use enterprise search platforms are Solr.

Apache Solr is highly reliable, scalable, and fault-tolerant, providing distributed indexing, replication, and load-balanced querying, automated failover and recovery, centralized configuration, and more. 

Solr powers the search and navigation features of many of the world’s largest internet sites.

Solr Features:

  • Advanced full-text search
  • Optimized for High Volume Traffic
  • Standards Based Open Interfaces – XML, JSON and HTTP.
  • Comprehensive Administration Interfaces
  • Easy Monitoring 
  • Highly Scalable and Fault Tolerant
  • Flexible and Adaptable with easy configuration
  • Near Real-Time Indexing
  • Extensible Plugin Architecture
  • Schema when you want, schemaless when you don’t
  • Powerful Extensions
  • Faceted Search and Filtering
  • Geospatial Search
  • Advanced Configurable Text Analysis
  • Highly Configurable and User Extensible Caching
  • Performance Optimization 
  • Security built right in
  • Advanced Storage Options
  • Monitor-able Logging
  • Query Suggestions, Spelling and More
  • Rich Document Parsing
  • Multiple search indices

Solr deployment modes:

  • Stand-alone
    • Single solr server with the ability to create replicas (master/slave).
  • Cloud
    • Cluster of solr servers running with Apache Zookeeper acting as load balancer and central configuration manager.

A deep dive into the technical part

In this section, we will deep dive into setting up Solr in standalone mode, just to show you how easy and straightforward it is, then the different ways Solr supports adding/inserting the data, and finally the search part.

Setting up standalone solr server

  1. Solr is available from the Solr website. Download the latest release https://lucene.apache.org/solr/downloads.html. Latest version released is 9.0.0.
  2. Extract the Solr distribution archive to a directory of your choosing.
  3. Package Installation:
cd ~/
tar zxf solr-9.0.0.tgz
  1. Starting Solr:
bin/solr start
  1. Visit http://localhost:8983/solr/

Viola! You are done and have a full-text search engine installed and operating!!

How cool and easy is that!

Core Creation

Solr documents are saved in the core, is which called the collection in SolrCloud, and may be referred to as index.

So let’s create our first core named “hello_solr”, you can use the admin dashboard or the command line by running the following command

bin/solr create -c <name>

Data indexing

Indexing is the process of feeding Solr with data. Solr’s basic unit of information is a document, which is a set of data that describes something. A recipe document would contain the ingredients, the instructions, the preparation time, the cooking time, the tools needed, and so on. A document about a person, for example, might contain the person’s name, biography, favorite color, and shoe size. A document about a book could contain the title, author, year of publication, number of pages, and so on.

In the Solr universe, documents are composed of fields, which are more specific pieces of information. Shoe size could be a field. First name and last name could be fields.

You have multiple options to add documents to solr:

For the sake of simplicity, I’ll use the post tool to index set of xml files with data which already shipped with solr in example/exampledocs directory, and I recommend to start playing around with it, as it can post various types of content to Solr, including files in Solr’s native XML and JSON formats, CSV files, a directory tree of rich documents, or even a simple short web crawl.

bin/post -c helo_solr example/exampledocs/*.xml

The output should be similar to the following

SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/hello_solr/update…
Entering auto mode. File endings considered are xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file gb18030-example.xml (application/xml) to [base]
POSTing file hd.xml (application/xml) to [base]
POSTing file ipod_other.xml (application/xml) to [base]
POSTing file ipod_video.xml (application/xml) to [base]
POSTing file manufacturers.xml (application/xml) to [base]
POSTing file mem.xml (application/xml) to [base]
POSTing file money.xml (application/xml) to [base]
POSTing file monitor.xml (application/xml) to [base]
POSTing file monitor2.xml (application/xml) to [base]
POSTing file mp500.xml (application/xml) to [base]
POSTing file sd500.xml (application/xml) to [base]
POSTing file solr.xml (application/xml) to [base]
POSTing file utf8-example.xml (application/xml) to [base]
POSTing file vidcard.xml (application/xml) to [base]
14 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/hello_solr/update…
Time spent: 0:00:00.153

SEARCHING!!

As of now, you have a search engine with data indexed and ready to be searched! How difficult do you think it is?

Hmmm.. you guessed it correctly! Will be as easy as the previous steps!!

All you need is to call an API passing your core name as path parameter and the query term as query param and that’s it!

Let’s search for “video”, open your browser, and through the following link!

http://localhost:8983/solr/hello_solr/select?q=video

Or execute the following command from your terminal!

curl http://localhost:8983/solr/hello_solr/query -d ‘{“query” : “video”}’

WOW! You made it!

Now, try to change the query terms and play around with the different ways of querying Solr, deep dive into the concept, and HAPPY SEARCHING!

Want to be part of the Andela Community? Then join the Andela Talent Network!

With more than 175,000 technologists in our community, in over 90 countries, we’re committed to creating diverse remote engineering teams with the world’s top talent. And our network members enjoy being part of a talented community, through activities, benefits, collaboration, and virtual and in-person meetups.

All you need to do to join the Andela Talent Network is to follow our simple sign-up process. 

Submit your details via our online application then…

Complete an English fluency test – 15 minutes.

Complete a technical assessment on your chosen skill (Python, Golang, etc.) – 1 hour.

Meet with one of our Senior Developers for a technical interview – 1 hour.


Visit the Andela Talent Network sign-up page to find out more.

If you found this blog useful, check out our other blog posts for more essential insights!

Related Posts