Liferay 6.1 + SOLR 4 + Zookeeper = Massive scalability
In different projects we have been working to configure Liferay+SOLR in a Active/Active cluster with HA. After trying all the available alternatives we decided to move a step forward, and to integrate SOLR with zookeeper instead of the Master/Slave recommended solutions.
I've been using this configuration since SOLR 3.X and Liferay 5.2.X but with SOLR 4.X this solution has become unbeatable and i think it's time to give back to the community the patch we have been testing and using.
In this article you wil find:
1. The purpose of this fork/code
2. How to setup a cloud with Zookeeper + SOLR 4.0
3. How to configure the SOLR-WEB Liferay plugin to work with a SOLR cluster based in zookeeper?
1. The purpose of this fork is the compilation of solr-web adapting it to SOLR4.0 and add the classes to connect to a cloud-based zookeeper.
Once this compilation you can use the plugin in liferay with SOLR 4.0.
There are three options when installing the server:
- A single server
- A cluster of servers based on replication Master / Slaves
- A cluster of servers based on zookeeper / solr-cloud
The objective of this document is to explain the alternative zookeeper / solrcloud because the others have a great documentation.
2. How to setup a cloud with Zookeeper + SOLR 4.0
This document explains how to configure SOLR with zookeeper in a cluster Active/Active. This is a good approach and i am sure that works fine but i prefer to install zookeeper in a separate service, sure it takes a little more RAM , but let's you update the components separately.
In this link you can find my solrconfig.xml
Here is my zoo.cfg :
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/var/data/zookeeper
# the port at which the clients will connect
clientPort=2181
server.1=172.16.0.224:2888:3888
server.2=172.16.0.225:2888:3888
server.3=172.16.0.226:2888:3888
server.4=172.16.0.227:2888:3888
With that configured properly you will get a cluster and with great new admin of SOLR 4.0 you will see smth lilke this:
3.How to configure the SOLR-WEB Liferay plugin to work with a SOLR cluster based in zookeeper
Now you are ready to integrate Liferay and the cluster you've created.
First you need to compile the plugin for this fork at github that includes the class: com.liferay.portal.search.solr.server.BasicCloudSolrServer
Or you can go the direct way, if you have liferay 6.1 you can download this war and deploy it in your liferay.
After that you have to configure the /WEB-INF/classes/METAINF/solr-spring.xml to point to your zookeeper servers.
- Shall we install a zookeeper in every Lifreray Machine and point in this liferay to its own zookeeper?
- When Is more efficient to have a SOLR installed in the same machine as the Liferay Portal?
- Shall this class "com.liferay.portal.search.solr.server.BasicCloudSolrServer" be included in the liferay trunk?
Summary:
Pros:
- Architecture is 100% active, all machines are candidates for Master, in the index/search process.
- 100% shared nothing architecture, no need for network shares, NFS, etc..
- We can manage the configuration files centrally.
- Architecture that is self Multi Master, ie if the Master fall, cluster itself choose the new Master.
- Very efficient replication copies files and indices.
Cons:
- There is a bit more RAM and CPU consumption by the new process.
I've been using this configuration since SOLR 3.X and Liferay 5.2.X but with SOLR 4.X this solution has become unbeatable and i think it's time to give back to the community the patch we have been testing and using.
In this article you wil find:
1. The purpose of this fork/code
2. How to setup a cloud with Zookeeper + SOLR 4.0
3. How to configure the SOLR-WEB Liferay plugin to work with a SOLR cluster based in zookeeper?
1. The purpose of this fork is the compilation of solr-web adapting it to SOLR4.0 and add the classes to connect to a cloud-based zookeeper.
Once this compilation you can use the plugin in liferay with SOLR 4.0.
There are three options when installing the server:
- A single server
- A cluster of servers based on replication Master / Slaves
- A cluster of servers based on zookeeper / solr-cloud
The objective of this document is to explain the alternative zookeeper / solrcloud because the others have a great documentation.
2. How to setup a cloud with Zookeeper + SOLR 4.0
This document explains how to configure SOLR with zookeeper in a cluster Active/Active. This is a good approach and i am sure that works fine but i prefer to install zookeeper in a separate service, sure it takes a little more RAM , but let's you update the components separately.
A minimum of 3 zookepers are needed in order to vote the Master. It's okay to have more zookeepers that SOLR as is a decoupled service.
In this link you can find my solrconfig.xml
Here is my zoo.cfg :
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/var/data/zookeeper
# the port at which the clients will connect
clientPort=2181
server.1=172.16.0.224:2888:3888
server.2=172.16.0.225:2888:3888
server.3=172.16.0.226:2888:3888
server.4=172.16.0.227:2888:3888
With that configured properly you will get a cluster and with great new admin of SOLR 4.0 you will see smth lilke this:
3.How to configure the SOLR-WEB Liferay plugin to work with a SOLR cluster based in zookeeper
Now you are ready to integrate Liferay and the cluster you've created.
First you need to compile the plugin for this fork at github that includes the class: com.liferay.portal.search.solr.server.BasicCloudSolrServer
Or you can go the direct way, if you have liferay 6.1 you can download this war and deploy it in your liferay.
After that you have to configure the /WEB-INF/classes/METAINF/solr-spring.xml to point to your zookeeper servers.
From this point we can share some doubts:
- Shall we install a zookeeper in every Lifreray Machine and point in this liferay to its own zookeeper?
- When Is more efficient to have a SOLR installed in the same machine as the Liferay Portal?
- Shall this class "com.liferay.portal.search.solr.server.BasicCloudSolrServer" be included in the liferay trunk?
Summary:
Pros:
- Architecture is 100% active, all machines are candidates for Master, in the index/search process.
- 100% shared nothing architecture, no need for network shares, NFS, etc..
- We can manage the configuration files centrally.
- Architecture that is self Multi Master, ie if the Master fall, cluster itself choose the new Master.
- Very efficient replication copies files and indices.
Cons:
- There is a bit more RAM and CPU consumption by the new process.
Thank you Israel! This is very much apreciated, I may use it soon on one of my projects. I've seen that some files reference "com.liferay.portal.kernel.search.IndexWriter" as the bean id while others use "com.liferay.portal.search.solr.SolrIndexWriterImpl", it's only an aesthetical/consistency matter but you may want to check it.
ResponderEliminarIf I've eye shiners tomorrow I know who to blame ;)
Regards,
Xabi.