I recently was involved in the configuration of a hardware load balancer for a client. The SharePoint solution being load balanced consisted of three separate web applications, two of which were using FBA and third using NTLM. There were several issues that occurred during the process and I want to share some of the techniques that we used to resolve the issues.
So Which Server Is This Running On?
The first and foremost issue with diagnosing any issue with a load balancer is know which of the servers in the pool is serving up the response. This is easily solved through the use of Response Headers.
Can I Avoid The Load Balancer?
The next objective is to allow a specific machine to be accessed without going through the load balancer. This can be done using host files and needs to be implemented in two ways:
- Access local services that have the same domain as the initial request
- Direct the initial request to the server
The way to achieve objective 1 is to use DNS:
Set up DNS entries for WebApp-Server1.DomainName.com, WebApp-Server2.DomainName.com, etc. This will allow easy access from anywhere on the network to specific servers.
The way to achieve objective 2 is to use host files on each machine in the load balancer pool:
Add host entries on the machine to redirect requests to that would usually be routed through the load balancer to the local machine, e.g.
where the load balancer is routing requests to WebApp.DomainName.com to the same IP address. This keeps requests on the same machine without the need to go through the load balancer.
Is This A Load Balancer Issue?
Whenever a change is made to an infrastructure there is a risk that something will stop working. If you are introducing a load balancer into the equation then it is vital to be able to prove that whatever symptom has appeared is the result of the load balancing.
Ideally there are certain conditions that should be in place prior to introducing the load balancer:
- All functionality within the SharePoint web applications has been tested on each node in the pool
- The DNS entries that will eventually point at the load balancer have been tested pointing at all of the each node in the pool
- The DNS entries that will bypass the load balancer have all been tested
If these conditions have been met then any issues that arise should be a result of the load balancer.
If an issue does arise, then it is vital to test the issue on each of the nodes in the pool by bypassing the load balancer, and to test each of the nodes with the load balancer directing traffic to the node. This will identify whether or not the issue is on a specific node, the load balancer, or combination of both.
It can be very important to have as much evidence as possible pointing to a possible cause of the issue as you may have no access to the load balancer and your client may also have no access. It may only be the hosting provider with access to the load balancer and they may be charging for changes or be less concerned about resolving the issue and not be interested in helping to investigate and identify the cause of the issue.
Any solution with a load balancer has requirements for high availability. This must be tested.
The ideal test is to artificially fail one node:
- Test the solution, identifying which node is serving the responses
- Have one user carry on testing, whilst continuously checking the node that is serving the responses
- Whilst the testing is taking place, on the node machine, stop it listening to requests
- Monitor the result to the user as the load balancer recognises an issue and transfers the traffic to the other node
This test should be carried out in both directions for all web applications in the solution.
In summary there are several steps you can take to reduce the possible impact of introducing a load balancer, and also to help you diagnose any issues that arise:
- When you introduce a load balancer into an environment, ensure that you have tested every web application via the intended load balancer URL, and via the URLs designed to bypass the load balancer.
- Setup response headers so that you know which node is serving the response
- Test the failover of the load balancer
Have fun load balancing.