1) The ConnectTimeout parameter on each
The best way to defend against a hung Portal is to set the ServerIOTimeout attribute (missing by default, which is why most people miss it). This governs how long the plugin will wait after establishing a connection for data to be returned. Set this value to your pain threshold for waiting for a page to return. After this timeout, that server will be marked down.
2) Performance tests show that "Random" is a more efficient workload management policy than "Round Robin". There is evidence that shows when using Round Robin and a server gets marked down, a larger than normal allotment of that server's traffic is shifted to the very next server in line instead of being evenly redistributed across the cluster.
3) Consider using the