Development of a Decentralised Virtual Service Redirector
for Internet applications

Yinong Chen, Scott Hazelhurst, Vashti Galpin, Roger Mateer, Conrad Mueller
Highly Dependable Systems Research Programme
Department of Computer Science
University of the Witwatersrand, Johannesburg
South Africa


Keywords:	fault tolerance, dependable computing, Internet, firewall, redirector, fault-tolerant protocol


Abstract
An ongoing project of the Highly Dependable Systems Research Programme (PHDS) 
at the University of the Witwatersrand is the development of a dependable 
autonomous decentralised system using readily available hardware and software 
components. The experimental system consists of Unix stations connected by 
ethernet. Real-time communication protocols and fault tolerant protocols 
have implemented as a fault-tolerant real-time environment to support various 
applications. Inspired from South African industry's requirement, the Internet 
service redirector is chosen as the application of the PHDS system. A firewall 
application has been implemented on the virtual redirector. Current work on 
this project includes: introduction of mobile agent technology into the system 
and implementation of a testbed to evaluate the availability and performance 
of the system.


Overview 
Reliability and availability of network servers, including servers for 
internet data caching, web servers, DNS servers and firewalls, are becoming 
a matter of concern of service providers and their clients. Many service 
providers have used redundant servers as backup spares in their systems to 
address this problem. Current implementations of redundant servers are:

*  User Configuration. The redundant servers with different IP addresses 
   are all connected to the network. The users have to put the IP addresses of 
   the primary and backup servers in their client software configuration. When 
   the primary server is not available, the client software looks for a backup 
   server's IP address. 
   The drawback of this implementation is that the service provider has 
   to rely on users to setup their configuration correctly.
*  Manual Switch. The primary and backup servers have the same IP address. 
   When the primary server is down, a backup spare can be manually switched 
   on to replace the primary one.
*  Hardware service redirector. The IP address of the server is in fact for 
   the redirector that redirects each client request to one of the redundant 
   servers available. The price of such redirector is very high which may not 
   be acceptable to many service providers. Another drawback is that there 
   is a single point of failure: The failure of the redirector will result 
   in the unavailability of the entire service.
*  Software service redirector, e.g., the ConnectControl in Firewall-1 
   from the Check Point Software Technologies. The principle is same as the 
   hardware redirector, except that it is implemented by software. It is 
   running on a single machine and there is still a single point of failure 
   in the system.

The aim of this research is to explore an alternative way to implement the 
service redirection that can overcome the problems that current techniques 
encounter. The aims of the research are to achieve low cost and high 
availability and reliability by using a distributed fault-tolerant system. 
The design objectives are

*  redundant servers,
*  elimination or reduction of the probability of the single point of 
   failure in the system,
*  scalable configuration to cope with different kinds of applications and  
*  different level of performance and dependability requirements,
*  fail-safe behaviour for critical tasks,

The proposed approach achieving the aims and objectives of the project are using

*  an autonomous decentralised system to support redundancy at modular level,
*  fault-tolerant protocols and mobile agents to implement fault detection 
   and reconfiguration,
*  distributed load balancing algorithm to distributed task allocation,
*  replicate firewalls to ensure fail-safe security checking
*  formal specification and verification of the core part of the design to 
   ensure the high confidence of the design correctness,
*  evaluation and modelling of system behaviour in a number of process algebras.

The PHDS system consists of a number of redundant servers connected by three 
disjoint networks. Net1 is used for the internal communication among the 
firewall nodes. Net2 is used to connect to the Intranet that must be protected 
by the firewall. Net3 connects the firewall to the outside world. All redundant 
servers are given the same IP address. Any request either from the Internet or 
from the Intranet will be received by all servers. However, it will be dealt 
with by only a subset of the available servers in the form of task replication. 
A hash algorithm in each server is used to decide whether a server should handle 
an incoming request or just drop it. The hashing algorithm will take the current 
load and working state of each server into account. A heartbeat protocol is used 
to check the availability of each server. A comparison protocol running among 
the servers will check the output agreement among the replicate servers. The 
agreements and disagreements are logged in a syndrome table for fault diagnosis. 
The syndrome table, together with the results of a heartbeat protocol, can be 
used by a reconfiguration protocol to localise faults. When a node is diagnosed 
as faulty, its replicate task will send a mobile agent to a fault-free node to 
start a new replicate in that node.

The hashing algorithm and the protocols that support the hashing algorithm form 
a dependable redirector in the decentralised system. This redirector is 
distributed in all servers and has the same function as a physical redirector. 
We call this director a virtual redirector. 

We have implemented an experimental system with a hashing algorithm, a heartbeat 
protocol, a comparison protocol and a real-time communication protocol 
[Bingham96, Chen98]. Currently, we are working on the following tasks 
surrounding this project.

*  Testbed. A testbed that runs the PHDS virtual redirector and a monolithic  
   physical redirector is being implemented [Mateers&Chen99]. The aim of 
   this research is to compare the performance and the availability between 
   the virtual and the physical redirectors.
*  Firewall modelling. Firewalls are important for network security and 
   performance. Problems here include the firewall being a bottle-neck and 
   the access rules not being understandable. We are investigating more 
   effective means of representing firewall rule databases.
*  Mobile agent technique. We are exploring the effectiveness of mobile 
   agents in the PHDS system. For example, mobile agents may be introduced 
   in the current reconfiguration protocol. When a computer fails, the 
   critical tasks on the computer have to be move to another computer. 
   Currently, we have all the tasks stored on all computers. The 
   reconfiguration is implemented by sending current states of a task 
   from a replica of the failed node to anther node. The node then 
   activates the same task with the receiving states. For the 
   experimental system with only three nodes, this is a valid way. 
   However, if we have a larger application with more nodes, it would 
   not a efficient way to implement the reconfiguration. With the 
   mobile agent technique, each node only store tasks allocated to it. 
   When a node fails, a replica of the failed node can migrate the task 
   (the process) and its current states to another node. Through a 
   special code the receiving node can recognise that an incoming 
   packet is a mobile agent. It can activate the packet as a task.

*  Formal specification and verification. For highly available systems, 
   the validation of the specification and ensuring the correctness 
   of the implementation are very important. Future work planned includes 
   using our systems as case-studies for ongoing work in formal 
   specification and verification methodologies.

*  Evaluation and modelling. The research involves finding criteria 
   to determine what is required for a general technique for fault 
   tolerance, defining a technique and evaluating it in terms of the 
   criteria. An important part of this evaluation will involve case 
   studies of reasonable size. Currently, the PHDS virtual redirector, 
   specifically relating to highly available firewall services, is 
   being modelled using process algebras.


References
Bingham96  W. Bingham, Token Protocols for Real Time Systems, Honours Research Report, Department of Computer Science, University of the Witwatersrand, 1996.

Chen98  Y. Chen, A Redundant Virtual Service Redirector for Computer Networks, in Digest of Fast Abstracts: IEEE 28th Annual International Symposium on Fault-Tolerant Computing (FTCS-28), Munich, June 1998, pp. 27 - 28.

hazelhurst98  S. Hazelhurst and A. Fatti and A. Henwood, Binary Decision Diagram Representations of Firewall and Router Access Lists, Department of Computer Science, University of the Witwatersrand, October 1998, Technical Report TR-Wits-CS-1998-3, Proceedings of 1998 South African Institute of Computer Scientists and Information Technology Annual Research and Development Conference.

hazelhurst99  S. Hazelhurst, Algorithms for Analysing Firewall and Router Access Lists, Department of Computer Science, University of the Witwatersrand, July 1999, Technical Report, TR-Wits-CS-1999-5, To appear: South African Networking and Telecommunications Conference, September 1999.

Mateer&Chen99  Roger Mateer and Yinong Chen, Highly-Available Firewall Service Using Virtual Redirectors, Submitted to: South African Institute of Computer Scientists and Information Technology Annual Research and Development Conference, Mount Amanzi, South Africa, 17-19 November 1999.