Development of a Decentralised Virtual Service Redirector for Internet applications Yinong Chen, Scott Hazelhurst, Vashti Galpin, Roger Mateer, Conrad Mueller Highly Dependable Systems Research Programme Department of Computer Science University of the Witwatersrand, Johannesburg South Africa Keywords: fault tolerance, dependable computing, Internet, firewall, redirector, fault-tolerant protocol Abstract An ongoing project of the Highly Dependable Systems Research Programme (PHDS) at the University of the Witwatersrand is the development of a dependable autonomous decentralised system using readily available hardware and software components. The experimental system consists of Unix stations connected by ethernet. Real-time communication protocols and fault tolerant protocols have implemented as a fault-tolerant real-time environment to support various applications. Inspired from South African industry's requirement, the Internet service redirector is chosen as the application of the PHDS system. A firewall application has been implemented on the virtual redirector. Current work on this project includes: introduction of mobile agent technology into the system and implementation of a testbed to evaluate the availability and performance of the system. Overview Reliability and availability of network servers, including servers for internet data caching, web servers, DNS servers and firewalls, are becoming a matter of concern of service providers and their clients. Many service providers have used redundant servers as backup spares in their systems to address this problem. Current implementations of redundant servers are: * User Configuration. The redundant servers with different IP addresses are all connected to the network. The users have to put the IP addresses of the primary and backup servers in their client software configuration. When the primary server is not available, the client software looks for a backup server's IP address. The drawback of this implementation is that the service provider has to rely on users to setup their configuration correctly. * Manual Switch. The primary and backup servers have the same IP address. When the primary server is down, a backup spare can be manually switched on to replace the primary one. * Hardware service redirector. The IP address of the server is in fact for the redirector that redirects each client request to one of the redundant servers available. The price of such redirector is very high which may not be acceptable to many service providers. Another drawback is that there is a single point of failure: The failure of the redirector will result in the unavailability of the entire service. * Software service redirector, e.g., the ConnectControl in Firewall-1 from the Check Point Software Technologies. The principle is same as the hardware redirector, except that it is implemented by software. It is running on a single machine and there is still a single point of failure in the system. The aim of this research is to explore an alternative way to implement the service redirection that can overcome the problems that current techniques encounter. The aims of the research are to achieve low cost and high availability and reliability by using a distributed fault-tolerant system. The design objectives are * redundant servers, * elimination or reduction of the probability of the single point of failure in the system, * scalable configuration to cope with different kinds of applications and * different level of performance and dependability requirements, * fail-safe behaviour for critical tasks, The proposed approach achieving the aims and objectives of the project are using * an autonomous decentralised system to support redundancy at modular level, * fault-tolerant protocols and mobile agents to implement fault detection and reconfiguration, * distributed load balancing algorithm to distributed task allocation, * replicate firewalls to ensure fail-safe security checking * formal specification and verification of the core part of the design to ensure the high confidence of the design correctness, * evaluation and modelling of system behaviour in a number of process algebras. The PHDS system consists of a number of redundant servers connected by three disjoint networks. Net1 is used for the internal communication among the firewall nodes. Net2 is used to connect to the Intranet that must be protected by the firewall. Net3 connects the firewall to the outside world. All redundant servers are given the same IP address. Any request either from the Internet or from the Intranet will be received by all servers. However, it will be dealt with by only a subset of the available servers in the form of task replication. A hash algorithm in each server is used to decide whether a server should handle an incoming request or just drop it. The hashing algorithm will take the current load and working state of each server into account. A heartbeat protocol is used to check the availability of each server. A comparison protocol running among the servers will check the output agreement among the replicate servers. The agreements and disagreements are logged in a syndrome table for fault diagnosis. The syndrome table, together with the results of a heartbeat protocol, can be used by a reconfiguration protocol to localise faults. When a node is diagnosed as faulty, its replicate task will send a mobile agent to a fault-free node to start a new replicate in that node. The hashing algorithm and the protocols that support the hashing algorithm form a dependable redirector in the decentralised system. This redirector is distributed in all servers and has the same function as a physical redirector. We call this director a virtual redirector. We have implemented an experimental system with a hashing algorithm, a heartbeat protocol, a comparison protocol and a real-time communication protocol [Bingham96, Chen98]. Currently, we are working on the following tasks surrounding this project. * Testbed. A testbed that runs the PHDS virtual redirector and a monolithic physical redirector is being implemented [Mateers&Chen99]. The aim of this research is to compare the performance and the availability between the virtual and the physical redirectors. * Firewall modelling. Firewalls are important for network security and performance. Problems here include the firewall being a bottle-neck and the access rules not being understandable. We are investigating more effective means of representing firewall rule databases. * Mobile agent technique. We are exploring the effectiveness of mobile agents in the PHDS system. For example, mobile agents may be introduced in the current reconfiguration protocol. When a computer fails, the critical tasks on the computer have to be move to another computer. Currently, we have all the tasks stored on all computers. The reconfiguration is implemented by sending current states of a task from a replica of the failed node to anther node. The node then activates the same task with the receiving states. For the experimental system with only three nodes, this is a valid way. However, if we have a larger application with more nodes, it would not a efficient way to implement the reconfiguration. With the mobile agent technique, each node only store tasks allocated to it. When a node fails, a replica of the failed node can migrate the task (the process) and its current states to another node. Through a special code the receiving node can recognise that an incoming packet is a mobile agent. It can activate the packet as a task. * Formal specification and verification. For highly available systems, the validation of the specification and ensuring the correctness of the implementation are very important. Future work planned includes using our systems as case-studies for ongoing work in formal specification and verification methodologies. * Evaluation and modelling. The research involves finding criteria to determine what is required for a general technique for fault tolerance, defining a technique and evaluating it in terms of the criteria. An important part of this evaluation will involve case studies of reasonable size. Currently, the PHDS virtual redirector, specifically relating to highly available firewall services, is being modelled using process algebras. References Bingham96 W. Bingham, Token Protocols for Real Time Systems, Honours Research Report, Department of Computer Science, University of the Witwatersrand, 1996. Chen98 Y. Chen, A Redundant Virtual Service Redirector for Computer Networks, in Digest of Fast Abstracts: IEEE 28th Annual International Symposium on Fault-Tolerant Computing (FTCS-28), Munich, June 1998, pp. 27 - 28. hazelhurst98 S. Hazelhurst and A. Fatti and A. Henwood, Binary Decision Diagram Representations of Firewall and Router Access Lists, Department of Computer Science, University of the Witwatersrand, October 1998, Technical Report TR-Wits-CS-1998-3, Proceedings of 1998 South African Institute of Computer Scientists and Information Technology Annual Research and Development Conference. hazelhurst99 S. Hazelhurst, Algorithms for Analysing Firewall and Router Access Lists, Department of Computer Science, University of the Witwatersrand, July 1999, Technical Report, TR-Wits-CS-1999-5, To appear: South African Networking and Telecommunications Conference, September 1999. Mateer&Chen99 Roger Mateer and Yinong Chen, Highly-Available Firewall Service Using Virtual Redirectors, Submitted to: South African Institute of Computer Scientists and Information Technology Annual Research and Development Conference, Mount Amanzi, South Africa, 17-19 November 1999.