William Cooke

Contents:
  1. Overview
  2. Implementation
    1. Technology
    2. Coding
  3. Evaluation
    1. Alternate Options
    2. Appropriate Extensions
    3. Conclusion

Rule Engine

Overview

The RE is the central component that essentially acts as a brain to the EMS. It is also a reactionary component that takes in data about the world and takes actions based upon its current rule set. This up to date information will be retrieved through the ORM component, which provides a secure interface for connecting. All of the effectors in the systems can be seen as dumb, in the sense that they can only receive messages and act on them, they don't have any decision making power, this allows all of the decisions to be made, and maintained centrally.

The RE is a reactionary component that takes action based upon the current world rule set. By analysing the data from the sensor network the RE will make inferences to which decisions should be made and how to update the devices connected to the system. To connect to the devices themselves the rule engine will maintain an active connection to the basestation to send the appropriate messages.

The RE will be adaptable, in that the rules will not be static, and there will be an interface that allows both authenticated users to perform updates and changes. This interface will be exposed through the Web Service and a SOAP interface.

Additionally the RE should be operationally independent from all of the component parts, that is to say that it will still be able to run, albeit with a reduced feature set, when the other components cannot be contacted. This is an important feature as it tells us about the whole world state.

Implementation

Technology

The technologies used in developing the RE are varied. The principle language of use will be Java 6, this is so we can use RMI to communicate between the internal components, though external entities can connect through the SOAP interface, which means we could replace this with another system easily. The rule data will be stored in a number of MySQL tables.

In the cases that our EMS is going to be used in it would be acceptable for the rule engine to poll the sensor information in the database as the granularity that it provides would be proportional to the rate of incoming data. This also means that the system will not have circular dependencies, which can often lead to live-lock.

Coding

This section will give an overview of the important classes built within this section and a look at how these parts operate together and their various interfaces to the other elements in the system.

RuleEngine
The RE itself is the central server of all of the functionality. After starting it immediately attempts to bind itself to the RMI registry, then to the other required components. It maintains the links to the other elements re-binding if the links fail, thus maintaining its own state irrespective of whether the rest of the system is functional.

This component acts as a remote interface and exposes a number of features allowing the other systems to change the world and rule state. This includes starting and stopping the server, adding and modifying rules, and forcing particular actions execution. It also performs validation and verification of the user input, when a rule is added or updated the RE will check first that the syntax is correct then verify that all of the required devices exist within the system.

World State

The world state, or knowledge base, that is used by the system is used for deciding which rule effects should be invoked. This object is updated by the RE whenever new data is retrieved from the sensors, devices or rules are changed.

The sensor data, which is updated before each rule evaluation, is retrieved from the ORM and through its get latest measurement interface. This basically gives us a view of the world as it is now, and from this we can make the decisions on which rules to invoke.

When the RE is started the world state puts all of the effectors into an inconsistent state, this is as we can be unsure whether certain devices have been switched on or off in the time the system was offline. It then, while running will build up a model of the state of each of the devices, as certain rules within the system are contingent on what state the particular device is in. Also it allows the rule system to have a set of conditions on when to invoke rules.

For example, if we were to say a radiator would be turned on when the temperature dropped below 20 degrees, and then the temperature dropped below 20 degrees, the radiator would be sent a message to switch on. This is fine, however, on the next iteration the temperature may still be below 20 degrees so the message would be sent again, and continue to be sent. To minimise the amount of messages we would say the rule would state, turn on the radiator if the temperature drops below 20 degrees and the radiator is off.

Finally the rule data is held within memory so that they can be quickly evaluated on each run through of the evaluation. They are stored indexed by both device and id, so that if new information comes in about a particular device we could just check through the corresponding rules and not have to search through all the rules.

Rule Syntax

The rule syntax is fairly straightforward and intuitive once the user understands the various features associated. Primarily the rules are logical with certain elements of predicate logic to give slightly more expressive power to the language.

The logical part of the system allows us to specify a Boolean expression with various mathematical operators and sensors. For example, if we wanted to turn on the lights in a building we could represent this by stating:

sensor(sensor_id) <= 99

This allows us to construct rules that are specific to using the active world state data to calculate.

State based rules extend the logic system allowing us to execute if an effector is in a particular state. That is to say when the device holds a particular value (in the case of something like a thermostat) or is on or off. This allows us to minimise the amount of messages that are sent as described in the world state section above.

Temporal rules allow the adding of a time based element to the execution of rules, this can be assigned as an additional parameter that is passed through the basic rule system. There are two different cases absolute and relative. With absolute time we assign a particular time or date when this rule should be executed. The second, relative, looks at a particular length of time that has passed since a particular reading took place. For example, one might want to have a light in a room turned off after a period of no activity.

Each rule is assigned a priority, by default this is low for normal rules, and real (highest) for temporal rules. When the rules are being invoked and more than one message would be sent to a particular effector the RE performs erasure over those with lower priorities. So where a temporal rule exists and executes a particular action it will always execute above another rule of lower priority.

Example:

effector(16FF,6,-1) & sensor(11EA,1) >= 10 ==> effector(16FF,6,ON)

Says if effector 16FF:6 is in an inconsistent state and the sensor 11EA:1 is currently reading above 10 then turn on effector 16FF:6.

Resolution Engine

The resolution engine is very simple and contained within the Logic System class as a sequence of static methods, the crucial method being eval . This method takes basically any set of arguments that could be used and the current world state and returns those rules that should be invoked.

The actual assessment of the rules looks at each Condition object within a particular rule, as we are restricted to joining the conditions together with the and operator we can simply check to see if the condition holds and move on. If any of the conditions don't hold then we can instantly falsify the entire rule and move on.

The temporal rules have the additional proviso of having to look whether the time is within a particular timeframe, a drift can be specified, in our case this is double the time between each wake up of the RE to ensure that the rule is invoked, obviously this is more critical for relative rules than absolute, which we can check per day if a rule has not been executed past a certain time execute it.

Evaluation

Alternate Options

During development of the system there were a number of alternate paths that could have been taken, but due to certain constraints couldn't be explored fully, some of these could be made into extensions with additional work though are more concerned with changing pieces of the already established system.

Firstly there was an alternative implementation within the Manadrax Inference Engine project, this included a full logical system that could have been used instead of the relatively simple one that was implemented. This would make the logic system more complete, however, may add too much additional complexity for the users of the system. Overall it was decided that although we could use an abstraction to make it simpler to use in this case it was easier to re-invent the wheel and make the system, which allowed easier customisation in our relatively fluid development flow.

Using the ORM to retrieve the current world state can sometimes be problematic, firstly we are putting additional strain on the ORM and also we have to wait for the data to persist into the database. Another option would be modifying the basestation so that the data handler would multicast the sensor data to the ORM and the RE at the same time the provision is certainly there with the message listener interface. This would stop unnecessary polling of the ORM and mean that the current world state was more up to date.

One option is to distribute the rules across the sensor network, not only does this take the responsibility away from the RE. This not only reduces the amount of execution time on the RE but also speeds up response of the devices. Though currently (although not in the examples we used) the EMS would be taking decisions over a large length of time so propagation of messages through the sensor network wouldn't matter. The problem is how do we want the devices to react if they are not being coordinated by the RE should they independently continue to run the rules or stop, additionally it means that the system can be in an inconsistent rule state, with the RE holding a different set of rules from the various devices.This could be made as an extension to the current system, with some sort of ranking of how complex a rule is allowed to be before it must be controlled centrally.

Within the system currently a number of systems are using polling as their method for updating data, chiefly the temporal rules and the RE updating the world state. In testing it was made clear that if the temporal rules were polled they could fail to be executed, because they appear outside of the drift timeframe. If they were set off by a clock mechanism it would be ensured that the appropriate rules were executed. The aforementioned problem could also be partially solved by ensuring that the sensor data is passed through in a timely manner, as in the direct interface. The other situation would be the database informing the RE when it receives the new information, this would be possible through use of TRIGGERS and the database executing some native code, however, this would make the ORM circularly reliant on the RE which is an unneeded complication, and the same could be fulfilled by bypassing the ORM altogether with the direct interface.

The connection with the device controller interface is currently blocking (though this isn't noticed on the RE side as multiple threads send the messages) and has to wait for the returning message. On the device controller side we can see this as when a particular rule is invoked the effectors will be affected sequentially rather than in parallel. Making this asynchronous with a call back interface would make this much better, allowing the device controller to maintain the consistency of the devices world state as this is a slightly more logical separation of concerns. Additionally this would prevent the system accidently drying up the thread pool or using up all the memory from lost messages.

Appropriate Extensions

There were a number of extensions that would have been nice in the system, however due to time constraints we couldn't add them. These unlike the above, wouldn't require any major architectural changes in terms of the system so they are classified here.

A number of changes could be included in the current rule system, firstly a more active error reporting and validation system. For example, it is possible to include two rules in the same priority level which execute contradictory actions; in this case the response is undefined. It would be more user-friendly to flag a warning when this is going to happen to inform the user of the problem. Another nice extension would be allowing use of the OR Boolean operator within the preconditions, this would allow many rules to be merged rather than having a few rules for one situation. Currently the system requires a rule per state of any given effector to ensure every possibility is captured.

On the initial designs it was suggested that the RE could respond to complex rule operations based upon data over time, as analysed by the reporting engine. This extension was never completed as the report engine was completed very late in the development cycle and there was no time to integrate the changes. This would be useful for specialist situations, though would probably require re-coding for each change unless substantial work was done on making some sort of generic reporting structure.

Allowing the world state to be exported to the GUI would have a number of useful features; firstly we would easily be able to see how a rule affects the world (rather than having to be in the place where the effector was to see that it had changed state). Currently the only use is for managing the internal state of the system. Additionally we could confirm that the system is responding as we would expect.

Conclusion

The RE performed all of the required points highlighted in the specification, with quite a high degree of success. A few of the conscious design decisions that were made, in particular the sensor reading framework, meant that it could sometimes be quite cumbersome to test various parts of the system. It also showed that other methods would make the system more generic and allow a wider variety of problems to be solved by this one system.

Overall the system performed acceptably, given the various constraints that were put on it. The first constraint was man hours, both the size of the team and the physical amount of time that we had to explore the problem. Given more time the additional features mentioned above would allow this to become a more commercially viable and adaptable system. The extra time would enable the team to have a larger design time and hence more time for prototyping the various solutions, in a couple of cases decisions had to be made without proper testing of the various methods that were available.