Monday, September 1, 2008

Maintaining data consistency in mobile database broadcasts

Abstract
The broadcast-based data dissemination in wireless environments poses new challenging issues on data consistency of transaction processing. Due to concurrent execution of update transactions and broadcast of data items, mobile transactions generated by mobile clients may observe inconsistency of data values. Among the concurrency control techniques providing data consistency in broadcast environments, Update First with Order (UFO) algorithm has been shown to be more efficient and has minimal impact on server, client and the broadcast schedule. In this paper, we propose a mobile database system that uses a modified UFO algorithm utilizing rule-based approach for concurrency control among mobile and update transactions in the broadcast environment. We use Event-Condition-Action (ECA) rules to make decisions on broadcasting without conflicting. Moreover, this system has an inference engine that makes deduction using forward chaining control flow to provide effective consistency checking during updates. The system discussed in this paper, provides site autonomy between the mobile clients and the server with minimum upstream communication and data consistency that are desirable features to the scalability of applications, which are running in broadcast environments.

Introduction
The increasing ability to interconnect computers through internetworking, wireless networks, high bandwidth satellite, and cable networks has spawned a new class of information centered applications based on data dissemination. In contrast to traditional, where data are delivered from servers to clients on demand, a wide range of emerging database applications benefit from a broadcast mode for data dissemination. In such applications, the server repetitively broadcasts data to a client population without a specific request. Clients monitor the broadcast channel and retrieve the data items they need as they arrive on the broadcast channel. Such applications typically involve a small number of servers and a much larger number of clients with similar interests. Examples include weather information systems, electronic commerce applications, such as auction and electronic tendering, and traffic control information systems. With data broadcast, mobile transactions do not need to inform the broadcast server before accessing a data item and can get it from the “air” while it is being broadcast.

As data dissemination systems continue to evolve, more and more sophisticated client applications will require reading current and consistent data despite updates at the server. Since many data items in mobile computing systems are used to record the real-time information such as current traffic conditions of the roads, current weather conditions of the cities, and news updates, their values will be highly dynamic and sensitive. The updates capture the most current information for the system and refresh the values of the data items in the database [9][10]. Accessing out-dated data items is undesirable and will significantly affect the usefulness of the information to mobile clients [2]. Thus allowing the execution of updates while it is being broadcast is important in maintaining the validity and freshness of the data items. However, if concurrent execution of updates and data broadcast is allowed, the problem of concurrency control must be addressed. Given the limited amount of bandwidth available for clients to communicate with the broadcast server, achieving data consistency efficiently is a challenging research issue.

Unfortunately, conventional concurrency protocols, such as two phase locking, optimistic method and timestamp ordering [1] are not suitable for mobile computing systems as the overhead for setting locks and detecting data conflicts in a mobile environment can be very heavy [8].

In this paper, we study the problem of disseminating consistent data items to mobile transactions while allowing updates to be executed concurrently at the database server. An efficient and pioneering method is by broadcasting multiple versions of data items [18]. Consistent data items are provided to mobile transactions by requiring the mobile transactions to read data items committed at the same point of time. The basic multi-version broadcast method is extended in [16,17] for systems with client caches where multiple versions of data items are maintained. By reading cached data items, which are committed at the same time, the data access delay can be much reduced and at the same time data consistency is ensured. Another efficient method for concurrency control between mobile transactions and update transactions is a data re-broadcast scheme called Update First with Ordering (UFO) [3]. Although UFO can provide the most updated values of data items to mobile transactions and at the same time maintain the serializability of the execution between update and mobile transactions, it is designed for read-only mobile transactions in which operations are unordered. In this paper, we extend the UFO protocol for read-write mobile transactions, which uses rule-based approach.


Figure 1: Architecture of Mobile Database System


The following are the contributions of the paper
UFO algorithm is extended based on rule-base approach for mobile transactions in which read-write operations can be done.
Enhancements based on an invalidation scheme are done for accessing consistent data.
The organization of the remaining parts of the paper is as follows. Section 2 reviews related work on broadcasting consistent data items. Section 3 describes the system architectural model. In Section 4, we describe the correctness of transactions in broadcast environments. Section 5 discusses the problem of data inconsistency in data broadcast using examples. Section 6 introduces the modified UFO algorithm with a detailed discussion on its correctness and properties. The paper concludes in Section 7.

Related works
Data management in broadcast environments receives a lot of attention in these few years. However, there are only a few studies on transaction processing [8]. In [11], Datacycle architecture is introduced for high throughput database systems. The entire contents of the database are repetitively broadcast to the hosts over a high bandwidth network. It could be very expensive to use the serializability as the correctness criterion in the Datacycle architecture [8].

In [8], a new correctness criterion is proposed to allow read-only transactions to read current and consistent data in broadcast environments without contacting the server. However, the serializability is not maintained in their protocol. Two different read-only transactions may perceive the effects of update transactions in different serial orders. It may be hazardous to certain applications such as mobile stock trading where a buy/sell trade will be triggered to exploit the temporary pricing relationships among stocks. From the trader’s perspective, the inability of database management system to maintain the serializability may have important financial consequences [12]. For instance, if the users who submitted multiple read-only transactions communicate and compare their query results, they may be confused [13]. In [8], they proposed two protocols, F-Matrix and R-Matrix. Although the F-Matrix shows better performance, it suffers from high overheads in terms of expensive computation and high bandwidth requirement for additional control information.

In [4,6], multi-version data broadcast approaches are suggested to resolve the problem of reading inconsistent data values. In addition to the most updated version, all the previous versions within a time frame need to be broadcast as well. Another method to detect the non-serializability problem is to broadcast serialization graphs [4,6]. Each client maintains its local serialization graph to ensure that the schedules of all the committed transactions are serializable. The first drawback of this method is also the heavy overhead in broadcasting the serialization graphs since every data conflict at the database server has to be broadcast. Secondly, each client must listen to the transmission channel to maintain and update its serialization graph [5]. This leads to another serious problem, which can affect the correctness of this approach: the need to maintain the serialization graphs at the client mobile even when it is disconnected. The mobile network is unreliable and disconnection is frequent. When disconnection occurs, the mobile client cannot obtain new serialization information about its transaction, making it virtually impossible to ensure serializability of transaction execution.

In [3], UFO algorithm can maintain the serializability of update transactions at the database server and the read-only mobile transactions. This algorithm has been shown to be more efficient and has minimal impact on server, client and the broadcast schedule. However, update operations of mobile clients have not been taken into account. In this paper, we propose a mobile database system that uses a modified UFO algorithm, utilizes a rule-based approach for concurrency control among mobile and update transactions in the broadcast environment. Moreover, this system has an inference engine that makes deduction using forward chaining control flow to provide effective consistency checking during updates.

System architecture model
Figure 1 presents a general architecture of the mobile database system similar to that described in [14,15]. Architecture model consists of stationary and mobile components. Mobile component is the mobile client, which can connect fixed network via a wireless link. Stationary components could be fixed host or base stations. Fixed Host in a fixed network does not have capable of connecting to a mobile unit. Base stations are augmented with wireless interface to communicate with mobile computers, called Mobile Support Station (MSS). In the above architecture model, database servers residing on fixed host and MSS compose a distributed database system. Every database server possesses site autonomy and supports local transaction processing through Two-Phase Locking mechanism. As a part of the distributed database system, each MSS can also function as a mobile transaction coordinator, which receives transaction operations from mobile clients and coordinates their execution. A mobile client can move arbitrarily at any moment within a wireless cell or among different cells. When it is connected with the fixed network, it always communicates with the MSS supporting the cell where it is currently located. In addition, it can disconnect from the fixed network at any time. When it reconnects with the fixed network, it fetches back the results of previous operations and sends subsequent operations.

An important property of the data items in a mobile computing system is that their values can be highly dynamic and sensitive as they are used to maintain the useful information such as weather, highway conditions, traffic directions, news, stock quotes and the location of a moving object [10]. To maintain the freshness and validity of the data items, data items should be updated often by the generation of update transactions. It is assumed that the updates are generated by means of some external environment such as capture devices, and sensors. The broadcast server disseminates all the data items periodically in each broadcast cycle. The duration of a broadcast cycle may be fixed or variable depending on the adopted broadcast scheduling algorithm, which is also used for selecting data items to broadcast.

Update transactions consist of one to several write operations and its arrival rate can be very high. It is assumed that a well-formed concurrency control protocol, such as two phase locking [3], is used for concurrency control amongst the update transactions at the database server. Mobile clients issue mobile transactions to access data items at the database. It is assumed that each mobile transaction consists of a sequence of read operations and defined with a deadline. Moreover mobile transaction will become totally useless after its deadline and has to be aborted or restarted.

In our model, the broadcast process is modeled as a long read-only transaction, called broadcast transaction. The length of a broadcast transaction is defined based on the life span of a mobile transaction that is equal to the time required to broadcast its data items. Thus, the data item set of a broadcast transaction includes all the data items, which are broadcast during the period from (current time - life-span of a mobile transaction) to current time. The reason of choosing the life span of a mobile transaction to define the length of a broadcast transaction is that for normal cases a mobile transaction will be finished within its life span. Note that the broadcast transaction actually does not possess any characteristics of a transaction.

The following are the assumptions of the system model are summarized below:
All update transactions are processed at the database servers.
The arrival rate of update transactions is high and random.
All mobile transactions are read-write.
Each mobile transaction has a deadline, i.e., arrival time + life span. It is important to complete a mobile transaction before its deadline. Otherwise, it should be restarted.
The broadcast process is modeled as a broadcast transaction.
Correctness of transactions in broadcast environments
Dissemination of data items to mobile transactions may be performed concurrently with the execution of update transactions. So it is possible that mobile transactions may read inconsistent data values due to uncontrolled interleaving of update and mobile transactions. Moreover, Mobile transaction should access updated value of data items. Thus, there are two fundamental requirements for disseminating data items to mobile transactions: Mutual consistency and Currency. Mutual consistency ensures that (a) the server maintains mutually consistent data and (b) clients can read mutually consistent data. Currency ensures that clients can access data that is current.

Mutual Consistency
Concurrency control amongst update transactions are to be done by a well-formed concurrency control protocol such as Two-Phase Locking [1], we have to consider the inconsistent problem of data conflicts between mobile transactions and update transactions. Our proposed protocol will concentrate on resolving this type of data conflicts to ensure that all mobile transactions will read consistent data values. In this paper, we adopt serializability as the correctness criterion of database consistency as it is widely accepted in the database community. If the serialization graph of a set of transactions is acyclic, then the schedule is serializable.

Currency
The Important characteristic of the mobile database systems is that the values of the data items are highly dynamic and the arrival rates of the updates can be very high. While data items are being broadcast from the database server to the mobile clients, updates of data items are being installed into the database. In such a dynamic environment, it is difficult to maintain a strict consistency between the external environment and the corresponding values of the data items in the database. This is especially true in the case of distributed database system. For example, in a weather monitoring system, updates are generated at the weather monitoring place database server. For most cases, stale information is much less useful to mobile clients. In order to minimize the staleness of the information, the system has to process update transactions as soon as possible and broadcast the most up-to-second information, e.g., the latest committed value of a data item. Since providing the most updated values of data items in a mobile network can be very expensive, an alternative is to bound the degree of “staleness” of the data items within a pre-defined bound. For some data items, e.g., weather information, the data values will be totally useless if they are “older” than a certain pre-defined time bound [7].

Data inconsistency in dissemination
In this section, we briefly discuss the inconsistency problem in data dissemination [3] using some examples.

Case1: Data Conflict between mobile and update transaction

Suppose the update transaction updates the weather for cities ‘Chennai’ and then ‘Mumbai’ and mobile clients wants to access the value of weather for cities ‘Mumbai’ and then ‘Chennai’.

If the execution history is:
Broadcasts weather value for ‘Mumbai’
Mobile transaction reads weather value for ‘Mumbai’
Update transaction updates weather value for ‘Chennai’
Update transaction updates weather value for ‘Mumbai’
Broadcasts weather value for ‘Chennai’
Mobile transaction reads weather value for ‘Chennai’
The mobile transaction may observe inconsistent data values. The reason is that mobile transaction reads the weather value for ‘Mumbai’, which is in conflict with update transaction, before update.

The serialization graph is cyclic because of MT U MT. Thus, it is non-serializable.

Case2: Mobile transaction may conflict with two or more update transaction

Suppose there are two update transactions U1 and U2 such that U1 updates weather value for ‘Chennai’ and then ‘Bangalore’, and U2 updates weather value for ‘Bangalore’ and then ‘Mumbai’.

If the execution history is:
Broadcasts weather value for ‘Chennai’
Mobile transaction reads weather value for ‘Chennai’
Update transaction1 updates weather value for ‘Chennai’
Update transaction1 updates weather value for ‘Bangalore’
Update transaction2 updates weather value for ‘Bangalore’
Update transaction2 updates weather value for ‘Mumbai’
Broadcasts weather value for ‘Mumbai’
Mobile transaction reads weather value for ‘Mumbai’
The serialization graph is cyclic because of U2 MT U1 U2.

Thus, it is non-serializable.

Case3: Involving two or more broadcast transaction

A mobile transaction may not able to find all its required data items in a single broadcast cycle. Non-serlializability may occur over more than one broadcast transaction.

If the execution history is:
Broadcasts weather value for ‘Chennai’
Mobile transaction reads weather value for ‘Chennai’
End of Broadcast Cycle1
Update transaction updates weather value for ‘Mumbai’
Update transaction updates weather value for ‘Chennai’
Broadcast transaction2 broadcasts weather value for ‘Mumbai’
Mobile transaction reads weather value for ‘Mumbai’
The serialization graph is cyclic because of MT U MT. Thus, it is non-serializable.

Note that we ignore the broadcast transaction as they don’t have any real effect on the database consistency and their function is only to provide mobile transaction to read.

Modified Ufo
In [3], Update First with Order (UFO) protocol is proposed for broadcasting consistent data items to mobile transactions with unordered read operations, i.e. each mobile transaction consists of a set of read operations and the operations can be executed in any order. In this paper, we propose a modified UFO algorithm, for read-write mobile transactions, which uses the rule-based approach. The issues of accessing consistent data items and processing operations under disconnection will also be addressed.

The basic principle of the modified UFO algorithm is to ensure that if a data conflict occurs between a broadcast transaction (BT) and an update transaction (U), the serialization order between them will always be U ® BT. Since mobile transactions (MT) read data items from broadcast transactions, the serialization order between broadcast transactions and mobile transactions is always BT ® MT. Thus, the serialization order between update transactions and mobile transactions will always be U ® MT and the schedules will be serializable.

Basically, the OUFO protocol consists of two parts:
Execution of update transactions
Conflict resolution between update and broadcast transactions
Execution of Update Transactions
The execution of an update transaction is divided into two phases: the execution phase and the commit phase. During the execution phase, the operations of an update transaction are executed and data conflicts with other update transactions are resolved using a conventional concurrency control protocol such as 2PL. The new values from the write operations of a transaction are written in a private workspace of the transaction instead into the database immediately. When all the operations of the update transaction have been completed, it enters the commit phase in which permanent updates of the database will be performed by copying the new values from its private workspace into the database. Data conflict with the broadcast transaction will be checked in the commit phase, which is performed in a critical section. So, we can see that the update transactions adopt 2PL for resolving data conflicts with other update transactions and use an optimistic approach to detect conflicts with the broadcast transaction. There are two important advantages in dividing the execution of the update transactions into two phases. Firstly, it can significantly reduce the blocking probability and delay of the broadcast transactions. If a broadcast transaction wants to read a data item, which is already locked by an update transaction during its commit phase, the broadcast transaction will be blocked until the update transaction is committed based on the principles of 2PL. At the same time, the update transaction, which is holding the lock, may be blocked due to data conflicts with other update transactions. Due to transitive blocking, the blocking time of the broadcast transactions can be very long. Dividing the execution of an update transaction into two phases can greatly reduce the blocking probability and blocking time of broadcast transactions since data conflicts between update and broadcast transactions will occur only when the update transaction is in the commit phase, which is much shorter. Secondly, the detection of data conflicts between the update and broadcast transactions will become much easier. At the commit phase, the system will know which data items have been accessed by the update transaction. By comparing the write set of the update transaction with the read set of the broadcast transaction, the system can easily determine whether there is any data conflict between them.

Conflict Resolution between Update and Broadcast Transactions
Data conflict between an update transaction and a broadcast transaction is detected when the update transaction enters its commit phase. Re-broadcast is used to resolve the conflict by reversing the serialization order from BT ® U to U ® BT. The details of the algorithms at both the server and mobile clients are shown in the following sections.

Algorithm at the Server:

It is performed when an update transaction enters the commit phase.
On(transaction commit)
If OBT Ç OU = { }
then
BT and U have no dependency
else
for each data item di Î { OBT Ç OU }
re-broadcast data item di
endif

where OBT = set of data items of broadcast transaction BT
OU = set of data items of update transaction U

By re-broadcasting the conflicting data item, the serialization order between the broadcast transaction and the update transaction is reversed. The set of data items in the broadcast transaction consists of those data items, which are broadcast in the period from (current time ??life-span of a mobile transaction) to current time. Thus, the set of data items in the broadcast transaction is not fixed. After the broadcast of a data item, the data item will be included and the last data item in the broadcast transaction will be removed.

Algorithm at the Mobile Client
The data items requested by a MT are represented by a sequence of read operations. Processing of a MT starts from the first read operation in the sequence. Each data item received from a BT is matched with the requesting data item of the executing operation. If there is a match, the MT will read the data item and the operation will be processed. The process is repeated for the next read operation until there is no more operation in the sequence. In case a data item, which is already read, is re-broadcast while the MT is waiting for other data items, the MT will be restarted from the operation which requests that data item. It will use the re-broadcast value for the execution of the operation. There are two reasons for the restart. Firstly, it is to ensure the serializability order U ® MT. The second reason is to provide the most updated data values to the mobile transactions. The algorithm at the mobile client is shown below where it is assumed that the MT reads all its data items from BT and there is no cache at the client.

dc = the data item required by the first operation in LMT loop until LMT exhausted
read data item di from BT (di is the currently broadcasting item)
if di = dc
then
MT processes di and updates LMT
else
if di Î SMT
MT repeats the processing on di , updates LMT
Restarts its execution from the operation, which requires di
endif
endif
dc = the data item requested by the next operation in LMT
end loop
where LMT = sequence of read operations in MT
SMT = set of data items have been accessed by MT

Correctness and implementation of modified UFO

Correctness
The schedules of the committed transactions produced from our algorithm are always serializable. To prove this, we will show that the serialization order between all committed mobile transactions and their conflicting update transactions will always be U ® MT, if the mobile transactions read the conflicting data items from the broadcast transaction.

Let MT, BT and U be mobile transaction, broadcast transaction and update transaction, respectively. If there is a data conflict between BT and U, and the broadcast of the conflicting data item is before the update of the data item, the serialization between BT and U will be BT ® U. According to our algorithm, the conflicting data item will be re-broadcast immediately and the serialization order will be reset to U ® BT. If the broadcast is after the update, the serialization order will be U ® BT. Since MT reads data items from BT, their serialization order will always be BT ® MT. Since serialization order is transitive, the serialization order between MT and U will always be U ® MT for the data conflict between MT and U and the conflicting data item is broadcast during the length of the broadcast transaction.

It is obvious to see that the currency of data items observed by mobile transactions can be maximized due to the data re-broadcast mechanism of our algorithm since all the data items which are broadcast within the length of the broadcast transaction are the most updated version.

Implementation
We discuss the implementation and characteristics of our proposed algorithm. This algorithm does not require any changes in the mobile clients except in getting re-broadcast data items. The main overheads of the our algorithm are:
Division of the execution of update transactions into two phases
Checking of the data sets of a broadcast transaction and an update transaction whenever an update transaction wants to enter the commit phase
Re-broadcast of conflicting data items if these items are broadcast before the start of the commit phase of the update transaction.
Dividing the execution of update transactions into two phases is trivial and should not incur much additional overhead. The overhead is for checking conflicting data sets. To speed up the checking process, the data items to be updated may be sorted according to their IDs.

The main overhead of this is data re-broadcast. The number of re-broadcast depends on the probability of data conflict which in turn depends on the broadcast schedule, arrival rate and the update pattern of the update transactions. The algorithm at the mobile clients is simple and does not incur much additional overhead for checking. The only additional work is to replace the old version of a data item in case it is re-broadcast.

The main advantages of our proposed algorithm are summarized below.

Consistency: It ensures that the execution schedules of the transactions are serializable.

Currency: It can maximize the currency of the data items read by mobile transactions by re-broadcast. In general, it can only provide the committed values at the last broadcast cycle as all the database updates are performed at the end of a broadcast cycle.

Broadcast algorithm: It can be applied with many other broadcast algorithms, which may only broadcast a subset of the data items in the database.

Conclusion
Data broadcast has been shown to be an efficient method for disseminating data items to transactions generated by mobile clients. If the broadcast of data items and the execution of update transactions are uncontrolled to maintain the validity of the data items at the database, mobile transactions may observe inconsistent data values. In this paper, we study the issue of concurrency control between update transactions and mobile transactions. Our proposed protocol uses ECA rules to make decisions on broadcasting without conflicting. Although the protocol is simple, it can effectively maintain the schedule between update and mobile transactions to be serializable and at the time maximize the currency of the data items observed by the mobile transactions. This system has an inference engine that makes deduction using forward chaining control flow to provide effective consistency checking during updates. The system discussed in this paper, provides site autonomy between the mobile clients and the server with minimum upstream communication and data consistency that are desirable features to the scalability of applications, which are running in broadcast environments.

No comments: