Wednesday, September 17, 2008

mc database

B. Mobile Databases
The conventional database model is changing fast [1][3].
Mobile devices are now capable of complex processing tasks
and have significant storage capability. As a result a new
approach of storing databases on every single machine, part of
a network is gaining significance. The concept of one central
autonomous database is thus fading fast, with the new notion of
local databases at each machine gaining prominence. Mobile
Hosts can now become machines for data processing and native
storage. Thus the physical location of databases is changing.
Frequent disconnection in mobile devices results in two
approaches. The first one believes that a MH will be offline for
only a very short period as it changes regions, while the other
supports the fact that a mobile user may be offline for a long
time and only comes online briefly. This can be a major design
consideration for any application designed for such devices.
The fundamental notion here is that a mobile user must be able
to work with his own local copy of the database even if his or
her connection to the network is broken. It is essential to handle
both cases maintaining user transparency at all times. For
example we may initially have a permanent connection to the
network, however as battery power runs low we can switch to
the other approach where the MH comes online only
intermittently for very brief periods only when needed or more
generally to perform an amalgam of network related tasks
simultaneously. Thus of significant concern is the limited use
of broadcasts or any other connectivity resource in-order to
make optimum use of available battery power.
Identification of a particular MH(the source of the database
under consideration) is also difficult. How does one locate the
MH in question? This is further complicated by the fact that
MHs move all the time, showing up intermittently in various
places, to further complicate things service areas(BH regions)
are not always contiguous [2][5] and may even be overlapping,
figure 2. Moreover a MH may not always move in a straight
line. As soon as an MH enters a new region the new region
manager(BH of the new region) detects a foreign entity and
asks it to identify itself. Upon receiving details the new BH
contacts the home BH or old BH of the MH and asks it whether
or not the mobile entity is what it claims to be(though this is
time consuming it is important from the safety point of view).
After credentials are established the MH may now work in its
new region, moreover the old BH may also transfer any
transactions started by the particular MH to the new BH. It
however may only return results(if the transactions are not
location dependent) to the new BH. One of the approaches is
chosen depending upon context.
Another, more severe problem is how does an MH find out
which BH it has to communicate with. Consider the case shown
in figure 2, when the MH moves it may come into an
overlapped region, how does the MH now decide what BH to
communicate with. This problem may be solved with a flurry of
messages between the concerned BHs and MH. This problem is
generally another major design consideration however it still is
a key aspect of research in this area.
As with all new approaches, significant problems and
complications result in such a system. Apart from conventional
difficulties a mobile database system may suffer from other
problems. Some of them are –
• Failure of sites
• Loss of messages
• Failure of communication links
• Network partition
Moreover the process of executing queries and supporting
concurrent execution of transactions becomes extremely
complex. Further, distributed (mobile) databases may be
categorized as either heterogeneous or homogeneous databases
depending upon the schema type they support. It is not
uncommon to have whole distributed systems designed with
only fixed schema sets, significantly more interesting(and
complex) is designing distributed systems that allow different
schema sets in the underlying databases at different sites on the
network. We tackle several of these issues in the following
sections.
C. Multidatabase System
A multidatabase system [6] is the term used to refer to a
collection of physically separated, but logically integrated
group of databases. That is, it is an approach wherein different
databases are combined together by an application software
layer to give the illusion of one integrated database to the user.
The additional layer is thus completely portable and pluggable
[7]. These different databases, called local databases generally
have different hardware and software implementations. In fact
it is generally economically viable to maintain local database
schema (also the hardware and software) and hence the need to
maintain local autonomy. MDASs are multidatabase systems
capable of large-scale operation over a wireless medium.
D. Transactions and Concurrency
The fundamental notion governing all database activity is a set
of four properties known commonly as the ACID properties
[8]. These properties take a whole new meaning in the
distributed sense. Mobile transactions are completely different
from conventional flat transactions. Mobile transactions must
be defined adequately as they are often unable to follow the
ACID model(sometimes a more complex strategy like a
modeling based on trees is adopted). Properties of transactions
and hosts on the move must be considered at all points giving
due consideration to an excellent recovery approach(generally
detailed logs). In this effect a proper requirement analysis is a
must[1]. The difficulties and security concerns with mobile
transactions must be well understood. Also search costs and
overhead to locate the correct MH is another important
parameter. Transactions thus take a lot longer to complete in
mobile environments, where a transaction may be defined as a
collection of queries geographically distributed, and arranged
so as to obtain location specific results. Thus there is a need to
temporarily deviate from the ACID properties when it comes to
mobile transactions. A general transaction may be of the
following type-
begin_transaction( ) {
if(lock_table_success) {
//statements of the transaction
select * from table_account;

//do some work

transaction_successful = true;
//transaction code ends
}
if (transaction_successful)
commit_work( );
else
rollback_work( );
Concurrency related to transactions demonstrates the fact
that more that one transaction can execute simultaneously such
that none of them is aware of the presence of others. Moreover
the database(s) must remain in a consistent state when these
transactions finish, however temporary inconsistency may be
allowed for a transaction to complete its work. Concurrency of
transactions increases design complexity however, performance
benefit realized as a result is enormous. Mobile and Distributed
systems however represent even higher levels of complexity.
Transactions become long lived and concurrency becomes
harder to achieve. A number of approaches to solve these
problems in massively heterogeneous systems have been
devised. These range from relatively simple locking and graph
based protocols to complex distributed algorithms [8][9].
Transactions take a lot longer to finish in mobile systems; apart
from this communication failures etc. may cause problems
[2][10][11][12]. A detailed approach to handling transactions in
a massively distributed system is discussed in a subsequent
section.
E. Deadlocks
The problem of deadlocks though of important consequence is
generally ignored by centralized systems as such. This is
generally because deadlocks are a rare occurrence in these
systems and the overhead associated with handling deadlocks
can be significant. The situation in the distributed context is
different though. As an analogy consider a peer-to-peer
database system, where each node can generate queries that are
both local and global. A system of simply 200 nodes with 10
queries per node can create a significant number of
transactions. As is obvious competition for resources is
immense and the problem of deadlocks takes center stage.
Theory calls for a formal definition of deadlocks, this section
talks about it. A system is in a deadlock state if there exists a
set of transactions {T0,T1,…..,Tn} such that every transaction in
the set is waiting for another transaction in the set[8]. In a
multiprogramming environment, analogous to the mobile
database model, several processes may compete for a finite
number of resources. Such a situation can lead to a deadlock
where processes or transactions are blocked, waiting
indefinitely for a resource that is held by another blocked
process. Deadlock detection and avoidance forms an important
part of any distributed system. Techniques for deadlock
management are essential to see the efficient working of any
such system. A number of algorithms to handle deadlocks are
available [8][13][14][15], both in the distributed and
centralized sense.
II. System Architecture
We have devised a system that is capable of operation in large
scale distributed(mobile) environments. The approach
considers all aspects of such a system, problems faced by the
various aspects and implementations after rigorous
performance testing. Our main focus was to design a system
capable of extending commercially available centralized
databases to operate safely under the rigors imposed by largescale
distribution. We have opted for the mobile database
model where in each node or machine has its own local copy of
a database. Databases present may even have different schema.
Each node is capable of submitting queries or transactions to
any other node currently registered. The approach is fully
distributed with only an initial registration with a MainServer
needed. For this purpose any host may act as a server after
being chosen by a simple election algorithm (when the system
is started initially). We have thus eliminated the need for
central transaction managers, resource managers, etc, which
(we feel) only act as centralized bottlenecks when such a large
system is to be considered. All nodes have their own
concurrency control and log maintenance for effective
restoration from failure. A fully distributed approach is thus
maintained. A host after initial registration with the acting
server is capable of complete autonomous operation. It simply
needs to import a list of current hosts which can be easily done
with the button. It is then possible to address and
communicate (work with the database of) any host currently on
the network. Even the central MainServer is expendable and is
only implemented as we need some mechanism to have node
details. In this effect a simple distribution list may be cycled
through the network to provide the hosts with necessary
information. We have however opted for the server based
approach. The architecture of this system is shown below in
figure 3.
The presented model is in fact similar to a multidatabase
system[6], like most multidatabase systems, local autonomy is
maintained even after joining the global system with the
fundamental difference lying in the fact that our approach is
fully distributed and there is no global layer to control the
systems’ working.
File Handler
Stable Storage
Transaction Processing System
User Interface
System Manager
Connections Manager
Run-Time Storage
Log Files
Local
Database
and
Database
Software
Transaction and Query
Manager (local)
Transaction Coordinator
Concurrency Manager
Lock Manager
Deadlock Handler
Recovery Manager
Transaction
Submitter
Update Manager
Figure 3. System Structure
Complete local autonomy also brings with it a surcharge, the
fact that a global transaction may starve if a local user does not
give it time to run. The user may do this inadvertently as most
user based transactions are long lived. For this purpose a
option is provided and the user is encouraged to
commit her work periodically thus creating what amounts to
sagas, where the user’s work is committed and global
transactions are given a chance to spawn from that point. Longlived
global transactions may also have explicit commit points
to provide effective scheduling of transactions.
A. Local Database and stable storage
The proposed model is built upon the bottom up strategy, i.e.
databases are already present and are not modified in anyway to
join the system. This approach is necessary as most
organizations and people already have data base management
systems of their choice installed and configured. These
databases are stored on disk and more often than not have
different schema, policies and data structures, usually
signifying the many different commercially available databases.
Log files [8] are maintained for all operations to effectively
resume operation after a failure has occurred. It is assumed that
all writes to disk are persistent and survive failures. Every node
upon initial startup checks whether this is the first time on the
network, if not the recovery manager routine is called taking a
due look at previous transactions and system shutdown. A
typical log file has the following contents.
!#LOG_FILE$4522#!
!#INFILE$172.16.4.3$1111transactionList.txt#!
!#COMMIT_POINTS$5#!
!#COMMIT_1$done#!
!#COMMIT_2$done#!
!#COMMIT_3$done#!
!#COMMIT_4$done#!
!#COMMIT_5$fail#!
!#LOG_FILE$END#!
The above log file is for transaction id 4522 at this node. The
second line shows the name of the file in which the transaction
list is stored, this is made to match the submitter id and the
transaction id at that node. This approach gives us an efficient
name scheme and a method to submit back transaction results
to the corresponding nodes along with support for transaction
status queries. The transactions branch into sub transactions at
commit points and the status of each sub transaction is stored.
This gives a means to speed up recovery. For example in the
above case the transaction manager is called by the recovery
manager and it is able to go to the last(5th ) part of the particular
transaction straight away. Moreover a master(summary) log file
is also maintained to speed up the recovery process and only go
into details of the necessary transactions(for example, the ones
that were not completed).
!#LOG_FILE$MASTER#!
!#
<4522>$<172.16.4.3>$<172.16.5.34>
~~STATUS~~
fail
#!
!#
<4523>$<172.16.5.34>$<172.16.5.34>
~~STATUS~~
done
#!
!#LOG_FILE$END#!
The above master file shows two transactions, the first is a
global submitted transaction, the second is a local
transaction(note the two ids are the same). The first transaction
has not completed. The recovery manager calls the
corresponding file(the one shown previously), here the control
is transferred to the transaction manager sending it straight to
the last(5th )part of transaction 4522. Thus an efficient
mechanism for keeping the transactions and their sub
transactions running has been put in place.
B. Network Connections
LAN(including wireless) communication hardware typically
offers non reliable multicasts and unicasts. Other problems
arise due to packet loss or packet overtaking inside switches
and network cards. The presented model is implemented
completely with UDP and supports timeouts and
retransmissions as and when necessary. Unique threads are
created for each communication session and each host to create
a safe and efficient network environment. Threads are further
grouped into Thread Pools to logically integrate and simplify
thread monitoring and to provide the best possible performance.
UDP was chosen simply because it entails the least overhead.
Since wireless communication is susceptible to
eavesdropping (as radio signals can move through walls)
security is given prime concern. All communication is
encrypted with a unique key hard coded into the
application(this of course can be changed to add key based
security features).
C. User Interface
The GUI is simple to use with onscreen help. Users may
submit, through it local and global transactions. Global
transactions can also be spread over more than one site.
Effective monitoring of all activity is done and visible on
screen. The user may also initiate status detection if she feels
her transactions’ results are long overdue. Working in the
background is the update manager (which may be disabled).
This is responsible for keeping up to date information about
nodes on the network and may be modified for use as a means
of background message passing between hosts thus leaving
scope for improvement and addition of(in the near future)
automatic handling of tasks by nodes.
D. Transaction Processing System
In the mobile database environment it is essential for users at
various sites to be able to access databases at others sites. In
effect, a database stored at one host has to cater to simultaneous
needs of many other hosts, while maintaining its autonomy.
This approach is of extreme importance in applications such as
banking. This is, in fact one of the key growth areas for MCommerce.
The real time fast access of databases has several
advantages of improved efficiency, better application
performance(lesser waiting time for users), etc. It however,
also comes with a host of new problems.
A number of security issues arise pertaining to simultaneous
data access and database modification by more than one
transaction at a time. In fact most transactions in these cases
span over multiple sites and need sub transaction management
(sagas and nested sagas). Another issue is local autonomy of a
site. Whether or not, to allow global entities to control
transaction processing is another issue. If allowed local
autonomy is often jeopardized, if not then external transactions
may have to wait for a long time before being executed. This
leads to indefinite delays, unnecessary aborts and hidden
deadlocks[6]. In fact tradeoffs of some kind must be made,
generally a proper transaction-processing scheme is
implemented and maintained across all sites to make sure the
ACID properties are always satisfied.
It is a design consideration in our model and fundamental to
its operation that a host does not treat a global transaction as if
it were special. Once submitted a transaction is totally
dependent on the host for its completion. This approach is
consistent with our decision of maintaining local autonomy.
Once completed, results are sent back to the originating host.
To maintain the ACID properties transactions must be
completed as a whole or aborted with compensating
transactions whenever needed. Moreover, before a successful
commit, the changes made by a transaction are visible only to
itself. This approach prevents cascading rollbacks incase the
transaction aborts. However, to speed up operation, since most
transactions are long lived, explicit commit points may be
specified so as to generate sagas, and handle many transaction
operations in parallel. The submitter of the transaction must
specify explicit commit points and the transaction effectively
branches at this point to have its partial work committed.
Global transactions however may not give immediate results; in
fact the transactions are stored locally and only when
completed are they submitted to the desired host(s). This
approach is consistent with minimum communication usage so
as to provide optimum usage of battery power for mobile hosts.
Once the transaction is processed the results are sent back to the
submitter.
Transactions are moved around using a fixed format. Strings
of data(queries) are stored as files mapped as trees logically,
where the root node is always a string of the form:
!#$$#!
This string enables us to distinguish transactions from one
another, recover from failures and handle re-submissions in
case of network loss or delay. The remaining nodes contain
strings of the query statements that make up the transaction.
Consider the transaction fragment given below.
select * from table_1;
insert into table_1 values(23, “rookie”);
commit;
select * from table_1;
select * from 172.16.3.2#table_45;
insert into 172.16.3.2#table_45 values (34, 34,
“veteran”);
commit;
This amounts to the following(logically), figure 4.
!#<4522>$<172.16.4.3>$<172.16.5.34>#!
select * from table_1;
insert into table_1
values(23, “rookie”);
commit;
select * from table_1;
select * from
172.16.3.2#table_45;
insert into
172.16.3.2#table_45 values
(34, 34, “veteran”);
commit;
Transaction subpart
Transaction subpart
Figure 4. Nested Transactions
Thus as can be seen from the figure above the transactions
divide themselves at commit points specified by users thus
amounting to sagas. This approach is extremely efficient in
increasing system throughput as global transactions or other
submitted transactions’ parts are given a chance to run, as the
previous work has been committed. The sub-transactions are
often executed independently from one another and parts of
transactions may be restored or rolled back depending upon
context. Moreover it is also possible to send of sub-transactions
to other hosts in an efficient manner. Such conventions and
corresponding naming methods are adopted for results to be
sent back to the designated hosts as well.
When a transaction has completed its work it must be made
permanent. Local transactions are committed at user discretion.
Global transaction queries are committed at the sites they are
sent to and any results generated are stored as a file and
forwarded to the respective submitter. Under this scheme, the
transactions are executed independently at each site. Logs are
maintained to implement this regardless of failures. However in
case of a failure of a transaction at one site it becomes the
responsibility of a manual operator to make sure that the
required work is done at that site, particularly since partial
commits are allowed and sub-transactions may have been
committed at other sites. Transactions once committed cannot
be rolled back. If possible a compensating transaction may be
generated otherwise manual intervention is needed. The
‘OPTIMISM’ model however guarantees delivery of the
necessary alerts and messages to the concerned hosts. Incase of
failure before a commit the work of a transaction is rolled back
automatically. Local transactions may also be rolled back at
user discretion. When a transaction is rolled back no other
transaction is effected since the work done by a transaction is
visible only to itself before a commit.
A unique feature of this model is the fact that heterogeneous
databases and schema are supported. Different databases
having different relations(tables) may be used remotely to
generate transactions and submit results. A user at a particular
site simply needs to specify the name of the concerned site
where the database is located and she is provided all the details
of that sites schema. This may then be kept in mind to generate
transactions as required. Caching is used along with this form
of explicit querying in order to maximize performance. Apart
from explicit queries a thesaurus [16] is also maintained at the
MainServer(shown in figure 5). This is updated periodically
and also by explicit messages from hosts regarding their
schema changes. The central thesaurus is however a central
point of failure and this hinders our distributed approach.
However it is possible to replicate the thesaurus at various
hosts, moreover the additional feature of explicit queries
eradicates the central point of failure. The thesaurus can be
used when a transaction is to span numerous sites and by a
single message to the thesaurus it is possible to get information
about the different schema present on the entire system. This
can be extremely effective in decreasing the network
congestion and transaction generation time and eventually its
round trip execution time.
Apart from transactions the mobile hosts also move around
the network. The result of which is that the MH may show up
anywhere on the network under any BH. It is the responsibility
of the BH to recognize this MH and register it by contacting its
previous region’s BH[17]. When a mobile entity moves to a
new region, it must be assigned a new IP[18]. In fact any
change in the attachment point of a MH should be completely
transparent to the protocols and applications running on the
stationary hosts. Any mobile communication framework can
handle this mobility of a client. We have used the PMADE [17]
framework for this purpose. The ‘OPTIMISM” model
effectively deals with movement both of the MH and
transactions. It is possible for an MH on the move to generate
transactions that need to be distributed at other nodes in other
regions. A unique naming scheme based on Figure 4 is used to
identify the transactions and results coming in from other nodes
effectively. The PMADE framework ensures delivery of
messages to the designated MHs even if they move to other
regions by suitably maintaining system state information and
adding appropriate message headers for re-routing of message
packets.
Frequent disconnection is also effectively handled (the
problem arises since wireless networks are at best sporadic).
The user may continue to work on the local copy of the
database, while global transactions to be submitted or results of
previous global transactions are simply queued and sent of to
their respective destinations as soon a network connection is
available. Thus work queues are maintained and due
consideration is also given to make minimum use of broadcasts
to save battery power.
D.1 Transaction Coordinator
The transaction coordinator is responsible for managing the
various transactions on a node. It is designed to distinguish the
sub-transactions from one another and sort out the incoming
results. It often adds or retracts the necessary information
to/from the outgoing/incoming message stream to effectively
manage the various global and local transactions.
D.2 Transaction and Query Manager
This part of the transaction processing system is responsible for
the actual execution of queries at a node. It is in effect the
interface to the underlying database. It deals with the execution
of a scheduler-selected transaction.
MH
MH
GLOBAL THESAURUS
maintains
universal
schema sets
Local
Database
Thesaurus
queries or
automatic
updates
Thesaurus reply:
172.16.3.2
!#
@table_44$varchar$
integer@
@table_45$integer$
integer$
varchar@
#!
Cache
TPS
Local
Database
TPS
Cache
Direct query
to node
Schema Query Reply:
!#
@Account_Info$varchar$
integer$integer@
#!
Figure 5. Managing Heterogeneity of Databases, support for different
schema sets
D.3 Concurrency Manager
Probably the most essential part of the design is the
concurrency manager. It is the main arm responsible for
transaction concurrency while maintaining isolation and
consistency. It is designed with deadlock prevention (soon to be
integrated with a full fledged detection and resolution
algorithm) in mind and keeps the system in a safe state so as to
avoid degradation of system performance. Stable storage logs
and database information is also controlled through the
concurrency manager. All information is subject to locks and
may only be accessed on a rotational basis.
The resource that all transactions compete for is the
underlying database. The concurrency manager resolves the
simultaneous work of many transactions by keeping in mind a
number of factors particularly kind of transaction, etc.
maintaining queues and executing transactions in parallel.
Local transactions are however given preference over global
transactions; this again is consistent with the notion of local
autonomy.

No comments: