[Please remove <h1>]
Commercial relational engines are a multi-billion dollar industry, and are
testimony to the reductions in the cost of developing and maintaining information
systems that derive from using the SQL language for managing data. This
research is guided by the hypothesis that comparable reductions in the cost
of developing and maintaining embedded software systems could
also ensue from using SQL. This induces a repository style of architecture
for an embedded system in which the software becomes a collection of subsystem
or feature clients that submit SQL requests to a common database. As usual,
the requests, e.g. SQL queries, are based on an understanding of the data in
terms of a global database schema.
To realize the full benefit for cost
savings, it is important that this global schema continues to abstract all
knowledge of internal data encoding or external data sources, and that it remains
the job of database technology to support this interface by an appropriate
translation of the SQL requests. But to compete with the largely current practice
of using low-level languages such as C, new or improved capabilities with database
technology are needed in a number of areas. First, there is a need for improving
query optimization since, as suggested, query plans generated by query optimizers
must be comparable in performance to low level code written directly by expert
programmers. Second, the unique properties of embedded systems that affect
the indexing and encoding of data have to be taken into consideration via enhancement
in schema languages and in reasoning about integrity constraints. And third,
there is a need for alternative transaction models. For embedded software,
subsystems implementing different features can have undesirable interactions
that are not prevented by following traditional concurrency control protocols.
To this end, we concentrate on advanced concurrency protocols, and on incorporating
dynamic knowledge of database interaction by clients in database schema.
Better query optimizers
Recall that query plans generated by query optimizers must be comparable in
performance to low level code written directly by expert programmers. This
places much greater demands on the sophistication of various facets of query
optimization, including: 1) physical data models and data encoding, 2) semantic
query optimization, 3) view integration capabilities, and 4) conceptual
data models and modeling. A common thread with these issues relates to a variety
of ongoing work on developing and using description logics as a lingua
franca for capturing the underlying relevant knowledge to query optimization,
and on the underlying technology of tableaux based theorem proving [1, 2].
For example, we are currently investigating the overlap between reasoning involved
in query optimization based on query graphs and in tableaux algorithms for
reasoning about subsumption in description logics [3]. In both cases, an underlying
graph-like artifact called a tableau is created, searched and modified. The
benefits derive in part from the abundance of reductions of various database
schema languages to sentences in description logics. Relational schema, ER
models, UML class and object diagrams, object-relational schema, and the first
two levels of OWL are examples of such languages. In related work, we are
also interested in the development of particular dialects of description logics
that are more suited to use in query optimization [4, 5].
Physical data modeling
Again, there are a number of properties of embedded systems that affect the
indexing and encoding of data. One is that a greater appreciation for the workload
becomes possible with the deployment of a dedicated database server for a particular
embedded system. It now becomes reasonable to assume, e.g., that SQL
operations are specified at compile time, and that more specific performance
requirements can also be determined at compile time for particular requests.
Decisions on the selection of appropriate indexing can therefore be driven
more by threshold requirements, e.g., that dictate logarithmic or even constant
runtime for any plan that might be chosen for a particular query. Another is
the need for more sophisticated data integration capabilities in which sources
other than primary data storage can be specified. (Embedded systems are invariably
deployed as components of larger distributed systems that will need to exhibit
appropriate emergent behavior.) And yet another property is that primary data
storage can be comprised of main or flash memory, which in turn leads to additional
issues in the selection of data structures. Current research is focused on
automating the selection of local data encoding, including the use of main
or flash memory, on description logic based data integration and on complexity
based approaches to query optimization [6, 7].
On a related topic, we are continuing work on defining m SQL, a subset of
SQL with a top-k operator for which it is always possible to find data encodings
and indices together with query plans that satisfy logarithmic bounds on execution
time. Note that embedded systems can require part of the query and update workload
to satisfy such bounds; this is in contrast to mainstream work in query optimization
and physical design for applications in information systems. As a consequence,
the usual case in which much of query optimization is dominated by join order
selection is replaced by an alternative pattern recognition problem for queries
in m SQL, and the typical goal that "best use" is made of the available store
is replaced with an alternative goal of minimizing the store required to preserve
the efficiency of query evaluation [7].
Dynamic aspects of client/server interaction
An embedded software system can have subsystems that require coordinated access
to a database in ways that are not currently enforced or supported by the relatively
simple interaction protocols and schema languages that underlie current database
servers. For communications software in particular, subsystems implementing
different features can have undesirable interactions that are not prevented
by following such protocols, regardless of the level of understanding for the
meaning of the underlying data that can be captured in current database schema
languages. There is a need for database servers to be much more client aware
in terms of the dynamic aspects of their interaction with the server. One important
consequence is that this enables the use of deadlock free locking protocols
and a consequent decoupling of concurrency control from data recovery, a necessity
for many embedded systems in which logging and general abort capabilities would
otherwise lead to unacceptable runtime overhead [8].
In general, we have realized that database schema must be capable of a deeper
integration of both static and dynamic constraints, and that other components
of a database system must in turn be able to use these constraints in compiling
SQL-based client-engine interaction with a proficiency that rivals expert low-level
programming. As a consequence, we are collaborating with other colleagues on
combining work in the areas of description logics, dynamic logics and model
checking, with the goal of capturing relevant knowledge and of providing the
necessary reasoning services to these components.
References
- D. Toman and G. E. Weddell. "Query
Processing in Embedded Control Programs," In Proc. International
Workshop on Databases in Telecommunications, 2001, pages 68-87.
- H. Liu, D. Toman and G. W. Weddell. "Fine Grained Information
Integration with Description Logics," In Proc. International Workshop on Description Logics, CEUR-WS, vol. 53, 2003, pages 1-12.
- A. Palacios. CGU: A Common Graph Utility for Description Logic Reasoning
and Conjunctive Query Optimization. MMath thesis, School of Computer Science,
University of Waterloo, 2005.
- D. Toman and G. E. Weddell. "On Reasoning About Structural Equality
in XML: A Description Logic Approach," Theoretical Computer Science, 336(1): 181-203, 2005.
- D. Toman and G. E. Weddell. "On the Interaction between Inverse Features and Path-functional Dependencies in Description Logics," In
Proc. International Joint Conference on Artificial Intelligence, 2005, pages 603-608.
- L. Stanchev and G. E. Weddell. "Index Selection for Embedded Control Applications using Description Logics," In Proc.
International Workshop on Description Logics, 2003.
- L. Stanchev. Automating Physical Design for an Embedded Control Program. PhD thesis, School of Computer Science, University of Waterloo,
2005.
- H. Yu and G. E. Weddell. "Investigations in Tree Locking for Compiled Database Applications," In Proc. CASCON, 2004, pages
217-231.
Related Links
Prof. David Toman's research
Prof. Grant Weddell's research