Embedded DBMS | Database Research Group | UW

[Please remove <h1>]

Commercial relational engines are a multi-billion dollar industry, and are testimony to the reductions in the cost of developing and maintaining information systems that derive from using the SQL language for managing data. This research is guided by the hypothesis that comparable reductions in the cost of developing and maintaining embedded software systems could also ensue from using SQL. This induces a repository style of architecture for an embedded system in which the software becomes a collection of subsystem or feature clients that submit SQL requests to a common database. As usual, the requests, e.g. SQL queries, are based on an understanding of the data in terms of a global database schema.

To realize the full benefit for cost savings, it is important that this global schema continues to abstract all knowledge of internal data encoding or external data sources, and that it remains the job of database technology to support this interface by an appropriate translation of the SQL requests. But to compete with the largely current practice of using low-level languages such as C, new or improved capabilities with database technology are needed in a number of areas. First, there is a need for improving query optimization since, as suggested, query plans generated by query optimizers must be comparable in performance to low level code written directly by expert programmers. Second, the unique properties of embedded systems that affect the indexing and encoding of data have to be taken into consideration via enhancement in schema languages and in reasoning about integrity constraints. And third, there is a need for alternative transaction models. For embedded software, subsystems implementing different features can have undesirable interactions that are not prevented by following traditional concurrency control protocols. To this end, we concentrate on advanced concurrency protocols, and on incorporating dynamic knowledge of database interaction by clients in database schema.

Better query optimizers

Recall that query plans generated by query optimizers must be comparable in performance to low level code written directly by expert programmers. This places much greater demands on the sophistication of various facets of query optimization, including: 1) physical data models and data encoding, 2) semantic query optimization, 3) view integration capabilities, and 4) conceptual data models and modeling. A common thread with these issues relates to a variety of ongoing work on developing and using description logics as a lingua franca for capturing the underlying relevant knowledge to query optimization, and on the underlying technology of tableaux based theorem proving [1, 2]. For example, we are currently investigating the overlap between reasoning involved in query optimization based on query graphs and in tableaux algorithms for reasoning about subsumption in description logics [3]. In both cases, an underlying graph-like artifact called a tableau is created, searched and modified. The benefits derive in part from the abundance of reductions of various database schema languages to sentences in description logics. Relational schema, ER models, UML class and object diagrams, object-relational schema, and the first two levels of OWL are examples of such languages. In related work, we are also interested in the development of particular dialects of description logics that are more suited to use in query optimization [4, 5].

Physical data modeling

Again, there are a number of properties of embedded systems that affect the indexing and encoding of data. One is that a greater appreciation for the workload becomes possible with the deployment of a dedicated database server for a particular embedded system. It now becomes reasonable to assume, e.g., that SQL operations are specified at compile time, and that more specific performance requirements can also be determined at compile time for particular requests. Decisions on the selection of appropriate indexing can therefore be driven more by threshold requirements, e.g., that dictate logarithmic or even constant runtime for any plan that might be chosen for a particular query. Another is the need for more sophisticated data integration capabilities in which sources other than primary data storage can be specified. (Embedded systems are invariably deployed as components of larger distributed systems that will need to exhibit appropriate emergent behavior.) And yet another property is that primary data storage can be comprised of main or flash memory, which in turn leads to additional issues in the selection of data structures. Current research is focused on automating the selection of local data encoding, including the use of main or flash memory, on description logic based data integration and on complexity based approaches to query optimization [6, 7].

On a related topic, we are continuing work on defining m SQL, a subset of SQL with a top-k operator for which it is always possible to find data encodings and indices together with query plans that satisfy logarithmic bounds on execution time. Note that embedded systems can require part of the query and update workload to satisfy such bounds; this is in contrast to mainstream work in query optimization and physical design for applications in information systems. As a consequence, the usual case in which much of query optimization is dominated by join order selection is replaced by an alternative pattern recognition problem for queries in m SQL, and the typical goal that "best use" is made of the available store is replaced with an alternative goal of minimizing the store required to preserve the efficiency of query evaluation [7].

Dynamic aspects of client/server interaction

An embedded software system can have subsystems that require coordinated access to a database in ways that are not currently enforced or supported by the relatively simple interaction protocols and schema languages that underlie current database servers. For communications software in particular, subsystems implementing different features can have undesirable interactions that are not prevented by following such protocols, regardless of the level of understanding for the meaning of the underlying data that can be captured in current database schema languages. There is a need for database servers to be much more client aware in terms of the dynamic aspects of their interaction with the server. One important consequence is that this enables the use of deadlock free locking protocols and a consequent decoupling of concurrency control from data recovery, a necessity for many embedded systems in which logging and general abort capabilities would otherwise lead to unacceptable runtime overhead [8].

In general, we have realized that database schema must be capable of a deeper integration of both static and dynamic constraints, and that other components of a database system must in turn be able to use these constraints in compiling SQL-based client-engine interaction with a proficiency that rivals expert low-level programming. As a consequence, we are collaborating with other colleagues on combining work in the areas of description logics, dynamic logics and model checking, with the goal of capturing relevant knowledge and of providing the necessary reasoning services to these components.

References

  1. D. Toman and G. E. Weddell. "Query Processing in Embedded Control Programs," In Proc. International Workshop on Databases in Telecommunications, 2001, pages 68-87.
  2. H. Liu, D. Toman and G. W. Weddell. "Fine Grained Information Integration with Description Logics," In Proc. International Workshop on Description Logics, CEUR-WS, vol. 53, 2003, pages 1-12.
  3. A. Palacios. CGU: A Common Graph Utility for Description Logic Reasoning and Conjunctive Query Optimization. MMath thesis, School of Computer Science, University of Waterloo, 2005.
  4. D. Toman and G. E. Weddell. "On Reasoning About Structural Equality in XML: A Description Logic Approach," Theoretical Computer Science, 336(1): 181-203, 2005.
  5. D. Toman and G. E. Weddell. "On the Interaction between Inverse Features and Path-functional Dependencies in Description Logics," In Proc. International Joint Conference on Artificial Intelligence, 2005, pages 603-608.
  6. L. Stanchev and G. E. Weddell. "Index Selection for Embedded Control Applications using Description Logics," In Proc. International Workshop on Description Logics, 2003.
  7. L. Stanchev. Automating Physical Design for an Embedded Control Program. PhD thesis, School of Computer Science, University of Waterloo, 2005.
  8. H. Yu and G. E. Weddell. "Investigations in Tree Locking for Compiled Database Applications," In Proc. CASCON, 2004, pages 217-231.

Related Links

Prof. David Toman's research

Prof. Grant Weddell's research