|
|
AUCQL is only a prototype, and an incomplete prototype as well.
Many features have yet to be implemented.
- Full SELECT. Only a limited part of SQL's SELECT statement
has been implemented. GROUP BY, HAVING, and ORDERING clauses have
not been implemented.
SQL types and type checking have not been implemented.
WHERE clause expressions have been simplified, as have
SELECT clause expressions.
Nested SELECTs have also not been implemented.
In some of the author's opinion (OK only Curtis's :-))
a full SELECT clause implementation may not even be desirable
for semistructured databases.
The end user will be a WWW user, not a database programmer.
For WWW users search will most likely be the most important part
of a semistructured query language.
GUI or search-engine kinds of query language
are preferable to SQL.
Nor is SQL desirable for hard-core programmers.
An object-oriented API with hooks into
AUCQL's algebraic operations would be better.
So complete implementation of SQL's SELECT will be put on hold.
- Path indexes. We have yet to address the issue of
building path indexes in property space.
- Culling intermediate results.
There is one problematic query for the current prototype of AUCQL.
The query is quite short.
SELECT *
FROM ()* All,
COALESCE(TRANSACTION_TIME, All) TT;
This selects everything in the database and figures out the TT for it.
This is a problem because all the paths to every node must
be computed and then retained for the coalescing.
In other operations, the paths are computed, used, and then discarded
(so only one set of variable assignments is ever in memory).
But coalescing needs to know about all the paths between a pair
of nodes, so all the paths must be retained.
So coalescing is an expensive, but necessary operation in
semistructured databases,
just as it is in relational databases.
- Name space confusion.
This is more of a language issue. What is a variable vs. what
is a required NAME property?
Consider the following expressions.
FROM movie Movie,
movie.review Review1,
Movie.review Review2,
Move.review Review3,
Movie.review Nodes
Variable Review1 is independent of variable
Movie, but Review2 extends the path in variable Movie.
There is also some From clause ordering constraints.
The variable Movie must be defined before it can be used in the
FROM clause (the compiler won't complain, it will happily generate
null values for that variable).
Review3 is also independent of Movie, but is problematic because
the user has no warning that they mistyped/misused a variable name
(the 'Move' defaults to a match on the NAME property, (NAME! Move)).
Finally, the compiler will barf on Nodes, since Nodes is now a
reserved word (the NODES operation).
But this is confusing because of the changing case sensitivity/insensitivity.
By using explicit MATCH operations, no such trouble exists,
but in order to be more like Lorel, we had to do some
fancy syntactic sugar, so there are namespace confusions.
It is not clear to us how to best resolve this design issue.
- Parenthesis nightmare.
We should have used a {} or square brackets rather than () as the enclosing
delimiters for descriptors.
The reason why is that () is massively overloaded, in expressions,
in regular expressions, and now in descriptors.
So a single-token lookahead parser has no way of disambiguating
the many uses of ().
We currently do a horrible hack, err, a quite sensible munging
of the input stream (we make one pass over the input stream
prior to parsing to transform the descriptor delimiters to {}).
This should be sanitized.
- Aggregates.
The code and syntax are in place for aggregates, just haven't
debugged it yet. For some reason, a semantic error is generated.
- SELECT DISTINCT.
Should be simple to add (hash the output lines).
- GROUP BY, HAVING, ORDERING.
After aggregates and SELECT DISTINCT are implemented...
- Sanity checking on dimensions.
The issue here is that some dimensions, e.g., time and security,
need special operations, e.g., OVERLAPS. A few of these are
built-in to AUCQL, but semantic checking to ensure that operations
are only used properly are not currently in place.
So if you want to you can do '12 OVERLAPS 14' and most likely
the run-time evaluator will die a horrible death.
- Testing.
No extensive, rigorous testing has been done (that's what the
WWW interface is for!), so some stuff just may be buggy.
We'll fix it if you find it.
Just e-mail Curtis.Dyreson at usu.edu.
- FIXED Nov. 23 1998.
Nested operations.
Currently, you can't do the following, which you should
be able to do.
MATCH(MATCH(...
Instead you have to use an intermediate variable.
MATCH(...) X
MATCH(X, ...
On the TODO list.
- FIXED Nov. 23 1998.
Cycles.
We assume that the input is an acyclic semistructure.
Cycles will cause a problem (of non-termination) when doing certain
Kleene closure matching, but are OK for all other operations.
Cycles can be broken by marking visited nodes to prevent revisiting,
which we plan to add in future.
- PARTIALLY FIXED Nov. 23 1998.
Type checking.
Only limited type checking is performed.
More advanced type checking is needed to prevent brain-dead code
like the following.
FROM DIMENSION(TRANSACTION_TIME, ...) TT,
MATCH(TT, ...
Some types are checked, just not all currently.
Curtis E. Dyreson,
Michael H. Böhlen, and
Christian S. Jensen
© 1998-2000. All rights reserved.
|