A Top-Down Proof Procedure for Generalized Data Dependencies

Dependency theory allows the expression and modelling of constraints that the data must satisfy in order to reflect correctly the world that a relational database intends to describe. Since the introduction of functional dependencies by Codd ([Cod72]), many kinds of dependencies have been studied in the litterature, and a lot of work has been carried out in the late 70’s and early 80’s. Database dependencies theory is still an active area of research [SF00] [Her95] [LL97a] [LL97b] [LL98]. Functional and multivalued dependencies are the most known classes of data dependencies. In practice, these two kinds are sufficient in general to express constraints ([Ull88]). Nevertheless, more general classes have been introduced, with the purpose of finding a uniform way to express constraints ([BV81],[SU82]). This paper deals with the class commonly known for generalizing most of the dependencies: that of tuple-generating and equality-generating dependencies (TGDs and EGDs) ([BV84b]). For a survey on this general class of dependencies, we refer the reader to [LL99] or [FV86].

The central problem is the implication problem, which is to decide whether a dependency is logically implied by a given set of dependencies. A process that could solve this problem would provide a solution to find the minimal cover of a set of dependencies, to decide whether a dependency is redundant within a set, useful during the constraint acquisition stage, etc. A procedure has already been designed in [BV84b] for that purpose: the well-known chase. Unfortunately, as the implication problem for TGDs and EGDs is semi-decidable, the chase is only a proof procedure, and therefore the process may run forever. As we argue in this paper, the chase is clearly a bottom-up procedure: from hypotheses to conclusion. Also, the chase entails the dynamic creation of new constants in the general case of TGDs.

We introduce a new proof procedure which is top-down: from conclusion to hypothesis that is goal-directed. The originality of this procedure is that it does not act as classical theorem proving procedures, by requiring a special form of the dependencies set, such as clausal form, obtained after Skolemization. We show, with our procedure, that this step is useless, and that the notion of piece allows infering directly on the dependencies set.

Therefore, our top-down chase is not simply the usual chase reversed, but a new way of solving the implication problem. The fact it can be performed top-down is the first contribution of this paper. The second contribution is to avoid dynamic creation of constants, as well as Skolemization of the dependencies set, usually applied on the original knowledge base prior to top-down proofs. This is realized by taking advantage of the form of the dependencies. To our knowledge, this has not been realized before. Indeed, dynamic creation of constants can be costly in proof procedures, such as the chase, in order to take into account the effect of existential quantifiers.

While it is true that top-down approaches can take exponential time longer w.r.t. bottom-up approaches, many reasons allow us to think that the proof procedure presented in this paper is efficient. These arguments will be detailed in the last section. The efficiency w.r.t. the bottom-up chase is currently being assessed in details.

More recently, constrained dependencies ([Mah94], [BCW95], [Mah97]) and ordered dependencies [Ng99] have been introduced. They originate in the constraint programming field and permit expression of semantic relations on variables, thus giving them an interpretation. The chase procedure has been redesigned in [MS96], still in a bottom-up way, in order to deal with constrained tuple-generating dependencies. This work in the dependency theory gives new perspectives for the top-down chase procedure we present.

The top-down chase originates in the conceptual graphs model, which is a knowledge representation model introduced by Sowa ([Sow84]). The base model has been extended with graph rules and an inference method, called piece resolution ([SM96]). The logical roots of this process have been studied in [CS98], and constitute the basis of the top-down chase. Proofs of the lemmas and theorems of this paper are therefore derived from these two last-mentioned.

Section 2 describes the framework and the implication problem for data dependencies. We sketch the existing (bottom-up) chase. In section 3 the top-down chase is explained. Section 4 closes the paper with some concluding remarks.

2 The Framework

The following definitions are for the most part taken from [BV84b], though simplified for the purpose of this paper. The first subsection states some assumptions we make throughout this paper. The second subsection presents the necessary definitions. The third subsection is a description of the kind of dependencies this paper focus on. Note that they are known to capture the semantics of most of the dependency types studied in the litterature. The fourth subsection presents formally the implication problem and then describes the traditional chase procedure, which has been designed to solve the implication problem. The chase is clearly a bottom-up mechanism, from hypotheses to conclusion.

2.1 Preliminaries

For sake of clarity, we assume several restrictions on the model. First, we assume that the universal relation assumption holds, i.e. the database is modelled with only one relation, because it usually permits a simpler formal presentation of the approaches. Secondly, dependencies are typed (many-sorted), so attribute domains are disjoint. Thirdly, we do not address dependencies with constants, as the last one illustrated below.

TGDs and EGDs can all be expressed in first-order logic. Informally speaking, an equality-generating dependency (EGD) says that if some tuples exist in the database, then some values in these tuples must be equal. A tuple-generating dependency (TGD) says that if some tuples exist in the database, then some other tuples, whose values are not necessarily taken from the first tuples, must also exist in the database. Thus some values may be unknown.

For example, to express a classical functional dependency stating that two customers having the same name also have the same identifier, we use the following EGD which is also a functional dependency :

To express that a invoice is always related to an existing order, we use the following TGD :

By introducing constants, we can state some more specialized constraints. For example, the invoice 23 is related to an order taken by the customer 12 :

However, the framework used in this paper is that of tableaux, because it is well-suited when the universal relation assumption holds. On the other hand, this assumption is very unpractival for real-world applications, because it would imply modelling the whole database schema with only one relation. That is the reason why we do not use real-world examples throughout this paper. Note that these restrictions appear here for clarity reasons only and all the results are applicable in the unrestricted model.

2.2 Attributes, relations and tableaux

Definition 1 (Attribute) Attributes are symbols taken from a given finite set U={A₁,… ,A_n} called the universe. All sets of attributes are subsets of U . The complement of an attribute A in U is denoted by A .

Definition 2 (Relation) With each attribute A is associated an infinite set, its domain, denoted DOM(A) , such that DOM(A)∩ DOM(B)=∅ for A≠B . Let Dom=∪_A∈
UDOM(A) . For an attribute set X , an X -value is a mapping w:X→ Dom , such that w(A)∈ DOM(A) for all A∈ X . A tuple is a U -value. A relation is a set (not necessarily finite) of tuples. We use the letters t,u,v,w,… to denote tuples, and I,J,… to denote relations. For a tuple w and a set Y⊆ U , we denote the restriction of w to Y by w[Y] . For an attribute A , we do not distinguish between w[A] (which is an A -value) and w(A) (which is an element of DOM(A) ). Let I be a relation. The set of X -values in I is I[X]={w[X]|w∈ I}. The set of values in I is VAL(I)=∪_{A∈ U}I[A] .

Definition 3 (Valuation) A valuation is a mapping h:Dom→ Dom , such that a∈ DOM(A) implies h(a)∈ DOM(A) for all a∈ Dom . The valuation h can be extended to tuples and relations as follows. Let w be a tuple. Then h(w)=h∘ w (where ∘ denotes functional composition). Let I be a relation. Then h(I)={h(w)|w∈ I} . Usually, we are interested only in a small subset of Dom , for example, VAL(I) . We let h be undefined for other values, and say that h is a valuation on I . We associate with each attribute A an infinite set of constants CONS(A)⊆ DOM(A) . Let CONS=∪_{A∈ U}CONS(A). For any valuation h and a constant a∈ CONS(A) , we require h(a)=a . Let I be a relation, CONS(I) is the set of constants appearing in VAL(I).

Example 1 Let I be a relation, and w be a tuple, as shown below. Let h be a valuation on w such that a₁↦ a₁ , b₁↦ b₁ , c₁↦ c₁ and d₁↦ d₂ . Then h(w)∈ h(I) .

Definition 4 (Extension) Let I be a relation, and let h be a valuation on I . An extension of h to another relation I′ is a valuation h′ on I′ , which agrees with h on VAL(I) . If h is the identity relation on VAL(I) , then we say that h is the identity on I .

2.3 Dependencies

An equality-generating dependency (EGD) says that if some tuples exist in the database, then some values in these tuples must be equal.

Definition 5 (EGD) An Equality-Generating Dependency (EGD) is a finite pair <(a₁,a₂),I> , where a₁ and a₂ are A -values for some attribute A , and I is a finite relation, such that a₁,a₂∈ I[A] . We also call this EGD an A -EGD. A relation J satisfies <(a₁,a₂),I> if, for any valuation h such that h(I)⊆ J , we have h(a₁)=h(a₂) . Note that, if a₁=a₂ , then <(a₁,a₂),I> is trivially satisfied by every relation and the EGD is trivial.

Example 2 Consider the relation I of example 1. Then E=<(c₁,c₂),I> is an EGD, as shown below, and K satisfies E . Note that E is equivalent to the following functional dependency : A→ C.

A tuple-generating dependency (TGD) says that if some tuples exist in the database, then some other tuples, whose values are not necessarily taken from the first tuples, must also exist in the database. Thus some values may be unknown.

Definition 6 (TGD) A Tuple-Generating Dependency (TGD) is a pair of finite relations <I′ ,I> . It is satisfied by a relation J if for any valuation h , such that h(I)⊆ J , there is an extension h′ of h to I′ so that h′ (I′ )⊆ J .

Example 3 Consider the relation I of example 1. Let I′ be another relation, as shown below. Then T=<I′,I> is a TGD and K′ satisfies T .

2.4 The implication problem and the (bottom-up) chase

Let D be a set of dependencies, and d be a dependency. The implication problem is to decide whether D⊨ d , that is to determine whether d is true in every database in which each dependency of D is true. Let SAT(D) be the set of all relations, only composed of constants, that satisfy all the dependencies in D , then the implication problem is equivalent to decide whether SAT(D)⊆ SAT(d) .

The implication problem can also be considered under a different view: that of finite implication problem. The finite implication problem is to decide whether d is satisfied by every finite relation that satisfies all dependencies in D. However, as detailed below, this problem admits no proof procedure.

The chase procedure has been designed to solve the implication problem. The reader can refer to [BV84b] for a complete description. Intuitively speaking, if the dependency to be proven is the TGD <I′ ,I> , or the EGD <(a₁,a₂),I> , the chase procedure takes the relation I , and treats it as if it formed a set of tuples. Then it applies repeatedly the dependencies of D , following two distinct rules, one for TGDs, whose effect is to add tuples to the relation, and one for EGDs, whose effect is to “identify” two constants. When the dependency to be proven is a TGD, the procedure stops when obtaining the tuples of I′ . When it is a EGD, the procedure stops when obtaining the identification of a₁ and a₂ .

This mechanism has been shown to be sound and complete in [Var84]. Note that the implication problem for TGDs and EGDs is semi-decidable. Thus the chase may not stop.

The chase procedure is clearly a bottom-up (or forward chaining) one. Indeed, rule applications are generating new tuples or identification of constants. This is executed until we obtain the desired conclusion. The goal to be proven is not used to guide the process. Moreover, when applying a TGD rule whose effect is to add tuples to the relation, existential quantification always requires a costly dynamic creation of new constants.

We now show that we can apply a top-down (or backward) procedure, in which the process is goal-directed.

3 The Top-Down Chase

Depending on the type of the dependencies, the implication problem is solvable or recursively unsolvable ([BV81, Var84, GL82]). This means that in the first case, there is a decision procedure, hence an algorithm that always halt, whereas in the second case, there is a proof procedure: if the implication is true, then the process will terminate. In the other case, it might never stop. The finite implication problem, on the other hand, is not even partially solvable. Therefore, there can be no proof procedure for it.

A subset of TGDs is known as Full TGDs. These dependencies have the same form as Datalog rules. In this case, the notion of piece, stated in the introduction, is useless. Therefore, dedicated top-down theorem proving procedures can be applied to solve the implication problem involving only Full TGDs, such as Query-Subquery ([Vie86]) and OLD-Resolution ([TS86]). The principle aim of these procedures is to provide a terminating algorithm for top-down evaluation of Datalog rules. Indeed, even if the implication problem is decidable for this particular class, their top-down evaluation might not terminate. To tackle this problem, tabulation (or memoing) techniques must be applied in order to cut down looping sequences. Notice that these techniques do not affect completeness. In the general case of TGDs, however, there are unfinishing sequences that remain undetectable, due to the undecidable nature of the implication problem. In the case of Full TGDs, the implication problem and the finite implication problem coincide and are both decidable.

For this kind of dependencies, as well as all other subsets (functional and multivalued dependencies) the interest of the top-down chase is to be also applicable. However, in these particular cases, specific proof procedures shall be more efficient because they implement memoing. The top-down chase and the classical bottom-up chase are clearly needed when dependencies are more complex (i.e. not a decidable subset), and also to provide a general way to solve the implication problem. Notice that memoing can be added to the top-down chase too, but this is an implementation-level extension.

There have been a lot of work carried out also in the active database area. For a survey, we refer the reader to [WC95]. Active databases generally support the automatic triggering of updates in response to internal or external events. These updates are expressed by mean of rules which are slightly different from TGDs and EGDs. Usually, these rules can be expressed in Datalog with negation [BDR98] [AHV95]. The main difference lies in the fact that the variables occuring in the conclusion are all universally quantified, whereas in database dependencies they are existentially quantified if they do not occur in the hypothesis too.

The first subsection presents a proof procedure for TGDs only. The second subsection shows how this procedure can take EGDs into account as well, by using some reduction theorems.

3.1 A proof procedure for TGDs

Let D be a set of TGDs without constants, and let d=<J′,J> be a TGD without constants. Intuitively speaking, to decide whether D⊨ d , we start with the tuples of J′ and treat it as if it formed a relation. Let Q be this relation. Q is considered as the goal to reach. On the other hand, we add J to D , after replacing each symbol of J by a new constant, by transforming each tuple into a TGD with constants. Then we try to remove the tuples of Q, by applying successively a rule, giving each time a new goal. These rule applications may introduce new tuples. If we achieve in removing all tuples, i.e. obtaining an empty goal, then D⊨ d .

Compared with classical theorem proving methods, this proof procedure relies on a complex core rule that does not require any modification of the dependencies in D. Indeed, in order to apply the classical resolution method, one needs to rewrite D and ¬ d under clausal form. In order to do so, the Skolemization step would generate constants or functions due to the presence of existential quantifiers in the conclusion. Then the resolution method would try to generate the empty clause. When obtained, then the Herbrand theorem ensures that D ⊨ d.

In the present case, the top-down proof procedure does not require rewriting the dependencies of D nor dynamically creating new constants or functions. Indeed, the originality is to allow inferring directly on dependencies of D, thus providing a general mechanism. Notice that this can be obtained thanks to the particular form of TGDs and EGDs.

Let us now explain this process in a more formal way. We need the notion of distinct substitution that replaces each symbol of a relation by a new constant.

Definition 7 (Distinct substitution) Let J be a relation without constants. A distinct substitution ω on J is a valuation on J such that ω (a) is a new distinct constant for each a∈ VAL(J).

Now, we can add to D a set of TGDs with constants, each of them being of the form <u,∅ > , with u corresponding to a distinct tuple in ω (J) . Thus we add | ω (J)| TGDs. The following theorem states that checking whether D⊨ d can be reduced to checking whether D_ω⊨ Q , where D_ω is the result of adding the new TGDs to D , and Q is the goal. Note that the added TGDs are the only ones of D_ω having constants in it. More formally, we have :

Theorem 1 Let D be a set of TGDs without constants, and let d=<J′,J> be a TGD without constants. Let ω be a distinct substitution on J . Let ω ′ be an extension of ω on J′ such that ω ′ is the identity on VAL(J′)−VAL(J) . Let D_ω denote D∪ { <u,∅ > | u∈ ω (J)} and let Q denote the TGD <ω ′(J′),∅ > with possibly some constants. Then D⊨ d if and only if D_ω⊨ Q . Q is also called a goal.

Example 4 Let d be a TGD without constants, as shown below.

Let ω be a distinct substitution on J such that a₄↦ a , a₅↦ a′ , b₃↦ b , b₄↦ b′ , c₃↦ c , c₄↦ c′ and d₃↦ d , with a,a′,b,b′,c,c′,d being constants. Let ω ′ be an extension of ω on J′ such that ω ′ is the identity on VAL(J′)−VAL(J) . Let D_ω denote D∪ { <u,∅ > | u∈ ω (J)} :

We now describe the main step of the process. Given a TGD and a goal, this rule constructs a new goal. We need to introduce the notion of piece. A piece is a set of tuples that are semantically linked. When applying the rule, we will see that these tuples can be treated at the same time. This notion is an alternative to the classical Skolemization process that replaces unknown values by constants, prior to using classical first-order logic provers. A piece is a set of tuples which share unknown values, and are therefore treated as a whole. This is the originality of the top-down chase.

Definition 8 (Piece) Let L be a relation, and v⊆ VAL(L) . Pieces of L in relation to v are defined in the following way : for all tuples u,v∈ L , u and v belong to the same piece if and only if there is a sequence of tuples (u₁,… ,u_m) of L such that u₁=u and u_m=v and ∀ i=1… m−1 , ∃ a∈ v such that a∈ VAL(u_i) and a∈ VAL(u_i+1) . As a corollary, if a tuple does not share any element of v with another, it forms itself a piece. By construction, the set of pieces is unique for a given relation L and a given set of symbols v.

Example 5 Let L be a relation, as shown below, and v={a₂,c₁,d₂} . Then there are three pieces of L in relation to v : P₁ is containing tuples of L that are sharing the symbols a₂ and c₁ , P₂ is containing tuples of L that are sharing the symbol d₂ , and P₃ is containing the last tuple of L that does not share any of these symbols and therefore is itself a piece.

Definition 9 (core rule) Let T=<I′,I> be a TGD of D_ω and Q_n=<V,∅ > be a goal, such that (VAL(V)∩ (VAL(I)∪ VAL(I′)))\ CONS=∅ . If there is a valuation h on I∪ I′∪ V such that:

then the result of the application of the rule, denoted Q_n+1=T(T,Q_n) , is Q_n+1=<h(V)\ P∪ h(I),∅ > . Q_n+1 becomes a new goal. Thus it is obtained by removing P from h(V) and by adding h(I) . Note that there may be several possible TGDs obtained by an application of the rule, depending on the valuation h . In a sense, the piece notion allows us to group together some tuples according to some particular symbols (which correspond in logic to existentially quantified variables of T) they share.

Example 6 Consider the TGD T of example 3, and the TGD Q of example 4, here called Q_n, as shown below. Let V=ω ′(J′). Therefore, Q=<V,∅>. We can check that (VAL(V)∩ (VAL(I)∪ VAL(I′)))\ CONS=∅ . Let h be a valuation on I∪ I′ ∪ V such that ∀ a∈ VAL(I)−VAL(I′) , h(a)=a and ∀ a∈ VAL(I′)−VAL(I) , h(a)=a . Therefore h(a₂)=a₂ . h is also defined by: h(a₄)=a₂ , h(b₁)=b′, h(b₂)=b , h(c₁)=c′ , h(c₂)=c , and h(d₅)=d₂ .

There are two pieces of h(V) in relation to (VAL(I′)−VAL(I))\ CONS={a₂} , which are the first tuple of h(V) and the last two tuples of h(V) . The second piece P is such that P⊆ h(I′) . We construct Q_n+1=T(T,Q_n) . Let V′=h(V)\ P∪ h(I) :

The following theorem states that a goal is implied by a set of TGDs if and only if there is a sequence of rule applications such that the sequence starts with the original goal, and gives the empty goal as an end result. The proof is detailed at the end of the paper.

Theorem 2 (Soundness and Completeness) D_ω⊨ Q if and only if there is a sequence ((T₁,Q₁),… ,(T_n,Q_n)) of rule applications, such that T₁,… T_n∈ D_ω and Q₁=Q and Q_n=<∅ ,∅ > and ∀ i∈ [1..n−1] , Q_i+1=T(T_i,Q_i) . Note that the last TGD used, T_n , is of the form <u,∅ > .

It is important to notice that whenever D_ω⊨ Q, theorem 2 ensures that there is at least one sequence of rule applications, but it does not give a method to find it. Therefore, to implement the top-down chase, we need to add a search strategy (breadth-first or depth-first for example). In order to illustrate the need for a search strategy, let us detail in the following three kinds of sequences for the same set of TGDs D_ω and the same goal Q. The first one stops and ends with an empty goal, thus proving the implication. The second one does not terminate (although in this particular case, the loop might be detected and stopped), and the third one is stuck (no rule can be applied any more) and thus needs backtracking whenever the search algorithm is depth-first.

In the following examples, we will consider that D_ω contains the TGD T of example 6, as well as another TGD S detailed below. Remember also that in 4 it has been shown that in order to prove that a dependency d is implied by a set of dependencies D, we construct a goal to be proven (i.e. without hypothesis), and we add some TGDs (also without hypothesis) to D, giving D_ω. Therefore, in the present case, D_ω contains T as well as two more TGDs presented in example 6. Finally, the goal to be proven is Q, already presented in example 4 . D_ω is detailed below:

Example 7 The following example illustrates a successful proof sequence of rule applications, ending with the empty goal. Consider the first rule application of example 6, giving a new goal Q₁=T(T,Q) to be proven :

Consider the TGD U₁. Let U₁=<I′,I> and Q₁=<V₁,∅ >. We can check that (VAL(V₁)∩ (VAL(I)∪ VAL(I′)))\ CONS=∅ . Let h be a valuation on I∪ I′ ∪ V₁ such that ∀ a∈ VAL(I)−VAL(I′) , h(a)=a and ∀ a∈ VAL(I′)−VAL(I) , h(a)=a (trivial). h is also defined by: h(a₁)=a and h(d₁)=d . Therefore we have :

As there is no element in (VAL(I′)−VAL(I))\ CONS, each tuple of h(V₁) is itself a piece. The third tuple P is such that P ⊆ h(I′) . We construct Q₂=T(U₁,Q₁) . Let Q₂=<h(V₁)\ P∪ h(I), ∅> :

Now consider the TGD U₂. Let U₂=<I′,I> and Q₂=<V₂,∅ >. We can check that (VAL(V₂)∩ (VAL(I)∪ VAL(I′)))\ CONS=∅ . Let h be a valuation on I∪ I′ ∪ V₂ such that ∀ a∈ VAL(I)−VAL(I′) , h(a)=a and ∀ a∈ VAL(I′)−VAL(I) , h(a)=a (trivial). h is also defined by: h(d₂)=d′ . Therefore we have :

As there is no element in (VAL(I′)−VAL(I))\ CONS, each tuple of h(V₂) is itself a piece. The first tuple P is such that P ⊆ h(I′) . We construct Q₃=T(U₂,Q₂) . Let Q₃=<h(V₂)\ P ∪ h(I), ∅> :

Following the same schema, we apply the very same rule to Q₃, giving Q₄=<h(V₃)\ P ∪ h(I), ∅> :

Example 8 The following example illustrates a non-terminating sequence of rule applications. Consider the first rule application of example 6, giving the new goal Q₁=T(T,Q) to be proven, as well as the TGD T :

Let T=<I′,I> and Q₁=<V₁,∅ >. We can check that (VAL(V₁)∩ (VAL(I)∪ VAL(I′)))\ CONS={a₁, d₁, d₂} . Therefore, we must rename these symbols either in T or in Q₁. Let now Q₁ be :

Let h be a valuation on I∪ I′ ∪ V₁ such that ∀ a∈ VAL(I)−VAL(I′) , h(a)=a and ∀ a∈ VAL(I′)−VAL(I) , h(a)=a . Therefore h(a₁)=a₁ , h(a₂)=a₂, and h(d₁)=d₁ . h is also defined by: h(a₃)=a₂ , h(b₁)=b′ , h(b₂)=b , h(c₁)=c , h(c₂)=c′ , h(d₃)=d₂ and h(d₄)=d₂ . Therefore we have :

There are two pieces of h(V₁) in relation to (VAL(I′)−VAL(I))\ CONS={a₂} , which are the first tuple of h(V₁) and the last two tuples of h(V₁) . The second piece P is such that P⊆ h(I′) . We construct Q₂=T(T,Q₁) . Let V₂=h(V₁)\ P∪ h(I) :

Therefore Q₃=Q₁ w.r.t. symbol renaming, thus the sequence is entering a loop. Note that in this particular case, it is simple to detect and thus to stop. However, this is an implementation-level treatment.

Example 9 The following example illustrates a stuck sequence (no rule can be applied any more) that thus needs backtracking whenever the search algorithm is depth-first. Consider again the goal Q₁=T(T,Q) to be proven, as well as the TGD S :

Let S=<I′,I> and Q₁=<V₁,∅ >. We can check that (VAL(V₁)∩ (VAL(I)∪ VAL(I′)))\ CONS={a₁, d₁, d₂} . Therefore, we must rename these symbols either in S or in Q₁. Let now Q₁ be :

Let h be a valuation on I∪ I′ ∪ V₁ such that ∀ a∈ VAL(I)−VAL(I′) , h(a)=a and ∀ a∈ VAL(I′)−VAL(I) , h(a)=a . Therefore h(d₁)=d₁, h(d₂)=d₂ and h(d₃)=d₃ . h is also defined by: h(a₁)=a , h(a₃)=a , h(b₁)=b , h(b₂)=b′ , h(c₁)=c′ , h(c₂)=c , h(d₄)=d₃ and h(d₅)=d₃ . Therefore we have :

There are two pieces P₁ and P₂ of h(V₁) in relation to (VAL(I′)−VAL(I))\ CONS={d₂, d₃} , which are the first (identical) two tuples of h(V₁) and the last tuple of h(V₁) . The two piece are included in h(I′). Note that we can save one sequence step by removing both pieces at the same time instead of performing two steps with the same valuation h. We construct Q₂=T(T,Q₁) . Let V₂=h(V₁)\ P₁ \ P₂ ∪ h(I) :

One can now verify that there is no applicable rule to Q₂ amongst T, S, U₁ and U₂.

3.2 Dealing with EGDs

EGDs add some difficulties because they include an equality predicate. Nevertheless, it has been shown in [BV84b] that the implication problem for TGDs and EGDs without constants is reducible to the implication problem for TGDs without constants. Thus, our core rule is sufficient.

Let e be the A -EGD e=<(a₁,a₂),J> . Let w₁ be a tuple such that w₁[A]=a₁ and for all attributes B∈ A , w₁[B]∉J[B] . Let w₂ be a tuple such that w₂[A]=w₁[A] and w₂[A]=a₂ . We associate with e two TGDs. e₁ is <w₁,J∪ {w₂}> , e₂ is <w₂,J∪ {w₁}> . For a given set D of TGDs and EGDs, let D^* be the result of replacing each EGD e in D by its two associated TGDs e₁ and e₂ .

Example 10 Consider the EGD E of example 2. We associate to E two TGDs e₁ and e₂ , as shown below.

The following theorems allow us to ignore the presence of EGDs by reducing the implication of a dependency by a set of TGDs and EGDs to the implication of a TGD by a set of TGDs only:

Theorem 3 ([BV84b]) Let D be a set of TGDs and EGDs without constants and let d be a TGD without constants. Then D⊨ d if and only if D^*⊨ d .

Theorem 4 ([BV84b]) Let D be a set of TGDs and EGDs without constants. Let e be a non-trivial A -EGD without constants and let e₁ and e₂ be its two associated TGDs. Then D⊨ e if and only if D^*⊨ e₁ and there is a non-trivial A -EGD in D .

Thus, we can now generalize theorem 1 and give the final following theorem that reduces the implication problem of TGDs and EGDs to the existence of a top-down chase using the rule:

Theorem 5 Let D be a set of TGDs and EGDs without constants:

4 Conclusion and remarks

As a conclusion, we discuss several points. First we compare the top-down chase with a backward formal system and with other proof procedures. Then we discuss the contributions of our work. We conclude this paper by lifting some restrictions on the model and pointing out some perspectives.

4.1 Comparison with formal systems

Many formal systems have been studied for data dependencies ([BFH77], [Sci82], [BV84a], [SU82]). In [BV84a], some formal systems for TGDs and EGDs are studied. Two of these systems are backward, but only one, namely the T3 formal system, has some similarities with the top-down chase. We sketch here the main differences. We refer the reader to this paper for more details about formal systems for TGDs and EGDs.

The T3 system deals with TGDs and their unknown values (i.e. non-constant symbols of VAL(J′)−VAL(J) ) in the following way : the process starts with a goal tuple Q and applies a TGD T=<J′,J> by making J′ and J coincide. Typically this is achieved using especially the collapsing, augmentation and projection rules. When J′ and J coincide, it uses the transitivity rule to derive a new goal. A derivation tree is not linear , as it is the case for SLD-resolution.

The top-down chase leads to a linear inference in the sense that it uses directly TGDs from the base without first applying rules on them. There is only one rule. Obviously, it is a more complex step.

4.2 Discussion

To our knowledge, this is the first time a top-down proof procedure is used to solve the implication problem for dependencies. As such, a first constribution is to have shown that this can be performed top-down.

The top-down chase avoids rewriting the dependencies set under clausal form, and avoids the corresponding Skolemization process, necessary in order to use classical top-down theorem proving procedures. In the top-down chase, only the preliminary transformation of the TGD to be proven requires creating as many new constants as universally quantified variables appearing in it. Thus this step is unsignificant. By providing a way to infer directly on the original form of dependencies, by way of the notion of piece, the top-down chase is conceptually speaking a simpler approach, this is our second contribution.

Compared to the classical chase, it does not entails the dynamic creation of new constants. This is our third contribution.

Note that the top-down chase is not the usual chase simply reversed. The core rule is totally different from those used in the bottom-up approach of [BV84b]. Indeed, it has been shown that there is a strong relationship between the bottom-up chase and resolution with paramodulation [BC90][NR01]. Whereas the latters acts on the Skolemized set of clauses, the top-down chase does not need this prior step.

We are currently investigating in details the efficiency of the top-down chase. Actually, this is not trivial to assess, because of the various parameters that come into play. We agree with [BMSU86] on the fact that “ it is unreasonable to expect proofs of optimality, or even superiority of one algorithm over another, in all cases ”.

First of all, there is no need for Skolemizing the dependencies set nor dynamic constant creation. This might increase the efficiency. Indeed, on the contrary of the bottom-up chase, existentially quantified variables in the TGD conclusions never transform into constants. The only new symbols that are created are those of the current goal that are temporary variables. The goal can grow by the addition of hypotheses of the dependencies used by the T−rule. Notice that these symbols are, on the other hand, deleted whenever they are mapped to existing constants, or whenever some pieces are removed from the goal. Depending on the search strategy, they are also deleted from memory whenever the sequence of rule applications is stuck or successful (and needs for example backtracking when doing depth-first search).

On the other hand, top-down approaches can take exponential time longer w.r.t. bottom-up approaches. However, the notion of piece performs some dynamic optimizations by dealing with groups of atoms instead of one at the time, thus failures are detected earlier w.r.t. classical top-down procedures, and irrelevant goals are cut down earlier. This feature dramatically reduces the number of backtracks. As an illustration, we have compared our method with Prolog over 5000 proofs generated at random, varying each time the size and the content of the dependencies set, and the size of the dependencies. The top-down chase has been implemented on top of the CoGITo platform [GS98], which is a set of tools for conceptual graphs management. These tests showed that the top-down chase provides a logarithmic average improvement for the number of backtracks as well as for the proof duration (when the process stops), in spite of a non-optimized implementation. This reduces drastically the drawback of the top-down approach, and gives the top-down chase a fair practical level of efficiency. The fact that the search for the right symbol mappings has an exponential complexity does not practically seem to act as a brake. Indeed, whereas for SLD-Resolution, some irrelevant goals may lead to exploring a branch of the resolution tree, the top-down chase detects failures earlier and allows saving the average time.

However, either approach could run faster in practice on given data [BMSU86]. That is why we think presenting an example in which the top-down chase would be more efficient than the bottom-up chase would not be significant and would not provide any worthwhile addition to the discussion.

For all these reasons, we plan to implement the chase of [BV84b] in order to practically compare their efficiency over a random dataset. We think dependencies might be divided into classes for which one or the other approach would be better but identifying them is still an open problem.

We must mention that some optimisations to the bottom-up approach have been made by magic-sets in [MS96]. The principle of magic sets ([BMSU86]) is to perform at compile time some optimizations that are usually performed at run-time, by rewriting the set of dependencies before inference. This leads to avoiding the generation of irrelevant facts during the process, which is the essence of the top-down approach. We shall take these optimizations into account for the implementation.

4.3 Extension of the model

As already stated, we assumed some restrictions on the model that can be easily lifted. The reduction of EGDs to TGDs also works in this unrestricted model, which is stated in [BV84b]. Thus all the results can be extended, as they are in [CS98] for piece resolution.

There has been a renewed interest in data dependency theory with the introduction of constrained dependencies and ordered dependencies. These type of dependency can express a wide variety of constraints on the data ([BCW95]), besides generalizing most of the temporal dependencies of the taxonomy presented in [JS92]. The chase procedure has been redesigned, still in a bottom-up way, in order to deal with constrained tuple-generating dependencies ([MS96]), which are constrained functional dependencies. Our procedure can serve as a basis for the design of a top-down chase for constrained tuple-generating dependencies.

References

Appendix

Proof of theorem 1

Proof 1 (If). Let K∈ SAT(D) be a relation. Let k be a valuation such that k(J)⊆ K . So, by construction, K∈ SAT({ <u,∅ > | u∈ k (J)}) . It follows that K∈ SAT(D_k) . As D does not contain constants, the only difference between D_k and D_ω lies in the constants respectively introduced by the valuation k and ω . Therefore, we can construct a relation K_ω such that K_ω∈ SAT(D_ω) and K is obtained from K_ω by renaming constants of ω (J) in order to match with k(J) . Let k′ be an extension of k on J′ such that k′ is the identity on VAL(J′)−VAL(J) . By construction, as there is a valuation h on ω ′(J′) such that h(ω ′(J′))∈ K_ω , there is a valuation h′ on k′(J′) identical to h on VAL(J′)−VAL(J) , such that we have h′(k′(J′))∈ K . Hence K∈ SAT(d) , and D⊨ d .

(Only If). Let K∈ SAT(D_ω) be a relation. (i) There is a valuation k on ω (J) such that k(ω (J))⊆ K . As ω (J) contains only constants, then ω (J)⊆ K . (ii) As D⊆ D_ω , then D_ω⊨ D and K∈ SAT(D) . As D⊨ d , then K∈ SAT(d) . As ω (J)⊆ K (cf. i) and K∈ SAT(d) , then there is a extension k′ of k on ω ′(J′) such that k′(ω ′(J′))⊆ K . Hence K∈ SAT(Q) , and D_ω⊨ Q .

Lemma 1 Let Q=<I,∅ > and Q′ =<H,∅ > be two goals and T=<J′,J> be a TGD. If Q′=T(T,Q) then {Q′,T}⊨ Q .

Proof 2 Let K∈ SAT({Q′,T}) , then K∈ SAT(Q′) and K∈ SAT(T) . (i) By construction, there is a valuation h such that Q′=<h(I)\ P∪ h(J),∅ > , so there is a valuation k such that k(h(I)\ P∪ h(J))⊆ K . We also have k(h(I)\ P)⊆ K , so K∈ SAT(<h(I)\ P,∅ >) . (ii) Trivially, we have K∈ SAT(<h(J),h(J)>) , and also K∈ SAT(<h(J),h(I)\ P∪ h(J)>) . Moreover, we have K∈ SAT(T) , thus K∈ SAT(<J′,J>) and we also have K∈ SAT(<h(J′),h(J)>) and, as P⊆ h(J′) , K∈ SAT(<P,h(J)>) . Therefore K∈ SAT(<P,h(I)\ P∪ h(J)>) . (iii) Suppose K∉SAT(Q) . Then there is not any valuation k such that k(I)⊆ K . Therefore, for every valuation k such that k(h(I)\ P)⊆ K , we have k(P)⊈K ( k exists because K∈ SAT(<h(I)\ P,∅ >) - cf. i). As there is a valuation k such that k(h(I)\ P∪ h(J))⊆ K (cf. i), then K∉SAT(<P,h(I)\ P∪ h(J)>) , which leads to a contradiction (cf. ii). So K∈ SAT(Q) and {Q′,T}⊨ Q .

Lemma 2 Let Q and Q′ be two goals and T be a TGD. If {Q′,T}⊨ Q, but Q′⊭Q, then there is a rule application B=T(T,Q), such that Q′⊨ B.

Proof 3 For this proof, we shall swap to the first-order logic framework. Φ will denote, for a given TGD, the equivalent formula in FOL. {Q′,T}⊨ Q, therefore the set {Φ(Q′),Φ(T),¬ Φ(Q)} is unconsistant. SLD-Resolution allows to derive the empty clause. We are then sure that there is a linear refutation, starting from the negative clause corresponding to ¬ Φ(Q), provided that Φ(Q′) and Φ(T) are under clausal form with exactly on positive litteral.

To do that, we must rewrite Φ(Q′), Φ(T) and ¬ Φ(Q) under clausal form, in the following way:

• Φ(Q′) is under the form Φ(Q′)=∃ x₁... ∃ x_h(A₁∧ ... ∧ A_j). We need to introduce Skolem constants that we shall denote q_i, i∈ [1..h], each of them being respectively replacing the variable x_i. We construct j clauses¹ under the form Q′_i=A_i[q₁,...,q_h], i∈ [1..j].

• Φ(T) is under the form Φ(T)=∃ y₁... ∃ y_k(C₁∧ ... ∧ C_l) ← H₁∧ ... ∧ H_n, universally closed by the variables x₁,...,x_p. We need to introduce Skolem functions that we shall denote f_i(x₁,...,x_p), i∈ [1..k], each of them being respectively replacing the variable y_i. We construct l clauses under the form T_i=(C_i∨ ¬ H₁ ∨ ... ∨ ¬ H_n)[x₁,...,x_p, f₁(x₁,...,x_p), ...,f_k(x₁,...,x_p)] with i∈ [1..l].

• Φ(Q) is under the form Q=∃ x₁... ∃ x_s(Q₁∧ ... ∧ Q_t). The negation of Φ(Q) is under the form ¬ Q₁∨ ... ∨ ¬ Q_t, universally closed by the variables x₁,...,x_s. We construct a clause under the form NQ=(¬ Q₁∨ ... ∨ ¬ Q_t)[x₁,...,x_s]

The linear refutation starting with NQ exists. As NQ is composed only by negative litterals, NQ can only be resolved with clauses in which there are positive litterals, i.e. the clauses Q′_a,a ∈ [1...j] and T_b,b ∈ [1..l].

Let us suppose that the refutation only use the clauses Q′_a, a∈ [1..j]. Then the clauses T_b, b∈ [1..l] are unnecessary, and thus {Φ(Q′), ¬ Φ(Q)} is unconsistant, thus Φ(Q′)⊨ Φ(Q), which is in contradiction with the hypothesis. Therefore the resolution does not only use the clauses Q′_a, a∈ [1..j].

Remember that T_i=(C_i∨ ¬ H₁ ∨ ... ∨ ¬ H_n)[x₁,...,x_p, f₁(x₁,...,x_p), ...,f_r(x₁,...,x_p)] with i∈ [1..l]. If the refutation does not only use the clauses Q′_a, a∈ [1..j], then it uses at least one of the atoms C_a, a∈ [1..l] of the clauses T_i,i∈[1..l] (which are the only other positive litterals). This step will give a resolvent containing the litterals ¬ H₁ ∨ ... ∨ ¬ H_n. The f₁(x₁,...,x_p),...,f_r(x₁,...,x_p) are Skolem functions. They correspond to existentially quantified variables of Φ(T), therefore there is a substitution between a Q_b, b∈ [1..t] and a C_a, a∈ [1..l], which does not maps from the equivalent existentially quantified variables (1st condition of the T-rule verified) nor from the variables of H_i, i∈[1..n] that are not in C_a. Therefore, within the tableaux framework, there is an equivalent valuation h that satisfied the first condition of the T-rule (that concerns symbols of the conclusion of T not in the hypothesis, i.e. existentially quantified variables), and the second condition of the T-rule (that concerns symbols of the hypothesis of T not in the conclusion of T, i.e. universally quantified variables appearing only in the hypothesis). Indeed, as the variables do not come into account in the substitution, we force the equivalent symbols (in the tableaux valuation) to be identical. This has no consequences, because we have supposed that no symbols have the same name in T and Q.

This resolution step has instanced the variables of NQ. Therefore litterals of NQ containing, as a term, one of these variables have them also instanced. Thus, as there is no function symbols nor variables in Q′_a, a∈ [1..j], these litterals are unified with C_a[x₁,...,x_p, f₁(x₁,...,x_p), ...,f_r(x₁,...,x_p)], a∈ [1..n]. We are therefore sure that there is at least one piece of the conclusion of h(Q) appearing in the conclusion of h(T). Thus the third condition of the T-rule is satisfied.

Let us focus on the litterals added to the resolvent in the resolution step: ¬ H₁ ∨ ... ∨ ¬ H_n. There are two possibilities: either there are unified with the C_a[x₁,...,x_p,f₁(x₁,...,x_p), ...,f_r(x₁,...,x_p)], a∈ [1..n], or with the A_a, a∈ [1..j]. Let us suppose that at the moment they come into account within the resolution, they are unified with the C_a[x₁,...,x_p,f₁(x₁,...,x_p),...,f_r(x₁,...,x_p)], a∈ [1..n], then the resolvent would contain again the same atoms, because negative atoms of the clauses T_i,i∈[1..l] are the same in each clause T_i,i∈[1..l]. They are thus necessarily unified with the A_a, a∈ [1..j], for the resolution to have an end (which is the case). Thus we see that negative litterals of the clauses T_i,i∈[1..l] can be only unified with the A_a, a∈ [1..j]. Therefore, the negation of the new goal has a refutation which does not need the clauses T_b, b∈ [1..l], and thus {Φ(Q′),¬ Φ(B)} is unconsistant, and Q′⊨ B

Lemma 3 Let Q be a goal and Γ={T₁,...,T_n} be a set of TGDs. If Γ ⊨ Q, then there is a finite sequence of indices i₁,...,i_p such that {T_i₁,...,T_{i_p}}⊨ Q, with i_j ∈ [1..n] and such that ∀ j ∈ [1..p], there is a goal B_j−1 such that {B_j−1,T_{i_j}}⊨ B_j with B₀=<∅,∅> and B_p=Q.

Proof 4 By induction on p. At step 1, T_i₁⊨ Q. Thus {<∅,∅>, T_i₁}⊨ Q. By the lemma, there is a rule application between Q and T_i₁, giving a goal B₁ such that <∅,∅>⊨ B₁. Let us suppose this induction true until step p−1: there is a finite sequence of indices i₁,...,i_p−1 such that {T_i₁,...,T_{i_p−1}}⊨ Q, with i_j ∈ [1..n] and such that ∀ j ∈ [1..p−1], there is a goal B_j−1 such that {B_j−1,T_{i_j}}⊨ B_j with B₀=<∅,∅> and B_p−1=Q. At step p, we must show that {B_p−1, T_{i_p}} ⊨ Q and that Γ ⊨ B_p−1 Let T_{i_p} the TGD corresponding to the first clause interfering with Q in the SLD-resolution process described in the previous lemma. Let us show that the resolutions with T_{i_p} can be performed at the beginning of the SLD-resolution, and that Γ ⊨ B_p−1. To do that, let us suppose that we can not group the resolutions with the clauses coming from T_{i_p}. Then there is necessarily a resolution with a clause coming from T_{i_p} that needs a previous resolution with a clause not coming from T_{i_p}. If it is the case, then (i) either there is a litteral of NQ that can not be unified with an atom of a clause coming from T_{i_p}, but it will be possible later (if it is not possible later, then we do not need this clause), (ii) or this litteral will appear later in a resolvent, and will be resolved with a clause coming from T_{i_j}, j ≤ p−2. If a litteral of NQ can not be unified with an atom of the clause coming from T_{i_p}, it will not be possible later, because the resolution process does create other opportunities.

We now have a new clause resulting of a sequence of resolutions with the clauses coming from T_{i_p}. Thus this clause has a linear refutation. Let us show that this clause corresponds to a goal. To do that, let us perform the inverse transformation (i.e. clausal-form to FOL and thus tableaux). This clause only contains litterals from the original goal, and litterals coming from the hypothesis of T_{i_p}, all negative. Moreover, it does not contain function symbols. Therefore we can perform the inverse transformation under the form of a goal B_p−1 and thus Γ ⊨ B_p−1.

As Γ ⊨ B_p−1, by induction, there is a finite sequence of indices i₁,...,i_p−1 such that {T_i₁,...,T_{i_p−1}}⊨ Q, with i_j ∈ [1..n] and such that ∀ j ∈ [1..p−1], there is a goal B_j−1 such that {B_j−1,T_{i_j}}⊨ B_j with B₀=<∅,∅>. As we showed that {B_p−1,T_{i_p}}⊨ Q, then there is a finite sequence of indices i₁,...,i_p such that {T_i₁,...,T_{i_p}}⊨ Q, with i_j ∈ [1..n] and such that ∀ j ∈ [1..p], there is a goal B_j−1 such that {B_j−1,T_{i_j}}⊨ B_j with B₀=<∅,∅> and B_p=Q

Lemma 4 Let Q′ and Q be two goal such that Q′ ⊨ Q, and let Γ={T₁,...,T_n} be a set of TGDs. If there is a sequence of rule applications starting from Q′ and terminating with success, then there is a sequence of rule applications starting from Q and terminating with success.

Proof 5 Let us show that if there is a rule application between Q′ and a TGD T giving a new goal B′, then there is also a rule application between Q and T giving a new goal B such that B′ ⊨ B. We then conclude by recurrence that Q′⊨ Q, thus the set {Φ(Q′),¬ Φ(Q)} is unconsistant.

The SLD-resolution allows to produce the empty clause from the set of clauses corresponding to {Φ(Q′),¬ Φ(Q)}. There is therefore a linear refutation starting from the negative clause corresponding to ¬ Φ(Q), provided that Φ(Q′) be under clausal form with exactly one positive litteral. Let us perform the same transformation as in lemma 2. The linear refutation starting from NQ exists. As NQ is composed only of positive litterals, NQ can be resolved only with clauses having positive litterals, i.e. the clauses Q′_a,a ∈ [1...j]. Each litteral of NQ is thus unifiable with one of the A_a[q₁,...,q_h], a∈ [1..j]. All the atoms of NQ thus appear in the A_a[q₁,...,q_h], a∈ [1..j] and can only differ by a variable of NQ corresponding to a term in the A_a[q₁,...,q_h], a∈ [1..j]. Therefore tuples of Q are a subset of the tuples of Q′, and can only differ by extra symbols in Q.

If there is a rule application between Q′ and a TGD T then there is a valuation h′ and (at least) one piece of the conclusion of h′(Q′) appearing entirely in the conclusion of h′(T). As tuples of Q can only have extra symbols, and as tuples of Q all appear in Q′, there is also a valuation h and (at least) one piece of the conclusion of h(Q) appearing entirely in the conclusion of h(T). Let us construct the now goal B by removing from the conclusion of h(Q) only the pieces corresponding to those removed from the conclusion of h′(Q′) when B′ was constructed. Then the new goal B contains some tuples of h(Q) that differ, by construction, from that of B′ by potentially extra symbols, and some tuples of h(T) that also differ from that of B by potentially extra symbols. By performing the transformation of ¬ Φ(B) and Φ(B′) under clausal form, it is easy to show that each litteral of the clause coming from ¬ Φ(B) is unifiable with an atom of a clause coming from Φ(B′), and thus that the set {Φ(Q′),¬ Φ(Q)} is unconsistant, and B′ ⊨ B. By recurrence on the number of rule applications starting with Q′, we conclude that there is also a sequence of rule applications starting from Q and terminating with success.

Proof of theorem 2

Proof 6

(If). By induction on the number of T-rule applications. Trivially, as Q_n=<∅ ,∅ > , then D_ω⊨ Q_n . Assume the induction is true for Q_i, ∀ i∈ 2,… ,n , thus D_ω⊨ Q₂ . Let us prove that D_ω⊨ Q₁ . By lemma 1, as Q₂=T(T₁,Q₁) , then {Q₂,T₁}⊨ Q₁ , and also {Q₂,D_ω}⊨ Q₁ . As D_ω⊨ Q₂ , it follows that D_ω⊨ Q₁ .

(Only If). By lemma 3, there is a sequence of indices i₁,...,i_p such that {T_i₁,...,T_{i_p}}⊨ Q, with i_j ∈ [1..n] and such that ∀ j ∈ [1..p], there is a goal B_j−1 such that {B_j−1,T_{i_j}}⊨ B_j with B₀=<∅,∅> and B_p=Q. The proof is made by induction on p. At step 1, we have {<∅,∅>, T_i₁ }⊨ B₁, therefore by lemma 2, there is a rule application between B₁ and T_i₁, giving a goal B₀ such that <∅,∅>⊨ B₀. As <∅,∅> ⊨ <∅,∅>, the resolution terminates with success. Let us suppose this hypothesis true until step p−1, i.e. there is a sequence of rule applications starting from B_p−1 and terminating with success. At step p, we have B_p=Q. By lemma 2, there is a rule application between Q et T_{i_p}, giving a goal B such that B_p−1⊨ B. According to the induction hypothesis, there is a sequence of rule applications starting from B_p−1 and terminating with success. By lemma 4, there is also a sequence of rule applications starting from B, thus from Q, terminating with success.

The notation F[t₁,...t_k], where F is a formula means that terms of F (variables, constants and functions) are t₁,...,t_k

A Top-Down Proof Procedure for Generalized Data Dependencies

Stéphane Coulondre
INSA / University of Lyon, 20 Av Albert Einstein, 69621 Villeurbanne Cedex, France
Stephane.Coulondre(at)insa-lyon.fr

1 Introduction

2 The Framework

2.1 Preliminaries

2.2 Attributes, relations and tableaux

2.3 Dependencies

2.4 The implication problem and the (bottom-up) chase

3 The Top-Down Chase

3.1 A proof procedure for TGDs

3.2 Dealing with EGDs

4 Conclusion and remarks

4.1 Comparison with formal systems

4.2 Discussion

4.3 Extension of the model

References

Appendix

Proof of theorem 1

Proof of theorem 2

Proof of theorem 5

A Top-Down Proof Procedure for Generalized Data Dependencies

Stéphane Coulondre INSA / University of Lyon, 20 Av Albert Einstein, 69621 Villeurbanne Cedex, France Stephane.Coulondre(at)insa-lyon.fr

1 Introduction

2 The Framework

2.1 Preliminaries

2.2 Attributes, relations and tableaux

2.3 Dependencies

2.4 The implication problem and the (bottom-up) chase

3 The Top-Down Chase

3.1 A proof procedure for TGDs

3.2 Dealing with EGDs

4 Conclusion and remarks

4.1 Comparison with formal systems

4.2 Discussion

4.3 Extension of the model

References

Appendix

Proof of theorem 1

Proof of theorem 2

Proof of theorem 5

Stéphane Coulondre
INSA / University of Lyon, 20 Av Albert Einstein, 69621 Villeurbanne Cedex, France
Stephane.Coulondre(at)insa-lyon.fr