Copyright (C) 1995-1997 The University of Melbourne.
Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies.
Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided also that the entire resulting derived work is distributed under the terms of a permission notice identical to this one.
Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions.
Mercury is a new general-purpose programming language, designed and implemented by a small group of researchers at the University of Melbourne, Australia. Mercury is based on the paradigm of purely declarative programming, and was designed to be useful for the development of large and robust "real-world" applications. It improves on existing logic programming languages by providing increased productivity, reliability and efficiency, and by avoiding the need for non-logical program constructs. Mercury provides the traditional logic programming syntax, but also allows the syntactic convenience of user-defined functions, smoothly integrating logic and functional programming into a single paradigm.
Mercury requires programmers to supply type, mode and determinism declarations for the predicates and functions they write. The compiler checks these declarations, and rejects the program if it cannot prove that every predicate or function satisfies its declarations. This improves reliability, since many kinds of errors simply cannot happen in successfully compiled Mercury programs. It also improves productivity, since the compiler pinpoints many errors that would otherwise require manual debugging to locate. The fact that declarations are checked by the compiler makes them much more useful than comments to anyone who has to maintain the program. The compiler also exploits the guaranteed correctness of the declarations for significantly improving the efficiency of the code it generates.
To facilitate programming-in-the-large, to allow separate compilation, and to support encapsulation, Mercury has a simple module system. Mercury's standard library has a variety of pre-defined modules for common programming tasks -- see the Mercury Library Reference Manual.
Mercury's syntax is similar to the syntax of Prolog, with some additional declarations for types, modes, determinism, the module system, and pragmas, and with the distinction that function symbols may stand also for invocations of user-defined functions as well as for data constructors.
A Mercury program consists of a set of modules. Each module is a file containing a sequence of items (declarations and clauses). Each item is a term followed by a period. Each term is composed of a sequence of tokens, and each token is composed of a sequence of characters. Like Prolog, Mercury has the Definite Clause Grammar (DCG) notation for clauses.
Tokens in Mercury are the same as in ISO Prolog. The only difference is the `#line' token, which is used as a line number directive (see below).
The different tokens are as follows. Tokens may be separated by whitespace or line number directives.
"
).
Strings may contain backslash escapes. `\a' stands for "alert"
(a beep character), `\b' for backspace, `\r' for carriage-return,
`\f' for form-feed, `\t' for tab, `\n' for newline,
`\v' for vertical-tab. An escaped backslash, single-quote, or
double-quote stands for itself. The sequence `\x' introduces
a hexadecimal escape; it must be followed by a sequence of hexadecimal
digits and then a closing backslash. It is replaced
with the character whose character code is identified by the hexadecimal
number. Similarly, a backslash followed by an octal digit is the
beginning of an octal escape; as with hexadecimal escapes, the sequence
of octal digits must be terminated with a closing backslash.
A backslash followed immediately by a newline is deleted; thus an
escaped newline can be used to continue a string over more than one
source line. (String literals may also contain embedded newlines.)
'
). Quoted names can contain
backslash escapes of the same form as for strings.
Syntactically, terms in Mercury are exactly the same as in ISO Prolog, except that as an extension we permit higher-order terms, as described below. However, the meaning of some terms in Mercury is different to that in Prolog. See section Data-terms.
A term is either a variable or a functor.
A functor is an integer, a float, a string, a name, a compound term, or a higher-order term.
A compound term is a name followed without any intervening whitespace by an open parenthesis (i.e. an open_ct token), a sequence of argument terms separated by commas, and a close parenthesis. Compound terms may also be specified using operator notation, as in Prolog.
A higher-order term is a variable followed without any intervening
whitespace by an open parenthesis (i.e. an open_ct token),
a sequence of argument terms separated by commas, and a close
parenthesis. A higher-order term is equivalent to a compound term
whose functor is the empty name, and whose arguments are the
the variable followed by the arguments of the higher-order term.
That is, a term such as Var(Arg1, ..., ArgN)
is
parsed as "(Var, Arg1, ..., ArgN)
,
Each item in a Mercury module is either a declaration or a clause. If the top-level functor of the term is `:-/1', the item is a declaration, otherwise it is a clause. There are three types of clauses. If the top-level functor of the item is `:-/2', the item is a rule. If the top-level functor is `-->/2', the item is a DCG rule. Otherwise, the item is a fact. There are two types of rules and facts. If the top-level functor of the head of a rule is `=/2', the rule is a function rule, otherwise it is a predicate rule. If the top-level functor of the head of a fact is `=/2', the fact is a function fact, otherwise it is a predicate fact.
The allowed declarations are:
:- type :- pred :- func :- inst :- mode :- pragma :- module :- interface :- implementation :- import_module :- use_module :- external :- end_module
The `type', `pred' and `func' declarations are used for the type system, the `inst' and `mode' declarations are for the mode system, the `pragma' declarations are for the C interface, and for compiler hints about inlining, and the remainder are for the module system. They are described in more detail in their respective chapters.
(The current implementation also allows `when/2' declarations, but ignores them. This helps when one wants to write a program that is both a Mercury program and an NU-Prolog program.)
A function fact is an item of the form `Head = Result'.
A predicate fact is an item of the form `Head',
where the top-level functor of Head
is not :-/1
, :-/2
, -->/2
, or =/2
.
In both cases, the Head term must not be a variable.
The top-level functor of the Head
determines which predicate or function the fact belongs to;
the predicate or function must have been declared
in a preceding `pred' or `func' declaration in this module.
The arguments of the head must be valid data-terms.
A fact is equivalent to a rule whole body is `true'.
A function rule is an item of the form
`Head = Result :- Body'.
A predicate rule is an item of the form
`Head :- Body' where the top-level
functor of `Head' is not =/2
.
In both cases, the Head term must not be a variable.
The top-level functor of the Head determines which predicate or
function the clause belongs to; the predicate or function must have
been declared in a preceding `pred' or `func' declaration in
this module.
The arguments of the head must be valid data-terms.
The Body must be a valid goal.
A goal is a term of one of the following forms:
some Vars Goal
all Vars Goal
Goal1, Goal2
Goal1 ; Goal2
true
fail
not Goal
\+ Goal
Goal1 => Goal2
Goal1 <= Goal2
Goal1 <=> Goal2
if CondGoal then ThenGoal else ElseGoal
CondGoal -> ThenGoal ; ElseGoal
Term1 = Term2
Term1 \= Term2
call(Closure)
call(Closure1, Arg1)
call(Closure2, Arg1, Arg2)
call(Closure3, Arg1, Arg2, Arg3)
Var
Var(Arg1)
Var(Arg2)
Var(Arg2, Arg3)
call/N
syntax, i.e.
`call(Var)', `call(Var, Arg1)', etc.
Call
pred
declaration
in the module or in the interface of an imported module.
The arguments must be valid data-terms.
DCG-rules in Mercury have identical syntax and semantics to DCG-rules in Prolog.
A DCG-rule is an item of the form `Head --> Body'. The Head term must not be a variable. A DCG-rule is an abbreviation for an ordinary rule with two additional implicit arguments appended to the arguments of Head. These arguments are fresh variables which we shall call V_in and V_out. The Body must be a valid DCG-goal, and is an abbreviation for an ordinary goal. The next section defines a mathematical function `DCG-transform(V_in, V_out, DCG-goal)' which specifies the semantics of how DCG goals are transformed into ordinary goals. (The `DCG-transform' function is purely for the purposes of exposition, to define the semantics -- it is not part of the language.)
A DCG-goal is a term of one of the following forms:
some Vars DCG-goal
transform(V_in, V_out, some Vars DCG_goal) = some Vars transform(V_in, V_out, DCG_goal)
all Vars DCG-goal
transform(V_in, V_out, all Vars DCG_goal) = all Vars transform(V_in, V_out, DCG_goal)
DCG-goal1, DCG-goal2
transform(V_in, V_out, (DCG-goal1, DCG-goal2)) = (transform(V_in, V_new, DCG_goal1), transform(V_new, V_out, DCG_goal2))where V_new is a fresh variable.
DCG-goal1 ; DCG-goal2
transform(V_in, V_out, (DCG_goal1 ; DCG_goal2)) = ( transform(V_in, V_out, DCG_goal1) ; transform(V_in, V_out, DCG_goal2) )
{ Goal }
transform(V_in, V_out, { Goal }) = (Goal, V_out = V_in)
[Term, ...]
transform(V_in, V_out, [Term1, ...]) = (V_in = [Term, ... | V_Out])
[]
transform(V_in, V_out, []) = (V_out = V_in)
not DCG-goal
\+ DCG-goal
transform(V_in, V_out, not DCG_goal) = (not transform(V_in, V_new, DCG_goal), V_out = V_in)where V_new is a fresh variable.
if CondGoal then ThenGoal else ElseGoal
CondGoal -> ThenGoal ; ElseGoal
transform(V_in, V_out, if CondGoal then ThenGoal else ElseGoal) = if transform(V_in, V_cond, CondGoal) then transform(V_cond, V_out, ThenGoal) else transform(V_in, V_out, ElseGoal)
=(Term)
transform(V_in, V_out, =(Term)) = (Term = V_in, V_out = V_in)
DCG-call
transform(V_in, V_out, p(A1, ..., AN)) = p(A1, ..., AN, V_in, V_out)
Syntactically, a data-term is just a term.
There are a couple of differences from Prolog. The first one is that double-quoted strings are atomic in Mercury, they are not abbreviations for lists of character codes. The second is that Mercury terms may contain function applications, higher-order function applications, and lambda expressions.
A data-term is either a variable, a data-functor, a higher-order function application, or a lambda expression.
A data-functor is an integer, a float, a string, a character literal (any single-character name), a name, or a compound data-term. A compound data-term is a compound term whose form does not match the form of a lambda expression or higher-order function application and whose arguments are data-terms. If a data-functor is a name or a compound data-term, its top-level functor must name a function, predicate, or data constructor declared in the program or in the interface of an imported module.
A lambda expression is a compound term of one of the following forms
lambda([Var1::Mode1, Var2::Mode2, ...] is Det, Goal) pred(Var1::Mode1, Var2::Mode2, ...) is Det :- Goal pred(Var1::Mode1, Var2::Mode2, ..., DCGMode0, DCGMode1) is Det --> DCGGoal func(Var1::Mode1, Var2::Mode2, ...) = (Var::Mode) is Det :- Goal func(Var1, Var2, ...) = Var :- Goal
where Var1, Var2, ... are zero or more variables, Mode1, Mode2, ... are zero or more modes (see section Modes), DCGMode0 and DCGMode1 are modes (see section Modes), Det is a determinism (see section Determinism), Goal is a goal (see section Goals), and DCGGoal is a DCG Goal (see section DCG-goals). The `:- Goal' part is optional; if it is not specified, then `:- true' is assumed. A lambda expression denotes a higher-order predicate or function term whose value is the predicate or function of the specified arguments determined by the specified goal. The form of lambda expression using `-->' as its top level functor is a syntactic abbreviation: an expression of the form
pred(Var1::Mode1, Var2::Mode2, ..., DCGMode0, DCGMode1) is Det --> DCGGoal
is equivalent to
pred(Var1::Mode1, Var2::Mode2, ..., DCGVar0::DCGMode0, DCGVar1::DCGMode1) is Det :- Goal
where DCGVar0 and DCGVar1 are fresh variables, and Goal is the result of `DCG-transform(DCGVar0, DCGVar1, DCGGoal)' where DCG-transform is the function specified in see section DCG-goals. See section Higher-order programming.
A higher-order function application is a compound term of one of the following two forms
apply(Func, Arg1, Arg2, ..., ArgN) FuncVar(Arg1, Arg2, ..., ArgN)
where N >= 0, Func is a term of type `func(T1, T2, ..., Tn) = T', FuncVar is a variable of that type, and Arg1, Arg2, ..., ArgN are terms of types `T1', `T2', ..., `Tn'. The type of the higher-order function application term is T. It denotes the result of applying the specified function to the specified arguments. See section Higher-order programming.
The rule for implicit quantification in Mercury is not the same as the usual one in mathematical logic. In Mercury, variables that do not occur in the head of a clause are implicitly existentially quantified around their closest enclosing scope (in a sense to be made precise in the following paragraphs). This allows most existential quantifiers to be omitted, and leads to more concise code.
An occurrence of a variable is in a negated context if it is in a negation, in a universal quantification, in the condition of an if-then-else, in an inequality, or in a lambda expression.
Two goals are parallel if they are different disjuncts of the same disjunction, or if one is the "else" part of an if-then-else and the other goal is either the "then" part or the condition of the if-then-else, or if they are the goals of disjoint (distinct and non-overlapping) lambda expressions.
If a variable occurs in a negated context and does not occur outside of that negated context other than in parallel goals (and in the case of a variable in the condition of an if-then-else, other than in the "then" part of the if-then-else), then that variable is implicitly existentially quantified inside the negation.
The treatment of inequality, universal quantification, implication, and logical equivalence as abbreviations can cause the introduction of double negations which could make otherwise well-formed code mode-incorrect. To avoid this problem, the language specifies that double negations are removed after syntax analysis, before mode analysis is performed.
The type system is based on polymorphic many-sorted logic.
Certain special types are builtin, or are defined in the Mercury library:
char
, int
, float
, string
.
int
, float
,
and string
. (For char
, the standard syntax suffices.)
pred
, pred(T)
, pred(T1, T2)
, ...
(func) = T
, func(T1) = T
,
func(T1, T2) = T
, ...
univ
.
univ
is defined in the standard library module std_util
,
along with the predicates type_to_univ/2
and univ_to_type/2
.
With those predicates, any type can be converted to the universal type
and back again.
The universal type is useful for situations
where you need heterogeneous collections.
io__state
.
io__state
is defined in the standard library module io
,
and represents the state of the world.
Predicates which perform I/O are passed the old state of the world
and produce a new state of the world.
In this way, we can give a declarative semantics to code that performs I/O.
New types can be introduced with `:- type' declarations. There are several categories of derived types:
:- type fruit ---> apple ; orange ; banana ; pear. :- type strange ---> foo(int) ; bar(string). :- type employee ---> employee( string, % name int, % age string % department ). :- type tree ---> empty ; leaf(int) ; branch(tree, tree). :- type list(T) ---> [] ; [T | list(T)]. :- type pair(T1, T2) ---> T1 - T2.If the body of a discriminated union type definition contains a term whose top-level functor is
';'/2
,
the semi-colon is normally assumed to be a separator.
This makes it difficult to define a type
whose constructors include ';'/2
.
To allow this, curly braces can be used to quote the semi-colon.
It is then also necessary to quote curly braces.
The following example illustrates this:
:- type tricky ---> { int ; int } ; { { int } }.This defines a type with two constructors,
';'
/2 and '{}'/1
,
whose argument types are all int
.
Each discriminated union type definition introduces a distinct type.
Mercury considers two discriminated union types that have the same bodies
to be distinct types (name equivalence).
Having two different types with the same name and arity in the program
is an error.
(We hope to relax this in the future, to allow types with the same name
and arity in different modules.)
:- type money == int. :- type assoc_list(KeyType, ValueType) == list(pair(KeyType, ValueType)).Like discriminated union type definitions, equivalence type definitions must be transparent. Mercury treats an equivalence type as an abbreviation for the type on the right hand side of the definition; the two are equivalent in all respects in scopes where the equivalence type is visible.
:- type t1. :- type t2(T1, T2).declare types
t1/0
and t2/2
to be abstract types.
Such declarations are only useful in the interface section of a module.
This means that the type names will be exported,
but the constructors (functors) for these types will not be exported.
The implementation section of a module
must have give the definition of all the abstract types
named in the interface section of the module.
Abstract types may be defined as either discriminated union types
or as equivalence types.
Constructors may be overloaded among different types: there may be any number of constructors with a given name and arity, so long as they all have different types. However, there must be only one constructor with a given name, arity, and result type. (There is no particularly good reason for this restriction; in the future we may allow several such functors as long as they have different argument types.) Note that excessive overloading of constructors can slow down type checking and can make the program confusing for human readers, so overloading should not be over-used.
The argument types of each predicate must be explicitly declared with a `:- pred' declaration. The argument types and return type of each function must be explicitly declared with a `:- func' declaration. These declarations may be polymorphic. For example:
:- pred member(T, list(T)). :- func length(list(T)) = int.
There must only be one predicate with a given name and arity in each module, and only one function with a given name and arity in each module.
The compiler infers the types of data-terms, and in particular the types of variables and overloaded constructors, functions, and predicates. A type assignment is an assignment of a type to every variable and of a particular constructor, function, or predicate to every name in a clause. A type assignment is valid if it satisfies the following conditions.
Each constructor in a clause must have been declared in at least one visible type declaration. The type assigned to each constructor term must match one of the type declarations for that constructor, and the types assigned to the arguments of that constructor must match the argument types specified in that type declaration.
The type assigned to each function call term must match the return type of one of the `:- func' declarations for that function, and the types assigned to the arguments of that function must match the argument types specified in that type declaration.
The type assigned to each predicate argument must match the type specified in one of the `:- pred' declarations for that predicate. The type assigned to each head argument in a predicate clause must exactly match the argument type specified in the corresponding `:- pred' declaration.
The type assigned to each head argument in a function clause must exactly match the argument type specified in the corresponding `:- func' declaration, and the type assigned to the result term in a function clause must exactly match the result type specified in the corresponding `:- func' declaration.
(Here "match" means to be an instance of, i.e. to be identical to for some substitution of the type parameters, and "exactly match" means to be identical up to renaming of type parameters.)
One type assignment A is said to be more general than another type assignment B if there is a binding of the type parameters in A that makes it identical (up to renaming of parameters) to B. If there is more than one valid type assignment, the compiler must choose the most general one. If there are two valid type assignments which are not identical up to renaming and neither of which is more general than the other, then there is a type ambiguity, and compiler must report an error. A clause is type-correct if there is a unique (up to renaming) most general valid type assignment. Every clause in a Mercury program must be type-correct.
The mode of a predicate, or function, is a mapping from the initial state of instantiation of the arguments of the predicate, or the arguments and result of a function, to their final state of instantiation. To describe states of instantiation, we use information provided by the type system. Types can be viewed as regular trees with two kinds of nodes: or-nodes representing types and and-nodes representing constructors. The children of an or-node are the constructors that can be used to construct terms of that type; the children of an and-node are the types of the arguments of the constructors. We attach mode information to the or-nodes of type trees.
An instantiatedness tree is an assignment of an instantiatedness -- either free or bound --- to each or-node of a type tree, with the constraint that all descendants of a free node must be free.
A term is approximated by an instantiatedness tree if for every node in the instantiatedness tree,
When an instantiatedness tree tells us that a variable is bound, there may be several alternative function symbols to which it could be bound. The instantiatedness tree does not tell us which of these it is bound to; instead for each possible function symbol it tells us exactly which arguments of the function symbol will be free and which will be bound. The same principle applies recursively to these bound arguments.
Mercury's mode system allows users to declare names for instantiatedness trees using declarations such as
:- inst listskel = bound( [] ; [free | listskel] ).
This instantiatedness tree describes lists
whose skeleton is known but whose elements are distinct variables.
As such, it approximates the term [A,B]
but not the term [H|T]
(only part of the skeleton is known),
the term [A,2]
(not all elements are variables),
or the term [A,A]
(the elements are not distinct variables).
As a shorthand, the mode system provides `free' and `ground' as names for instantiatedness trees all of whose nodes are free and bound respectively. The shape of these trees is determined by the type of the variable to which they apply.
As execution proceeds, variables may become more instantiated.
A mode mapping is a mapping
from an initial instantiatedness tree to a final instantiatedness tree,
with the constraint that no node of the type tree
is transformed from bound to free.
Mercury allows the user to specify mode mappings directly
by expressions such as inst1 -> inst2
,
or to give them a name using declarations such as
:- mode m :: inst1 -> inst2.
Two standard shorthand modes are provided, corresponding to the standard notions of inputs and outputs:
:- mode in :: ground -> ground. :- mode out :: free -> ground.
Prolog fans who want to use the symbols `+' and `-' can do so by simply defining them using a mode declaration:
:- mode (+) :: in. :- mode (-) :: out.
These two modes are enough for most functions and predicates. Nevertheless, Mercury's mode system is sufficiently expressive to handle more complex data-flow patterns, including those involving partially instantiated data structures. (The current implementation does not handle partially instantiated data structures yet.)
For example, consider an interface to a database that associates data with keys, and provides read and write access to the items it stores. To represent accesses to the database over a network, you would need declarations such as
:- type operation ---> lookup(key, data) ; set(key, data). :- inst request = bound( lookup(ground, free) ; set(ground, ground) ). :- mode create_request :: free -> request. :- mode satisfy_request :: request -> ground.
`inst' and `mode' declarations can be parametric. For example, the following declaration
:- inst listskel(Inst) = bound( [] ; [Inst | listskel(Inst)] ).
defines the inst `listskel(Inst)' to be a list skeleton whose elements have inst `Inst'; you can the use insts such as `listskel(listskel(free))', which represents the instantiation state of a list of lists of free variables. The standard library provides the parametric modes
:- mode in(Inst) :: Inst -> Inst. :- mode out(Inst) :: free -> Inst.
so that for example the mode `create_request' defined above could have be defined as
:- mode create_request :: out(request).
A predicate mode declaration assigns a mode mapping to each argument of a predicate. A function mode declaration assigns a mode mapping to each argument of a function, and a mode mapping to the function result. Each mode of a predicate or function is called a procedure. For example, given the mode names defined by
:- mode out_listskel :: free -> listskel. :- mode in_listskel :: listskel -> listskel.
the (type and) mode declarations of the function length and predicate append are as follows:
:- func length(list(T) = int. :- mode length(in_listskel) = out. :- mode length(out_listskel) = in. :- pred append(list(T), list(T), list(T)). :- mode append(in, in, out). :- mode append(out, out, in).
Note that functions may have more than one mode, just like predicates; functions can be reversible.
Alternately, the mode declarations for `length' could use the standard library modes `in/1' and `out/1':
:- func length(list(T)) = int. :- mode length(in(listskel)) = out. :- mode length(out(listskel)) = in.
If a predicate or function has only one mode, the `pred' and `mode' declaration can be combined:
:- func length(list(T)::in) = (int::out). :- pred append(list(T)::in, list(T)::in, list(T)::out).
If there is no mode declaration for a function, the compiler assumes a default mode for the function in which all the arguments have mode `in' and the result of the function has mode `out'. (However, there is no requirement that a function have such a mode; if there is any explicit mode declaration, it overrides the default.)
A function or predicate mode declaration is an assertion by the programmer that for all possible argument terms and (if applicable) result term for the function or predicate that are approximated (in our technical sense) by the initial instantiatedness trees of the mode declaration and all of whose free variables are distinct, if the function or predicate succeeds then the resulting binding of those argument terms and (if applicable) result term will in turn be approximated by the final instantiatedness trees of the mode declaration, with all free variables again being distinct. We refer to such assertions as mode declaration constraints. These assertions are checked by the compiler, which rejects programs if it cannot prove that their mode declaration constraints are satisfied.
Note that with the usual definition of append, the mode
:- mode append(in_listskel, in_listskel, out_listskel).
would not be allowed, since it would create aliasing between the different arguments -- on success of the predicate, the list elements would be free variables but they would not be distinct.
In Mercury it is always possible to call a procedure with an
argument that is is more bound than the initial inst specified for that
argument in the procedure's mode declaration. In such cases, the
compiler will insert additional unifications to ensure that the
argument actually passed to the procedure will have the inst specified.
For example, if the predicate p/1
has mode `p(out)', you
can still call `p(X)' if X
is ground. The compiler will
transform this code to `p(Y), X = Y' where Y
is a fresh
variable. It is almost as if the predicate p/1
has another mode
`p(in)'; we call such modes "implied modes".
To make this concept precise, we introduce the following definition. A term satisfies an instantiatedness tree if for every node in the instantiatedness tree,
The mode set for a predicate or function is the set of mode declarations for the predicate or function. A mode set is an assertion by the programmer that the predicate should only be called with argument terms that satisfy the initial instantiatedness trees of one of the mode declarations in the set (i.e. the specified modes and the modes they imply are the only allowed modes for this predicate or function). We refer to the assertion associated with a mode set as the mode set constraint; these are also checked by the compiler.
A predicate or function p is well-moded with respect to a given mode declaration if given that the predicates and functions called by p all satisfy their mode declaration constraints, there exists an ordering of the literals in the definition of p such that
We say that a predicate or function is well-moded if it is well-moded with respect to all the mode declarations in its mode set, and we say that a program is well-moded if all its predicates and functions are well-moded.
The mode analysis algorithm checks one procedure at a time. It abstractly interprets the definition of the predicate or function, keeping track of the instantiatedness of each variable, and selecting a mode for each call and unification in the definition. To ensure that the mode set constraints of called predicates and functions are satisfied, the compiler may reorder the elements of conjunctions; it reports an error if no satisfactory order exists. Finally it checks that the resulting instantiatedness of the procedure's arguments is the same as the one given by the procedure's declaration.
The mode analysis algorithm annotates each call with the mode used.
Mode declarations can also specify so-called "unique modes". Mercury's unique modes are similar to "linear types" in some functional programming languages such as Clean. They allow you to specify when there is only one reference to a particular value, and when there will be no more references to that value. If the compiler knows there are will be more references to a value, it can perform "compile-time garbage collection" by automatically inserting code to deallocate the storage associated with that value. Even more importantly, the compiler can also simply reuse the storage immediately, for example by destructively updating one element of an array rather than making a new copy of the entire array in order to change one element. Unique modes are also the mechanism Mercury uses to provide declarative I/O.
We have not yet implemented unique modes fully, and the details are still in a state of flux. So the following should be considered tentative.
In addition to the insts mentioned above (`free', `ground', and `bound(...)'), Mercury also provides "unique" insts `unique' and `unique(...)' which are like `ground' and `bound(...)' respectively, except that they carry the additional constraint that there can only be one reference to the corresponding value. There is also an inst `dead' which means that there are no references to the corresponding value, so the compiler is free to generate code that reuses that value. There are three standard modes for manipulation unique values:
% unique output :- mode uo :: free -> unique. % unique input :- mode ui :: unique -> unique. % destructive input :- mode di :: unique -> dead.
Mode `uo' is used to create a unique value. Mode `ui' is used to inspect a unique value without losing its uniqueness. Mode `di' is used to deallocate or reuse the memory occupied by a value that will not be used.
Note that a value is not considered `unique' if it might be needed on backtracking. This means that unique modes are generally only useful for code whose determinism is `det' or `cc_multidet' (see section Determinism).
"Well it just so happens that your friend here is only mostly dead.
There's a big difference between mostly dead and all dead...
Now, mostly dead is slightly alive.
Now, all dead -- well, with all dead, there's usually only one thing that you can do.""What's that?"
"Go through his clothes and look for loose change!"
--- from the movie "The Princess Bride".
To allow for backtrackable destructive updates -- that is, updates whose effect is undone on backtracking, perhaps by recording the overwritten values on a "trail" so that they can be restored after backtracking -- Mercury also provides "mostly unique" modes. The insts `mostly_unique' and `mostly_dead' are equivalent to `unique' and `dead', except that only references which will be encountered during forward execution are counted - it is OK for `mostly_unique' or `mostly_dead' values to be needed again on backtracking.
Mercury defines some standard modes for manipulating "mostly unique" values, just as it does for unique values:
% mostly unique output :- mode muo :: free -> unique. % mostly unique input :- mode mui :: unique -> unique. % mostly destructive input :- mode mdi :: unique -> dead.
The implementation of the mode analysis algorithm is not quite complete; as a result, it is not possible to use nested unique modes, i.e. modes in which anything but the top level of a variable is unique. If you do, you will get unique mode errors when you try to get a unique field of a unique data structure. It is also not possible to use unique-input modes; only destructive-input and unique-output modes work.
The Mercury compiler does not (yet) reuse `dead' values. The only destructive update in the current implementation occurs in library modules, e.g. for I/O and arrays. We do however plan to implement structure reuse and compile-time garbage collection in the very near future.
For each mode of a predicate or function, we categorise that mode according to how many times it can succeed, and whether or not it can fail before producing its first solution.
det
).
semidet
).
multi
).
nondet
).
failure
.
erroneous
.
The determinism annotation erroneous
is used on the library
predicate `error/1', but apart from that those last two determinism
annotations are generally not needed.
To summarize:
Maximum number of solutions Can fail? 0 1 > 1 no erroneous det multi yes failure semidet nondet
The determinism of each mode of a predicate or function is indicated by an annotation on the mode declaration. For example:
:- pred append(list(T), list(T), list(T)). :- mode append(in, in, out) is det. :- mode append(out, out, in) is multi. :- mode append(in, in, in) is semidet. :- func length(list(T)) = int. :- mode length(in) = out is det. :- mode length(in(list_skel)) = out is det. :- mode length(in) = in is semidet.
An annotation of `det' or `multidet' is an assertion that for every value each of the inputs, there exists at least one value of the outputs for which the predicate is true, or (in the case of functions) for which the function term is equal to the result term. Conversely, an annotation of `det' or `semidet' is an assertion that for every value each of the inputs, there exists at most one value of the outputs for which the predicate is true, or (in the case of functions) for which the function term is equal to the result term. These assertions are called the mode-determinism assertions; they can play a role in the semantics, because in certain circumstances they may allow an implementation to perform optimizations that would not otherwise be allowed, such as optimizing away a goal with no outputs even though it might infinitely loop.
If the mode of the predicate is given in the :- pred
declaration
rather than in a separate :- mode
declaration,
then the determinism annotation goes on the :- pred
declaration
(and similarly for functions).
In particular, this is necessary
if a predicate does not have any argument variables.
For example:
:- pred loop(int::in) is erroneous. loop(X) :- loop(X). :- pred p is det. p. :- pred q is failure. q :- fail.
If there is no mode declaration for a function, then the default mode for that function is considered to have been declared as `det'. If you want to write a partial function, i.e. one whose determinism is `semidet', then you must explicitly declare the mode and determinism.
In Mercury, a function is supposed to be a true mathematical function of its arguments; that is, the value of the function's result should be determined only by the values of its arguments. Hence, for any mode of a function that specifies that all the arguments are fully input (i.e. for which the initial inst of all the arguments is a ground inst), the determinism of that mode can only be `det', `semidet', `erroneous', or `failure'.
The determinism categories form this lattice:
erroneous / \ failure det \ / \ semidet multi \ / nondet
The higher up this lattice a determinism category is, the more the compiler knows about the number of solutions of predicates of that determinism.
The determinism of goals is inferred from the determinism of their component parts, according to the rules below. The inferred determinism of a procedure is just the inferred determinism of the procedure's body.
For procedures that are local to a module,
the determinism annotations may be omitted;
in that case, their determinism will be inferred.
(To be precise, the determinism of procedures without a determinism annotation
is defined as the least fixpoint of the transformation which,
given an initial assignment
of the determinism det
to all such procedures,
applies those rules to infer
a new determinism assignment for those procedures.)
It is an error to omit the determinism annotation for procedures that are exported from their containing module.
If a determinism annotation is supplied for a procedure, the declared determinism is compared against the inferred determinism. If the declared determinism is greater than or not comparable to the inferred determinism (in the partial ordering above), it is an error. If the declared determinism is less than the inferred determinism, it is not an error, but the implementation may issue a warning.
The determinism category of each goal is inferred according to the following rules. These rules work with the two components of determinism category: whether the goal can fail without producing a solution, and the maximum number of solutions of the goal (0, 1, or more). If the inference process below reports that a goal can succeed more than once, but the goal generates no outputs that are visible from outside the goal, the final determinism of the goal will be based on the goal succeeding at most once, since the compiler will implicitly prune away any duplicate solutions.
det
, semidet
, or failure
,
depending on its mode.
A unification that assigns the value of one variable to another
is deterministic.
A unification that constructs a structure and assigns it to a variable
is also deterministic.
A unification that tests whether a variable has a given top function symbol
is semideterministic,
unless the compiler knows the top function symbol of that variable,
in which case its determinism is either det or failure
depending on whether the two function symbols are the same or not.
A unification that tests two variables for equality
is semideterministic,
unless the compiler knows that the two variables are aliases for one another,
in which case the unification is deterministic,
or unless the compiler knows that the two variables
have different function symbols in the same position,
in which case the unification has a determinism of failure.
The compiler knows the top function symbol of a variable
if the previous part of the predicate definition
contains a unification of the variable with a function symbol,
or if the variable's type has only one function symbol.
det
.
The conjunction `(A, B)' can fail
if either A can fail, or if A can succeed at least once,
and B can fail.
The conjunction can succeed at most zero times
if either A or B can succeed at most zero times.
The conjunction can succeed more than once
if either A or B can succeed more than once
and both A and B can succeed at least once.
(If e.g. A can succeed at most zero times,
then even if B can succeed many times
the maximum number of solutions of the conjunction is still zero.)
Otherwise, i.e. if both A and B succeed at most once,
the conjunction can succeed at most once.
( L = [], empty(Out) ; L = [H|T], nonempty(H, T, Out) )If L is input to the disjunction, then the disjunction is a switch on L. A switch can fail if the various arms of the switch do not cover all the function symbols in the type of the switched-on variable, or if the code in some arms of the switch can fail, bearing in mind that in each arm of the switch, the unification that tests the switched-on variable against the function symbol of that arm is considered to be deterministic. A switch can succeed several times if some arms of the switch can succeed several times, possibly because there are multiple disjuncts that test the switched-on variable against the same function symbol. A switch can succeed at most zero times only if all arms of the switch can succeed at most zero times.
failure
.
A disjunction `(A ; B)' that is not a switch
can fail if both A and B can fail.
It can succeed at most zero times
if both A and B can succeed at most zero times.
It can succeed at most once
if one of A and B can succeed at most once
and the other can succeed at most zero times.
Otherwise, i.e. if either A or B can succeed more than once,
or if both A and B can succeed at least once,
it can succeed more than once.
erroneous
,
then the determinism of the negation is erroneous
.
If the determinism of the negated goal is failure
,
the determinism of the negation is det
.
If the determinism of the negated goal is det
or multi
,
the determinism of the negation is failure
.
Otherwise, the determinism of the negation is semidet
.
Note that "perfect" determinism inference is an undecidable problem, because it requires solving the halting problem. (For instance, in the following example
:- pred p(T, T). :- mode p(in, out) is det. p(A, B) :- ( something_complicated(A, B) ; B = A ).
`p/0' can have more than one solution
only if `something_complicated' can succeed.)
Sometimes, the rules specified by the Mercury language
for determinism inference will infer a determinism
that is not as precise as you would like.
However, it is generally easy to overcome such problems.
The way to do this is to replace the compiler's static checking
with some manual run-time checking.
For example, if you know that a particular goal should never fail,
but the compiler infers that goal to be semidet
,
you can check at runtime that the goal does succeed,
and if it fails, call the library predicate `error/1'.
:- pred q(T, T). :- mode q(in, out) is det. q(A, B) :- ( goal_that_should_never_fail(A, B0) -> B = B0 ; error("goal_that_should_never_fail failed!") ).
The predicate error/1
has determinism erroneous
,
which means the compiler knows that it will never succeed or fail,
so the inferred determinism for the body of q/2
is det
.
(Checking assumptions like this is good coding style anyway.
The small amount of up-front work that Mercury requires
is paid back in reduced debugging time.)
Mercury's mode analysis knows that
computations with determinism erroneous can never succeed,
which is why it does not require the "else" part to generate
a value for `B'.
The introduction of the new variable `B0' is necessary
because the condition of an if-then-else is a negated context,
and can export the values it generates
only to the "then" part of the if-then-else,
not directly to the surrounding computation.
(If the surrounding computations had direct access
to values generated in conditions,
they might access them even if the condition failed.)
Normally, attempting to call
a nondet
or multi
mode of a predicate
from a predicate declared as semidet
or det
will cause a determinism error.
So how can we call nondeterministic code from deterministic code?
There are several alternative possibilities.
If you just want to see if a nondeterministic goal is satisfiable or not,
without needing to know what variable bindings it produces,
then there is no problem -
determinism analysis considers nondet
and multi
goals
with no non-local output variables to be
semidet
and det
respectively.
If you want to use the values of output variables, then you need to ask yourself which one of possibly many solutions to a goal do you want? If you want all of them, you need to use the predicate `solutions/2' in the standard library module `std_util', which collects all of the solutions to a goal into a list -- see section Higher-order programming.
If you just want one solution and don't care which,
the calling predicate should be declared nondet
or multi
.
The nondeterminism should then be propagated up the call tree
to the point at which it can be pruned.
In Mercury, pruning can be achieved in several ways.
The first way is the one mentioned above: if a goal has no non-local output variables then the implementation will only attempt to satisfy the goal once. Any potential duplicate solutions will be implicitly pruned away.
The second way is to rely on the fact that
the implementation will only seek a single solution to `main/2',
so alternative solutions to `main/2'
(and hence also to nondet
or multi
predicates
called directly or indirectly from `main/2')
are implicitly pruned away.
This is one way to achieve "don't care" style nondeterminism in Mercury.
The other situation in which you may want pruning and committed choice style nondeterminism is when you know that all the solutions returned will be equivalent. For example, you might want to find the maximum element in a set by iterating over the elements in the set. Iterating over the elements in a set in an unspecified order is a nondeterministic operation, but no matter which order you remove them, the maximum value in the set should be the same.
We may eventually extend Mercury to allow users to write
unique [X] Goal
as a special quantifier, meaning
"there exists a unique X
for which `Goal' is true".
This would allow the implementation
to prune alternative solutions for `Goal'
if `X' was the only output variable of `Goal'.
Note that specifying a user-defined equivalence relation as the equality predicate for user-defined types (see section User-defined equality predicates) means that the `unique' quantifier could be used to express more general forms of equivalence. For example, if you define a set type which represents sets as unsorted lists, you would want to define a user-defined equivalence relation for that type, which could sort the lists before comparing them. The `unique' quantifier could then be used for sets even though the lists used to represent the sets might not be in the same order in every solution.
Currently the language does not support the `unique' quantifier. (However, it is possible to achieve a similar effect by using the C interface to type-cast a higher-order predicate term.)
In addition to the determinism annotations described earlier, there are
"committed choice" versions of multi
and nondet
, called cc_multi
and cc_nondet
.
These can be used instead of multi
or nondet
if all calls
to that mode of the predicate occur in a context in which only one
solution is needed.
Such single-solution contexts are determined as follows.
The compiler will check that all calls to a committed-choice mode of a predicate do indeed occur in a single-solution context.
There are several reasons to use committed choice determinism annotations. One reason is for efficiency: committed choice annotations allow the compiler to generate much more efficient code. Another reason is for doing I/O, which is allowed only in `det' or `cc_multi' predicates, not in `multi' predicates. Another is for dealing with types that use non-canonical representations (see section User-defined equality predicates). And there are a variety of other applications.
It would be nice to be able to declare a mode of a predicate as both `multi' and `cc_multi', and have the compiler call the appropriate one depending on whether the call comes from a single-solution context or not. However, the current implementation does not yet support this.
When defining abstract data types, often it is convenient to use a non-canonical representation --- that is, one for which a single abstract value may have more than one different possible concrete representations. For example, you may wish to implement an abstract type `set' by representing a set as an (unsorted) list.
:- module set_as_unsorted_list. :- interface. :- type set(T). :- implementation. :- import_module list. :- type set(T) ---> set(list(T)).
In this example, the concrete representations `set([1,2])' and `set([2,1])' would both represent the same abstract value, namely the set containing the elements 1 and 2.
For types such as this, which do not have a canonical representation, the standard definition of equality is not the desired one; we want equality on sets to mean equality of the abstract values, not equality of their representations. To support such types, Mercury allows programmers to specify a user-defined equality predicate for user-defined types:
:- type set(T) ---> set(list(T)) where equality is set_equals.
Here `set_equals' is the name of a user-defined predicate that is used for equality on the type `set(T)'. It could for example be defined in terms of a `subset' predicate.
:- pred set_equals(set(T)::in, set(T)::in) is semidet. set_equals(S1, S2) :- subset(S1, S2), subset(S2, S1).
A type declaration for a type T may contain a `where equality is equalitypred' specification only if the following conditions are satisfied:
Types with user-defined equality can only be used in limited ways. Because there multiple representations for the same abstract value, any attempt to examine the representation of such a value is a conceptually non-deterministic operation. In Mercury this is modelled using committed choice nondeterminism.
The semantics of a specifying `where equality is equalitypred' on the type declaration for a type T are as follows:
Mercury supports higher-order functions and predicates with currying, closures, and lambda expressions. (To be pedantic, it would be more accurate to say that Mercury supports higher-order procedures: in Mercury, when you construct a higher-order predicate term, you only get one mode of a predicate or function; if you want multiple modes, you must pass multiple higher-order procedures.)
To create a higher-order predicate or function term, you can use a lambda expression, or, if the predicate or function has only one mode and it is not a zero-arity function, you can just use its name. For example, if you have declared a predicate
:- pred sum(list(int), int). :- mode sum(in, out) is det.
the following three unifications have the same effect:
X = lambda([List::in, Length::out] is det, sum(List, Length)) Y = (pred(List::in, Length::out) is det :- sum(List, Length)) Z = sum
In the above example, the type of `X', `Y', and `Z' is `pred(list(int), int)', which means a predicate of two arguments of types `list(int)' and `int' respectively. [The syntax using `lambda' is supported to enable programs to work in both Mercury and Prolog, because the syntax using `pred' and `:-' can't be easily emulated in Prolog. When we have implemented better debugging environments for Mercury, the syntax using `lambda' will be deprecated.]
Similarly, given
:- func scalar_product(int, list(int)) = list(int). :- mode scalar_product(in, in) = out is det.
the following three unifications have the same effect:
X = (func(Num, List) = NewList :- NewList = scalar_product(Num, List)) Y = (func(Num::in, List::in) = (NewList::out) is det :- NewList = scalar_product(Num, List)) Z = sum_func
In the above example, the type of `X', `Y', and `Z' is `func(int, list(int)) = list(int)', which means a function of two arguments, whose types are `int' and `list(int)', with a return type of `int'. As with `:- func' declarations, if the modes and determinism of the function are omitted in a higher-order function term, then the modes default to `in' for the arguments, `out' for the function result, and the determinism defaults to `det'.
If the predicate or function has more than one mode, you must use an explicit lambda expression to specify which mode you want.
You can also create higher-order function terms of non-zero arity and higher-order predicate terms by "currying", i.e. specifying the first few arguments to a predicate or function, but leaving the remaining arguments unspecified. For example, the unification
Sum123 = sum([1,2,3])
binds `Sum123' to a higher-order predicate term of type `pred(int)'. Similarly, the unification
Double = scalar_product(2)
binds `Double' to a higher-order function term of type `func(list(int)) = list(int)'.
For higher-order predicate expressions that thread an accumulator pair, we have syntax that allows you to use DCG notation in the goal of the expression. For example,
Pred = (pred(Strings::in, Num::out, di, uo) is det --> io__write_string("The strings are: "), { list__length(Strings, Num) }, io__write_strings(Strings), io__nl )
is equivalent to
Pred = (pred(Strings::in, Num::out, IO0::di, IO::uo) is det :- io__write_string("The strings are: ", IO0, IO1), list__length(Strings, Num), io__write_strings(Strings, IO1, IO2), io__nl(IO2, IO) )
Higher-order function terms of zero arity can only be created using an explicit lambda expression; you have to use e.g. `(func) = foo' rather than plain `foo', because the latter denotes the result of evaluating the function, rather than the function itself.
Note that when constructing a higher-order term, you cannot just use the name of a builtin language construct such as `=', `\=', `call', or `apply', and nor can such constructs be curried. Instead, you must either use an explicit lambda expression, or you must write a forwarding predicate or function. For example, instead of
list__filter([1,2,3], \=(2), List)
you must write either
list__filter([1,2,3], (pred(X::in) is semidet :- X \= 2), List)
or
list__filter([1,2,3], not_equal(2), List)
where you have defined `not_equal' using
:- pred not_equal(T::in, T::in) is semidet. not_equal(X, Y) :- X \= Y.
Another case when this arises is when want to curry a higher-order term. Suppose, for example, that you have a higher-order predicate term `OldPred' of type `pred(int, char, float)', and you want to construct a new higher-order predicate term `NewPred' of type `pred(char, float)' from `OldPred' by supplying a value for for just the first argument. The solution is the same: use an explicit lambda expression or a forwarding predicate. In either case, the body of the lambda expression or the forwarding predicate must contain a higher-order call with all the arguments supplied.
Once you have created a higher-order predicate term (sometimes known as a closure), the next thing you want to do is to call it. For predicates, you use the builtin goal call/N:
call(Closure)
call(Closure1, Arg1)
call(Closure2, Arg1, Arg2)
For example, the goal
call(Sum123, Result)
would bind `Result' to the sum of `[1, 2, 3]', i.e. to 6.
For functions, you use the builtin expression apply/N:
apply(Closure)
apply(Closure1, Arg1)
apply(Closure2, Arg1, Arg2)
For example, given the definition of `Double' above, the goal
List = apply(Double, [1, 2, 3])
would be equivalent to
List = scalar_product(2, [1, 2, 3])
and so for a suitable implementation of the function `scalar_product/2' this would bind `List' to `[2, 4, 6]'.
One extremely useful higher-order predicate in the Mercury standard
library is solutions/2
, which has the following declaration:
:- pred solutions(pred(T), list(T)). :- mode solutions(pred(out) is nondet, out) is det.
The term which you pass to `solutions/2' is a higher-order predicate term. You can pass the name of a one-argument predicate, or you can pass a several-argument predicate with all but one of the arguments supplied (a closure). The declarative semantics of `solutions/2' can be defined as follows:
solutions(Pred, List) is true iff all [X] (call(Pred, X) <=> list__member(X, List)) and List is sorted.
where `call(Pred, X)' invokes the higher-order predicate term `Pred' with argument `X', and where `list__member/2' is the standard library predicate for list membership. In other words, `solutions(Pred, List)' finds all the values of `X' for which `call(Pred, X)' is true, collects these solutions in a list, sorts the list, and returns that list as its result. Here's an example: the standard library defines a predicate `list__perm(List0, List)'
:- pred list__perm(list(T), list(T)). :- mode list__perm(in, out) is nondet.
which succeeds iff List is a permutation of List0. Hence the following call to solutions
solutions(list__perm([3,1,2]), L)
should return all the possible permutations of the list `[3,1,2]' in sorted order:
L = [[1,2,3],[1,3,2],[2,1,3],[2,3,1],[3,1,2],[3,2,1]].
See also `unsorted_solutions/2' and `solutions_set/2', which are defined in the standard library module `std_util' and documented in the Mercury Library Reference Manual.
In Mercury, the mode and determinism of a higher-order predicate or function term are part of that term's inst, not its type. This allows a single higher-order predicate to work on argument predicates of different modes and determinism, which is particularly useful for library predicates such as `list__map' and `list__foldl'.
The language contains builtin `inst' values
pred is Determinism pred(Mode) is Determinism pred(Mode1, Mode2) is Determinism ... (func) = Mode is Determinism func(Mode1) = Mode is Determinism func(Mode1, Mode2) = Mode is Determinism ...
These insts represent the instantiation state of variables bound to higher-order predicate and function terms with the appropriate mode and determinism. For example, `pred(out) is det' represents the instantion state of being bound to a higher-order predicate term which is `det' and accepts one output argument; the term `sum([1,2,3])' from the example above is one such higher-order predicate term which matches this instantiation state.
As a convenience, the language also contains builtin `mode' values of the same name (and they are what we have been using in the examples up to now). These modes map from the corresponding `inst' to itself. It is as if they were defined by
:- mode (pred is Determinism) :: in(pred is Determinism). :- mode (pred(Inst) is Determinism) :: in(pred(Inst) is Determinism). ...
using the parametric inst `in/1' mentioned in section Modes which maps an inst to itself.
If you want to define a predicate which returns a higher-order predicate term, you would use a mode such as `free -> pred(...) is ...', or `out(pred(...) is ...)'. For example:
:- pred foo(pred(int)). :- mode foo(free -> pred(out) is det) is det. foo(sum([1,2,3])).
Note that in Mercury it is an error to attempt to unify two higher-order terms. This is because equivalence of higher-order terms is undecidable in the general case.
For example, given the definition of `foo' above, the goal
foo(lambda([X::out] is det, X = 6))
is illegal. If you really want to compare higher-order predicates for equivalence, you must program it yourself; for example, the above goal could legally be written as
P = lambda([X::out] is det, X = 6), foo(Q), all [X] (call(P, X) <=> call(Q, X)).
Note that the compiler will only catch direct attempts at higher-order unifications; indirect attempts (via polymorphic predicates, for example `(list__append([], [P], [Q])' may result in an error at run-time rather than at compile-time.
The Mercury module system is simple and straightforward.
Each module must start with a module
declaration,
specifying the name of the module.
An interface
declaration specifies
the start of the module's interface section:
this section contains declarations for the types, data constructors,
instantiation states, modes, functions and predicates exported by this module.
Mercury provides support for abstract data types,
since the definition of a type may be kept hidden,
with only the type name being exported.
An implementation
declaration specifies
the start of the module's implementation section.
Any entities declared in this section are local to the module
and cannot be used by other modules.
The implementation section must of course contain definitions
for all abstract data types, functions and predicates exported by the module,
as well for all local types, functions and predicates.
The module may optionally end with an end_module
declaration.
If a module wishes to make use of entities exported by other modules,
then it must explicitly import those modules using one or more
import_module
or use_module
declarations.
These declarations may occur either in the interface or the implementation
section. If the imported entities are used in the interface section,
then the corresponding import_module
or use_module
declaration must also be in the interface section. If the imported
entities are only used in the implementation section, the
import_module
or use_module
declaration should be in
the implementation section.
The names of predicates, functions, constructors, types, modes and insts
can be explicitly module qualified using the `:' operator,
i.e. `module:name'. This is useful both for readability and
for resolving name conflicts. Uses of entities imported using
use_module
declarations must be explicitly module qualified.
Currently we also support `__' as an alternative module qualifier,
so you can write module__name
instead of module:name
.
We are considering changing the module qualifier from `:'
to `.' in a future version, so that we can use `:' as
a type qualifier instead. Hence for the time being we recommend
that you use `__' rather than `:' as module qualifier.
Certain optimizations require information or source code for predicates defined in other modules to be as effective as possible. At the moment, inlining and higher-order specialization are the only optimizations that the Mercury compiler can perform across module boundaries.
One module must export a predicate `main', which must be declared as either
:- pred main(io__state::di, io__state::uo) is det.
or
:- pred main(io__state::di, io__state::uo) is cc_multi.
(or any declaration equivalent to one of the two above).
For example, here is the definition of a simple module for managing queues:
:- module queue. :- interface. % Declare an abstract data type. :- type queue(T). % Declare some predicates which operate on the abstract data type. :- pred empty_queue(queue(T)). :- mode empty_queue(out) is det. :- mode empty_queue(in) is semidet. :- pred put(queue(T), T, queue(T)). :- mode put(in, in, out) is det. :- pred get(queue(T), T, queue(T)). :- mode get(in, out, out) is semidet. :- implementation. % Queues are implemented as lists. We need the `list' module % for the declaration of the type list(T), with its constructors % '[]'/0 % and '.'/2, and for the declaration of the predicate % list__append/3. :- import_module list. % Define the queue ADT. :- type queue(T) == list(T). % Declare the exported predicates. empty_queue([]). put(Queue0, Elem, Queue) :- list__append(Queue0, [Elem], Queue). get([Elem | Queue], Elem, Queue). :- end_module queue.
Mercury has a standard library which includes modules for lists, stacks, queues, priority queues, sets, bags (multi-sets), maps (dictionaries), random number generation, input/output and filename and directory handling. See the Mercury Library Reference Manual for details.
The Mercury standard library has a standard naming convention in which
every entity exported by a module is prefixed by the module name and
two underscores. We have found this convention improves the
readability and maintainability of our code, and so we recommend that
you follow it in your code too. (Eventually, these names will be converted
to names of the form module.name
; this will allow you to omit the
prefix in places where it does not improve readability.)
A legal Mercury program is one that complies with the syntax, type, mode, determinism, and module system rules specified in earlier chapters. If a program does not comply with those rules, the compiler must report an error.
For each legal Mercury program, there is an associated predicate calculus theory whose language is specified by the type declarations in the program and whose axioms are the completion of the clauses for all predicates in the program, plus the usual equality axioms extended with the completion of the equations for all functions in the program, plus axioms corresponding to the mode-determinism assertions (see section Determinism), plus axioms specifying the semantics of library predicates and functions. The declarative semantics of a legal Mercury program is specified by this theory.
Mercury implementations must be sound: the answers they compute must be true in every model of the theory. Mercury implementations are not required to be complete: they may fail to compute an answer in finite time, or they may exhaust the resource limitations of the execution environment, even though an answer is provable in the theory. However, there are certain minimum requirements that they must satisfy with respect to completeness.
There is an operational semantics of Mercury programs called the strict sequential operational semantics. In this semantics, the program is executed top-down, starting from `main/2', and function calls within a goal, conjunctions and disjunctions are all executed in depth-first left-to-right order. Conjunctions and function calls are "minimally" reordered as required by the modes: the order is determined by selecting the first mode-correct sub-goal (conjunct or function call), executing that, then selecting the first of the remaining sub-goals which is now mode-correct, executing that, and so on. (There is no interleaving of different individual conjuncts or function calls, however; the sub-goals are reordered, not split and interleaved.) Function application is strict, not lazy.
Mercury implementations are required to provide a method of processing Mercury programs which is equivalent to the strict sequential operational semantics.
There is another operational semantics of Mercury programs called the strict commutative operational semantics. This semantics is equivalent to the strict sequential operation semantics except that there is no requirement that function calls, conjunctions and disjunctions be executed left-to-right; they may be executed in any order. (The order may even be different each time a particular goal is entered.)
As well as providing the strict sequential operational semantics, Mercury implementations may optionally provide additional implementation-defined operational semantics, provided that any such implementation-defined operational semantics are at least as complete as the strict commutative operational semantics. An implementation-defined semantics is "at least as complete" as the strict commutative semantics if and only if the implementation-defined semantics guarantees to compute an answer in finite time for any program for which an answer would be computed in finite time for all possible executions under the strict commutative semantics (i.e. for all possible orderings of conjuctions and disjunctions).
Thus, to summarize, there are in fact a variety of different operational semantics for Mercury. In one of them, the strict sequential semantics, there is no nondeterminism -- the behaviour is always specified exactly. Programs are executed top-down using SLDNF (or something equivalent), mode analysis does "minimal" reordering (in a precisely defined sense), function calls, conjunctions and disjunctions are executed depth-first left-to-right, and function evaluation is strict. All implementations are required to support the strict sequential semantics, so that a program which works on one implementation using this semantics will be guaranteed to work on any other implementation. However, implementations are also allowed to support other operational semantics, which may have non-determinism, so long as they are sound with respect to the declarative semantics, and so long as they meet a minimum level of completeness (they must be at least as complete as the strict commutative semantics, in the sense that every program which terminates for all possible orderings must must also terminate in any implementation-defined operational semantics).
This compromise allows Mercury to be used in several different ways. Programmers who care more about ease of programming and portability than about efficiency can use the strict sequential semantics, and can then be guaranteed that if their program works on one correct implementation, it will work on all correct implementations. Compiler implementors who want to write optimizing implementations that do lots of clever code reorderings and other high-level transformations or that want to offer parallelizing implementations which take maximum advantage of parallelism can define different semantic models. Programmers who care about efficiency more than portability can write code for these implementation-defined semantic models. Programmers who care about efficiency and portability can achieve this by writing code for the commutative semantics. Of course, this is not quite as easy as using the strict sequential semantics, since it is in general not sufficient to test your programs on just one implementation if you are to be sure that it will be able to use the maximally efficient operational semantics on any implementation. However, if you do write code which works for all possible executions under commutative semantics (i.e. for all possible orderings of conjunctions and disjunctions), then you can be guaranteed that it will work correctly on every implementation, under every possible implementation-defined semantics.
The University of Melbourne Mercury implementation offers eight
different semantics, which can be selected with different
combinations of the `--no-reorder-conj', `--no-reorder-disj',
and `--fully-strict' options. (The `--fully-strict' option
prevents the compiler from improving completeness by optimizing away infinite
loops or calls to error/1
.) The default semantics are the
commutative semantics. Enabling all of these options gives you the
the strict sequential semantics. Enabling just some of them gives
you a semantics somewhere in between.
Future implementations of Mercury may wish to offer other operational semantics. For example, they may wish to provide semantics in which function evaluation is lazy, rather than strict; semantics with a guaranteed fair search rule; and so forth.
The Mercury distribution includes a number of examples of the use of the C interface that show how to interface C++ with Mercury and how to set up `Mmake' files to automate the build process. See the `samples/c_interface' directory in the Mercury distribution.
A declaration of the form
:- pragma c_code(Pred(Var1::Mode1, Var2::Mode2, ...), may_call_mercury, C_Code).
or
:- pragma c_code(Func(Var1::Mode1, Var2::Mode2, ...) = (Var::Mode), may_call_mercury, C_Code).
means that any calls to the specified mode of Pred or Func will be replaced by the C code given in C_Code. The C code fragment may refer to the specified variables (Var1, Var2, ..., and Var) directly by name.
If there is a pragma c_code
declaration for a mode of a predicate
or function, then that mode of the predicate may not have determinism
`multi' or `nondet', there must not be any clauses for that
predicate or function, and there must be a pragma c_code
declaration
for every mode of the predicate or function.
For example, the following piece of code gives a predicate, `c_write_string/3', which has a similar effect to the Mercury library predicate `io__write_string/3':
:- pred c_write_string(string, io__state, io__state). :- mode c_write_string(in, di, uo) is det. :- pragma c_code(c_write_string(S::in, IO0::di, IO::uo), may_call_mercury, "puts(S); IO = IO0;").
If the C code does not recursively invoke Mercury code, as in the above example, then you can use `will_not_call_mercury' in place of `may_call_mercury' in the declarations above. This allows the compiler to use a slightly more efficient calling convention. (If you use this form, and the C code does invoke Mercury code, then the behaviour is undefined -- your program may misbehave or crash.)
The C code in a pragma c_code
declaration
for any procedure whose determinism indicates that it could fail
must assign a truth value to the macro `SUCCESS_INDICATOR'.
For example:
:- pred string__contains_char(string, character). :- mode string__contains_char(in, in) is semidet. :- pragma c_code(string__contains_char(Str::in, Ch::in), will_not_call_mercury, "SUCCESS_INDICATOR = (strchr(Str, Ch) != NULL);").
SUCCESS_INDICATOR
should not be used other than as the target of
an assignment.
(For example, it may be #define
d to a register, so you should not
try to take its address.)
Procedures whose determinism indicates that that they cannot fail
should not access SUCCESS_INDICATOR
.
Arguments whose mode is input will have their values set by the
Mercury implementation on entry to the C code. If the procedure
succeeds, the C code must set the values of all output arguments
before returning. If the procedure fails, the C code need only
set SUCCESS_INDICATOR
to false (zero).
Any macros, function prototypes, or other C declarations that are used in `c_code' pragmas must be included using a `c_header_code' declaration of the form
:- pragma c_header_code(HeaderCode).
HeaderCode can be a C `#include' line, for example
:- pragma c_header_code("#include <math.h>")
or
:- pragma c_header_code("#include ""tcl.h""").
or it may contain any C declarations, for example
:- pragma c_header_code(" extern int errno; #define SIZE 200 struct Employee { char name[SIZE]; } extern int bar; extern void foo(void); ").
Mercury automatically includes certain headers such as <stdlib.h>
,
but you should not rely on this, as the set of headers which Mercury
automatically includes is subject to change.
Definitions of C functions or global variables may be included using a declaration of the form
:- pragma c_code(Code).
For example,
:- pragma c_code(" int bar = 42; void foo(void) {} ").
Such code is copied verbatim into the generated C file.
It is also possible to export Mercury procedures to C, so that you can call Mercury code from C (or from other languages that can interface to C, e.g. C++).
A declaration of the form
:- pragma export(Pred(Mode1, Mode2, ...), "C_Name_1").
or
:- pragma export(Func(Mode1, Mode2, ...) = Mode, "C_Name_2").
exports a procedure for use by C.
For each Mercury module containing `pragma export' declarations, the Mercury implementation will automatically create a header file for that module which declares a C function C_Name() for each of the `pragma export' declarations. Each such C function is the C interface to the specified mode of the specified Mercury predicate or function.
The interface to a Mercury procedure is determined as follows. Input arguments are passed by value. For output arguments, the caller must pass the address in which to store the result. If the Mercury procedure can fail, then its C interface function returns a truth value indicating success or failure. If the Mercury procedure is a Mercury function that cannot fail, and the function result has an output mode, then the C interface function will return the Mercury function result value. Otherwise the function result is appended as an extra argument. Arguments of type `io__state' or `store__store(_)' are not passed at all; that's because these types represent mutable state, and in C modifications to mutable state are done via side effects, rather than argument passing.
Calling polymorphically typed Mercury procedures from C is a little bit more difficult than calling ordinary (monomorphically typed) Mercury procedures. The simplest method is to just create monomorphic forwarding procedures that call the polymorphic procedures, and export them, rather than exporting the polymorphic procedures.
If you do export a polymorphically typed Mercury procedure, the compiler will prepend one `type_info' argument to the parameter list of the C interface function for each polymorphic type variable in the Mercury procedure's type signature. The caller must arrange to pass in appropriate `type_info' values corresponding to the types of the other arguments passed. These `type_info' arguments can be obtained using the Mercury `type_of' function in the Mercury standard library module `std_util'.
A Mercury implementation should allow you to link with object files or libraries that were produced by compiling C code. The exact mechanism for linking with C object files is implementation-dependent. The following text describes how it is done for the University of Melbourne Mercury implementation.
To link an existing object file or library into the Mercury executable, set the `Mmake' variable `MLLIBS' in the `Mmake' file in the directory in which you are working. For example add the following line to the `Mmake' file:
MLLIBS = my_functions.o -L/usr/local/contrib/lib -lfancy_library
As illustrated by the example, the values for `MLLIBS' are similar to those taken by the Unix linker.
For more information, see the "Libraries" chapter of the Mercury users guide, and the `man' pages for `mmc' and `ml'.
For each of the Mercury types int
, float
, char
,
and string
, there is a C typedef for the corresponding type in C:
Integer
, Float
, Char
, and String
respectively.
In the current implementation, `Integer' is a typedef for an integral type whose size is the same size as a pointer; `Float' is a typedef for `double' (unless the program and the Mercury library was compiled with `-DUSE_SINGLE_PREC_FLOAT', in which case it is a typedef for `float'); `Char' is a typedef for `char'; and `String' is a typedef for `Char *'.
Mercury variables of type int
, float
, char
, or
string
are passed to and from C as C variables whose type is
given by the corresponding typedef. Mercury variables of any other
type are passed as a `Word', which in the current implementation
is a typedef for an unsigned type whose size is the same size as a pointer.
Mercury lists can be manipulated by C code using the following macros, which are defined by the Mercury implementation.
list_is_empty(list) /* test if a list is empty */ list_head(list) /* get the head of a list */ list_tail(list) /* get the tail of a list */ list_empty() /* create an empty list */ list_cons(head,tail) /* construct a list with the given head and tail */
Note that the use of these macros is subject to some caveats (see section Memory management).
The inbuilt Mercury type c_pointer
can be used to pass C pointers
between C functions which are called from Mercury. For example:
:- module pointer_example. :- interface. :- type complicated_c_structure. % Initialise the abstract C structure that we pass around in Mercury. :- pred initialise_complicated_structure(complicated_c_structure::uo) is det. % Perform a calculation on the C structure. :- pred do_calculation(int::in, complicated_structure::di, complicated_structure::uo) is det. :- implementation. % Our C structure is implemented as a c_pointer. :- type complicated_c_structure == c_pointer. :- pragma c_header_code(" extern struct foo *init_struct(void); extern struct foo *perform_calculation(int, struct foo *); "); :- pragma c_code(initialise_complicated_structure(Structure::uo), may_call_mercury, "Structure = init_struct();"). :- pragma c_code(do_calculation(Value::in, Structure0::di, Structure::uo, may_call_mercury, "Structure = perform_calculation(Value, Structure0);").
Passing pointers to dynamically-allocated memory from Mercury to code written in other languages, or vice versa, is in general implementation-dependent.
The current Mercury implementation supports two different methods of memory management: conservative garbage collection, or no garbage collection. (With the latter method, heap storage is reclaimed only on backtracking.)
Conservative garbage collection makes inter-language calls simplest. When using conservative garbage collection, heap storage is reclaimed automatically. Pointers to dynamically-allocated memory can be passed to and from C without taking any special precautions.
When using no garbage collection, you must be careful not to retain
pointers to memory on the Mercury heap after Mercury has backtracked
to before the point where that memory was allocated.
You must also avoid the use of the macros
list_empty()
and list_cons()
.
(The reason for this is that they may access Mercury's `hp' register,
which might not be valid in C code. Using them in the bodies of
procedures defined using `pragma c_code' with
`will_not_call_mercury' would probably work, but we don't advise it.)
Instead, you can write Mercury functions to perform these actions
and use `pragma export' to access them from C.
This alternative method also works with conservative garbage collection.
Future Mercury implementations may use non-conservative methods of garbage collection. For such implementations, it will be necessary to explicitly register pointers passed to C with the garbage collector. The mechanism for doing this has not yet been decided on. It would be desirable to provide a single memory management interface for use when interfacing with other languages that can work for all methods of memory management, but more implementation experience is needed before we can formulate such an interface.
In certain compilation grades (see the "Compilation model options" section of the Mercury User's Guide), the University of Melbourne Mercury implementation supports trailing. Trailing is a means of having side-effects, such as destructive updates to data structures, undone on backtracking. The basic idea is that during forward execution, whenever you perform a destructive modification to a data structure that may still be live on backtracking, you should record whatever information is necessary to restore it on a stack-like data structure called the "trail". Then, if a computation fails, and execution backtracks to before those those updates were performed, the Mercury runtime engine will traverse the trail back to the most recent choice point, undoing all those updates.
The interface used is a set of C functions (which are actually implemented as macros) and types. Typically these will be called from C code within `pragma c_code' declarations in Mercury code.
For examples of the use of this interface, see the modules `extras/trailed_update/tr_array.m' and `extras/clpr/cfloat.m' in the Mercury distribution.
A "choice point" is a point in the computation to which execution might backtrack. The "current" choice point is the one that was most recently encountered; that is also the one to which execution will branch if the current computation fails.
When you trail an update, the Mercury engine will ensure that if execution ever backtracks to the choice point that was current at the time of trailing, then the update will be undone.
If the Mercury compiler determines that it will never need to backtrack to a particular choice point, then it will "prune" away that choice point. If a choice point is pruned, the trail entries for those updates will not necessarily be discarded, because in general they may still be necessary in case we backtrack to a prior choice point.
The simplest form of trailing is value trailing. This allows you to trail updates to memory and have the Mercury runtime engine automatically undo them on backtracking.
MR_trail_value()
void MR_trail_value(Word *address, Word value);Ensures that if future execution backtracks to the current choice point, then value will be placed in address.
MR_trail_current_value()
void MR_trail_current_value(Word *address);Ensures that if future execution backtracks to the current choice point, the value currently in address will be restored. `MR_trail_current_value(address)' is equivalent to `MR_trail_value(address, *address)'.
For more complicated uses of trailing, you can store the address of a C function on the trail and have the Mercury runtime call your function back whenever future execution backtracks to the current choice point, or whenever that choice point is pruned, either because execution commits to never backtracking over that point, or because an exception was thrown, or possibly during garbage collection.
Note that currently Mercury does not support exception handling, and the garbage collector in the current Mercury implementation does not garbage-collect the trail; these two cases are mentioned only so that we can cater for possible future extensions.
MR_trail_function()
typedef enum { MR_undo, MR_exception, MR_commit, MR_solve, MR_gc } MR_untrail_reason; void MR_trail_function( void (*untrail_func)(Word, MR_untrail_reason), void *value );A call to `MR_trail_function(untrail_func, value)' adds an entry to the function trail. The Mercury implementation ensures that if future execution ever backtracks to current choicepoint, then
(*untrail_func)(value, MR_undo)
will be called.
It also ensures that if the current choice point is pruned because
execution commits to never backtracking to it,
then (*untrail_func)(value, MR_commit)
will be called.
It also ensures that if execution requires that the current goal be
solvable, then (*untrail_func)(value, MR_solve)
will be called. This happens in calls to solutions/2
, for example.
(MR_commit
is used for "hard" commits, i.e. when we commit
to a solution and prune away the alternative solutions; MR_solve
is used for "soft" commits, i.e. when we must commit to a solution
but do not prune away all the alternatives.)
MR_exception and MR_gc are currently not used ---
they are reserved for future use.
Typically if the untrail_func is called with reason being `MR_undo' or `MR_exception', then it should undo the effects of the update(s) specified by value, and the free any resources associated with that trail entry. If it is called with reason being `MR_commit' or `MR_solve', then it not undo the update(s); instead, it may check for floundering (see the next section). In the `MR_commit' case it may, in some cases, be possible to also free resources associated with the trail entry. If it is called with anything else (such as `MR_gc'), then it should probably abort execution with an error message.
Another use for the function trail is check for floundering in the presence of delayed goals.
Often, when implementating certain kinds of constraint solvers, it may not be possible to actually solve all of the constraints at the time they are added. Instead, it may be necessary to simply delay their execution until a later time, in the hope the constraints may become solvable when more information is available. If you do implement a constraint solver with these properties, then at certain points in the computation -- for example, after executing a negated goal -- it is important for the system to check that their are no outstanding delayed goals which might cause failure, before execution commits to this execution path. If there are any such delayed goals, the computation is said to "flounder". If the check for floundering was omitted, then it could lead to unsound behavior, such as a negation failing even though logically speaking it ought to have succeeded.
The check for floundering can be implemented using the function trail, by simply calling `MR_trail_function()' to add a function trail entry whenever you create a delayed goal, and putting the appropriate check for floundering in the `MR_commit' and `MR_solve' cases of your function. (For examples of this, see the `ML_cfloat_untrail_func()' function in the file `extras/clpr/cfloat.m' and the `ML_var_untrail_func' function in the file `extras/trailed_update/var.m' in the Mercury distribution.) If your function does detect floundering, then it should print an error message and then abort execution.
If a mutable data structure is updated multiple times, and each update is recorded on the trail using the functions described above, then some of this trailing may be redundant. It is generally not necessary to record enough information to recover the original state of the data structure for every update on the trail; instead, it is enough to record the original state of each updated data structure just once for each choice point occurring after the data structure is allocated, rather than once for each update.
The functions described below provide a means to avoid redundant trailing.
MR_ChoicepointId
MR_current_choicepoint_id()
MR_null_choicepoint_id()
typedef ... MR_ChoicepointId; MR_ChoicepointId MR_current_choicepoint_id(void); MR_ChoicepointId MR_null_choicepoint_id(void);The type
MR_ChoicepointId
is an abstract type used
to hold the identity of a choice point. Values of this
type can be compared using C's `==' operator.
MR_current_choicepoint_id()
returns a value indicating
the identity of the most recent choice point; that is, the
point to which execution would backtrack if the current computation
failed.
MR_null_choicepoint_id()
returns a "null" value that is
distinct from any value ever returned by MR_current_choicepoint_id
.
(Note that MR_null_choicepoint_id()
is a macro that is guaranteed to be suitable for use as a
static initializer, so that it can for example be used to
provide the initial value of a C global variable.)
The way these functions are generally used is as follows.
When you create a mutable data structure, you should call
MR_current_choicepoint_id()
and save the value it returns
as a `prev_choicepoint' field in your data structure.
(If your mutable data structure
is a C global variable, then you can use MR_null_choicepoint_id()
for the initial value of this `prev_choicepoint' field.)
When you are about to modify your mutable data structure,
you can then call MR_current_choicepoint_id()
again and
compare the result from that call with the value saved in
the `prev_choicepoint' field in the data structure.
If they are different, then you must trail the update,
and update the prev_choicepoint field with the new value;
furthermore, you must also take care that on backtracking the
previous value of the `prev_choicepoint' field in your data
structure is restored to its previous value, by trailing that update too.
But if MR_current_choice_id()
and the prev_choicepoint
field
are equal, then you can safely perform the update to your data
structure without trailing it.
Note that there is a cost to this -- you have to include an extra field in your data structure for each part of the data structure which you might update, you need to perform a test for each update to decide whether or not to trail it, and if you do need to trail the update, then you have an extra field that you need to trail. Whether or not the benefits from avoiding redundant trailing outweigh these costs will depend on your application.
A declaration of the form
:- pragma inline(Name/Arity).
is a hint to the compiler that all calls to the predicate(s) or function(s) with name Name and arity Arity should be inlined.
The current Mercury implementation is smart enough to inline simple predicates even without this hint.
A declaration of the form
:- pragma no_inline(Name/Arity).
ensures the compiler will not inline this predicate. This may be used simply for performance concerns (inlining can cause unwanted code bloat in some cases) or to prevent possibly dangerous inlining when using low-level C code.
A declaration of the form
:- pragma obsolete(Name/Arity).
declares that the predicate(s) or function(s) with name Name and arity Arity are "obsolete": it instructs the compiler to issue a warning whenever the named predicate(s) or function(s) are used.
`pragma obsolete' declarations are intended for use by library developers, to allow gradual (rather than abrupt) evolution of library interfaces. If a library developer changes the interface of a library predicate, they should leave the old version of that predicate in the library, but mark it as obsolete using a `pragma obsolete' declaration, and document how library users should modify their code to suit the new interface. The users of the library will then get a warning if they use obsolete features, and can consult the library documentation to determine how to fix their code. Eventually, when the library developer deems that users have had sufficient warning, they can remove the old version entirely.
The `source_file' pragma and `#line' directives provide support for preprocessors and other tools that generate Mercury code. The tool can insert these directives into the generated Mercury code to allow the Mercury compiler to report diagnostics (error and warning messages) at the original source code location, rather than at the location in the automatically generated Mercury code.
A `source_file' pragma is a declaration of the form
:- pragma source_file(Name).
where Name is a string that specifies the name of the source file.
For example, if a preprocessor generated a file `foo.m' based on a input file `foo.m.in', and it copied lines 20, 30, and 31 from `foo.m.in', the following directives would ensure that any error or warnings for those lines copied from `foo.m' were reported at their original source locations in `foo.m.in'.
:- module foo. :- pragma source_file("foo.m.in"). #20 % this line comes from line 20 of foo.m #30 % this line comes from line 30 of foo.m % this line comes from line 31 of foo.m :- pragma source_file("foo.m"). #10 % this automatically generated line is line 10 of foo.m
Note that if a generated file contains some text which is copied from a source file, and some which is automatically generated, it is a good idea to use `pragma source_file' and `#line' directives to reset the source file name and line number to point back to the generated file for the automatically generated text, as in the above example.
The `pragma' declarations described above are a standard part of the Mercury language. However, as an extension, implementations may also choose to support additional pragmas with implementation-dependent semantics.
The University of Melbourne Mercury implementation supports the following additional pragmas:
Large tables of facts can be compiled using a different algorithm that is more efficient and produces more efficient code.
A declaration of the form
:- pragma fact_table(Name/Arity, FileName).
tells the compiler that the predicate or function with name Name and arity Arity is defined by a set of facts in an external file FileName. Defining large tables of facts in this way allows the compiler to use a more efficient algorithm for compiling them. This algorithm uses less memory than would normally be required to compile the facts so much larger tables are possible.
Each mode is indexed on all its input arguments so the compiler can produce very efficient code using this technique.
In the current implementation, the table of facts is compiled into a separate C file named `FileName.c'. The compiler will automatically generate the correct depencencies for this file when the command `mmake main_module.depend' is invoked. This ensures that the C file will be compiled to `FileName.o' and then linked with the other object files when `mmake main_module' is invoked.
The main limitation of the `fact_table' pragma is that predicates or functions defined as fact tables can only have arguments of types `string', `int' or `float'.
The compiler includes a termination analyser which can be used to prove termination of predicates and functions. Details of the analysis is available in "Termination Analysis for Mercury" by Chris Speirs, Zoltan Somogyi and Harald Sondergaard. In P. Van Hentenryck, editor, "Static Analysis: Proceedings of the 4th International Symposium", Lecture Notes in Computer Science. Springer, 1997. A longer version is available for download from http://www.cs.mu.oz.au/publications/tr_db/mu_97_09.ps.gz
The analysis is based around an algorithm proposed by Gerhard Groger and Lutz Plumer in their paper "Handling of mutual recursion in automatic termination proofs for logic programs." in K. Apt, editor, The Proceedings of the Joint International Conference and Symposium on Logic Programming, pager 336-350. MIT Press, 1992.
For an introduction to termination analysis for logic programs, please refer to "Termination Analysis for Logic Programs" by Chris Speirs. Technical Report 97/23, Department of Computer Science, The University of Melbourne, Melbourne, Australia, 1997. Available from http://www.cs.mu.oz.au/mercury/papers/mu_97_23.ps.gz
The analyser is enabled by the option `--enable-termination', which can be abbreviated to `--enable-term'. When termination analysis is enabled, any predicates or functions with a `check_termination' pragma defined on them will have their termination checked, and if termination cannot be proved, the compiler will emit an error message detailing the reason that termination could not be proved.
The option `--check-termination' option, which may be abbreviated to `--check-term' or `--chk-term', forces the compiler to check the termination of all predicates in the module. It is common for the compiler to be unable to prove termination of some predicates and functions because they call other predicates which could not be proved to terminate or because they use language features (such as higher order calls) which cannot be usefully analysed. In this case, the compiler only emits a warning for these predicates and functions if the `--verbose-check-termination' option is enabled. For every predicate or function that the compiler cannot prove the termination of, a warning message is emitted, but compilation continues. The `--check-termination' option implies the `--enable-termination' option.
The accuracy of the termination analysis is substantially degraded if intermodule optimization is not enabled. Unless intermodule optimization is enabled, the compiler must assume that any imported predicate may not terminate.
Currently the compiler assumes that all procedures defined using the C interface (`pragma c_code') terminate for all input. If this is not the case, a `pragma does_not_terminate' declaration should be used to inform the compiler that this C code may not terminate.
The following declarations can be used to inform the compiler of the termination properties of a predicate or function, or to force the compiler to prove termination of a given predicate or function.
:- pragma terminates(Name/Arity).This declaration may be used to inform the compiler that this predicate or function is guaranteed to terminate for any input. This is useful when the compiler cannot prove termination of some predicates or functions which are in turn preventing the compiler from proving termination of other predicates or functions.
:- pragma does_not_terminate(Name/Arity).This declaration may be used to inform the compiler that this predicate does not necessarily terminate. This is useful for procedures defined using the C interface, which the compiler assumes to terminate by default.
:- pragma check_termination(Name/Arity).This pragma forces the compiler to prove termination of this predicate. If it cannot prove the termination of the specified predicate or function then the compiler will quit with an error message.