                 INTERNAL ENCODING OF DATA ITEMS
                 ===============================

There are various categories of items that need to be represented on
the heap and (possibly) on the stack and in machine registers:

   a. terms

   b. types

   c. disagreement pairs

The data items in each of the data category can be classified into 
sub-categories, and their chosen representation depends on the information 
that needs to be retained and on an overall convenience in processing. 

For the seek of system's potability, C structures are adopted for encoding
data items. The first two bytes of the leading word of each sort of term
are used as a tag field containing the category information together with 
a mark field to be used by a garbage collector. 

typedef struct 
{
    Byte   categoryTag;
    Byte   mark;                    //to be used in garbage collection
} DF_Tag;

Further, the garbage collector should have the ability to recognize each 
category of terms without auxiliary information, and therefore a tag is
provided for each (sub)category of data item, and all those tags are 
collected at the top-level as the following:

/* The tags of heap items */
typedef enum
{
    //type categories
    DF_TY_TAG_SORT = 0,       //sort
    DF_TY_TAG_REF,            //reference
    DF_TY_TAG_SKVAR,          //skeleton variable
    DF_TY_TAG_ARROW,          //type arrow
    DF_TY_TAG_STR,            //type structure
    DF_TY_TAG_FUNC,           //functor of type structure

    //term categories
    DF_TM_TAG_VAR  = 6,       // existential variables
    DF_TM_TAG_CONST,          // constants 
    DF_TM_TAG_INT,            // integers
    DF_TM_TAG_FLOAT,          // floats
    DF_TM_TAG_NIL,            // empty lists
    DF_TM_TAG_STR,            // strings
    DF_TM_TAG_STREAM,         // streams
    DF_TM_TAG_BVAR,           // lambda bound variables (de Bruijn index)
                              // -- atoms above
    DF_TM_TAG_REF,            // references
                              // -- complex terms below
    DF_TM_TAG_CONS,           // list constructors
    DF_TM_TAG_LAM,            // abstractions
    DF_TM_TAG_APP,            // applications
    DF_TM_TAG_SUSP,           // suspensions

    //suspension environment items 
    DF_ENV_TAG_DUMMY = 19,    //dummy environment
    DF_ENV_TAG_PAIR,          //pair environment 
    
    //disagreement pair
    DF_DISPAIR = 21           
} DF_HeapDataCategory;

The formats of the different data items are described below by categories.


                      I. REPRESENTATION OF TERMS
                      ==========================

There are actually further sub-category of objects to be represented
here: 

  a. terms 

  b. environment terms

  c. environment lists

Each of these sub-categories is dealt with separately below. 


Representation of terms
=======================

The particular kinds of term items whose representation is considered
in this version are the following:

  a. unbound variables

  b. constants of two kinds --- with and without type associations

  c. bound variables (de Bruijn indices) 

  d. references 

  e. abstractions 

  f. applications 

  g. suspensions 

  h. special categories of constants: integers, floats, nil, cons,
     strings and streams.

A variable number of words is required for different items.
We assume a generic term of atomic size (2 words regardless machine 
architecture), and any item with smaller actual size is enforced to take 
the atomic size. Items that cannot fit in the atomic size are assumed to 
have sizes as integral numbers of words. 

Term category tags
------------------
The category tags belonging to terms are the following:

    DF_TM_TAG_VAR,            // existential variables
    DF_TM_TAG_CONST,          // constants 
    DF_TM_TAG_INT,            // integers
    DF_TM_TAG_FLOAT,          // floats
    DF_TM_TAG_NIL,            // empty lists
    DF_TM_TAG_STR,            // strings
    DF_TM_TAG_STREAM,         // streams
    DF_TM_TAG_BVAR,           // lambda bound variables (de Bruijn index)
                              // -- atoms above
    DF_TM_TAG_REF,            // references
                              // -- complex terms below
    DF_TM_TAG_CONS,           // list constructors
    DF_TM_TAG_LAM,            // abstractions
    DF_TM_TAG_APP,            // applications
    DF_TM_TAG_SUSP,           // suspensions

Representation of generic term
-------------------------------

//a generic term (head) for every category
typedef struct               
{
    DF_Tag       tag;        /* the common field for every term (head); can 
                                be any one of enum TermCategory.
                                rely on struct alignment */ 
    void         *dummy;     /* a place holder which enforces the size of the 
                                generic term to be 2 words. */
} DF_Term;

typedef DF_Term *DF_TermPtr;  //term pointer


Common data fields
------------------
There are some data fields that are common to some of the term representations,
and they are assumed to two space of two bytes each:

typedef TwoBytes  DF_UnivInd;   //universe count
typedef TwoBytes  DF_CstTabInd; //constant symbol table index
typedef TwoBytes  DF_Arity;     //application arity
typedef TwoBytes  DF_DBInd;     //de Bruijn ind, embed level and num of lams
typedef TwoBytes  DF_StreamTabInd; 


Unbound variables
-----------------
typedef struct                  //logic variables          
{
    DF_Tag         tag;
    DF_UnivInd     univCount;
} DF_VarTerm;

We assume the field univCount contains the universe count of the unbound
variable.

De Bruijn indices
-----------------
typedef struct                  //de Bruijn indices
{
    DF_Tag         tag;
    DF_DBInd       index;
} DF_BVTerm;

We assume the field index contains the value of the de Bruijn index.

Constants    (2 kinds)
---------

1. Constants are partitioned into two sets: one with type associations and the 
   other without. Both sorts have their category tag being DF_TM_TAG_CONST.
   An additional byte is used to indicate whether type is associated with value
   0 or 1.
2. References to constant symbol table are encoded as table index. An 
   alternative method is to use the absolute addresses of corresponding 
   symbol table entries. The reason for using the index approach is discussed 
   in detail in dataformat_comments.txt.
3. Universe count and symbol table index are grouped together, and an alias 
   of integer type is given to this combination. This is used to reduce the cost
   of constant comparison, and details can be found in dataformat_comments.txt.

typedef struct {                //name and universe count field for constants
    DF_UnivInd     univCount;        
    DF_CstTabInd   symTabIndex; 
} DF_NameAndUC;

We assume that the field univCount contains the universe count and 
the field symTabIndex contains the index of this constant to the constant 
symbol table.

typedef struct {                //constant without type association
    DF_Tag         tag; 
    Boolean        withType;        
    union {
        unsigned int       value;
        DF_NameAndUC       nameAndUC;
    } data;
} DF_ConstTerm;

typedef struct {                //constant with type association
    DF_Tag         tag; 
    Boolean        withType;        
    union {
        unsigned int    value;
        DF_NameAndUC    nameAndUC;
    } data;
    DF_TypePtr     typeEnv;
} DF_TConstTerm;

We assume that the field withType contains 0 or 1 to indicate whether types
are associated with the constant, and the field typeEnv contains a pointer
to its type environment when types are associated.

Integers
--------
typedef struct                  //integers
{
    DF_Tag        tag;     
    long int      value;
} DF_IntTerm;

Floats
------
typedef struct                  //floats
{
    DF_Tag        tag;     
    float         value;
} DF_FloatTerm;

String
------
typedef struct                  //string
{
    DF_Tag        tag;
    char          *charList;
} DF_StrTerm;

Note that we assume the character list is the C encoding of the string.


Stream
------
typedef struct                  //stream
{
    DF_Tag        tag;     
    DF_StreamTabInd index;
} DF_StreamTerm;

We assume the field index contains an index to the stream symbol table.

Nil
---
typedef struct                  //empty list
{
    DF_Tag        tag;     
} DF_NilTerm;

Reference
---------
typedef struct                  //reference
{
    DF_Tag        tag;     
    DF_TermPtr    target;
} DF_RefTerm;

Cons
----
typedef struct                  //list cons
{
    DF_Tag        tag;     
    DF_TermPtr    args;
} DF_ConsTerm;

We assume that the field args contains a pointer to the argument vector with
two entries.

Abstractions
------------
typedef struct                  //abstractions
{
    DF_Tag        tag;     
    DF_DBInd      numOfLams;      
    DF_TermPtr    body;
} DF_LamTerm;

We assume that the field embedLevel contains the number of lambdas in the front
of an abstraction, and the field body is a pointer to the abstraction body.

An alternative to encode the function component and also the body component
in an abstractions is to use an atomic term instead of a term pointer. 
The pros and cons are discussed in dataformat_comments.txt.

Applications
------------
typedef struct                  //applications
{
    DF_Tag        tag;     
    DF_Arity      arity;
    DF_TermPtr    functor;
    DF_TermPtr    args;
} DF_AppTerm;

We assume that the field arity contains the number of arguments of this 
application, the field functor is a pointer to the function component and 
the field args is a pointer to the argument vector (the number of elements is 
that in the arity field).

Suspensions
-----------
typedef struct                  //suspensions
{
    DF_Tag         tag;   
    DF_DBInd       ol;
    DF_DBInd       nl;
    DF_TermPtr     termSkel;
    DF_EnvPtr      envList;
} DF_SuspTerm;

We assume that the fields ol, nl contain the old and new embedding levels,
the field termSkel contains a pointer to the term skeleton and envList
contains a pointer to the environment list of the suspension. 


Representation of environment terms
===================================

There are two kinds of environment terms: dummy terms that essentially
record an embedding level and a pair item that records a term and
an embedding level.  Their corresponding heap data tags are the following:
    DF_ENV_TAG_DUMMY,         //dummy environment
    DF_ENV_TAG_PAIR,          //pair environment 

At a level of detail, these two sorts of environment items take the forms of:

Dummy environment item
----------------------
typedef struct                  //dummy environment item
{
    DF_Tag            tag;
    DF_DBInd          embedLevel;
    DF_EnvPtr         rest;
} DF_DummyEnv;

Pair environment item
---------------------
typedef struct                  //pair environment item
{
    DF_Tag           tag;
    DF_DBInd         embedLevel;
    DF_EnvPtr        rest;
    DF_TermPtr       term;
} DF_PairEnv;

The field embedLevel is assumed to contain the embedding levels, the field
rest is assumed to contain a pointer to the next environment item, and
the field term in a pair environment item is to be a pointer to the term
component.


                      II. REPRESENTATION OF TYPES
                      ===========================


Types are represented on the heap like first-order terms in the WAM. Type
arrows are treated as a special binary operator like list cons in the WAM.
An alternative encoding of type arrows (Boehm tree representation) is 
discussed in dataformat_comments.txt. 

In addition to creating types on the heap, we have to deal with type
skeletons. A similar representation is used for these except that
variables in skeletons are natural numbers.

The detailed representations are presented below.

Representation of types
=======================

The particular kinds of type items whose representation is considered
in this version are the following:

  a. sorts (constants)

  b. type variables or references

  c. type variables in type skeletons

  d. type structures (constructed types) and type constructors

  e. type arrows

An atomic head of 2 words (regardless machine architecture) is used for
each category of (the heads of) type, and we assume a generic type of this size.

Type category tags
------------------
The category tags belonging to types are the following:
    DF_TY_TAG_SORT,           //sort
    DF_TY_TAG_REF,            //reference
    DF_TY_TAG_SKVAR,          //skeleton variable
    DF_TY_TAG_ARROW,          //type arrow
    DF_TY_TAG_STR,            //type structure
    DF_TY_TAG_FUNC,           //functor of type structure

Representation of generic type
-------------------------------

//generic type (head) for every category
typedef struct               
{
    DF_Tag          tag;     /* the common field for every type (head); can 
                                be any one of enum TypeCategory.
                                rely on struct alignment */ 
    void            *dummy;  /* a place holder which enforces the size of the 
                                generic term to be 2 words. */
} DF_Type;

typedef DF_Type *DF_TypePtr;      //type pointer

Common data fields
------------------
There are some data fields that are common to some of the type representations,
and they are assumed to two space of two bytes each:

typedef TwoBytes      DF_KstTabInd;    //kind symbol table index
typedef TwoBytes      DF_StrTypeArity; //arity of type structure
typedef TwoBytes      DF_SkelInd;      //offset of variables in type skeletons

Sorts (constants)
-----------------
typedef struct                         //type sort
{
    DF_Tag            tag;
    DF_KstTabInd      kindTabIndex;
} DF_SortType;

We assume that the field kindTabIndex contains the index of this sort to the 
kind symbol table. 

Type variables or references
----------------------------
Only one sort of cell is used for both as in WAM, with the unbound variables
being a reference to itself.

typedef struct                         //type reference
{
    DF_Tag            tag;      
    DF_TypePtr        target;
} DF_RefType;

Type variables in type skeletons
--------------------------------

typedef struct                         //variables in type skeletons
{
    DF_Tag            tag;
    DF_SkelInd        offset;
} DF_SkVarType;

We assume that a (pseudo) offset to the type environment of the type skeleton
where this variable present is contained in the field offset. 

Type arrows
-----------
typedef struct                         //type arrows
{
    DF_Tag            tag;       
    DF_TypePtr        args;
} DF_ArrowType;

We assume that the field args contains a pointer to the two arguments in
contiguous locations.

Type structures (constructed types) and type constructors
---------------------------------------------------------

type constructors:

typedef struct                         //type functors
{
    DF_Tag            tag;
    DF_StrTypeArity   arity;
    DF_KstTabInd      kindTabIndex;
} DF_FuncType;

We assume that the field arity contains the arity of this type constructor
and the field kindTabIndex contains the index of this type constructor in
the kind symbol table.

typedef struct                         //type structures
{
    DF_Tag            tag;       
    DF_FuncType       *funcAndArgs;
} DF_StrType;


We assume the funcAndArgs field is a pointer to the functor followed by the 
arguments in contiguous locations.


               III. DISAGREEMENT PAIRS
               =======================

The disagreement set is represented as a linked list, and changes on it are
assumed to be performed non-destructively based on the assumption that 
those changes rarely occur when computation is organized around pattern 
unification. At a level of detail, the node (a disagreement pair) in this set
is of form:

typedef struct DF_disPair   //each node in the disagreement set
{
    DF_Tag               tag;
    struct DF_disPair    *next;
    DF_TermPtr           firstTerm;
    DF_TermPtr           secondTerm;
} DF_DisPair;

We assume that the heap category tag DF_DISPAIR will be assigned in its tag
field, the pair of terms are referred from the pointers firstTerm and 
secondTerm respectively, and the pointer next refers to the next node in the
set.


