Calculator Design Specification

John Kalucki, 2/22/2004

Overview

Calculator is a portable C++ component that executes small programs very efficiently. Programs consist of a list of instructions, input data, literals (constants), and scratch memory. Generally programs result in some output data. A simple virtual machine (VM) executes programs and terminates.

A typical Calculator object life-cycle & use-pattern:

Instantiate Calculator
Assemble Instructions: Create and Insert Instructions into Calculator
Allocate Required Memory
Bind Input, Output, and Literals to Calculator
Execute Program
Check Status Register
Utilize Output
As Needed: Bind New Input.
As Needed: Bind new Output
As Needed: Jump to #5
Destroy Calculator
De-allocate memory

Calculator will be installed as a component of Fennel. Coding standards, build environment, etc. are to follow the Fennel projectÂ’s example. Calculator will introduce no additional dependencies to Fennel.

The primary goals of this Calculator design are Execution Performance and, as a pair: Extensibility and Maintainability. Overall performance for the single execution case will be considered as well as case where the same program is executed repeatedly over a varying set of inputs and outputs.

Calculator can call external functions for extensibility and to efficiently handle complex operations. For example, string concatenation could involve a call to strncpy() instead of having the VM perform a painful byte-by-byte copy of the string. A strongly-typed version of external functions called extended instructions allows Calculator to be extended in a type-safe way.

Calculator will interact directly and indirectly with a number of other components. A compiler will create programs. Typically the compiler will instantiate instruction objects and insert them into a list. Once finished, the compiler will ask the instruction for a serial representation and create a text version of the program. The compiler may, at its option, also create the program text directly, but will give up the implicit type checking and other features.

An Execution Object (XO) will call the assembler with a text program. The assembler will allocate some of the register sets, and re-create the instruction object list, inserting the instructions into a Calculator object. Later, the XO can bind data, and execute Calculator.

Memory Architecture

Register Sets

Memory is modeled as a number of register sets. Each register set is stored in an array. Individual registers are accessed by an 0-based index into the set. Register sets are currently a simple abstraction from Fennel Tuples.

Calculator will perform all memory allocation during instantiation and configuration and no memory allocation during typical execution paths. Warning Messages and Error Messages may be allocated at runtime, however. See Warnings & Errors below. The caller will be responsible for allocating and de-allocating register sets.

Note: External functions and extended instructions may perform resource allocation as a side-effect. Calculator makes no attempt to track resource allocation. The program writer must explicitly call appropriate external functions to de-allocate such memory.

There are five register sets defined:

1. Input:

a. Input of program data. (i.e. function arguments)

b. Generally contains tuple data handed to XO.

c. Intended to be the main variant when Calculator programs are executed repeatedly.

2. Output:

a. Output of program data. (i.e. function return values)

b. Generally contains the tuple to be passed out of the XO.

c. Storage of intermediate results between program executions. (e.g. Aggregation Functions)

d. A compiler might also choose to store metadata such as exceptions, errors, warnings, overflow indicator bits, etc. in the output register set, but this usage is discouraged. Storing such data here requires the XO to filter the tuple before passing it on to the next XO.

3. Local: (Â“IntermediatesÂ”)

a. General purpose intermediate storage.

b. Intermediate storage for integral types. (i.e. main memory)

c. Pre-allocated Â“heap spaceÂ” for scratch storage of strings and other Large Objects.

4. Literals: (Â“ConstantsÂ”)

a. When Calculator programs are executed repeatedly, static literals need not be mixed with changing input tuples.

b. Literal register set is allocated and set by the assembler, not by the XO.

c. For a very small increase in Calculator complexity, there is potentially a win in reduction of binding overhead.

d. Explicitly supporting literals in a register set obviates the need for a literal addressing mode and its associated complexity (e.g. ADD Local(4), $0x200 is not needed. $0x200 is loaded from the Literals register instead: ADD Local(4), Literal(3))

5. Status:

a. Provide a generic and flexible location for programs to communicate with the XO.

b. Prevents status information from mixing with the output register set, obviating the need for filtering the output tuple in the XO.

c. Allows for rich interaction between programs and XO without also requiring modification to Calculator.

Register sets will not be cleared or otherwise manipulated by Calculator object. Only two types of objects within the Calculator component will manipulate register sets: the assembler and subclasses of the Instruction class. By not clearing register sets, the output register set may also be used as input as well as output for instructions.

Addressing Modes

Genuflecting to John Hennessy and Dave Patterson, CalculatorÂ’s addressing modes will be orthogonal:

1. All operands of all instructions that reference memory can use all addressing modes

2. All register sets are readable and writeable by all instructions. However, sensible programming conventions would discourage inappropriate use of sets. (e.g. writing to the input and literal register sets)

3. All memory locations of a given type (and/or length, see Storage, below) can be used interchangeably.

4. No Â“specialÂ” registers.

5. All memory accesses will be typed. (Instructions will cast all accesses as the desired type.)

Efficiency is paramount! Operands in programs will be indexes into a given register set. For example: Output[4] or Input[0]. The RegisterReferenceclass will provide an abstraction to allow various implementations of, experiments with, and debug-level instrumentation of, all memory access.

A consideration: RegisterReference may cache the abstract register set reference Â“Local [4]Â” as a direct pointer to memory by setting the CachePointer property. Tuning may indicate that this is valuable when applied to register sets that do not change, such as the Literal set and the Local set, but reference invalidation costs for sets that typically re-bound before each execution may not be worth the cost. The implication is that local memory access may be somewhat faster than Input or Output access. TBD.

A consideration: Add an additional indexing addressing mode that allows easier manipulation of one-dimensional and multi-dimensional arrays, jump tables for CASE statements, and perhaps the register sets themselves. For example, the 80386 architecture supports a Scale/Index/Base addressing mode. Addresses are calculated as (Scale * Index) + Base. TBD

Nested rows issues? TBD

Multiple rows per execution? (e.g. batch) TBD.

Storage

Portability Invariant: A given program and input should produce (nearly) identical output on each and every supported hardware type. Calculator must be independent of underlying hardware Â– with sole and notable exception of rounding errors in floating point numbers.

Native type support, storage, typedefs, etc. are provided by Fennel Tuples. Fennel supports types beyond the native C++ types, such as VARCHAR, VARBINARY, and so forth. Therefore Calculator types are defined by the Fennel enum StandardTypeDescriptorOrdinal.

Types

The Calculator is strongly typed:

Default execution mode will assume that a program is type correct.
Behavior may be undefined if type violations occur.
Calculator is strongly typed within instructions but not between instructions and underlying tuples. Runtime type checking is not supported!
Explicit type conversions are required for type promotion, demotion and transformation.
With the exception of extended instructions and external functions, the return type of all other instructions is implicitly determined by the associated input types. Note: The serialized version of programs may redundantly require output type to be declared to enforce a regular representation and make parsing easier for the assembler.
Instructions that take two operands typically require that both operands be of the same type, unless noted otherwise.
Trace and debug modes may or may/not enforce typing. TBD.

The implication is that the compiler must be aware of types, and must always Do The Right Thing.

Type Mappings

Mappings between SQL types and C++ native types is provided by the Fennel Tuple code. (See documentation for SQL to Fennel type mapping.) Some types, such as SQLÂ’s DECIMAL(p,s) and TIMESTAMP(p) types are not directly supported by Calculator. The compiler should handle this type by generating appropriate programs.

Tuple represent NULL values with a NULL pointer.

Non-native types supported by Fennel Tuples, such as VARCHAR, are simply represented by a pointer in the Calculator, and perhaps also an integer containing the length of the array.

When represented as strings, for example in serialized programs, the mapping from StandardTypeDescriptorOrdinal to strings is defined by standardTypetoString() located in fennel/calc/StandardTypeToString.cpp.

Fennel/Tuple/ StandardTypeDescriptorOrdinal	String Representation
`STANDARD_TYPE_INT_8`	`s1`
`STANDARD_TYPE_UINT_8`	`u1`
`STANDARD_TYPE_INT_16`	`s2`
`STANDARD_TYPE_UINT_16`	`u2`
`STANDARD_TYPE_INT_32`	`s4`
`STANDARD_TYPE_UINT_32`	`u4`
`STANDARD_TYPE_INT_64`	`s8`
`STANDARD_TYPE_UINT_64`	`u8`
`STANDARD_TYPE_REAL`	`r`
`STANDARD_TYPE_DOUBLE`	`d`
`STANDARD_TYPE_CHAR`	`c`
`STANDARD_TYPE_VARCHAR`	`vc`
`STANDARD_TYPE_BINARY`	`b`
`STANDARD_TYPE_VARBINARY`	`vb`

Internationalization

Internationalization issues are expected to be transparent to Calculator. Byte arrays are represented by pointers.

Isolation

ItÂ’s a Mad Mad Mad Mad World. Under normal running conditions, Calculator will make no attempt to isolate the C++ process from potentially ill side effects of Calculator program execution. This responsibility is completely up to the program compiler and the author of extended instructions.

Risks not handled, include, but are not limited to:

C++ memory corruption - Anywhere the address space, not just in register sets!
C++ stack corruption Â– Another example of memory corruption.
C++ program counter corruption Â– e.g. jump to hyperspace, segmentation fault, core dump.
VM program counter corruption Â– e.g. jump to hyperspace, segmentation fault, core dump.
VM program non-termination. Â– i.e. Calculator.exec() does not return.
Resource leakage from external function calls Â– Typically memory leakage, but potentially file descriptors or any other resource that an external function call could abandon.
Unaligned memory access Â– e.g. bus error, core dump.
Security, Access control violations, etc.

Some of these risks could be mitigated by only allowing a tightly controlled list of external functions to be called.

Risks handled:

Division by Zero Â– Generates a warning.
Modulus by Zero Â– Generates a warning.
Arithmetic Overflow: TBD
Arithmetic Underflow: TBD

Each instruction that may bump into these risks will have to handle them in a well defined manner and provide the appropriate error, warning or exception.

Calculator Programs

Representation

Calculator programs can be created and represented in several ways:

1) In-process C++ can access the instruction set OO hierarchy directly, cons up instruction objects, order them into programs, and have Calculator execute them. This is the typical Fennel access mode.

2) Out-of-process C++ can link in Calculator code to access the instruction set OO hierarchy directly. Each instruction can convert itself into a serialized (string) representation. The serial representation can be transmitted to another C++ process for assembly and execution. See Assembler section for details on the format of these strings.

3) In or Out of process Java creates, via some arbitrary method, a string representation of a program. The program can be transmitted to a C++ process.

4) Out-of-process C++ can also use method #3.

Flow Control

All jumps are to absolute instruction Â“addressesÂ”. The effective address of an instruction is determined solely by the order that it is inserted into Calculator. The first instruction inserted is at address 0 Â– that is to say, instruction addresses are 0-indexed

Jump labels, instead of absolute jump addresses would make programs easier to read and debug, but would add complexity to Calculator and perhaps to the compiler. For now, labels are not supported. A compiler and program writer might consider implementing labels as a convenience in creating programs.

A consideration: May want to provide some support for efficient processing of SQL CASE statements, such as the SIB addressing mode mentioned above.

Exceptions

Yet to be fully specified. TBD

Will use the RAISE instruction.

May want to support catch, throw, try, finally.

When an exception is thrown in a program, Calculator will eventually throw a C++ exception that the caller to Calculator.exec() is encouraged to catch and decode.

Format and values of object thrown by exception TBD.

Extended Instructions

Implemented, but not currently documented. TBD

External Functions

Yet to be specified. TBD

Will have some form of static lookup table of supported functions.

Assembler will have to convert a string function name into a function pointer.

Will have to support a pointer type.

John Sichi has recommended looking at Functors. Look at Boost Functors. May be useful here.

Provide/find optimized str(3) functions that handle SQL semantics correctly. (e.g. ignoring trailing spaces, varbinary blobs of unequal length)

Consider resource allocation issues.

Consider how to represent Null stings vs. empty strings, etc.

NULL

Fennel Tuples represent NULL values as a NULL pointer. RegisterReference handles this convention.

All instructions will be NULL-aware and will implement appropriate SQL NULL semantics.

If a Local register may be set to NULL, and if Calculator will be called more than once, the Local register set must have the PointerReset property set. This property causes Calculator to reset the tuple pointer back to its original location before the next call to exec().

Writing programs to test every location for IsNotNULL before each write could be a real bear. Program writers are encouraged to make liberal use of Local registers to avoid this problem. ItÂ’s possible that this representation of NULL is simply not strong enough, and Tuples will have to be changed to reflect this problem. TBD

Effect on strings and external functions? TBD

Warnings and Errors

The compiler must take warning recovery and error handling into account. The compiler can decide if it wishes to perform exception avoidance by testing for common errors within the program, or let Calculator execute and handle the issue.

Warning and error codes codes are taken from Section 22.1 SQLSTATE of the SQL99 specification. Arbitrarily, CalculatorÂ’s implementation-defined errors are in the Class 5Z. This choice should be rationalized. TBD.

Warnings are the result of correctable problems. Program execution always continues after warnings and side-effects are well defined. Warning messages are coded and placed onto a deque (double-ended queue) for later access. A deque is used to allow interpretation of warnings in the order that they occurred (iterate from deque.begin()) or to allow interpretation of the last warning first (iterate from deque.end())

Warnings:

Warning	Code Class/Subclass	Side-effect(s)	Comments
`Division By Zero`	`22012`	Result contains a NULL.	SQL99 spec appears to state that DIV/0 is an error. A side-effect is not specified. Consider making Division By Zero an Error. TBD
`Overflow`	`22003`	Round or Truncate? TBD	SQL99 says implementation may round or truncate
`Underflow`	`22003`	Round or Truncate? TBD	Does this even exist?

Errors are problems that are not correctable, or problems with side-effects that are not well defined. Program execution always terminates immediately after errors. Error status is indicated in the last element of the warning deque. An exception is thrown after the error message is added to the warning queue. Format of this exception TBD.

Errors:

Error	Code	Side-effect(s)	Comments
`Segmentation Fault`	`5Z001`		PC falls off end of program and/or jumps to a non-existent location.

Instruction Set

Base Class Name	Object Name	OpCode Mnemonic & String Representation	Operands	Comment	Type Support
Base Class Name	Object Name	OpCode Mnemonic & String Representation	Operands	Comment	`bool`	`long` `ulong`	`float` `long double`	`T *`
`·BooleanInstruction`	`BoolAnd`	`AND`	`result, operand, operand`		`Y`
	`BoolOr`	`OR`	`result, operand, operand`
	`BoolEqual`	`EQ`		Equivalent to SQL boolean `IS`
	`BoolNotEqual`	`NE`		Equivalent to SQL boolean `IS NOT`
	`BoolGreater`	`GT`		`TRUE` > `FALSE`. No need for GE, LE
	`BoolLess`	`LT`		`TRUE` > `FALSE`. No need for GE, LE
	`BoolNot`	`NOT`	`result, operand`
	`BoolMove`	`MOVE`
	`BoolIsNull`	`ISNULL`
	`BoolIsNotNull`	`ISNOTNULL`
	`BoolToNull`	`TONULL`	`result`
`·Â·NativeNative-Instruction`	`NativeAdd`	`ADD`	`result, operand, operand`			`Y`
	`NativeSub`	`SUB`
	`NativeMul`	`MUL`
	`NativeDiv`	`DIV`
	`NativeNeg`	`NEG`	`result, operand`
	`NativeMove`	`MOVE`
	`NativeToNull`	`TONULL`	`result`
`·Â·IntegralNativeInstruction`	`IntegralNativeMod`	`MOD`	`result, operand1, operand2`	modulus operator, op1 is dividend, op2 is divisor
	`IntegralNativeShiftLeft`	`SHFL`	`result, operand1, operand2`	shift operand1 left or right by operand2 number of bits. Shifts in zeros. operator2 must be positive long or ulong
	`IntegralNativeShiftRight`	`SHFR`	`result, operand1, operand2`
	`IntegralNativeAnd`	`AND`	`result, operand, operand`	Bitwise operations
	`IntegralNativeOr`	`OR`		Bitwise operations
	`IntegralNativeXor`	`XOR`		No SQL mapping. Dubious utility.
`·Â·BoolNative-` `Instruction`	`BoolNativeEqual`	`EQ`	`result, operand, operand`	returns boolean
	`BoolNativeNotEqual`	`NE`
	`BoolNativeGreater`	`GT`
	`BoolNativeGreaterEqual`	`GE`
	`BoolNativeLess`	`LT`
	`BoolNativeLessEqual`	`LE`
	`BoolNativeIsNull`	`ISNULL`	`result, operand`
	`BoolNativeIsNotNull`	`ISNOTNULL`	`result, operand`
`·Â·PointerPointer-Instruction`	`PointerMove`	`MOVE`	`result, operand1`	Result and Operand1 are pointers. Operand2 is Long				`Y`
	`PointerAdd`	`ADD`	`result, operand1, operand2`	Result and Operand1 are pointers. Operand2 is Long
	`PointerSub`	`SUB`	`result, operand1, operand2`
	`PointerToNull`	`TONULL`	`result`
`·Â·BoolPointer-Instruction`	`PointerEqual`	`EQ`	`result, operand, operand`	returns boolean
	`PointerNotEqual`	`NE`
	`PointerGreater`	`GT`
	`PointerGreaterEqual`	`GE`
	`PointerLess`	`LT`
	`PointerLessEqual`	`LE`
	`PointerIsNull`	`ISNULL`	`result,` `operand`
	`PointerIsNotNull`	`ISNOTNULL`	`result,` `operand`
`·JumpInstruction`	`Jump`	`JMP`	`location`	unconditional jump
	`JumpTrue`	`JMPT`	`location, operand`	conditional jump, operand is boolean	`Y`
	`JumpFalse`	`JMPF`
	`JumpNull`	`JMPN`
	`JumpNotNull`	`JMPNN`
`·Instruction`	`Raise`	`RAISE`	`literal long`	Exception. Operands TBD
`·Instruction`	`Return`	`RETURN`		Exit from Calculator. Operand TBD
`·ExtendedInstruction`	`ExtendedInstruction`	`EXT`	`signature`	Signature is a string: Â“Fname(type1,type2,type3)Â”	`?`	`Y`	`Y`	`Y`
`CalInstruction`	`Call`	`CALL`	`TBD`	TBD	`?`	`Y`	`Y`	`Y`

Assembler

The Assembler performs the following functions:

Creates Output, Literal, Local and Status Tuples.
Binds Literal, Local and Status Tuples to Calculator Register sets.
Initializes literal values in Literal Register Set.
Verifies the format of serialized programs as they are parsed, including type checking.
Converts (Â“assemblesÂ”) serialized programs (string representation of programs) into instruction objects.

Serialized Programs

Serialized programs support the following features:

A preamble that defines the size and shape of all register sets. This information is effectively a tuple definition for each register set.
A preamble that optionally contains literals to be initialized in the literal register set.
Support for all types of literals, including 8-bit arrays VARBINARY, Unicode strings, etc in a 6-bit clean manner.
A list of instructions, the actual program text, to be created and inserted into the Calculator

XML Format

For any number of good reasons, Calculator should support XML representations of programs. Unfortunately, an XML representation introduces the requirement for an XML parser into Fennel, which is not strictly necessary for Calculator to work. Therefore, to avoid adding another library dependency on Fennel, the text representation of programs will not be XML.

Boost::Spirit Parser

This parser was considered, but eventually disregarded. The template programming style that Spirit is written in is appealing, but contains a steep learning curve. Maintenance of this parser could be difficult. Also, compile-time errors are rather difficult to decode at times. The developer might be forced to only make small changes, and then compile and test. Otherwise it can be nearly impossible to tell where an error was introduced. Perhaps the Spirit parser can be re-investigated in the future.

Simple Serialized Format

All of the functionality required can be provided cheaply by using a very simple format to represent instructions. Expediency is the goal for now. The implementation details of this parser are, at the moment, outside the scope of this document. If the parser introduces an additional library dependency to Fennel, it must be possible to compile Fennel without the parser. For example, fennel/configure should take arguments, similar to Â–with-boostÂ–with-stlport, such as Â–with-ANTLR Â–with-XERCES, or the like, to enable serial program representation and the associated third-party parsing library.

BNF (EBNF?) of Simple Serialized Format

Note: < whitespace > is allowed between tokens. Tokenizer or skip-parser should consume < whitespace >

Note: Format is case-sensitive.

< whitespace > ::= Â“ Â“ | Â“\nÂ” | Â“\rÂ” | Â“\tÂ” < program > ::= <preamble> Â“TÂ” < program text > < preamble > ::= < registerset def > [ <literal set def > < literal values > ] < registerset def > ::= < set name > [ Â“,Â” ( < type size > )+ ] < terminator > < set name > ::= Â“OÂ” | # Output Â“IÂ” | # Input Â– Generally unused. Â“SÂ” | # Status Â“LÂ” | # Local < type size > ::= { < fixedlengthtype > } | { < variablelengthtype > Â“,Â” < bytes > } < fixedlengthtype > ::= < integers > | < reals > | < doubles > | < chars > | < binary > < variablelengthtype >:: = < varchar > | < varbinary > < literal set def > ::= Â“CÂ” # Literal set, think Constant. [ Â“,Â” ( < type size > )+ ] < terminator > < literal values > ::= Â“VÂ” # Values ( < literal def> ) * < terminator > < literal def > ::= < integerval > | < realval > | < bytearrayval > < integerval > ::= [ Â“-Â“ ] ( [0-9] ) + < realval > ::= [ Â“-Â“ ] ( [0-9] ) + Â“.Â” ( [0-9] ) + < bytearrayval> ::= # TBD See section Â“Binary EncodingÂ” < program text > ::= ( < instruction > ) + # a program has 1 or more instructions < instruction > ::= < opcode > Â“,Â” < operand list > < terminator > < operand list > ::= { < result > Â“,Â” < operand > [ Â“,Â” < operand > ] } | { < set size > } < terminator > :: = Â“;Â” < result > ::= < output address > | < local address > < operand > ::= < output address > | < input address > | < local address > | < literal address > | < status address > | < instruction address > /* < SIB address > */ < output address > ::= Â“OÂ” < type > < whitespace > < output index > < input address > ::= Â“IÂ” < type > < whitespace > < input index > < local address > ::= Â“LÂ” < type > < whitespace > < memory index > < literal address > ::= Â“CÂ” < type > < whitespace > < literal index > < status address > ::= Â“SÂ” < type > < whitespace > < literal index > < instruction address > ::= Â“@Â” <instruction index > /* < SIB address > ::= Â“*Â” < type > < scale > Â“,Â” < index > Â“,Â” < base > */ < type > ::= < assembly code type > # see Types section & # StandardTypeToString < set size > ::= < output index > | < input index> | < local index> | < literal index > | < status index > < output index > ::= 0 | 1 | ... | <size of output register set Â– 1> < input index > ::= 0 | 1 | ... | <size of input register set Â– 1> < local index > ::= 0 | 1 | ... | <size of local register set Â– 1> < literal index > ::= 0 | 1 | ... | <size of literal register set Â– 1> < status index > ::= 0 | 1 | ... | <size of status register set Â– 1> < instruction index > ::= 0 | 1 | . . . | <size of program Â– 1>

Note: May want additional < literal def > definitions, for example exponents: 6.0221 * 10^24. TBD

Array Encoding

Character arrays (strings) and binary arrays will be encoded to allow transmission and viewing in less than 8-bit clean environments. Typically strings passed through the calculator will be encoded with Unicode and will therefore contain many non-printable (at least when interpreted as ASCII) characters. The format must also support the initialization of VARBINARY and other explicitly binary types, even though their use is likely to be somewhat rare.

Design goals, in rough order are:

CPU Utilization
Size Inflation
Hand Decoding
Human Readable

Candidate encoding schemes include, but are not limited to:

Base64
Hex (each byte represented by two characters)
BinHex-like (minus the resource fork features)
Escaped, yet printable-ASCII.

ENCODING FORMAT IS TBD.

Example Programs

The following is a list of SQL statements and statement fragments and corresponding example compilations of those statements into Calculator programs. More efficient representations of the statements certainly exist. This list is not intended as a guide to the most efficient representation, rather as a possible representation.

CREATE TABLE COURSE (        Heading INT,        COG INT,        VMG REAL);

SELECT COG - Heading as Set        FROM COURSE      WHERE ( (Heading > 200 AND Heading < 220)                 OR                 VMG > 10.0)              );

I s4, s4, r;               # Input Register Set is:                            #   0: long "Heading", 1: long "COG"                            #   2: real "VMG" O s4;                      # Output Register Set:                            #   0: long "Set" S u1;                      # Status Register Set: unsigned char; C u1, u1, u1, u1, u1, u4; # Local Register Set: 5 booleans,                            #   represented by unsigned chars                            #   and on s4 "Set" L u1, u1, s4, s4, r;       # Literal Register Set:                            #   boolean, boolean, long,                            #   long, real; V 1, 0, 200, 220, 10.0;    # Values for Literal Register Set,                            #   including a TRUE and FALSE                            # Following could be done more                            #   efficently. Drawing out the example                            #   here on purpose: T;                            # Start program text GT   L u1 0, I s4 0, C s4 2; #00# Heading > 200, store result bool in local 0 LT   L u1 1, I s4 1, C s4 3; #01# Heading < 220, store result bool in local 1 AND L u1 2, L u1 0, L u1 1; #02# AND local 0 and local 1, result in local 2 GT   L u1 3, I r 3, C r 4; #03# VMG > 10.0, result in local 3 OR   L u1 4, L u1 2, L u1 3; #04# OR result of AND and SOG > 10.0 JMPT @ 8   , L u1 4;          #05# Jump if WHERE is satisfied MOVE S u1 0, C u1 0;          #06# Move Literal[0] "TRUE" to                               #    Status[0]. Filter this row. RETURN;                       #07# Exit, filtering row. SUB O s4 0, I s4 1, I s4 0; #08# Subtract Input[0] from Input[1],                               #    into Output[0] MOVE S u1 0, L u1 1;          #09# Move Literal[0] "FALSE" to                               #    Status[0]. Do not filter this row. RETURN;                       #10# Exit, passing row onward.

MORE EXAMPLES TO FOLLOW IN THIS SPACE

For completeness, a demonstration that the language can perform more complex algorithms, handle iteration, etc. Also, an argument for an indexing addressing mode:

// Fibonacci input(long 0) = n, n>=2; long prev, temp, curr, count ulong index; prev = 1; // mem(0) curr = 1; // mem(1) index = 0; // mem(2) // count mem(3) // temp mem(4) out(index++) = prev; out(index++) = curr; for (count = 0; count < input(0); count++) { temp = curr; curr = curr + prev; out(index++) = curr; prev = temp; }

Example register set layout:

inputs: [0] = n literals: [0] = long 0, [1] = long 1, [2] = ulong 0; locals: [0] = prev, [1] = curr, [2] = index, [3] = count, [4] = temp

Program:

(Note: Has not been updated to current instruction set.) MOVE &0, % long 1; // prev = 1 MOVE &1, % long 1; // curr = 1 MOVE &2, % ulong 0; // index = 0 MOVEI #0, & long 0, &2; // note: index is implicitly ulong in 3rd operand. ADD &2, & ulong 2, %ulong 1; // note: side effect! Read Opands read before writeback. MOVEI #0, & long 0, &2; ADD &2, & ulong 2, %ulong 1; // an inc operator might be nice, but would be redundant MOVE &3, % long 0; // count = 0 _BEGINLOOP; JMPGE _ENDLOOP, & long 3, $ long 0; MOVE &4, & long 1; // temp = cur ADD &1, & long 1, & long 0; // curr = curr + prev MOVEI #0, & long 0, &2; // out(index) = curr ADD &2, & ulong 2, % ulong 1; // index++ MOVE &0, & long 4; // prev = temp ADD &3, & long 3, % long 1; // count++ JUMP _BEGINLOOP; _ENDLOOP; RETURN;

Object Tree

The design goal is to group classes first by type then by operation to allow base classes to be templated by type. Better combinatorics for integral types this way.

Please excuse this diagram represented as an outline.

Outline gude:

Object : Base Class

contains

[Â…]

Member Function ()

[Â…]

Calculator

`Code`

Program counter

Registers

Register Reference

Register Set Index
Register Index
Pointer

Registers

Input Register Set
Output Register Set
Local Register Set
Literal Register Set
Status Register Set

Instruction

longName() pure virtual
shortName() pure virtual
describe() pure virtual
exec() pure virtual private (called from friend class Calculator)
bindCalc() pure virtual

BoolInstruction : Instruction
- BoolOr : BoolInstruction
- BoolAnd : BoolInstruction
- BoolNot : BoolInstruction
- BoolMove : BoolInstruction
- BoolEqual : BoolInstruction
- BoolNotEqual : BoolInstruction
- BoolLess : BoolInstruction
- BoolGreater : BoolInstruction
- BoolIsNull : BoolInstruction
- BoolNotNull : BoolInstruction
NativeInstruction <T> : Instruction
- NativeNativeInstruction <T> : NativeInstruction
  - NativeAdd <T> : NativeNativeInstruction // Concrete
  - NativeSub <T> : NativeNativeInstruction // Concrete
  - NativeMul <T> : NativeNativeInstruction // Concrete
  - NativeDiv <T> : NativeNativeInstruction // Concrete
  - NativeNeg <T> : NativeNativeInstruction // Concrete
  - NativeMove <T> : NativeNativeInstruction // Concrete
- IntegralNativeInstruction <T> : NativeInstruction
  - IntegralNativeMod <T> : IntegralNativeInstruction // Concrete
  - IntegralNativeAnd <T> : IntegralNativeInstruction // Concrete
  - IntegralNativeOr <T> : IntegralNativeInstruction // Concrete
  - IntegralNativeShiftLeft <T> : IntegralNativeInstruction // Concrete
  - IntegralNativeShiftRight <T> : IntegralNativeInstruction // Concrete
- BoolNativeInstruction <T> : NativeInstruction
  - BoolNativeEqual <T> : BoolNativeInstruction // Concrete
  - BoolNativeNotEqual <T> : BoolNativeInstruction // Concrete
  - BoolNativeGreater <T> : BoolNativeInstruction // Concrete
  - BoolNativeGreaterEqual <T> : BoolNativeInstruction // Concrete
  - BoolNativeLess <T> : BoolNativeInstruction // Concrete
  - BoolNativeLessEqual <T> : BoolNativeInstruction // Concrete
  - BoolNativeIsNull <T> : BoolNativeInstruction // Concrete
  - BoolNativeNotNull <T> : BoolNativeInstruction // Concrete
PointerInstruction <T> : Instruction
- PointerPointerInstruction <T> : PointerInstruction
  - PointerMove <T> : PointerInstruction // Concrete
  - PointerAdd <T> : PointerInstruction // Concrete
  - PointerSub <T> : PointerInstruction // Concrete
  - PointerToNull <T> : PointerInstruction // Concrete
- BoolPointerInstruction <T> : PointerInstruction
- PointerEqual <T> : PointerInstruction // Concrete
- PointerNotEqual <T> : PointerInstruction // Concrete
- PointerGreater <T> : PointerInstruction // Concrete
- PointerGreaterEqual <T> : PointerInstruction // Concrete
- PointerLess <T> : PointerInstruction // Concrete
- PointerLessEqual <T> : PointerInstruction // Concrete
- PointerIsNull <T> : PointerInstruction // Concrete
- PointerIsNotNull <T> : PointerInstruction // Concrete
JumpInstruction : Instruction
- Jump : FlowInstruction // Concrete
- JumpTrue: FlowInstruction // Concrete
- JumpFalse: FlowInstruction // Concrete
- JumpNull: FlowInstruction // Concrete
- JumpNotNull: FlowInstruction // Concrete
ReturnInstruction : Instruction // Concrete
ReturnException : Instruction // Concrete
CallInstruction : Instruction

TBD

Miscellaneous Topics

Programming Conventions

General Conventions

Avoid writing into Literal register set. This may be enforced with the ReadOnly RegisterReference property.
Writing into Local register set is strictly forbidden. XOs and the tuple unmarshalling code make assumptions that the underlying tuple will not be molested. This may be enforced with the ReadOnly RegisterReference property.
Program should explicitly clear status registers.
XO should check the warning from Calculator after each execution

Aggregation & Analytic Functions

SQL Aggregate Functions and Analytic Functions can be supported by the program writer by appropriate manipulation of Output register sets. The Â“aggregator bucketsÂ” (nee accumulators) would be modeled as Output sets that were passed to various Calculators. There is no requirement that Output sets be cleared before use. Reading from Output registers is legal and encouraged. Note that setting the CachePointer property for Output register sets is likely to be a very bad idea for this reason, and also for performance reasons.

Status Register Conventions

Register [0] is a boolean that signals to a Filter XO if the row should be Â“filteredÂ”, that is dropped, or Â“passedÂ”, that is sent on to the next XO. IFF S[0] == TRUE -> filter.
Register [1]: Add more definitions hereÂ…

Pros & Cons

Calculator setup is mid-weight, not light weight. Have to instantiate a number of objects to represent each instruction. This is mitigated somewhat by creating exactly one RegisterReference for each register, and passing pointers into each Instruction instance.

Threading Model

Calculator is not thread safe. Each calculator object is light weight enough that each thread can reasonably instantiate its own Calculator.

Handles

A handle primitive type might be useful for managing resources and utilizing external functions. TBD.

Tuple Descriptors

XOs will find it necessary to determine the format of the Input and Output tuples to perform their processing. getRegisterDescriptors() has been provided for this purpose.

Appendix A: Extended Instructions

Windowed Aggregation Support

Windowed Aggregation support requires a new group of extended instructions be defined and implemented. The main difference between the windowed aggregation functions and the equivalent standard aggregation functions is the data currently in the window must be "remembered". As processing proceeds new rows are added to the window and rows that meet the exit criteria are dropped. With the standard SUM function values are only added to the accumulator, but with the windowed aggregation version data values are also subtracted from the running sum as rows are dropped. The MIN and MAX functions also need special attention. Normally the MIN/MAX functions would only need to monitor the current row being processed to see if a new MIN/MAX value is presented. Windowed aggregation increases the complexity. As rows are dropped the current MIN/MAX value may be dropped. This forces the system to go back to the window and rescan the list for the "new" MIN/MAX value.

As with the rest of the calculator the windowed aggregation instructions are implemented to support different data types. Currently windowed aggregation supports 3 basic types of data. 64 bit Integer (INT64_T), 64 bit real (DOUBLE) and character string. BOOLEAN and BINARY are not supported data types. Smaller data types are supported through CASTing and must be CAST up to the supported size before submission to any windowed aggregation instructions. Return values are CAST down to match original data type.

The string data types are supported by MIN/MAX and COUNT. Calls to the SUM or AVG functions generate an appropriate calculator exception.

Support for the NUMERIC and DECIMAL data types is also provided through CASTing. Values defined with scale of zero are cast to INT64_T. Non-zero scaled values require a CAST to type DOUBLE before submission for windowed aggregation processing. Return values are cast back to match input data type.

While the COUNT, SUM and AVG functions are provided if the MIN and MAX functions are not being used then more efficient implementations of the functions are available that do not incur the overhead of managing the priority queue of window values.

Windowed Aggregation Functions

Instruction	Parameters	Comments
WinAggInit	result	On entry the data buffer for this register must contain the data type for this aggregation column. STANDARD_TYPE_INT_64, STANDARD_TYPE_DOUBLE and STANDARD_TYPE_VARCHAR are the only valid values. On return the data pointer points to a WinAggAccum object. It holds all entries in the window for the target column. This tuple(more specifically the buffer pointed to by pData) must be supplied to all subsequent aggregation calls for this column.
WinAggAdd	result, newValue, winAggAccum	result - on exit contains the row count currently in the Window. Same as WinAggCount newValue - Input register with the new value to be added to the window winAggAccum - Object holding current window information
WinAggDrop	result, oldValue, winAggAccum	result - on exit contains the new row count for the window. Same as WinAggCount oldValue - original tuple/value that was previously added to the window.
WinAggCount	result, winAggAccum	result - on exit contains the total number of lines in the window including NULL entries. Return type is always INT64_T.
WinAggSum	result, winAggAccum	result - on exit contains the sum over the current window entries data type matches column type. May return NULL
WinAggAvg	result, winAggAccum	result - on exit contains the average calculated over the current window contents. Is equivalent to WinAggSum()/WinAggCount(). Data type matches the column type.
WinAggMin	result, winAggAccum	result - on exit contains the Minimum value out of the current window contents. May return NULL
WinAggMax	result, winAggAccum	result - on exit contains the Maximum value out of the current window contents. May return NULL

$Id: //open/dt/dev/fennel/web/doc/doxygen/doc/calculator-compact.html#3 $