Calculator Design Specification


John Kalucki, 2/22/2004

Copyright (C) 2003-2009, SQLstream, Inc.

Overview

Calculator is a portable C++ component that executes small programs very efficiently. Programs consist of a list of instructions, input data, literals (constants), and scratch memory. Generally programs result in some output data. A simple virtual machine (VM) executes programs and terminates.

A typical Calculator object life-cycle & use-pattern:

  1. Instantiate Calculator
  2. Assemble Instructions: Create and Insert Instructions into Calculator
  3. Allocate Required Memory
  4. Bind Input, Output, and Literals to Calculator
  5. Execute Program
  6. Check Status Register
  7. Utilize Output
  8. As Needed: Bind New Input.
  9. As Needed: Bind new Output
  10. As Needed: Jump to #5
  11. Destroy Calculator
  12. De-allocate memory

Calculator will be installed as a component of Fennel. Coding standards, build environment, etc. are to follow the Fennel projectÂ’s example. Calculator will introduce no additional dependencies to Fennel.

The primary goals of this Calculator design are Execution Performance and, as a pair: Extensibility and Maintainability. Overall performance for the single execution case will be considered as well as case where the same program is executed repeatedly over a varying set of inputs and outputs.

Calculator can call external functions for extensibility and to efficiently handle complex operations. For example, string concatenation could involve a call to strncpy() instead of having the VM perform a painful byte-by-byte copy of the string. A strongly-typed version of external functions called extended instructions allows Calculator to be extended in a type-safe way.

Calculator will interact directly and indirectly with a number of other components. A compiler will create programs. Typically the compiler will instantiate instruction objects and insert them into a list. Once finished, the compiler will ask the instruction for a serial representation and create a text version of the program. The compiler may, at its option, also create the program text directly, but will give up the implicit type checking and other features.

An Execution Object (XO) will call the assembler with a text program. The assembler will allocate some of the register sets, and re-create the instruction object list, inserting the instructions into a Calculator object. Later, the XO can bind data, and execute Calculator.

Memory Architecture

Register Sets

Memory is modeled as a number of register sets. Each register set is stored in an array. Individual registers are accessed by an 0-based index into the set. Register sets are currently a simple abstraction from Fennel Tuples.

Calculator will perform all memory allocation during instantiation and configuration and no memory allocation during typical execution paths. Warning Messages and Error Messages may be allocated at runtime, however. See Warnings & Errors below. The caller will be responsible for allocating and de-allocating register sets.

Note: External functions and extended instructions may perform resource allocation as a side-effect. Calculator makes no attempt to track resource allocation. The program writer must explicitly call appropriate external functions to de-allocate such memory.

There are five register sets defined:

1. Input:

a. Input of program data. (i.e. function arguments)

b. Generally contains  tuple data handed to XO.

c. Intended to be the main variant when Calculator programs are executed repeatedly.

2. Output:

a. Output of program data. (i.e. function return values)

b. Generally contains the tuple to be passed out of the XO.

c. Storage of intermediate results between program executions. (e.g. Aggregation Functions)

d. A compiler might also choose to store metadata such as exceptions, errors, warnings, overflow indicator bits, etc. in the output register set, but this usage is discouraged. Storing such data here requires the XO to filter the tuple before passing it on to the next XO.

3. Local: (“Intermediates”)

a. General purpose intermediate storage.

b. Intermediate storage for integral types. (i.e. main memory)

c. Pre-allocated “heap space” for scratch storage of strings and other Large Objects.

4. Literals: (“Constants”)

a. When Calculator programs are executed repeatedly, static literals need not be mixed with changing input tuples.

b. Literal register set is allocated and set by the assembler, not by the XO.

c. For a very small increase in Calculator complexity, there is potentially a win in reduction of binding overhead.

d. Explicitly supporting literals in a register set obviates the need for a literal addressing mode and its associated complexity (e.g. ADD Local(4), $0x200 is not needed. $0x200 is loaded from the Literals register instead: ADD Local(4), Literal(3))

5. Status:

a. Provide a generic and flexible location for programs to communicate with the XO.

b. Prevents status information from mixing with the output register set, obviating the need for filtering the output tuple in the XO.

c. Allows for rich interaction between programs and XO without also requiring modification to Calculator.

Register sets will not be cleared or otherwise manipulated by Calculator object. Only two types of objects within the Calculator component will manipulate register sets: the assembler and subclasses of the Instruction class. By not clearing register sets, the output register set may also be used as input as well as output for instructions.

Addressing Modes

Genuflecting to John Hennessy and Dave Patterson, CalculatorÂ’s addressing modes will be orthogonal:

1. All operands of all instructions that reference memory can use all addressing modes

2. All register sets are readable and writeable by all instructions. However, sensible programming conventions would discourage inappropriate use of sets. (e.g. writing to the input and literal register sets)

3. All memory locations of a given type (and/or length, see Storage, below) can be used interchangeably.

4. No “special” registers.

5. All memory accesses will be typed. (Instructions will cast all accesses as the desired type.)

Efficiency is paramount! Operands in programs will be indexes into a given register set. For example: Output[4] or Input[0]. The RegisterReference class will provide an abstraction to allow various implementations of, experiments with, and debug-level instrumentation of, all memory access.

A consideration: RegisterReference may cache the abstract register set reference “Local [4]” as a direct pointer to memory by setting the CachePointer property. Tuning may indicate that this is valuable when applied to register sets that do not change, such as the Literal set and the Local set, but reference invalidation costs for sets that typically re-bound before each execution may not be worth the cost. The implication is that local memory access may be somewhat faster than Input or Output access. TBD.

A consideration: Add an additional indexing addressing mode that allows easier manipulation of one-dimensional and multi-dimensional arrays, jump tables for CASE statements, and perhaps the register sets themselves. For example, the 80386 architecture supports a Scale/Index/Base addressing mode. Addresses are calculated as (Scale * Index) + Base. TBD

Nested rows issues? TBD

Multiple rows per execution? (e.g. batch) TBD.

Storage

Portability Invariant: A given program and input should produce (nearly) identical output on each and every supported hardware type. Calculator must be independent of underlying hardware – with sole and notable exception of rounding errors in floating point numbers.

Native type support, storage, typedefs, etc. are provided by Fennel Tuples. Fennel supports types beyond the native C++ types, such as VARCHAR, VARBINARY, and so forth. Therefore Calculator types are defined by the Fennel enum StandardTypeDescriptorOrdinal.

Types

The Calculator is strongly typed:

The implication is that the compiler must be aware of types, and must always Do The Right Thing.

Type Mappings

Mappings between SQL types and C++ native types is provided by the Fennel Tuple code. (See documentation for SQL to Fennel type mapping.) Some types, such as SQLÂ’s DECIMAL(p,s) and TIMESTAMP(p) types are not directly supported by Calculator. The compiler should handle this type by generating appropriate programs.

Tuple represent NULL values with a NULL pointer.

Non-native types supported by Fennel Tuples, such as VARCHAR, are simply represented by a pointer in the Calculator, and perhaps also an integer containing the length of the array.

When represented as strings, for example in serialized programs, the mapping from StandardTypeDescriptorOrdinal to strings is defined by standardTypetoString() located in fennel/calc/StandardTypeToString.cpp.

Fennel/Tuple/ StandardTypeDescriptorOrdinal String Representation
STANDARD_TYPE_INT_8

s1

STANDARD_TYPE_UINT_8

u1

STANDARD_TYPE_INT_16

s2

STANDARD_TYPE_UINT_16

u2

STANDARD_TYPE_INT_32

s4

STANDARD_TYPE_UINT_32

u4

STANDARD_TYPE_INT_64

s8

STANDARD_TYPE_UINT_64

u8

STANDARD_TYPE_REAL

r

STANDARD_TYPE_DOUBLE

d

STANDARD_TYPE_CHAR

c

STANDARD_TYPE_VARCHAR

vc

STANDARD_TYPE_BINARY

b

STANDARD_TYPE_VARBINARY

vb

Internationalization

Internationalization issues are expected to be transparent to Calculator. Byte arrays are represented by pointers.

Isolation

ItÂ’s a Mad Mad Mad Mad World. Under normal running conditions, Calculator will make no attempt to isolate the C++ process from potentially ill side effects of Calculator program execution. This responsibility is completely up to the program compiler and the author of extended instructions.

 Risks not handled, include, but are not limited to:

Some of these risks could be mitigated by only allowing a tightly controlled list of external functions to be called.

Risks handled:

Each instruction that may bump into these risks will have to handle them in a well defined manner and provide the appropriate error, warning or exception.

Calculator Programs

Representation

Calculator programs can be created and represented in several ways:

1) In-process C++ can access the instruction set OO hierarchy directly, cons up instruction objects, order them into programs, and have Calculator execute them. This is the typical Fennel access mode.

2) Out-of-process C++ can link in Calculator code to access the instruction set OO hierarchy directly. Each instruction can convert itself into a serialized (string) representation.  The serial representation can be transmitted to another C++ process for assembly and execution. See Assembler section for details on the format of these strings.

3) In or Out of process Java creates, via some arbitrary method, a string representation of a program. The program can be transmitted to a C++ process.

4) Out-of-process C++ can also use method #3.

Flow Control

All jumps are to absolute instruction “addresses”. The effective address of an instruction is determined solely by the order that it is inserted into Calculator. The first instruction inserted is at address 0 – that is to say, instruction addresses are 0-indexed

Jump labels, instead of absolute jump addresses would make programs easier to read and debug, but would add complexity to Calculator and perhaps to the compiler. For now, labels are not supported. A compiler and program writer might consider implementing labels as a convenience in creating programs.

A consideration: May want to provide some support for efficient processing of SQL CASE statements, such as the SIB addressing mode mentioned above.

Exceptions

Yet to be fully specified. TBD

Will use the RAISE instruction.

May want to support catch, throw, try, finally.

When an exception is thrown in a program, Calculator will eventually throw a C++ exception that the caller to Calculator.exec() is encouraged to catch and decode.

Format and values of object thrown by exception TBD.

Extended Instructions

Implemented, but not currently documented. TBD

External Functions

Yet to be specified. TBD

Will have some form of static lookup table of supported functions.

Assembler will have to convert a string function name into a function pointer.

Will have to support a pointer type.

John Sichi has recommended looking at Functors. Look at Boost Functors. May be useful here.

Provide/find optimized str(3) functions that handle SQL semantics correctly. (e.g.  ignoring trailing spaces, varbinary blobs of unequal length)

Consider resource allocation issues.

Consider how to represent Null stings vs. empty strings, etc.

NULL

Fennel Tuples represent NULL values as a NULL pointer. RegisterReference handles this convention.

All instructions will be NULL-aware and will implement appropriate SQL NULL semantics.

If a Local register may be set to NULL, and if Calculator will be called more than once, the Local register set must have the PointerReset property set. This property causes Calculator to reset the tuple pointer back to its original location before the next call to exec().

Writing programs to test every location for IsNotNULL before each write could be a real bear. Program writers are encouraged to make liberal use of Local registers to avoid this problem. ItÂ’s possible that this representation of NULL is simply not strong enough, and Tuples will have to be changed to reflect this problem. TBD

Effect on strings and external functions? TBD

Warnings and Errors

The compiler must take warning recovery and error handling into account. The compiler can decide if it wishes to perform exception avoidance by testing for common errors within the program, or let Calculator execute and handle the issue.

Warning and error codes codes are taken from Section 22.1 SQLSTATE of the SQL99 specification. Arbitrarily, CalculatorÂ’s implementation-defined errors are in the Class 5Z. This choice should be rationalized. TBD.

Warnings are the result of correctable problems. Program execution always continues after warnings and side-effects are well defined. Warning messages are coded and placed onto a deque (double-ended queue) for later access. A deque is used to allow interpretation of warnings in the order that they occurred (iterate from deque.begin()) or to allow interpretation of the last warning first (iterate from deque.end())

Warnings:

Warning Code Class/Subclass Side-effect(s) Comments
Division By Zero 22012 Result contains a NULL. SQL99 spec appears to state that DIV/0 is an error. A side-effect is not specified. Consider making Division By Zero an Error. TBD
Overflow 22003 Round or Truncate? TBD SQL99 says implementation may round or truncate
Underflow 22003 Round or Truncate? TBD Does this even exist?

 

Errors are problems that are not correctable, or problems with side-effects that are not well defined. Program execution always terminates immediately after errors. Error status is indicated in the last element of the warning deque. An exception is thrown after the error message is added to the warning queue. Format of this exception TBD.

Errors:

Error Code Side-effect(s) Comments
Segmentation Fault 5Z001   PC falls off end of program and/or jumps to a non-existent location.

 

Instruction Set

Base Class Name Object Name OpCode Mnemonic
&
String Representation
Operands Comment

Type Support

bool long ulong float
long double
T *
·BooleanInstruction BoolAnd AND result, operand, operand

Y

 

 

 

  BoolOr OR

 

 

 

  BoolEqual EQ   Equivalent to SQL boolean IS

 

 

 

  BoolNotEqual NE   Equivalent to SQL boolean IS NOT

 

 

 

  BoolGreater GT TRUE > FALSE. No need for GE, LE

 

 

 

  BoolLess LT

 

 

 

  BoolNot NOT result, operand  

 

 

 

  BoolMove MOVE  

 

 

 

  BoolIsNull ISNULL  

 

 

 

  BoolIsNotNull ISNOTNULL    

 

 

 

  BoolToNull TONULL result  

 

 

 

·Â·NativeNative-Instruction NativeAdd ADD result, operand, operand  

 

Y

 

  NativeSub SUB  

 

 

  NativeMul MUL  

 

 

  NativeDiv DIV  

 

 

  NativeNeg NEG result, operand  

 

 

  NativeMove MOVE    

 

 

  NativeToNull TONULL result  

 

 

·Â·IntegralNativeInstruction IntegralNativeMod MOD result, operand1, operand2 modulus operator, op1 is dividend, op2 is divisor

 

 

 

 

  IntegralNativeShiftLeft SHFL result, operand1, operand2 shift operand1 left or right by operand2 number of bits. Shifts in zeros. operator2 must be positive long or ulong

 

 

 

  IntegralNativeShiftRight SHFR

 

 

 

  IntegralNativeAnd AND result, operand, operand Bitwise operations

 

 

 

  IntegralNativeOr OR

 

 

 

  IntegralNativeXor XOR No SQL mapping. Dubious utility.

 

 

 

·Â·BoolNative-
Instruction
BoolNativeEqual EQ result, operand, operand returns boolean

 

 

 

  BoolNativeNotEqual NE

 

 

  BoolNativeGreater GT

 

 

  BoolNativeGreaterEqual GE

 

 

  BoolNativeLess LT

 

 

  BoolNativeLessEqual LE

 

 

  BoolNativeIsNull ISNULL result, operand

 

 

  BoolNativeIsNotNull ISNOTNULL

 

 

·Â·PointerPointer-Instruction PointerMove MOVE result, operand1 Result and Operand1 are pointers.

Operand2 is Long

 

 

 

Y

  PointerAdd ADD result, operand1, operand2

 

 

 

  PointerSub SUB  

 

 

 

  PointerToNull TONULL result

 

 

 

·Â·BoolPointer-Instruction PointerEqual EQ result, operand, operand returns boolean

 

 

 

  PointerNotEqual NE

 

 

  PointerGreater GT

 

 

  PointerGreaterEqual GE

 

 

  PointerLess LT

 

 

  PointerLessEqual LE

 

 

  PointerIsNull ISNULL result,

operand

 

 

  PointerIsNotNull ISNOTNULL

 

 

·JumpInstruction Jump JMP location unconditional jump

 

 

 

 

  JumpTrue JMPT location, operand conditional jump, operand is boolean

Y

 

 

 

  JumpFalse JMPF

 

 

 

  JumpNull JMPN

 

 

 

  JumpNotNull JMPNN

 

 

 

·Instruction Raise RAISE literal long Exception. Operands TBD

 

 

 

 

·Instruction Return RETURN   Exit from Calculator. Operand TBD

 

 

 

 

·ExtendedInstruction ExtendedInstruction EXT signature Signature is a string:
“Fname(type1,type2,type3)”

?

Y

Y

Y

CalInstruction Call CALL TBD TBD

?

Y

Y

Y

Assembler

The Assembler performs the following functions:

Serialized Programs

Serialized programs support the following features:

XML Format

For any number of good reasons, Calculator should support XML representations of programs. Unfortunately, an XML representation introduces the requirement for an XML parser into Fennel, which is not strictly necessary for Calculator to work. Therefore, to avoid adding another library dependency on Fennel, the text representation of programs will not be XML.

Boost::Spirit Parser

This parser was considered, but eventually disregarded. The template programming style that Spirit is written in is appealing, but contains a steep learning curve. Maintenance of this parser could be difficult. Also, compile-time errors are rather difficult to decode at times. The developer might be forced to only make small changes, and then compile and test. Otherwise it can be nearly impossible to tell where an error was introduced. Perhaps the Spirit parser can be re-investigated in the future.

Simple Serialized Format

All of the functionality required can be provided cheaply by using a very simple format to represent instructions. Expediency is the goal for now. The implementation details of this parser are, at the moment, outside the scope of this document. If the parser introduces an additional library dependency to Fennel, it must be possible to compile Fennel without the parser. For example, fennel/configure should take arguments, similar to –with-boost–with-stlport, such as –with-ANTLR –with-XERCES, or the like, to enable serial program representation and the associated third-party parsing library.

BNF (EBNF?) of Simple Serialized Format

 

Note: < whitespace > is allowed between tokens. Tokenizer or skip-parser should consume < whitespace >

Note: Format is case-sensitive.

 

< whitespace >      ::= “ “ | “\n” | “\r” | “\t”

< program >         ::= <preamble> “T” < program text >

< preamble >        ::= < registerset def > [ <literal set def > < literal values > ]

< registerset def > ::= < set name >
                        [ “,” ( < type size > )+ ]
                        < terminator >
< set name >        ::= “O” | # Output
                        “I” | # Input – Generally unused.
                        “S” | # Status
                        Â“L” | # Local

< type size >       ::= { < fixedlengthtype > } |
                        { < variablelengthtype > “,” < bytes > }

< fixedlengthtype > ::= < integers > |
                        < reals > |
                        < doubles > |
                        < chars > |
                        < binary >

< variablelengthtype >:: = < varchar > | < varbinary >

< literal set def > ::= “C”   # Literal set, think Constant.
                        [ “,” ( < type size > )+ ]
                        < terminator >

< literal values >  ::= “V”    # Values
                        ( < literal def> ) *
                        < terminator >

< literal def >     ::= < integerval > |
                        < realval > |
                        < bytearrayval >

< integerval >      ::= [ “-“ ]
                        ( [0-9] ) +

< realval >         ::= [ “-“ ]
                        ( [0-9] ) +
                        “.”
                        ( [0-9] ) +

< bytearrayval>     ::=  # TBD See section “Binary Encoding”

< program text >    ::= ( < instruction > ) +   # a program has 1 or more instructions

< instruction >     ::=  < opcode > “,”
                         < operand list >
                         < terminator >

< operand list >    ::= { < result > “,” < operand > [ “,” < operand > ] } |
                        { < set size > }

< terminator >      :: = “;”

< result >          ::= < output address > |
                        < local address >

< operand >         ::= < output address > |
                        < input address > |
                        < local address > |
                        < literal address > |
                        < status address > |
                        < instruction address >
                           /* < SIB address > */

< output address >      ::= “O”
                            < type >
                            < whitespace >
                            < output index >

< input address >       ::= “I”
                            < type >
                            < whitespace >
                            < input index >

< local address >       ::= “L”
                            < type >
                            < whitespace >
                            < memory index >

< literal address >     ::= “C”
                            < type >
                            < whitespace >
                            < literal index >

< status address >      ::= “S”
                            < type >
                            < whitespace >
                            < literal index >

< instruction address > ::= “@” <instruction index >

/* < SIB address >      ::= “*” < type > < scale > “,” < index > “,” < base > */

< type >                ::= < assembly code type >
                            # see Types section &
                            # StandardTypeToString

< set size >            ::= < output index > |
                            < input index> |
                            < local index> |
                            < literal index > |
                            < status index >

< output index >        ::= 0 | 1 | ... | <size of output register set – 1>
< input index >         ::= 0 | 1 | ... | <size of input register set – 1>
< local index >         ::= 0 | 1 | ... | <size of local register set – 1>
< literal index >       ::= 0 | 1 | ... | <size of literal register set – 1>
< status index >        ::= 0 | 1 | ... | <size of status register set – 1>

< instruction index >   ::= 0 | 1 | . . . | <size of program – 1>

Note: May want additional < literal def > definitions, for example exponents: 6.0221 * 10^24. TBD

Array Encoding

Character arrays (strings) and binary arrays will be encoded to allow transmission and viewing in less than 8-bit clean environments. Typically strings passed through the calculator will be encoded with Unicode and will therefore contain many non-printable (at least when interpreted as ASCII) characters. The format must also support the initialization of VARBINARY and other explicitly binary types, even though their use is likely to be somewhat rare.

Design goals, in rough order are:

  1. CPU Utilization
  2. Size Inflation
  3. Hand Decoding
  4. Human Readable

Candidate encoding schemes include, but are not limited to:

  1. Base64
  2. Hex (each byte represented by two characters)
  3. BinHex-like (minus the resource fork features)
  4. Escaped, yet printable-ASCII.

ENCODING FORMAT IS TBD.

Example Programs

The following is a list of SQL statements and statement fragments and corresponding example compilations of those statements into Calculator programs. More efficient representations of the statements certainly exist. This list is not intended as a guide to the most efficient representation, rather as a possible representation.

CREATE TABLE COURSE (
       Heading INT,
       COG INT,
       VMG REAL);

SELECT COG - Heading as Set
       FROM COURSE
       WHERE ( (Heading > 200 AND Heading < 220)
                OR
                VMG > 10.0)
             ); 

I s4, s4, r;               # Input Register Set is:
                           #   0: long "Heading", 1: long "COG"
                           #   2: real "VMG"
O s4;                      # Output  Register Set:
                           #   0: long "Set"
S u1;                      # Status Register Set: unsigned char;
C u1, u1, u1, u1, u1, u4;  # Local Register Set: 5 booleans,
                           #   represented by unsigned chars
                           #   and on s4 "Set"
L u1, u1, s4, s4, r;       # Literal Register Set:
                           #   boolean, boolean, long,
                           #   long, real;
V 1, 0, 200, 220, 10.0;    # Values for Literal Register Set,
                           #   including a TRUE and FALSE
                           # Following could be done more
                           #   efficently. Drawing out the example
                           #   here on purpose:
T;                            # Start program text
GT   L u1 0, I s4 0, C s4 2;  #00# Heading > 200, store result bool in local 0
LT   L u1 1, I s4 1, C s4 3;  #01# Heading < 220, store result bool in local 1
AND  L u1 2, L u1 0, L u1 1;  #02# AND local 0 and local 1, result in local 2
GT   L u1 3, I r  3, C r  4;  #03# VMG > 10.0, result in local 3
OR   L u1 4, L u1 2, L u1 3;  #04# OR result of AND and SOG > 10.0
JMPT @ 8   , L u1 4;          #05# Jump if WHERE is satisfied
MOVE S u1 0, C u1 0;          #06# Move Literal[0] "TRUE" to
                              #    Status[0]. Filter this row.
RETURN;                       #07# Exit, filtering row.
SUB  O s4 0, I s4 1, I s4 0;  #08# Subtract Input[0] from Input[1],
                              #    into Output[0]
MOVE S u1 0, L u1 1;          #09# Move Literal[0] "FALSE" to
                              #    Status[0]. Do not filter this row.
RETURN;                       #10# Exit, passing row onward.

 

MORE EXAMPLES TO FOLLOW IN THIS SPACE

For completeness, a demonstration that the language can perform more complex algorithms, handle iteration, etc. Also, an argument for an indexing addressing mode:

// Fibonacci
input(long 0) = n, n>=2;

long prev, temp, curr, count
ulong index;
prev = 1;           // mem(0)
curr = 1;           // mem(1)
index = 0;          // mem(2)
                    // count mem(3)
                    // temp mem(4)
out(index++) = prev;
out(index++) = curr;
for (count = 0; count < input(0); count++) {
     temp = curr;
     curr = curr + prev;
     out(index++) = curr;
     prev = temp;
}

Example register set layout:

inputs:   [0] = n
literals: [0] = long 0, [1] = long 1, [2] = ulong 0;
locals:   [0] = prev, [1] = curr, [2] = index, [3] = count, [4] = temp

Program:

(Note: Has not been updated to current instruction set.)
MOVE   &0, % long 1;            // prev = 1
MOVE   &1, % long 1;            // curr = 1
MOVE   &2, % ulong 0;           // index = 0
MOVEI  #0, & long 0, &2;        // note: index is implicitly ulong in 3rd operand.
ADD    &2, & ulong 2, %ulong 1; // note: side effect! Read Opands read before writeback.
MOVEI  #0, & long 0, &2;
ADD    &2, & ulong 2, %ulong 1; // an inc operator might be nice, but would be redundant
MOVE   &3, % long 0;            // count = 0
_BEGINLOOP;
JMPGE  _ENDLOOP, & long 3, $ long 0;
MOVE   &4, & long 1;              // temp = cur
ADD    &1, & long 1, & long 0;    // curr = curr + prev
MOVEI  #0, & long 0, &2;          // out(index) = curr
ADD    &2, & ulong 2, % ulong 1;  // index++
MOVE   &0, & long 4;              // prev = temp
ADD    &3, & long 3, % long 1;    // count++
JUMP   _BEGINLOOP;
_ENDLOOP;
RETURN;

Object Tree

The design goal is to group classes first by type then by operation to allow base classes to be templated by type. Better combinatorics for integral types this way.

Please excuse this diagram represented as an outline.

Outline gude:

Object : Base Class

contains

contains

[Â…]           

Member Function ()

Member Function ()

[Â…]

 

Calculator

Code

Program counter

Registers

 

Register Reference

Registers

Instruction

 

TBD

 

 

Miscellaneous Topics

Programming Conventions

General Conventions

Aggregation & Analytic Functions

SQL Aggregate Functions and Analytic Functions can be supported by the program writer by appropriate manipulation of Output register sets. The “aggregator buckets” (nee accumulators) would be modeled as Output sets that were passed to various Calculators. There is no requirement that Output sets be cleared before use. Reading from Output registers is legal and encouraged. Note that setting the CachePointer property for Output register sets is likely to be a very bad idea for this reason, and also for performance reasons.

Status Register Conventions

Pros & Cons

Calculator setup is mid-weight, not light weight. Have to instantiate a number of objects to represent each instruction. This is mitigated somewhat by creating exactly one RegisterReference for each register, and passing pointers into each Instruction instance.

Threading Model

Calculator is not thread safe. Each calculator object is light weight enough that each thread can reasonably instantiate its own Calculator.

Handles

A handle primitive type might be useful for managing resources and utilizing external functions. TBD.

Tuple Descriptors

XOs will find it necessary to determine the format of the Input and Output tuples to perform their processing. getRegisterDescriptors() has been provided for this purpose.

Appendix A: Extended Instructions

Windowed Aggregation Support

Windowed Aggregation support requires a new group of extended instructions be defined and implemented.  The main difference between the windowed aggregation functions and the equivalent standard aggregation functions is the data currently in the window must be "remembered".  As processing proceeds new rows are added to the window and rows that meet the exit criteria are dropped.  With the standard SUM function values are only added to the accumulator, but with the windowed aggregation version data values are also subtracted from the running sum as rows are dropped.   The MIN and MAX functions also need special attention.  Normally the MIN/MAX functions would only need to monitor the current row being processed to see if a new MIN/MAX value is presented.  Windowed aggregation increases the complexity.  As rows are dropped the current MIN/MAX value may be dropped.  This forces the system to go back to the window and rescan the list for the "new" MIN/MAX value.

As with the rest of the calculator the windowed aggregation instructions are implemented to support different data types.  Currently windowed aggregation supports 3 basic types of data.  64 bit Integer (INT64_T), 64 bit real (DOUBLE) and character string.  BOOLEAN and BINARY are not supported data types.  Smaller data types are supported through CASTing and must be CAST up to the supported size before submission to any windowed aggregation instructions.  Return values are CAST down to match original data type.  

The string data types are supported by MIN/MAX and COUNT.  Calls to the SUM or AVG functions generate an appropriate calculator exception. 

Support for the NUMERIC and DECIMAL data types is also provided through CASTing.  Values defined with scale of zero are cast to INT64_T.  Non-zero scaled values require a CAST to type DOUBLE before submission for windowed aggregation processing.  Return values are cast back to match input data type.

While the COUNT, SUM and AVG functions are provided if the MIN and MAX functions are not being used then more efficient implementations of the functions are available that do not incur the overhead of managing the priority queue of window values.

Windowed Aggregation Functions

 

Instruction

Parameters

Comments

WinAggInit result On entry the data buffer for this register must contain the data type for this aggregation column.  STANDARD_TYPE_INT_64, STANDARD_TYPE_DOUBLE and STANDARD_TYPE_VARCHAR are the only valid values.  On return the data pointer points to a WinAggAccum object.  It holds all entries in the window for the target column.  This tuple(more specifically the buffer pointed to by pData) must be supplied to all subsequent aggregation calls for this column.
WinAggAdd result, newValue, winAggAccum result - on exit contains the row count currently in the Window.   Same as WinAggCount

newValue - Input register with the new value to be added to the window

winAggAccum - Object holding current window information

WinAggDrop result, oldValue, winAggAccum result - on exit contains the new row count for the window.  Same as WinAggCount

oldValue - original tuple/value that was previously added to the window.

WinAggCount result, winAggAccum result - on exit contains the total number of lines in the window including NULL entries.  Return type is always INT64_T.
WinAggSum result, winAggAccum result - on exit contains the sum over the current window entries data type matches column type. May return NULL
WinAggAvg result, winAggAccum result - on exit contains the average calculated over the current window contents.  Is equivalent to WinAggSum()/WinAggCount().  Data type matches the column type.
WinAggMin result, winAggAccum result - on exit contains the Minimum value out of the current window contents. May return NULL
WinAggMax result, winAggAccum result - on exit contains the Maximum value out of the current window contents. May return NULL

 

 

 


$Id: //open/dt/dev/fennel/web/doc/doxygen/doc/calculator-compact.html#3 $