John Kalucki, 2/22/2004
Copyright (C) 2003-2009, SQLstream, Inc.
Calculator is a portable C++ component that executes small programs very efficiently. Programs consist of a list of instructions, input data, literals (constants), and scratch memory. Generally programs result in some output data. A simple virtual machine (VM) executes programs and terminates.
A typical Calculator object life-cycle & use-pattern:
Calculator will be installed as a component of Fennel. Coding standards, build environment, etc. are to follow the Fennel projectÂ’s example. Calculator will introduce no additional dependencies to Fennel.
The primary goals of this Calculator design are Execution Performance and, as a pair: Extensibility and Maintainability. Overall performance for the single execution case will be considered as well as case where the same program is executed repeatedly over a varying set of inputs and outputs.
Calculator can call external functions for extensibility and to efficiently handle complex operations. For example, string concatenation could involve a call to strncpy() instead of having the VM perform a painful byte-by-byte copy of the string. A strongly-typed version of external functions called extended instructions allows Calculator to be extended in a type-safe way.
Calculator will interact directly and indirectly with a number of other components. A compiler will create programs. Typically the compiler will instantiate instruction objects and insert them into a list. Once finished, the compiler will ask the instruction for a serial representation and create a text version of the program. The compiler may, at its option, also create the program text directly, but will give up the implicit type checking and other features.
An Execution Object (XO) will call the assembler with a text program. The assembler will allocate some of the register sets, and re-create the instruction object list, inserting the instructions into a Calculator object. Later, the XO can bind data, and execute Calculator.
Memory is modeled as a number of register sets. Each register set is stored in an array. Individual registers are accessed by an 0-based index into the set. Register sets are currently a simple abstraction from Fennel Tuples.
Calculator will perform all memory allocation during instantiation and configuration and no memory allocation during typical execution paths. Warning Messages and Error Messages may be allocated at runtime, however. See Warnings & Errors below. The caller will be responsible for allocating and de-allocating register sets.
Note: External functions and extended instructions may perform resource allocation as a side-effect. Calculator makes no attempt to track resource allocation. The program writer must explicitly call appropriate external functions to de-allocate such memory.
There are five register sets defined:
a. Input of program data. (i.e. function arguments)
b. Generally contains tuple data handed to XO.
c. Intended to be the main variant when Calculator programs are executed repeatedly.
a. Output of program data. (i.e. function return values)
b. Generally contains the tuple to be passed out of the XO.
c. Storage of intermediate results between program executions. (e.g. Aggregation Functions)
d. A compiler might also choose to store metadata such as exceptions, errors, warnings, overflow indicator bits, etc. in the output register set, but this usage is discouraged. Storing such data here requires the XO to filter the tuple before passing it on to the next XO.
a. General purpose intermediate storage.
b. Intermediate storage for integral types. (i.e. main memory)
c. Pre-allocated “heap space” for scratch storage of strings and other Large Objects.
a. When Calculator programs are executed repeatedly, static literals need not be mixed with changing input tuples.
b. Literal register set is allocated and set by the assembler, not by the XO.
c. For a very small increase in Calculator complexity, there is potentially a win in reduction of binding overhead.
d. Explicitly supporting literals in a register set obviates the need for a
literal addressing mode and its associated complexity (e.g. ADD Local(4),
$0x200
is not needed. $0x200 is loaded from the Literals register
instead: ADD Local(4), Literal(3)
)
a. Provide a generic and flexible location for programs to communicate with the XO.
b. Prevents status information from mixing with the output register set, obviating the need for filtering the output tuple in the XO.
c. Allows for rich interaction between programs and XO without also requiring modification to Calculator.
Register sets will not be cleared or otherwise manipulated by Calculator object. Only two types of objects within the Calculator component will manipulate register sets: the assembler and subclasses of the Instruction class. By not clearing register sets, the output register set may also be used as input as well as output for instructions.
Genuflecting to John Hennessy and Dave Patterson, CalculatorÂ’s addressing modes will be orthogonal:
1. All operands of all instructions that reference memory can use all addressing modes
2. All register sets are readable and writeable by all instructions. However, sensible programming conventions would discourage inappropriate use of sets. (e.g. writing to the input and literal register sets)
3. All memory locations of a given type (and/or length, see Storage, below) can be used interchangeably.
4. No “special” registers.
5. All memory accesses will be typed. (Instructions will cast all accesses as the desired type.)
Efficiency is paramount! Operands in programs will be indexes into a given
register set. For example: Output[4] or Input[0]. The RegisterReference
class will provide an abstraction to allow various implementations of,
experiments with, and debug-level instrumentation of, all memory access.
A consideration: RegisterReference
may cache the abstract
register set reference “Local [4]” as a direct pointer to memory by setting the
CachePointer
property. Tuning may indicate that this is valuable
when applied to register sets that do not change, such as the Literal set and
the Local set, but reference invalidation costs for sets that typically re-bound
before each execution may not be worth the cost. The implication is that local
memory access may be somewhat faster than Input or Output access. TBD.
A consideration: Add an additional indexing addressing mode that allows easier manipulation of one-dimensional and multi-dimensional arrays, jump tables for CASE statements, and perhaps the register sets themselves. For example, the 80386 architecture supports a Scale/Index/Base addressing mode. Addresses are calculated as (Scale * Index) + Base. TBD
Nested rows issues? TBD
Multiple rows per execution? (e.g. batch) TBD.
Portability Invariant: A given program and input should produce (nearly) identical output on each and every supported hardware type. Calculator must be independent of underlying hardware – with sole and notable exception of rounding errors in floating point numbers.
Native type support, storage, typedefs, etc. are provided by Fennel Tuples.
Fennel supports types beyond the native C++ types, such as VARCHAR, VARBINARY,
and so forth. Therefore Calculator types are defined by the Fennel enum
StandardTypeDescriptorOrdinal.
The Calculator is strongly typed:
The implication is that the compiler must be aware of types, and must always Do The Right Thing.
Mappings between SQL types and C++ native types is provided by the Fennel
Tuple code. (See
documentation for SQL to Fennel type mapping.) Some types, such as SQLÂ’s
DECIMAL(p,s)
and TIMESTAMP(p)
types are not directly
supported by Calculator. The compiler should handle this type by generating
appropriate programs.
Tuple represent NULL values with a NULL pointer.
Non-native types supported by Fennel Tuples, such as VARCHAR
,
are simply represented by a pointer in the Calculator, and perhaps also an
integer containing the length of the array.
When represented as strings, for example in serialized programs, the mapping
from StandardTypeDescriptorOrdinal to strings is defined by
standardTypetoString()
located in
fennel/calc/StandardTypeToString.cpp.
Fennel/Tuple/ StandardTypeDescriptorOrdinal | String Representation |
---|---|
STANDARD_TYPE_INT_8 |
|
STANDARD_TYPE_UINT_8 |
|
STANDARD_TYPE_INT_16 |
|
STANDARD_TYPE_UINT_16 |
|
STANDARD_TYPE_INT_32 |
|
STANDARD_TYPE_UINT_32 |
|
STANDARD_TYPE_INT_64 |
|
STANDARD_TYPE_UINT_64 |
|
STANDARD_TYPE_REAL |
|
STANDARD_TYPE_DOUBLE |
|
STANDARD_TYPE_CHAR |
|
STANDARD_TYPE_VARCHAR |
|
STANDARD_TYPE_BINARY |
|
STANDARD_TYPE_VARBINARY |
|
Internationalization issues are expected to be transparent to Calculator. Byte arrays are represented by pointers.
ItÂ’s a Mad Mad Mad Mad World. Under normal running conditions, Calculator will make no attempt to isolate the C++ process from potentially ill side effects of Calculator program execution. This responsibility is completely up to the program compiler and the author of extended instructions.
Risks not handled, include, but are not limited to:
Some of these risks could be mitigated by only allowing a tightly controlled list of external functions to be called.
Risks handled:
Each instruction that may bump into these risks will have to handle them in a well defined manner and provide the appropriate error, warning or exception.
Calculator programs can be created and represented in several ways:
1) In-process C++ can access the instruction set OO hierarchy directly, cons up instruction objects, order them into programs, and have Calculator execute them. This is the typical Fennel access mode.
2) Out-of-process C++ can link in Calculator code to access the instruction set OO hierarchy directly. Each instruction can convert itself into a serialized (string) representation. The serial representation can be transmitted to another C++ process for assembly and execution. See Assembler section for details on the format of these strings.
3) In or Out of process Java creates, via some arbitrary method, a string representation of a program. The program can be transmitted to a C++ process.
4) Out-of-process C++ can also use method #3.
All jumps are to absolute instruction “addresses”. The effective address of an instruction is determined solely by the order that it is inserted into Calculator. The first instruction inserted is at address 0 – that is to say, instruction addresses are 0-indexed
Jump labels, instead of absolute jump addresses would make programs easier to read and debug, but would add complexity to Calculator and perhaps to the compiler. For now, labels are not supported. A compiler and program writer might consider implementing labels as a convenience in creating programs.
A consideration: May want to provide some support for efficient processing of SQL CASE statements, such as the SIB addressing mode mentioned above.
Yet to be fully specified. TBD
Will use the RAISE instruction.
May want to support catch, throw, try, finally.
When an exception is thrown in a program, Calculator will eventually throw a C++ exception that the caller to Calculator.exec() is encouraged to catch and decode.
Format and values of object thrown by exception TBD.
Implemented, but not currently documented. TBD
Yet to be specified. TBD
Will have some form of static lookup table of supported functions.
Assembler will have to convert a string function name into a function pointer.
Will have to support a pointer type.
John Sichi has recommended looking at Functors. Look at Boost Functors. May be useful here.
Provide/find optimized str(3) functions that handle SQL semantics correctly. (e.g. ignoring trailing spaces, varbinary blobs of unequal length)
Consider resource allocation issues.
Consider how to represent Null stings vs. empty strings, etc.
Fennel Tuples represent NULL values as a NULL pointer. RegisterReference handles this convention.
All instructions will be NULL-aware and will implement appropriate SQL NULL semantics.
If a Local register may be set to NULL, and if Calculator will be called more
than once, the Local register set must have the PointerReset
property set. This property causes Calculator to reset the tuple pointer back to
its original location before the next call to exec().
Writing programs to test every location for IsNotNULL before each write could be a real bear. Program writers are encouraged to make liberal use of Local registers to avoid this problem. ItÂ’s possible that this representation of NULL is simply not strong enough, and Tuples will have to be changed to reflect this problem. TBD
Effect on strings and external functions? TBD
The compiler must take warning recovery and error handling into account. The compiler can decide if it wishes to perform exception avoidance by testing for common errors within the program, or let Calculator execute and handle the issue.
Warning and error codes codes are taken from Section 22.1 SQLSTATE of the SQL99 specification. Arbitrarily, CalculatorÂ’s implementation-defined errors are in the Class 5Z. This choice should be rationalized. TBD.
Warnings are the result of correctable problems. Program execution always continues after warnings and side-effects are well defined. Warning messages are coded and placed onto a deque (double-ended queue) for later access. A deque is used to allow interpretation of warnings in the order that they occurred (iterate from deque.begin()) or to allow interpretation of the last warning first (iterate from deque.end())
Warnings:
Warning | Code Class/Subclass | Side-effect(s) | Comments |
---|---|---|---|
Division By Zero |
22012 |
Result contains a NULL. | SQL99 spec appears to state that DIV/0 is an error. A side-effect is not specified. Consider making Division By Zero an Error. TBD |
Overflow |
22003 |
Round or Truncate? TBD | SQL99 says implementation may round or truncate |
Underflow |
22003 |
Round or Truncate? TBD | Does this even exist? |
Errors are problems that are not correctable, or problems with side-effects that are not well defined. Program execution always terminates immediately after errors. Error status is indicated in the last element of the warning deque. An exception is thrown after the error message is added to the warning queue. Format of this exception TBD.
Errors:
Error | Code | Side-effect(s) | Comments |
---|---|---|---|
Segmentation Fault |
5Z001 |
PC falls off end of program and/or jumps to a non-existent location. |
Base Class Name | Object Name | OpCode Mnemonic & String Representation |
Operands | Comment |
Type Support |
|||
---|---|---|---|---|---|---|---|---|
bool |
long ulong |
float long double |
T * |
|||||
·BooleanInstruction |
BoolAnd |
AND |
result, operand, operand |
|
|
|
|
|
BoolOr |
OR |
|
|
|
||||
BoolEqual |
EQ |
Equivalent to SQL boolean IS |
|
|
|
|||
BoolNotEqual |
NE |
Equivalent to SQL boolean IS NOT |
|
|
|
|||
BoolGreater |
GT |
TRUE > FALSE . No
need for GE, LE |
|
|
|
|||
BoolLess |
LT |
|
|
|
||||
BoolNot |
NOT |
result, operand |
|
|
|
|||
BoolMove |
MOVE |
|
|
|
||||
BoolIsNull |
ISNULL |
|
|
|
||||
BoolIsNotNull |
ISNOTNULL |
|
|
|
||||
BoolToNull |
TONULL |
result |
|
|
|
|||
·Â·NativeNative-Instruction |
NativeAdd |
ADD |
result, operand, operand |
|
|
|
||
NativeSub |
SUB |
|
|
|||||
NativeMul |
MUL |
|
|
|||||
NativeDiv |
DIV |
|
|
|||||
NativeNeg |
NEG |
result, operand |
|
|
||||
NativeMove |
MOVE |
|
|
|||||
NativeToNull |
TONULL |
result |
|
|
||||
·Â·IntegralNativeInstruction |
IntegralNativeMod |
MOD |
result, operand1, operand2 |
modulus operator, op1 is dividend, op2 is divisor |
|
|
|
|
IntegralNativeShiftLeft |
SHFL |
result, operand1, operand2 |
shift operand1 left or right by operand2 number of bits. Shifts in zeros. operator2 must be positive long or ulong |
|
|
|
||
IntegralNativeShiftRight |
SHFR |
|
|
|
||||
IntegralNativeAnd |
AND |
result, operand, operand |
Bitwise operations |
|
|
|
||
IntegralNativeOr |
OR |
|
|
|
||||
IntegralNativeXor |
XOR |
No SQL mapping. Dubious utility. |
|
|
|
|||
·Â·BoolNative- Instruction |
BoolNativeEqual |
EQ |
result, operand, operand |
returns boolean |
|
|
||
BoolNativeNotEqual |
NE |
|
|
|||||
BoolNativeGreater |
GT |
|
|
|||||
BoolNativeGreaterEqual |
GE |
|
|
|||||
BoolNativeLess |
LT |
|
|
|||||
BoolNativeLessEqual |
LE |
|
|
|||||
BoolNativeIsNull |
ISNULL |
result, operand |
|
|
||||
BoolNativeIsNotNull |
ISNOTNULL |
|
|
|||||
·Â·PointerPointer-Instruction |
PointerMove |
MOVE |
result, operand1 |
Result and Operand1 are pointers. Operand2 is Long |
|
|
|
|
PointerAdd |
ADD |
result, operand1, operand2 |
|
|
|
|||
PointerSub |
SUB |
|
|
|
||||
PointerToNull |
TONULL |
result |
|
|
|
|||
·Â·BoolPointer-Instruction |
PointerEqual |
EQ |
result, operand, operand |
returns boolean |
|
|
||
PointerNotEqual |
NE |
|
|
|||||
PointerGreater |
GT |
|
|
|||||
PointerGreaterEqual |
GE |
|
|
|||||
PointerLess |
LT |
|
|
|||||
PointerLessEqual |
LE |
|
|
|||||
PointerIsNull |
ISNULL |
result,
|
|
|
||||
PointerIsNotNull |
ISNOTNULL |
|
|
|||||
·JumpInstruction |
Jump |
JMP |
location |
unconditional jump |
|
|
|
|
JumpTrue |
JMPT |
location, operand |
conditional jump, operand is boolean |
|
|
|
|
|
JumpFalse |
JMPF |
|
|
|
||||
JumpNull |
JMPN |
|
|
|
||||
JumpNotNull |
JMPNN |
|
|
|
||||
·Instruction |
Raise |
RAISE |
literal long |
Exception. Operands TBD |
|
|
|
|
·Instruction |
Return |
RETURN |
Exit from Calculator. Operand TBD |
|
|
|
|
|
·ExtendedInstruction |
ExtendedInstruction |
EXT |
signature |
Signature is a string: “Fname(type1,type2,type3)” |
|
|
|
|
CalInstruction |
Call |
CALL |
TBD |
TBD |
|
|
|
|
The Assembler performs the following functions:
Serialized programs support the following features:
For any number of good reasons, Calculator should support XML representations of programs. Unfortunately, an XML representation introduces the requirement for an XML parser into Fennel, which is not strictly necessary for Calculator to work. Therefore, to avoid adding another library dependency on Fennel, the text representation of programs will not be XML.
This parser was considered, but eventually disregarded. The template programming style that Spirit is written in is appealing, but contains a steep learning curve. Maintenance of this parser could be difficult. Also, compile-time errors are rather difficult to decode at times. The developer might be forced to only make small changes, and then compile and test. Otherwise it can be nearly impossible to tell where an error was introduced. Perhaps the Spirit parser can be re-investigated in the future.
All of the functionality required can be provided cheaply by using a very
simple format to represent instructions. Expediency is the goal for now. The
implementation details of this parser are, at the moment, outside the scope of
this document. If the parser introduces an additional library dependency to
Fennel, it must be possible to compile Fennel without the parser. For example,
fennel/configure should take arguments, similar to –with-boost–with-stlport
,
such as –with-ANTLR –with-XERCES, or the like, to enable serial program
representation and the associated third-party parsing library.
Note: < whitespace > is allowed between tokens. Tokenizer or skip-parser should consume < whitespace >
Note: Format is case-sensitive.
< whitespace > ::= “ “ | “\n” | “\r” |
“\t”
< program > ::= <preamble>
“T” < program text >
< preamble > ::= < registerset def
> [ <literal set def > < literal values > ]
< registerset def > ::= < set name >
[ “,” ( < type size > )+ ]
< terminator >
< set name > ::= “O” | # Output
“I” | # Input – Generally unused.
“S” | # Status
“L” | # Local
< type size > ::= { < fixedlengthtype >
} |
{ < variablelengthtype > “,” < bytes > }
< fixedlengthtype > ::= < integers > |
< reals > |
< doubles > |
< chars > |
< binary >
< variablelengthtype >:: = < varchar > | < varbinary >
< literal set def > ::= “C” # Literal set, think Constant.
[ “,” ( < type size > )+ ]
< terminator >
< literal values > ::= “V” # Values
( < literal def> ) *
< terminator >
< literal def > ::= < integerval > |
< realval > |
< bytearrayval >
< integerval > ::= [ “-“ ]
( [0-9] ) +
< realval > ::= [ “-“ ]
( [0-9] ) +
“.”
( [0-9] ) +
< bytearrayval> ::= # TBD See section “Binary
Encoding”
< program text > ::= ( < instruction > ) + # a
program has 1 or more instructions
< instruction > ::= < opcode > “,”
< operand list >
< terminator >
< operand list > ::= { < result > “,” < operand > [ “,” <
operand > ] } |
{ < set size > }
< terminator > :: = “;”
< result > ::= <
output address > |
< local address >
< operand > ::= < output
address > |
< input address > |
< local address > |
< literal address > |
< status address > |
< instruction address >
/* < SIB address > */
< output address > ::= “O”
< type >
< whitespace >
< output index >
< input address > ::= “I”
< type >
< whitespace >
< input index >
< local address > ::= “L”
< type >
< whitespace >
< memory index >
< literal address > ::= “C”
< type >
< whitespace >
< literal index >
< status address > ::= “S”
< type >
< whitespace >
< literal index >
< instruction address > ::= “@” <instruction index >
/* < SIB address > ::= “*” < type > < scale >
“,” < index > “,” < base > */
< type >
::= < assembly code type >
# see Types section &
# StandardTypeToString
< set size > ::= <
output index > |
< input index> |
< local index> |
< literal index > |
< status index >
< output index > ::= 0 | 1 | ... |
<size of output register set – 1>
< input index > ::= 0 | 1 |
... | <size of input register set – 1>
< local index > ::= 0 | 1 |
... | <size of local register set – 1>
< literal index > ::= 0 | 1 | ... |
<size of literal register set – 1>
< status index > ::= 0 | 1 | ... |
<size of status register set – 1>
< instruction index > ::= 0 | 1 | . . . | <size of program – 1>
Note: May want additional < literal def >
definitions, for
example exponents: 6.0221 * 10^24. TBD
Character arrays (strings) and binary arrays will be encoded to allow transmission and viewing in less than 8-bit clean environments. Typically strings passed through the calculator will be encoded with Unicode and will therefore contain many non-printable (at least when interpreted as ASCII) characters. The format must also support the initialization of VARBINARY and other explicitly binary types, even though their use is likely to be somewhat rare.
Design goals, in rough order are:
Candidate encoding schemes include, but are not limited to:
ENCODING FORMAT IS TBD.
The following is a list of SQL statements and statement fragments and corresponding example compilations of those statements into Calculator programs. More efficient representations of the statements certainly exist. This list is not intended as a guide to the most efficient representation, rather as a possible representation.
CREATE TABLE COURSE (
Heading INT,
COG INT,
VMG REAL);
SELECT COG - Heading as Set
FROM COURSE
WHERE ( (Heading > 200 AND Heading < 220)
OR
VMG > 10.0)
);
I s4, s4, r; # Input Register Set is:
# 0: long "Heading", 1: long "COG"
# 2: real "VMG"
O s4; # Output Register Set:
# 0: long "Set"
S u1; # Status Register Set: unsigned char;
C u1, u1, u1, u1, u1, u4; # Local Register Set: 5 booleans,
# represented by unsigned chars
# and on s4 "Set"
L u1, u1, s4, s4, r; # Literal Register Set:
# boolean, boolean, long,
# long, real;
V 1, 0, 200, 220, 10.0; # Values for Literal Register Set,
# including a TRUE and FALSE
# Following could be done more
# efficently. Drawing out the example
# here on purpose:
T; # Start program text
GT L u1 0, I s4 0, C s4 2; #00# Heading > 200, store result bool in local 0
LT L u1 1, I s4 1, C s4 3; #01# Heading < 220, store result bool in local 1
AND L u1 2, L u1 0, L u1 1; #02# AND local 0 and local 1, result in local 2
GT L u1 3, I r 3, C r 4; #03# VMG > 10.0, result in local 3
OR L u1 4, L u1 2, L u1 3; #04# OR result of AND and SOG > 10.0
JMPT @ 8 , L u1 4; #05# Jump if WHERE is satisfied
MOVE S u1 0, C u1 0; #06# Move Literal[0] "TRUE" to
# Status[0]. Filter this row.
RETURN; #07# Exit, filtering row.
SUB O s4 0, I s4 1, I s4 0; #08# Subtract Input[0] from Input[1],
# into Output[0]
MOVE S u1 0, L u1 1; #09# Move Literal[0] "FALSE" to
# Status[0]. Do not filter this row.
RETURN; #10# Exit, passing row onward.
For completeness, a demonstration that the language can perform more complex algorithms, handle iteration, etc. Also, an argument for an indexing addressing mode:
// Fibonacci
input(long 0) = n, n>=2;
long prev, temp, curr, count
ulong index;
prev = 1; // mem(0)
curr = 1; // mem(1)
index = 0; // mem(2)
// count mem(3)
// temp mem(4)
out(index++) = prev;
out(index++) = curr;
for (count = 0; count < input(0); count++) {
temp = curr;
curr = curr + prev;
out(index++) = curr;
prev = temp;
}
Example register set layout:
inputs: [0] = n
literals: [0] = long 0, [1] = long 1, [2] = ulong 0;
locals: [0] = prev, [1] = curr, [2] = index, [3] = count, [4] = temp
Program:
(Note: Has not been updated to current instruction set.)
MOVE &0, % long 1; // prev = 1
MOVE &1, % long 1; // curr = 1
MOVE &2, % ulong 0; // index = 0
MOVEI #0, & long 0, &2; // note: index is implicitly ulong in 3rd operand.
ADD &2, & ulong 2, %ulong 1; // note: side effect! Read Opands read before writeback.
MOVEI #0, & long 0, &2;
ADD &2, & ulong 2, %ulong 1; // an inc operator might be nice, but would be redundant
MOVE &3, % long 0; // count = 0
_BEGINLOOP;
JMPGE _ENDLOOP, & long 3, $ long 0;
MOVE &4, & long 1; // temp = cur
ADD &1, & long 1, & long 0; // curr = curr + prev
MOVEI #0, & long 0, &2; // out(index) = curr
ADD &2, & ulong 2, % ulong 1; // index++
MOVE &0, & long 4; // prev = temp
ADD &3, & long 3, % long 1; // count++
JUMP _BEGINLOOP;
_ENDLOOP;
RETURN;
The design goal is to group classes first by type then by operation to allow base classes to be templated by type. Better combinatorics for integral types this way.
Please excuse this diagram represented as an outline.
Outline gude:
Object : Base Class
contains
contains
[Â…]
Member Function ()
Member Function ()
[Â…]
Calculator
Code
Program counter
Registers
Register Reference
Register Set Index
Register Index
Pointer
Registers
Input Register Set
Output Register Set
Local Register Set
Literal Register Set
Status Register Set
Instruction
longName() pure virtual
shortName() pure virtual
describe() pure virtual
exec() pure virtual private (called from friend class Calculator)
bindCalc() pure virtual
BoolInstruction : Instruction
BoolOr : BoolInstruction
BoolAnd : BoolInstruction
BoolNot : BoolInstruction
BoolMove : BoolInstruction
BoolEqual : BoolInstruction
BoolNotEqual : BoolInstruction
BoolLess : BoolInstruction
BoolGreater : BoolInstruction
BoolIsNull : BoolInstruction
BoolNotNull : BoolInstruction
NativeInstruction <T> : Instruction
NativeNativeInstruction <T> : NativeInstruction
NativeAdd <T> : NativeNativeInstruction
// Concrete
NativeSub <T> : NativeNativeInstruction // Concrete
NativeMul <T> : NativeNativeInstruction // Concrete
NativeDiv <T> : NativeNativeInstruction // Concrete
NativeNeg <T> : NativeNativeInstruction // Concrete
NativeMove <T> : NativeNativeInstruction // Concrete
IntegralNativeInstruction <T> : NativeInstruction
IntegralNativeMod <T> : IntegralNativeInstruction
// Concrete
IntegralNativeAnd <T> : IntegralNativeInstruction
// Concrete
IntegralNativeOr <T> : IntegralNativeInstruction
// Concrete
IntegralNativeShiftLeft <T> : IntegralNativeInstruction
// Concrete
IntegralNativeShiftRight <T> : IntegralNativeInstruction
// Concrete
BoolNativeInstruction <T> : NativeInstruction
BoolNativeEqual <T> :
BoolNativeInstruction //
Concrete
BoolNativeNotEqual <T> : BoolNativeInstruction
// Concrete
BoolNativeGreater <T> : BoolNativeInstruction
// Concrete
BoolNativeGreaterEqual <T> : BoolNativeInstruction //
Concrete
BoolNativeLess <T> : BoolNativeInstruction
// Concrete
BoolNativeLessEqual <T> : BoolNativeInstruction
// Concrete
BoolNativeIsNull <T> : BoolNativeInstruction
// Concrete
BoolNativeNotNull <T> : BoolNativeInstruction
// Concrete
PointerInstruction <T> : Instruction
PointerPointerInstruction <T> : PointerInstruction
PointerMove <T> : PointerInstruction // Concrete
PointerAdd <T> : PointerInstruction //
Concrete
PointerSub <T> : PointerInstruction //
Concrete
PointerToNull <T> : PointerInstruction //
Concrete
BoolPointerInstruction <T> : PointerInstruction
PointerEqual <T> : PointerInstruction // Concrete
PointerNotEqual <T> : PointerInstruction //
Concrete
PointerGreater <T> : PointerInstruction //
Concrete
PointerGreaterEqual <T> : PointerInstruction //
Concrete
PointerLess <T> : PointerInstruction // Concrete
PointerLessEqual <T> : PointerInstruction //
Concrete
PointerIsNull <T> : PointerInstruction // Concrete
PointerIsNotNull <T> : PointerInstruction //
Concrete
JumpInstruction : Instruction
Jump : FlowInstruction
// Concrete
JumpTrue: FlowInstruction // Concrete
JumpFalse: FlowInstruction // Concrete
JumpNull: FlowInstruction // Concrete
JumpNotNull: FlowInstruction // Concrete
ReturnInstruction : Instruction // Concrete
ReturnException : Instruction // Concrete
CallInstruction : Instruction
TBD
ReadOnly
RegisterReference property.ReadOnly
RegisterReference property.SQL Aggregate Functions and Analytic Functions can be supported
by the program writer by appropriate manipulation of Output register sets. The
“aggregator buckets” (nee accumulators) would be modeled as Output sets that
were passed to various Calculators. There is no requirement that Output sets be
cleared before use. Reading from Output registers is legal and encouraged. Note
that setting the CachePointer
property for Output register sets is
likely to be a very bad idea for this reason, and also for performance reasons.
Calculator setup is mid-weight, not light weight. Have to instantiate a number of objects to represent each instruction. This is mitigated somewhat by creating exactly one RegisterReference for each register, and passing pointers into each Instruction instance.
Calculator is not thread safe. Each calculator object is light weight enough that each thread can reasonably instantiate its own Calculator.
A handle primitive type might be useful for managing resources and utilizing external functions. TBD.
XOs will find it necessary to determine the format of the Input and Output tuples to perform their processing. getRegisterDescriptors() has been provided for this purpose.
Windowed Aggregation support requires a new group of extended instructions be defined and implemented. The main difference between the windowed aggregation functions and the equivalent standard aggregation functions is the data currently in the window must be "remembered". As processing proceeds new rows are added to the window and rows that meet the exit criteria are dropped. With the standard SUM function values are only added to the accumulator, but with the windowed aggregation version data values are also subtracted from the running sum as rows are dropped. The MIN and MAX functions also need special attention. Normally the MIN/MAX functions would only need to monitor the current row being processed to see if a new MIN/MAX value is presented. Windowed aggregation increases the complexity. As rows are dropped the current MIN/MAX value may be dropped. This forces the system to go back to the window and rescan the list for the "new" MIN/MAX value.
As with the rest of the calculator the windowed aggregation instructions are implemented to support different data types. Currently windowed aggregation supports 3 basic types of data. 64 bit Integer (INT64_T), 64 bit real (DOUBLE) and character string. BOOLEAN and BINARY are not supported data types. Smaller data types are supported through CASTing and must be CAST up to the supported size before submission to any windowed aggregation instructions. Return values are CAST down to match original data type.
The string data types are supported by MIN/MAX and COUNT. Calls to the SUM or AVG functions generate an appropriate calculator exception.
Support for the NUMERIC and DECIMAL data types is also provided through CASTing. Values defined with scale of zero are cast to INT64_T. Non-zero scaled values require a CAST to type DOUBLE before submission for windowed aggregation processing. Return values are cast back to match input data type.
While the COUNT, SUM and AVG functions are provided if the MIN and MAX functions are not being used then more efficient implementations of the functions are available that do not incur the overhead of managing the priority queue of window values.
Instruction |
Parameters |
Comments |
WinAggInit | result | On entry the data buffer for this register must contain the data type for this aggregation column. STANDARD_TYPE_INT_64, STANDARD_TYPE_DOUBLE and STANDARD_TYPE_VARCHAR are the only valid values. On return the data pointer points to a WinAggAccum object. It holds all entries in the window for the target column. This tuple(more specifically the buffer pointed to by pData) must be supplied to all subsequent aggregation calls for this column. |
WinAggAdd | result, newValue, winAggAccum | result - on exit contains the row count currently in the Window.
Same as WinAggCount newValue - Input register with the new value to be added to the window winAggAccum - Object holding current window information |
WinAggDrop | result, oldValue, winAggAccum | result - on exit contains the new row count for the
window. Same as WinAggCount oldValue - original tuple/value that was previously added to the window. |
WinAggCount | result, winAggAccum | result - on exit contains the total number of lines in the window including NULL entries. Return type is always INT64_T. |
WinAggSum | result, winAggAccum | result - on exit contains the sum over the current window entries data type matches column type. May return NULL |
WinAggAvg | result, winAggAccum | result - on exit contains the average calculated over the current window contents. Is equivalent to WinAggSum()/WinAggCount(). Data type matches the column type. |
WinAggMin | result, winAggAccum | result - on exit contains the Minimum value out of the current window contents. May return NULL |
WinAggMax | result, winAggAccum | result - on exit contains the Maximum value out of the current window contents. May return NULL |
$Id: //open/dt/dev/fennel/web/doc/doxygen/doc/calculator-compact.html#3 $