LhxHashTable Class Reference

#include <LhxHashTable.h>

List of all members.

Public Member Functions

void init (uint partitionLevelInit, LhxHashInfo const &hashInfo, AggComputerList *aggList, uint buildInputIndex)

Initialize the hash table.

void init (uint partitionLevelInit, LhxHashInfo const &hashInfo, uint buildInputIndex)

Initialize the hash table.

bool allocateResources (bool reuse=false)

Allocate blocks to hold the number of slots needed for this hash table.

void releaseResources (bool reuse=false)

Release the blocks allocated.

void calculateSize (LhxHashInfo const &hashInfo, uint inputIndex, BlockNum &numBlocks)

Compute the number of blocks and slots required by the hash table and its contents for "nRows" rows with "cndKeys" distinct key values, for the specified key(aggs included) and data descriptions.

void calculateNumSlots (RecordNum cndKeys, uint usablePageSize, BlockNum numBlocks)

Compute the number of slots required by this hash table to store "cndKeys" distinct key values.

PBuffer findKey (TupleData const &inputTuple, TupleProjection const &inputKeyProj, bool removeDuplicateProbe)

Find key node based on key cols.

bool addTuple (TupleData const &inputTuple)

Insert a new tuple.

PBuffer * getSlot (uint slotNum)

Get the slot indexed by slotNum.

uint getNumSlots () const

Returns:
number of slots.

PBuffer * getFirstSlot () const

Returns:
the first slot in a chain of slots.

PBuffer * getNextSlot (PBuffer *curSlot)

Returns:
the next slot following curSlot in the slot chain.

bool isHashGroupBy () const

Returns:
if this hash table aggregates input

string toString ()

Print the content of the hash table.

Static Public Attributes

static const uint LhxHashTableMinPages = 2

Private Member Functions

PBuffer allocBlock ()

Allocate a block.

PBuffer allocBuffer (uint bufSize)

Allocate a buffer of size bufSize.

string printSlot (uint slotNum)

Print the content of a slot, i.e.

bool addKeyData (TupleData const &inputTuple)

Add a key node, with data.

bool addData (PBuffer keyNode, TupleData const &inputTuple)

Add a data node, following an existing keyNode.

bool aggData (PBuffer destKeyLoc, TupleData const &inputTuple)

Aggregate a new tuple.

PBuffer findKeyLocation (TupleData const &inputTuple, TupleProjection const &inputKeyProj, bool isProbing, bool removeDuplicateProbe)

Find location that stores the key node based on key cols.

Static Private Member Functions

static uint slotsNeeded (RecordNum cndKeys)

Compute the number of slots required to hold "cndKeys" keys without significant collisions.

Private Attributes

uint numSlots

Size of the hash table, i.e.

std::vector< PBuffer > slotBlocks

Array of page buffers which have been allocated as index buffers.

PBuffer * firstSlot

PBuffer * lastSlot

SegmentAccessor scratchAccessor

Scratch accessor for allocating large buffer pages.

uint maxBlockCount

maximum number of blocks to use for building this hash table.

PBuffer firstBlock

Linked list of blocks to fit the hash entry array and hash value nodes in.

PBuffer currentBlock

LhxHashBlockAccessor blockAccessor

This block accessor can be associated with any block.

LhxHashBlockAccessor nodeBlockAccessor

This block accessor is associated with the first block that contains key or data nodes.

SegPageLock bufferLock

Lock on scratch page.

uint currentBlockCount

current number of scratch buffers in use.

bool filterNull

special hash table properties: hash table filtered null keys.

TupleProjection filterNullKeyProj

bool removeDuplicate

special hash table properties: hash table should remove duplicates.

uint partitionLevel

The hash generators used by this hash table: one for the current level; one for the sub partition level(==partitionLevl+1).

LhxHashGenerator hashGen

LhxHashGenerator hashGenSub

TupleProjection keyColsAndAggsProj

Fields in the inputTuple parameter to addTuple() method that will hold keyCols, Aggs, and data columns.

TupleProjection keyColsProj

TupleProjection aggsProj

TupleProjection dataProj

vector< LhxHashTrim > isKeyColVarChar

LhxHashKeyAccessor hashKeyAccessor

Accessors for the content of this hash table.

LhxHashDataAccessor hashDataAccessor

uint maxBufferSize

The maximum number of bytes writable in a scratch page.

bool isGroupBy

Marks if this hash table is built for Group-by.

bool hasAggregates

For group-bys, marks if there are any aggregates.

AggComputerList * aggComputers

aggregate computers passed in from the agg exec stream.

TupleData aggWorkingTuple

TupleDataWithBuffer aggResultTuple

TupleData tmpKeyTuple

TupleData tmpDataTuple

Detailed Description

Definition at line 552 of file LhxHashTable.h.

Member Function Documentation

PBuffer LhxHashTable::allocBlock ( ) [private]

Allocate a block.

Returns:: pointer to the block. NULL if maxBlockCount is exceeded.

Definition at line 358 of file LhxHashTable.cpp.

References SegPageLock::allocatePage(), blockAccessor, bufferLock, currentBlockCount, SegPageLock::getPage(), CachePage::getWritableData(), maxBlockCount, LhxHashBlockAccessor::setCurrent(), LhxHashNodeAccessor::setNext(), and SegPageLock::unlock().

Referenced by allocateResources(), and allocBuffer().

00359 {
00360     PBuffer resultBlock;
00361 
00362     if (currentBlockCount < maxBlockCount) {
00363         currentBlockCount ++;
00364         /*
00365          * Allocate a new block.
00366          */
00367         bufferLock.allocatePage();
00368         resultBlock = bufferLock.getPage().getWritableData();
00369         bufferLock.unlock();
00370 
00371         /*
00372          * The new block is not linked in yet.
00373          */
00374         blockAccessor.setCurrent(resultBlock, false, false);
00375         blockAccessor.setNext(NULL);
00376     } else {
00377         /*
00378          * Hash Table reached its maximum size.
00379          */
00380         resultBlock = NULL;
00381     }
00382     return resultBlock;
00383 }

PBuffer LhxHashTable::allocBuffer ( uint bufSize ) [private]

Allocate a buffer of size bufSize.

Parameters:

[in] bufSize

Returns:: pointer to the buffer. NULL if no more space left.

Definition at line 385 of file LhxHashTable.cpp.

References allocBlock(), LhxHashBlockAccessor::allocBuffer(), currentBlock, LhxHashNodeAccessor::getNext(), nodeBlockAccessor, LhxHashBlockAccessor::setCurrent(), and LhxHashNodeAccessor::setNext().

Referenced by addData(), addKeyData(), and aggData().

00386 {
00387     PBuffer resultBuf = nodeBlockAccessor.allocBuffer(bufSize);
00388 
00389     if (!resultBuf) {
00390         /*
00391          * Current block out of memory
00392          */
00393         PBuffer nextBlock = nodeBlockAccessor.getNext();
00394         if (nextBlock) {
00395             currentBlock = nextBlock;
00396         } else {
00397             PBuffer newBlock = allocBlock();
00398             nodeBlockAccessor.setNext(newBlock);
00399             currentBlock = newBlock;
00400         }
00401 
00402         if (currentBlock) {
00403             nodeBlockAccessor.setCurrent(currentBlock, false, false);
00404             resultBuf = nodeBlockAccessor.allocBuffer(bufSize);
00405 
00406             assert (resultBuf);
00407         }
00408     }
00409 
00410     return resultBuf;
00411 }

string LhxHashTable::printSlot ( uint slotNum ) [private]

Print the content of a slot, i.e.

the content of all the keys and their data nodes.

Parameters:

[in] slotNum

Definition at line 964 of file LhxHashTable.cpp.

References LhxHashKeyAccessor::getFirstData(), LhxHashNodeAccessor::getNext(), getSlot(), hashDataAccessor, hashKeyAccessor, LhxHashDataAccessor::setCurrent(), LhxHashKeyAccessor::setCurrent(), LhxHashDataAccessor::toString(), and LhxHashKeyAccessor::toString().

Referenced by toString().

00965 {
00966     ostringstream slotTrace;
00967     PBuffer *slot = getSlot(slotNum);
00968 
00969     slotTrace << "[Slot] [" << slotNum << "] [" << slot <<"]\n";
00970 
00971     /*
00972      * Print all keys in this slot.
00973      */
00974     PBuffer currentHashKey = *slot;
00975     while (currentHashKey) {
00976         hashKeyAccessor.setCurrent(currentHashKey, true);
00977         slotTrace << "     " << hashKeyAccessor.toString() << "\n";
00978 
00979         /*
00980          * Print all data with the same key.
00981          */
00982         PBuffer currentHashData = hashKeyAccessor.getFirstData();
00983         while (currentHashData) {
00984             hashDataAccessor.setCurrent(currentHashData, true);
00985             slotTrace << "          " << hashDataAccessor.toString() << "\n";
00986             /*
00987              * next data.
00988              */
00989             currentHashData = hashDataAccessor.getNext();
00990         }
00991 
00992         /*
00993          * next key.
00994          */
00995         currentHashKey = hashKeyAccessor.getNext();
00996     }
00997     return slotTrace.str();
00998 }

bool LhxHashTable::addKeyData ( TupleData const & inputTuple ) [private]

Add a key node, with data.

Parameters:

[in] inputTuple

Returns:: false if hash table is out of memory.

Definition at line 660 of file LhxHashTable.cpp.

Referenced by addTuple().

00661 {
00662     // REVIEW jvs 25-Aug-2006:  If we're not using a power of two to allow
00663     // for fast modulo, then it should probably be a prime number to
00664     // reduce collisions.  Broadbase had a table "BBPrime" which
00665     // allowed it to quickly find the closest prime number after doing
00666     // other calculations like resource estimation.
00667     uint slotNum =
00668         (hashGen.hash(inputTuple, keyColsProj, isKeyColVarChar)) % numSlots;
00669 
00670     PBuffer *slot = getSlot(slotNum);
00671     PBuffer *newLastSlot = NULL;
00672 
00673     if (!firstSlot) {
00674         firstSlot = slot;
00675         lastSlot  = slot;
00676     } else {
00677         if (!(*slot)) {
00678             // first time inserting into this slot
00679             // need to chain the slot in if insertion successful
00680             newLastSlot = slot;
00681         }
00682     }
00683 
00684     PBuffer newNextKey = *slot;
00685 
00686     PBuffer newKey = NULL;
00687 
00688     if (!isGroupBy) {
00689         tmpKeyTuple.projectFrom(inputTuple, keyColsProj);
00690         hashKeyAccessor.checkStorageSize(tmpKeyTuple, maxBufferSize);
00691         uint newKeyLen =
00692             hashKeyAccessor.getStorageSize(tmpKeyTuple);
00693         newKey = allocBuffer(newKeyLen);
00694     } else {
00695         aggResultTuple.resetBuffer();
00696         for (int i = 0; i < keyColsProj.size() ; i ++) {
00697             aggResultTuple[i].copyFrom(inputTuple[keyColsProj[i]]);
00698         }
00699 
00700         for (int i = 0; i < aggComputers->size(); i ++) {
00701             (*aggComputers)[i].initAccumulator(
00702                 aggResultTuple[aggsProj[i]], inputTuple);
00703         }
00704         hashKeyAccessor.checkStorageSize(aggResultTuple, maxBufferSize);
00705         newKey =
00706             allocBuffer(hashKeyAccessor.getStorageSize(aggResultTuple));
00707     }
00708 
00709     PBuffer newData = NULL;
00710 
00711     if (!isGroupBy) {
00712         /*
00713          * Tuple contains data portion. i.e. this is not a group by case.
00714          */
00715         tmpDataTuple.projectFrom(inputTuple, dataProj);
00716         hashDataAccessor.checkStorageSize(tmpDataTuple, maxBufferSize);
00717         uint newDataLen = hashDataAccessor.getStorageSize(tmpDataTuple);
00718         newData = allocBuffer(newDataLen);
00719     }
00720 
00721     if (!newKey || (!isGroupBy && !newData)) {
00722         /*
00723          * Ran out of memory.
00724          */
00725         return false;
00726     }
00727 
00728     PBuffer *nextSlot = NULL;
00729 
00730     if (newNextKey) {
00731         // if slot not empty
00732         // copy the nextSlot field from newNextKey to newKey
00733         hashKeyAccessor.setCurrent(newNextKey, true);
00734         nextSlot = hashKeyAccessor.getNextSlot();
00735         hashKeyAccessor.setNextSlot(NULL);
00736     }
00737 
00738     *slot = newKey;
00739     hashKeyAccessor.setCurrent(newKey, false);
00740     hashKeyAccessor.setMatched(false);
00741     hashKeyAccessor.setNext(newNextKey);
00742     hashKeyAccessor.setNextSlot(nextSlot);
00743     hashKeyAccessor.setFirstData(NULL);
00744 
00745     if (!isGroupBy) {
00746         /*
00747          * Store the key.
00748          */
00749         hashKeyAccessor.pack(tmpKeyTuple);
00750 
00751         /*
00752          * Add data portion to this key.
00753          */
00754         hashKeyAccessor.setCurrent(newKey, true);
00755         hashDataAccessor.setCurrent(newData, false);
00756         hashDataAccessor.pack(tmpDataTuple);
00757         hashKeyAccessor.addData(newData);
00758     } else {
00759         /*
00760          * Store the key and the aggs.
00761          */
00762         hashKeyAccessor.pack(aggResultTuple);
00763     }
00764 
00765 
00766     /*
00767      * Link this slot (if inserted to for the first time) into the linked list.
00768      */
00769     if (newLastSlot) {
00770         hashKeyAccessor.setCurrent((*lastSlot), true);
00771         hashKeyAccessor.setNextSlot(newLastSlot);
00772         lastSlot = newLastSlot;
00773     }
00774 
00775     return true;
00776 }

bool LhxHashTable::addData	(	PBuffer	keyNode,
		TupleData const &	inputTuple
	)			`[private]`

Add a data node, following an existing keyNode.

Parameters:

`[in]`	keyNode	the key node for this data node
`[in]`	inputTuple

Returns:: false if hash table is out of memory.

Definition at line 778 of file LhxHashTable.cpp.

References LhxHashKeyAccessor::addData(), allocBuffer(), LhxHashDataAccessor::checkStorageSize(), dataProj, LhxHashDataAccessor::getStorageSize(), hashDataAccessor, hashKeyAccessor, maxBufferSize, LhxHashDataAccessor::pack(), TupleData::projectFrom(), LhxHashDataAccessor::setCurrent(), LhxHashKeyAccessor::setCurrent(), and tmpDataTuple.

Referenced by addTuple().

00779 {
00780     /*
00781      * REVIEW: optimization possible here if dataProj is empty; i.e. key
00782      * contains all cols. We can keep a count in the key, instead of storing
00783      * empty data nodes following the key. See test case
00784      * LhxHashTableTest::testInsert1Ka().
00785      * Another case is to support setop ALL in future.
00786      */
00787     hashKeyAccessor.setCurrent(keyNode, true);
00788 
00789     tmpDataTuple.projectFrom(inputTuple, dataProj);
00790 
00791     hashDataAccessor.checkStorageSize(tmpDataTuple, maxBufferSize);
00792 
00793     uint newDataLen =
00794         hashDataAccessor.getStorageSize(tmpDataTuple);
00795     PBuffer newData = allocBuffer(newDataLen);
00796 
00797     if (!newData) {
00798         /*
00799          * Hash table out of memory.
00800          */
00801         return false;
00802     }
00803 
00804     hashDataAccessor.setCurrent(newData, false);
00805     hashDataAccessor.pack(tmpDataTuple);
00806     hashKeyAccessor.addData(newData);
00807     return true;
00808 }

bool LhxHashTable::aggData	(	PBuffer	destKeyLoc,
		TupleData const &	inputTuple
	)			`[private]`

Aggregate a new tuple.

Parameters:

`[in]`	destKeyLoc	pointer to the destination key
`[in]`	inputTuple

Definition at line 810 of file LhxHashTable.cpp.

Referenced by addTuple().

00811 {
00812     PBuffer destKey;
00813     /*
00814      * Need to copy destKey out as destKeyLoc might not be aligned.
00815      */
00816     memcpy((PBuffer)&destKey, destKeyLoc, sizeof(PBuffer));
00817 
00818     hashKeyAccessor.setCurrent(destKey, true);
00819 
00820     aggResultTuple.resetBuffer();
00821 
00822     hashKeyAccessor.unpack(aggWorkingTuple, keyColsAndAggsProj);
00823 
00824     for (int i = 0; i < keyColsProj.size() ; i ++) {
00825         aggResultTuple[i].copyFrom(inputTuple[keyColsProj[i]]);
00826     }
00827 
00828     for (int i = 0; i < aggComputers->size(); i ++) {
00829         (*aggComputers)[i].updateAccumulator(
00830             aggWorkingTuple[aggsProj[i]],
00831             aggResultTuple[aggsProj[i]],
00832             inputTuple);
00833     }
00834 
00835     hashKeyAccessor.checkStorageSize(aggResultTuple, maxBufferSize);
00836 
00837     uint newResultSize =
00838         hashKeyAccessor.getStorageSize(aggResultTuple);
00839 
00840     uint oldResultSize =
00841         hashKeyAccessor.getStorageSize(aggWorkingTuple);
00842 
00843     if (newResultSize > oldResultSize) {
00844         PBuffer newKey = NULL;
00845         PBuffer newNextKey = hashKeyAccessor.getNext();
00846 
00847         /*
00848          * The key buffer will not hold the new result. Need to allocate buffer
00849          * again.
00850          */
00851         newKey = allocBuffer(newResultSize);
00852 
00853         if (newKey) {
00854             /*
00855              * Save the current key's next slot so we can set it in the new
00856              * key
00857              */
00858             PBuffer *nextSlot = hashKeyAccessor.getNextSlot();
00859 
00860             /*
00861              * The old key buffer is not used any more. Write in the key
00862              * location the new key buffer.
00863              */
00864             memcpy(destKeyLoc, (PBuffer)&newKey, sizeof(PBuffer));
00865 
00866             hashKeyAccessor.setCurrent(newKey, false);
00867             hashKeyAccessor.setMatched(false);
00868             hashKeyAccessor.setNext(newNextKey);
00869             hashKeyAccessor.pack(aggResultTuple);
00870             hashKeyAccessor.setNextSlot(nextSlot);
00871             return true;
00872         } else {
00873             return false;
00874         }
00875     } else {
00876         /*
00877          * The key buffer can hold aggResultTuple.
00878          */
00879         hashKeyAccessor.pack(aggResultTuple);
00880         return true;
00881     }
00882 }

uint LhxHashTable::slotsNeeded ( RecordNum cndKeys ) [inline, static, private]

Compute the number of slots required to hold "cndKeys" keys without significant collisions.

Parameters:

[in] cndKeys number of distinct keys

Returns:: number of slots needed.

Definition at line 1277 of file LhxHashTable.h.

References MAXU.

Referenced by calculateNumSlots(), and calculateSize().

01278 {
01279     RecordNum cKeys = RecordNum(ceil(cndKeys * 1.2));
01280     if (cKeys >= uint(MAXU)) {
01281         return uint(MAXU) - 1;
01282     } else {
01283         return uint(cKeys);
01284     }
01285 }

PBuffer LhxHashTable::findKeyLocation	(	TupleData const &	inputTuple,
		TupleProjection const &	inputKeyProj,
		bool	isProbing,
		bool	removeDuplicateProbe
	)			`[private]`

Find location that stores the key node based on key cols.

Parameters:

`[in]`	inputTuple
`[in]`	inputKeyProj	key columns from the inputTuple.
`[in]`	isProbing	whether the hash table is being probed.
`[in]`	removeDuplicateProbe

Returns:: the buffer which stored the address of the key

Definition at line 613 of file LhxHashTable.cpp.

References LhxHashNodeAccessor::getNext(), LhxHashNodeAccessor::getNextLocation(), getSlot(), LhxHashGenerator::hash(), hashGen, hashKeyAccessor, isKeyColVarChar, LhxHashKeyAccessor::isMatched(), LhxHashKeyAccessor::matches(), numSlots, LhxHashKeyAccessor::setCurrent(), and LhxHashKeyAccessor::setMatched().

Referenced by addTuple(), and findKey().

00618 {
00619     uint slotNum =
00620         (hashGen.hash(inputTuple, inputKeyProj, isKeyColVarChar)) % numSlots;
00621 
00622     PBuffer *slot = getSlot(slotNum);
00623     PBuffer keyLocation = (PBuffer)slot;
00624     PBuffer firstKey = *slot;
00625     PBuffer nextKey;
00626 
00627     if (firstKey) {
00628         /*
00629          * Keep searching if the key has already been linked to keys in the
00630          * same slot.
00631          */
00632         hashKeyAccessor.setCurrent(firstKey, true);
00633         while (!hashKeyAccessor.matches(inputTuple, inputKeyProj)) {
00634             nextKey = hashKeyAccessor.getNext();
00635             if (!nextKey) {
00636                 return NULL;
00637             }
00638 
00639             keyLocation = hashKeyAccessor.getNextLocation();
00640             hashKeyAccessor.setCurrent(nextKey, true);
00641         }
00642     } else {
00643         return NULL;
00644     }
00645 
00646     /*
00647      * Found a matching key
00648      */
00649     if (removeDuplicateProbe && hashKeyAccessor.isMatched()) {
00650         return NULL;
00651     }
00652 
00653     if (isProbing) {
00654         hashKeyAccessor.setMatched(true);
00655     }
00656 
00657     return keyLocation;
00658 }

void LhxHashTable::init	(	uint	partitionLevelInit,
		LhxHashInfo const &	hashInfo,
		AggComputerList *	aggList,
		uint	buildInputIndex
	)

Initialize the hash table.

Parameters:

`[in]`	partitionLevelInit	recursive partitioning level
`[in]`	hashInfo
`[in]`	aggList	pointer to list of agg computers.
`[in]`	buildInputIndex	which input is the build side.

Definition at line 333 of file LhxHashTable.cpp.

References aggComputers, aggResultTuple, aggWorkingTuple, TupleData::compute(), TupleDataWithBuffer::computeAndAllocate(), hasAggregates, LhxHashInfo::inputDesc, and isGroupBy.

Referenced by LhxJoinExecStream::execute(), LhxAggExecStream::execute(), LhxPartitionWriter::open(), LhxJoinExecStream::open(), LhxAggExecStream::open(), and LhxHashTableTest::testInsert().

00338 {
00339     init(partitionLevelInit, hashInfo, buildInputIndex);
00340 
00341     aggComputers = aggList;
00342     /*
00343      * The last input is the build side. In the group by case, there is only
00344      * one input.
00345      */
00346     aggWorkingTuple.compute(hashInfo.inputDesc[buildInputIndex]);
00347     aggResultTuple.computeAndAllocate(hashInfo.inputDesc[buildInputIndex]);
00348 
00349     isGroupBy = true;
00350 
00351     if (aggList->size() > 0) {
00352         hasAggregates = true;
00353     } else {
00354         hasAggregates = false;
00355     }
00356 }

void LhxHashTable::init	(	uint	partitionLevelInit,
		LhxHashInfo const &	hashInfo,
		uint	buildInputIndex
	)

Initialize the hash table.

Parameters:

`[in]`	partitionLevelInit	recursive partitioning level
`[in]`	hashInfo
`[in]`	buildInputIndex	which input is the build side.

Definition at line 249 of file LhxHashTable.cpp.

References SegPageLock::accessSegment(), LhxHashInfo::aggsProj, aggsProj, blockAccessor, bufferLock, calculateNumSlots(), LhxHashInfo::cndKeys, currentBlockCount, LhxHashInfo::dataProj, dataProj, LhxHashInfo::filterNull, filterNull, LhxHashInfo::filterNullKeyProj, filterNullKeyProj, LhxHashBlockAccessor::getUsableSize(), hashDataAccessor, hashGen, hashGenSub, hashKeyAccessor, LhxHashDataAccessor::init(), LhxHashKeyAccessor::init(), LhxHashGenerator::init(), LhxHashBlockAccessor::init(), LhxHashInfo::inputDesc, isGroupBy, LhxHashInfo::isKeyColVarChar, isKeyColVarChar, keyColsAndAggsProj, keyColsProj, LhxHashInfo::keyProj, maxBlockCount, maxBufferSize, LhxHashInfo::memSegmentAccessor, nodeBlockAccessor, LhxHashInfo::numCachePages, partitionLevel, SegmentAccessor::pSegment, LhxHashInfo::removeDuplicate, removeDuplicate, and scratchAccessor.

00253 {
00254     maxBlockCount = hashInfo.numCachePages;
00255     assert (maxBlockCount > 1);
00256     scratchAccessor = hashInfo.memSegmentAccessor;
00257     partitionLevel = partitionLevelInit;
00258     bufferLock.accessSegment(scratchAccessor);
00259     currentBlockCount = 0;
00260 
00261     /*
00262      * Recompute num slots based on hashInfo.numCachePages
00263      */
00264     RecordNum cndKeys = hashInfo.cndKeys[buildInputIndex];
00265     uint usablePageSize = scratchAccessor.pSegment->getUsablePageSize();
00266 
00267     calculateNumSlots(cndKeys, usablePageSize, maxBlockCount);
00268 
00269     /*
00270      * special hash table properties.
00271      */
00272     filterNull = hashInfo.filterNull[buildInputIndex];
00273 
00274     filterNullKeyProj = hashInfo.filterNullKeyProj[buildInputIndex];
00275     removeDuplicate = hashInfo.removeDuplicate[buildInputIndex];
00276 
00277     blockAccessor.init(usablePageSize);
00278     nodeBlockAccessor.init(usablePageSize);
00279     maxBufferSize = nodeBlockAccessor.getUsableSize();
00280 
00281     hashGen.init(partitionLevel);
00282     hashGenSub.init(partitionLevel + 1);
00283 
00284     uint i;
00285 
00286     /*
00287      * The last input is the build side.
00288      */
00289     TupleDescriptor const &buildTupleDesc = hashInfo.inputDesc[buildInputIndex];
00290     keyColsProj = hashInfo.keyProj[buildInputIndex];
00291 
00292     /*
00293      * Initialize varchar type indicator for the build side. (Assumed to be the
00294      * last input.)
00295      */
00296     isKeyColVarChar = hashInfo.isKeyColVarChar[buildInputIndex];
00297     aggsProj = hashInfo.aggsProj;
00298     dataProj = hashInfo.dataProj[buildInputIndex];
00299 
00300     isGroupBy = false;
00301 
00302     /*
00303      * These steps initialize the keyColsProjInKey and aggsProjInKey which are
00304      * based on the new keyColsAndAggs tuple.
00305      */
00306     TupleDescriptor keyDesc;
00307     TupleDescriptor dataDesc;
00308     TupleProjection keyColsProjInKey;
00309     TupleProjection aggsProjInKey;
00310 
00311     uint keyCount = keyColsProj.size();
00312     for (i = 0; i < keyCount; i++) {
00313         keyDesc.push_back(buildTupleDesc[keyColsProj[i]]);
00314         keyColsProjInKey.push_back(i);
00315     }
00316 
00317     keyColsAndAggsProj = keyColsProj;
00318     for (i = 0; i < aggsProj.size(); i++) {
00319         keyColsAndAggsProj.push_back(aggsProj[i]);
00320         keyDesc.push_back(buildTupleDesc[aggsProj[i]]);
00321         aggsProjInKey.push_back(i + keyCount);
00322     }
00323 
00324     hashKeyAccessor.init(keyDesc, keyColsProjInKey, aggsProjInKey);
00325 
00326     for (i = 0; i < dataProj.size(); i++) {
00327         dataDesc.push_back(buildTupleDesc[dataProj[i]]);
00328     }
00329 
00330     hashDataAccessor.init(dataDesc);
00331 }

bool LhxHashTable::allocateResources ( bool reuse = false )

Allocate blocks to hold the number of slots needed for this hash table.

Parameters:

[in] reuse if true, reuse the blocks already allocated to this hash table.

Returns:: status. false if no more space left in the hash table.

Definition at line 413 of file LhxHashTable.cpp.

References allocBlock(), LhxHashBlockAccessor::allocSlots(), blockAccessor, currentBlock, firstBlock, firstSlot, LhxHashNodeAccessor::getNext(), LhxHashBlockAccessor::getSlotsPerBlock(), lastSlot, nodeBlockAccessor, numSlots, LhxHashBlockAccessor::setCurrent(), LhxHashNodeAccessor::setNext(), and slotBlocks.

Referenced by LhxPartitionWriter::aggAndMarshalTuple(), LhxPartitionWriter::allocateResources(), LhxJoinExecStream::execute(), LhxAggExecStream::execute(), LhxJoinExecStream::open(), LhxAggExecStream::open(), and LhxHashTableTest::testInsert().

00414 {
00415     assert (numSlots != 0);
00416 
00417     PBuffer newBlock;
00418 
00419     slotBlocks.clear();
00420     firstSlot = NULL;
00421     lastSlot = NULL;
00422 
00423     if (!reuse) {
00424         firstBlock = allocBlock();
00425     }
00426 
00427     currentBlock = firstBlock;
00428 
00429     /*
00430      * Should be able to allocate at least one block.
00431      */
00432     assert (currentBlock != NULL);
00433 
00434     uint numSlotsPerBlock = blockAccessor.getSlotsPerBlock();
00435 
00436     /*
00437      * Initialize the block (clear all bytes etc).
00438      */
00439     nodeBlockAccessor.setCurrent(currentBlock, false, true);
00440     slotBlocks.push_back(currentBlock);
00441 
00442     if (numSlots <= numSlotsPerBlock) {
00443         /*
00444          * This will be the first "node block", i.e. it contains key or
00445          * data nodes.
00446          * The allocate call sets the freePtr of the currentBlock
00447          * correctly.
00448          */
00449         nodeBlockAccessor.allocSlots(numSlots);
00450         return true;
00451     }
00452 
00453     /*
00454      * Need to allocate more than one block.
00455      */
00456     int numSlotsToAlloc = numSlots - numSlotsPerBlock;
00457 
00458     while (numSlotsToAlloc > 0) {
00459         newBlock = NULL;
00460         if (reuse) {
00461             newBlock = nodeBlockAccessor.getNext();
00462         }
00463 
00464         if (!newBlock) {
00465             newBlock = allocBlock();
00466             if (!newBlock) {
00467                 return false;
00468             }
00469         }
00470 
00471         /*
00472          * New block is linked to the end of the allocated block list.
00473          */
00474         nodeBlockAccessor.setNext(newBlock);
00475         currentBlock = newBlock;
00476         nodeBlockAccessor.setCurrent(currentBlock, false, true);
00477         slotBlocks.push_back(currentBlock);
00478 
00479         if (numSlotsToAlloc <= numSlotsPerBlock) {
00480             /*
00481              * This will be the first "node block", i.e. it contains key or
00482              * data nodes.
00483              * The allocate call sets the freePtr of the currentBlock
00484              * correctly.
00485              */
00486             nodeBlockAccessor.allocSlots(numSlotsToAlloc);
00487         }
00488 
00489         numSlotsToAlloc -= numSlotsPerBlock;
00490     }
00491     return true;
00492 }

void LhxHashTable::releaseResources ( bool reuse = false )

Release the blocks allocated.

Parameters:

[in] reuse if true do not release the scratch pages back to the cache.

Definition at line 494 of file LhxHashTable.cpp.

References blockAccessor, currentBlock, currentBlockCount, firstBlock, hashDataAccessor, hashKeyAccessor, nodeBlockAccessor, NULL_PAGE_ID, SegmentAccessor::pSegment, LhxHashBlockAccessor::reset(), LhxHashNodeAccessor::reset(), and scratchAccessor.

Referenced by LhxJoinExecStream::closeImpl(), LhxAggExecStream::closeImpl(), LhxJoinExecStream::execute(), LhxAggExecStream::execute(), LhxJoinExecStream::open(), LhxAggExecStream::open(), LhxPartitionWriter::releaseResources(), and LhxHashTableTest::testInsert().

00495 {
00496     /*
00497      * Note: User of hash table needs to supply it with a private
00498      * scratchAccessor; otherwise, this call here can deallocate pages from
00499      * other clients of the shared scratchAccessor.
00500      */
00501     if (!reuse && scratchAccessor.pSegment) {
00502         scratchAccessor.pSegment->deallocatePageRange(
00503             NULL_PAGE_ID,
00504             NULL_PAGE_ID);
00505         firstBlock = NULL;
00506         currentBlockCount = 0;
00507     }
00508 
00509     hashKeyAccessor.reset();
00510     hashDataAccessor.reset();
00511     blockAccessor.reset();
00512     nodeBlockAccessor.reset();
00513     currentBlock = NULL;
00514 }

void LhxHashTable::calculateSize	(	LhxHashInfo const &	hashInfo,
		uint	inputIndex,
		BlockNum &	numBlocks
	)

Compute the number of blocks and slots required by the hash table and its contents for "nRows" rows with "cndKeys" distinct key values, for the specified key(aggs included) and data descriptions.

Parameters:

`[in]`	hashInfo
`[in]`	inputIndex	which input is the the hash table building on
`[out]`	numBlocks	max number of blocks for this hash table. If < 0, no stats are available to compute this value.

Definition at line 540 of file LhxHashTable.cpp.

References LhxHashInfo::cndKeys, LhxHashInfo::dataProj, dataProj, LhxHashDataAccessor::getAvgStorageSize(), LhxHashKeyAccessor::getAvgStorageSize(), LhxHashDataAccessor::init(), LhxHashKeyAccessor::init(), LhxHashInfo::inputDesc, isMAXU(), LhxHashInfo::keyProj, MAXU, LhxHashInfo::memSegmentAccessor, LhxHashInfo::numRows, TupleDescriptor::projectFrom(), SegmentAccessor::pSegment, and slotsNeeded().

Referenced by LhxJoinExecStream::prepare(), and LhxAggExecStream::prepare().

00544 {
00545     uint usablePageSize =
00546         (hashInfo.memSegmentAccessor.pSegment)->getUsablePageSize()
00547         - sizeof(PBuffer);
00548 
00549     TupleDescriptor const &inputDesc  = hashInfo.inputDesc[inputIndex];
00550 
00551     TupleProjection const &keyProj  = hashInfo.keyProj[inputIndex];
00552 
00553     TupleProjection const &dataProj  = hashInfo.dataProj[inputIndex];
00554 
00555     RecordNum cndKeys = hashInfo.cndKeys[inputIndex];
00556     RecordNum numRows = hashInfo.numRows[inputIndex];
00557     // if we don't have stats, don't bother trying to compute the hash table
00558     // size
00559     if (isMAXU(cndKeys) || isMAXU(numRows)) {
00560         numBlocks = MAXU;
00561         return;
00562     }
00563 
00564     TupleDescriptor keyDesc;
00565     keyDesc.projectFrom(inputDesc, keyProj);
00566 
00567     TupleDescriptor dataDesc;
00568     dataDesc.projectFrom(inputDesc, dataProj);
00569 
00570     LhxHashKeyAccessor tmpKey;
00571     LhxHashDataAccessor tmpData;
00572 
00573     TupleProjection tmpKeyProj;
00574     TupleProjection tmpAggsProj;
00575 
00576     /*
00577      * When estimating hash table size, ignore aggregate fields.
00578      */
00579     for (int i = 0; i < keyDesc.size(); i ++) {
00580         tmpKeyProj.push_back(i);
00581     }
00582 
00583     tmpKey.init(keyDesc, tmpKeyProj, tmpAggsProj);
00584     tmpData.init(dataDesc);
00585 
00586     double totalBytes =
00587         slotsNeeded(cndKeys) * sizeof(PBuffer)
00588         + cndKeys * tmpKey.getAvgStorageSize()
00589         + numRows * tmpData.getAvgStorageSize();
00590     double nBlocks = ceil(totalBytes / usablePageSize);
00591     if (nBlocks >= BlockNum(MAXU)) {
00592         numBlocks = BlockNum(MAXU) - 1;
00593     } else {
00594         numBlocks = BlockNum(nBlocks);
00595     }
00596 }

void LhxHashTable::calculateNumSlots	(	RecordNum	cndKeys,
		uint	usablePageSize,
		BlockNum	numBlocks
	)

Compute the number of slots required by this hash table to store "cndKeys" distinct key values.

Parameters:

`[in]`	cndKeys
`[in]`	usablePageSize	indicate the usable page size.
`[in]`	numBlocks	maximum number of blocks budgeted for this hash table

Definition at line 516 of file LhxHashTable.cpp.

References isMAXU(), max(), min(), numSlots, and slotsNeeded().

Referenced by init(), LhxPartitionWriter::open(), and LhxHashTableTest::testInsert().

00520 {
00521     // if we don't have stats for the number of distinct keys, just
00522     // use a default value
00523     if (isMAXU(cndKeys)) {
00524         cndKeys = RecordNum(10000);
00525     }
00526 
00527     /*
00528      * Use at least 1%, but no more than 10% of hash table cache pages to store
00529      * slots.
00530      */
00531     uint slotsLow = numBlocks * usablePageSize / sizeof(PBuffer) / 100;
00532     uint slotsHigh = numBlocks * usablePageSize / sizeof(PBuffer) / 10;
00533 
00534     numSlots =
00535         max(slotsNeeded(cndKeys), slotsLow);
00536 
00537     numSlots = min(numSlots, slotsHigh);
00538 }

PBuffer LhxHashTable::findKey	(	TupleData const &	inputTuple,
		TupleProjection const &	inputKeyProj,
		bool	removeDuplicateProbe
	)

Find key node based on key cols.

Parameters:

`[in]`	inputTuple
`[in]`	inputKeyProj	key columns from the inputTuple.
`[in]`	removeDuplicateProbe

Returns:: the buffer which stored the address of the key

Definition at line 940 of file LhxHashTable.cpp.

References findKeyLocation().

Referenced by LhxJoinExecStream::execute(), and LhxHashTableTest::testInsert().

00944 {
00945     PBuffer destKey;
00946     PBuffer destKeyLoc;
00947     bool isProbing = true;
00948     destKeyLoc =
00949         findKeyLocation(
00950             inputTuple, inputKeyProj, isProbing,
00951             removeDuplicateProbe);
00952 
00953     if (destKeyLoc) {
00954         /*
00955          * Need to copy destKey out as destKeyLoc might not be aligned.
00956          */
00957         memcpy((PBuffer)&destKey, destKeyLoc, sizeof(PBuffer));
00958         return destKey;
00959     } else {
00960         return NULL;
00961     }
00962 }

bool LhxHashTable::addTuple ( TupleData const & inputTuple )

Insert a new tuple.

Parameters:

[in] inputTuple

Definition at line 884 of file LhxHashTable.cpp.

References addData(), addKeyData(), aggData(), TupleData::containsNull(), filterNull, filterNullKeyProj, findKeyLocation(), hasAggregates, isGroupBy, keyColsProj, and removeDuplicate.

Referenced by LhxPartitionWriter::aggAndMarshalTuple(), LhxJoinExecStream::execute(), LhxAggExecStream::execute(), and LhxHashTableTest::testInsert().

00885 {
00886     if (filterNull && inputTuple.containsNull(filterNullKeyProj)) {
00887         /*
00888          * When null values are filtered, and this tuple does
00889          * contain null in its key columns, do not add to hash
00890          * table.
00891          */
00892         return true;
00893     }
00894 
00895     /*
00896      * We are building the hash table.
00897      */
00898     bool isProbing = false;
00899     bool removeDuplicateProbe = false;
00900     PBuffer destKeyLoc =
00901         findKeyLocation(
00902             inputTuple, keyColsProj, isProbing,
00903             removeDuplicateProbe);
00904 
00905     if (!destKeyLoc) {
00906         /*
00907          * Key is not present in the hash table. Add both the key and the data.
00908          */
00909         return addKeyData(inputTuple);
00910     } else if (removeDuplicate) {
00911         /*
00912          * Do not add duplicate keys.
00913          */
00914         return true;
00915     } else {
00916         /*
00917          * Key is present in the hash table.
00918          * If hash join, add to the data list corresponding this key.
00919          * If hash aggregate, aggregate the new data.
00920          */
00921         if (!isGroupBy) {
00922             PBuffer destKey;
00923             /*
00924              * Need to copy destKey out as destKeyLoc might not be aligned.
00925              */
00926             memcpy((PBuffer*)&destKey, destKeyLoc, sizeof(PBuffer));
00927 
00928             assert (destKey);
00929 
00930             return addData(destKey, inputTuple);
00931         } else {
00932             if (!hasAggregates) {
00933                 return true;
00934             }
00935             return aggData(destKeyLoc, inputTuple);
00936         }
00937     }
00938 }

PBuffer * LhxHashTable::getSlot ( uint slotNum )

Get the slot indexed by slotNum.

Parameters:

[in] slotNum

Returns:: pointer to the slot.

Definition at line 599 of file LhxHashTable.cpp.

References blockAccessor, LhxHashBlockAccessor::getSlot(), LhxHashBlockAccessor::getSlotsPerBlock(), LhxHashBlockAccessor::setCurrent(), and slotBlocks.

Referenced by addKeyData(), findKeyLocation(), and printSlot().

00600 {
00601     PBuffer *slot;
00602     uint slotsPerBlock = blockAccessor.getSlotsPerBlock();
00603 
00604     blockAccessor.setCurrent(slotBlocks[slotNum / slotsPerBlock], true, false);
00605 
00606     slot = blockAccessor.getSlot(slotNum % slotsPerBlock);
00607 
00608     assert (slot);
00609 
00610     return slot;
00611 }

uint LhxHashTable::getNumSlots ( ) const [inline]

Returns:: number of slots.

Definition at line 1287 of file LhxHashTable.h.

References numSlots.

01288 {
01289     return numSlots;
01290 }

PBuffer * LhxHashTable::getFirstSlot ( ) const [inline]

Returns:: the first slot in a chain of slots.

Definition at line 1292 of file LhxHashTable.h.

References firstSlot.

Referenced by LhxHashTableReader::advanceSlot().

01293 {
01294     return firstSlot;
01295 }

PBuffer * LhxHashTable::getNextSlot ( PBuffer * curSlot ) [inline]

Returns:: the next slot following curSlot in the slot chain.

Definition at line 1297 of file LhxHashTable.h.

References LhxHashKeyAccessor::getNextSlot(), hashKeyAccessor, and LhxHashKeyAccessor::setCurrent().

Referenced by LhxHashTableReader::advanceSlot().

01298 {
01299     hashKeyAccessor.setCurrent((*curSlot), true);
01300     return hashKeyAccessor.getNextSlot();
01301 }

bool LhxHashTable::isHashGroupBy ( ) const [inline]

Returns:: if this hash table aggregates input

Definition at line 1303 of file LhxHashTable.h.

References isGroupBy.

Referenced by LhxHashTableReader::init().

01304 {
01305     return isGroupBy;
01306 }

string LhxHashTable::toString ( )

Print the content of the hash table.

Returns:: the string representation of this hash table.

Definition at line 1000 of file LhxHashTable.cpp.

References currentBlockCount, firstSlot, lastSlot, maxBlockCount, numSlots, partitionLevel, and printSlot().

Referenced by LhxHashTableDump::dump().

01001 {
01002     ostringstream hashTableTrace;
01003 
01004     hashTableTrace << "\n"
01005         << "[Hash Table : maximum # blocks = " << maxBlockCount     << "]\n"
01006         << "[             current # blocks = " << currentBlockCount << "]\n"
01007         << "[             # slots          = " << numSlots          << "]\n"
01008         << "[             partition level  = " << partitionLevel    << "]\n"
01009         << "[             first slot       = " << firstSlot         << "]\n"
01010         << "[             last  slot       = " << lastSlot          << "]\n";
01011 
01012     for (int i = 0; i < numSlots; i ++) {
01013         hashTableTrace << printSlot(i);
01014     }
01015 
01016     return hashTableTrace.str();
01017 }

Member Data Documentation

uint LhxHashTable::numSlots [private]

Size of the hash table, i.e.

number of slots

Definition at line 561 of file LhxHashTable.h.

Referenced by addKeyData(), allocateResources(), calculateNumSlots(), findKeyLocation(), getNumSlots(), and toString().

std::vector<PBuffer> LhxHashTable::slotBlocks [private]

Array of page buffers which have been allocated as index buffers.

These contain arrays of pointers to tuple data stored in separate data buffers. Order is significant, since an index entry is decomposed into a page and a position on that page.

Definition at line 569 of file LhxHashTable.h.

Referenced by allocateResources(), and getSlot().

PBuffer* LhxHashTable::firstSlot [private]

Definition at line 571 of file LhxHashTable.h.

Referenced by addKeyData(), allocateResources(), getFirstSlot(), and toString().

PBuffer* LhxHashTable::lastSlot [private]

Definition at line 572 of file LhxHashTable.h.

Referenced by addKeyData(), allocateResources(), and toString().

SegmentAccessor LhxHashTable::scratchAccessor [private]

Scratch accessor for allocating large buffer pages.

Definition at line 577 of file LhxHashTable.h.

Referenced by init(), and releaseResources().

uint LhxHashTable::maxBlockCount [private]

maximum number of blocks to use for building this hash table.

Definition at line 582 of file LhxHashTable.h.

Referenced by allocBlock(), init(), and toString().

PBuffer LhxHashTable::firstBlock [private]

Linked list of blocks to fit the hash entry array and hash value nodes in.

A new block is linked to the head of the list.

Definition at line 593 of file LhxHashTable.h.

Referenced by allocateResources(), and releaseResources().

PBuffer LhxHashTable::currentBlock [private]

Definition at line 595 of file LhxHashTable.h.

Referenced by allocateResources(), allocBuffer(), and releaseResources().

LhxHashBlockAccessor LhxHashTable::blockAccessor [private]

This block accessor can be associated with any block.

Definition at line 600 of file LhxHashTable.h.

Referenced by allocateResources(), allocBlock(), getSlot(), init(), and releaseResources().

LhxHashBlockAccessor LhxHashTable::nodeBlockAccessor [private]

This block accessor is associated with the first block that contains key or data nodes.

Definition at line 606 of file LhxHashTable.h.

Referenced by allocateResources(), allocBuffer(), init(), and releaseResources().

SegPageLock LhxHashTable::bufferLock [private]

Lock on scratch page.

Definition at line 611 of file LhxHashTable.h.

Referenced by allocBlock(), and init().

uint LhxHashTable::currentBlockCount [private]

current number of scratch buffers in use.

Definition at line 616 of file LhxHashTable.h.

Referenced by allocBlock(), init(), releaseResources(), and toString().

bool LhxHashTable::filterNull [private]

special hash table properties: hash table filtered null keys.

Definition at line 621 of file LhxHashTable.h.

Referenced by addTuple(), and init().

TupleProjection LhxHashTable::filterNullKeyProj [private]

Definition at line 622 of file LhxHashTable.h.

Referenced by addTuple(), and init().

bool LhxHashTable::removeDuplicate [private]

special hash table properties: hash table should remove duplicates.

Definition at line 627 of file LhxHashTable.h.

Referenced by addTuple(), and init().

uint LhxHashTable::partitionLevel [private]

The hash generators used by this hash table: one for the current level; one for the sub partition level(==partitionLevl+1).

Definition at line 635 of file LhxHashTable.h.

Referenced by init(), and toString().

LhxHashGenerator LhxHashTable::hashGen [private]

Definition at line 636 of file LhxHashTable.h.

Referenced by addKeyData(), findKeyLocation(), and init().

LhxHashGenerator LhxHashTable::hashGenSub [private]

Definition at line 637 of file LhxHashTable.h.

Referenced by init().

TupleProjection LhxHashTable::keyColsAndAggsProj [private]

Fields in the inputTuple parameter to addTuple() method that will hold keyCols, Aggs, and data columns.

inputTuple should have the same shape as hashInfo.inputDesc[1] used in the init() method.

Definition at line 644 of file LhxHashTable.h.

Referenced by aggData(), and init().

TupleProjection LhxHashTable::keyColsProj [private]

Definition at line 645 of file LhxHashTable.h.

Referenced by addKeyData(), addTuple(), aggData(), and init().

TupleProjection LhxHashTable::aggsProj [private]

Definition at line 646 of file LhxHashTable.h.

Referenced by addKeyData(), aggData(), and init().

TupleProjection LhxHashTable::dataProj [private]

Definition at line 647 of file LhxHashTable.h.

Referenced by addData(), addKeyData(), calculateSize(), and init().

vector<LhxHashTrim> LhxHashTable::isKeyColVarChar [private]

Definition at line 648 of file LhxHashTable.h.

Referenced by addKeyData(), findKeyLocation(), and init().

LhxHashKeyAccessor LhxHashTable::hashKeyAccessor [private]

Accessors for the content of this hash table.

Definition at line 653 of file LhxHashTable.h.

Referenced by addData(), addKeyData(), aggData(), findKeyLocation(), getNextSlot(), init(), printSlot(), and releaseResources().

LhxHashDataAccessor LhxHashTable::hashDataAccessor [private]

Definition at line 654 of file LhxHashTable.h.

Referenced by addData(), addKeyData(), init(), printSlot(), and releaseResources().

uint LhxHashTable::maxBufferSize [private]

The maximum number of bytes writable in a scratch page.

Definition at line 659 of file LhxHashTable.h.

Referenced by addData(), addKeyData(), aggData(), and init().

bool LhxHashTable::isGroupBy [private]

Marks if this hash table is built for Group-by.

Group-by hash table only contains keys (group by keys plus aggregates) and does not contain data portion.

Definition at line 666 of file LhxHashTable.h.

Referenced by addKeyData(), addTuple(), init(), and isHashGroupBy().

bool LhxHashTable::hasAggregates [private]

For group-bys, marks if there are any aggregates.

Definition at line 671 of file LhxHashTable.h.

Referenced by addTuple(), and init().

AggComputerList* LhxHashTable::aggComputers [private]

aggregate computers passed in from the agg exec stream.

Definition at line 676 of file LhxHashTable.h.

Referenced by addKeyData(), aggData(), and init().

TupleData LhxHashTable::aggWorkingTuple [private]

Definition at line 677 of file LhxHashTable.h.

Referenced by aggData(), and init().

TupleDataWithBuffer LhxHashTable::aggResultTuple [private]

Definition at line 679 of file LhxHashTable.h.

Referenced by addKeyData(), aggData(), and init().

TupleData LhxHashTable::tmpKeyTuple [private]

Definition at line 758 of file LhxHashTable.h.

Referenced by addKeyData().

TupleData LhxHashTable::tmpDataTuple [private]

Definition at line 759 of file LhxHashTable.h.

Referenced by addData(), and addKeyData().

const uint LhxHashTable::LhxHashTableMinPages = 2 [static]

Definition at line 767 of file LhxHashTable.h.

Referenced by LhxJoinExecStream::getResourceRequirements(), and LhxAggExecStream::getResourceRequirements().

The documentation for this class was generated from the following files:

/home/pub/open/dev/fennel/hashexe/LhxHashTable.h
/home/pub/open/dev/fennel/hashexe/LhxHashTable.cpp

Generated on Mon Jun 22 04:00:38 2009 for Fennel by

1.5.1


Public Member Functions
void	init (uint partitionLevelInit, LhxHashInfo const &hashInfo, AggComputerList *aggList, uint buildInputIndex)
	Initialize the hash table.
void	init (uint partitionLevelInit, LhxHashInfo const &hashInfo, uint buildInputIndex)
	Initialize the hash table.
bool	allocateResources (bool reuse=false)
	Allocate blocks to hold the number of slots needed for this hash table.
void	releaseResources (bool reuse=false)
	Release the blocks allocated.
void	calculateSize (LhxHashInfo const &hashInfo, uint inputIndex, BlockNum &numBlocks)
	Compute the number of blocks and slots required by the hash table and its contents for "nRows" rows with "cndKeys" distinct key values, for the specified key(aggs included) and data descriptions.
void	calculateNumSlots (RecordNum cndKeys, uint usablePageSize, BlockNum numBlocks)
	Compute the number of slots required by this hash table to store "cndKeys" distinct key values.
PBuffer	findKey (TupleData const &inputTuple, TupleProjection const &inputKeyProj, bool removeDuplicateProbe)
	Find key node based on key cols.
bool	addTuple (TupleData const &inputTuple)
	Insert a new tuple.
PBuffer *	getSlot (uint slotNum)
	Get the slot indexed by slotNum.
uint	getNumSlots () const
	Returns: number of slots.
PBuffer *	getFirstSlot () const
	Returns: the first slot in a chain of slots.
PBuffer *	getNextSlot (PBuffer *curSlot)
	Returns: the next slot following curSlot in the slot chain.
bool	isHashGroupBy () const
	Returns: if this hash table aggregates input
string	toString ()
	Print the content of the hash table.
Static Public Attributes
static const uint	LhxHashTableMinPages = 2
Private Member Functions
PBuffer	allocBlock ()
	Allocate a block.
PBuffer	allocBuffer (uint bufSize)
	Allocate a buffer of size bufSize.
string	printSlot (uint slotNum)
	Print the content of a slot, i.e.
bool	addKeyData (TupleData const &inputTuple)
	Add a key node, with data.
bool	addData (PBuffer keyNode, TupleData const &inputTuple)
	Add a data node, following an existing keyNode.
bool	aggData (PBuffer destKeyLoc, TupleData const &inputTuple)
	Aggregate a new tuple.
PBuffer	findKeyLocation (TupleData const &inputTuple, TupleProjection const &inputKeyProj, bool isProbing, bool removeDuplicateProbe)
	Find location that stores the key node based on key cols.
Static Private Member Functions
static uint	slotsNeeded (RecordNum cndKeys)
	Compute the number of slots required to hold "cndKeys" keys without significant collisions.
Private Attributes
uint	numSlots
	Size of the hash table, i.e.
std::vector< PBuffer >	slotBlocks
	Array of page buffers which have been allocated as index buffers.
PBuffer *	firstSlot
PBuffer *	lastSlot
SegmentAccessor	scratchAccessor
	Scratch accessor for allocating large buffer pages.
uint	maxBlockCount
	maximum number of blocks to use for building this hash table.
PBuffer	firstBlock
	Linked list of blocks to fit the hash entry array and hash value nodes in.
PBuffer	currentBlock
LhxHashBlockAccessor	blockAccessor
	This block accessor can be associated with any block.
LhxHashBlockAccessor	nodeBlockAccessor
	This block accessor is associated with the first block that contains key or data nodes.
SegPageLock	bufferLock
	Lock on scratch page.
uint	currentBlockCount
	current number of scratch buffers in use.
bool	filterNull
	special hash table properties: hash table filtered null keys.
TupleProjection	filterNullKeyProj
bool	removeDuplicate
	special hash table properties: hash table should remove duplicates.
uint	partitionLevel
	The hash generators used by this hash table: one for the current level; one for the sub partition level(==partitionLevl+1).
LhxHashGenerator	hashGen
LhxHashGenerator	hashGenSub
TupleProjection	keyColsAndAggsProj
	Fields in the inputTuple parameter to addTuple() method that will hold keyCols, Aggs, and data columns.
TupleProjection	keyColsProj
TupleProjection	aggsProj
TupleProjection	dataProj
vector< LhxHashTrim >	isKeyColVarChar
LhxHashKeyAccessor	hashKeyAccessor
	Accessors for the content of this hash table.
LhxHashDataAccessor	hashDataAccessor
uint	maxBufferSize
	The maximum number of bytes writable in a scratch page.
bool	isGroupBy
	Marks if this hash table is built for Group-by.
bool	hasAggregates
	For group-bys, marks if there are any aggregates.
AggComputerList *	aggComputers
	aggregate computers passed in from the agg exec stream.
TupleData	aggWorkingTuple
TupleDataWithBuffer	aggResultTuple
TupleData	tmpKeyTuple
TupleData	tmpDataTuple