UnalignedAttributeAccessor Class Reference

UnalignedAttributeAccessor is similar to AttributeAccessor, except that it provides a by-value access model intended for storing individual data values with maximum compression (hence unaligned), as opposed to the tuple-valued by-reference model of AttributeAccessor. More...

#include <UnalignedAttributeAccessor.h>

List of all members.

Public Member Functions

 UnalignedAttributeAccessor ()
 UnalignedAttributeAccessor (TupleAttributeDescriptor const &attrDescriptor)
 Creates an accessor for the given attribute descriptor.
void compute (TupleAttributeDescriptor const &attrDescriptor)
 Precomputes access for a descriptor.
void storeValue (TupleDatum const &datum, PBuffer pDataWithLen) const
 Stores a value by itself, including length information, encoding it into the buffer passed in.
void loadValue (TupleDatum &datum, PConstBuffer pDataWithLen) const
 Loads a value from a buffer containing data encoded via storeValue.
TupleStorageByteLength getStoredByteCount (PConstBuffer pDataWithLen) const
 Gets the length information corresponding to the data stored in a buffer.
TupleStorageByteLength getMaxByteCount () const
 Get the maximum number of bytes required to store any value of the given attribute.

Private Member Functions

void compressInt64 (TupleDatum const &datum, PBuffer pDest) const
 Compresses and stores an 8-byte integer by stripping off leading zeros.
void uncompressInt64 (TupleDatum &datum, PConstBuffer pDataWithLen) const
 Uncompresses and loads an 8-byte integer, expanding it back to its original 8-byte value.
bool isInitialized () const

Private Attributes

uint cbStorage
bool omitLengthIndicator
bool isCompressedInt64

Static Private Attributes

static const TupleStorageByteLength ONE_BYTE_MAX_LENGTH = 127
static const TupleStorageByteLength TWO_BYTE_MAX_LENGTH = 32767
static const uint8_t ONE_BYTE_LENGTH_MASK = 0x7f
static const uint16_t TWO_BYTE_LENGTH_MASK1 = 0x7f00
static const uint16_t TWO_BYTE_LENGTH_MASK2 = 0x00ff
static const uint8_t TWO_BYTE_LENGTH_BIT = 0x80


Detailed Description

UnalignedAttributeAccessor is similar to AttributeAccessor, except that it provides a by-value access model intended for storing individual data values with maximum compression (hence unaligned), as opposed to the tuple-valued by-reference model of AttributeAccessor.

Note:
Two methods, storeValue and loadValue, store and load TupleDatum to and from a preallocated buffer. The storage format is different from the marshalled format for a tuple (see TupleAccessor), since there's only one TupleDatum involved and there is no need to store the offset needed for "constant seek time". The storage format depends on the type of the data stored and for variable-width values is prefixed with leading bytes containing the length of the data.

If the data is an 8-byte integer (other than null), the leading zeroes in the data are stripped, and the length of the remaining bytes is stored in the first byte, followed by the data.

If the data is fixed-width and non-nullable, only the data itself is stored. We do not need to store the length of the data in this case because it is fixed and can be determined from the type descriptor corresponding to the data.

In all other cases, a length is encoded in the leading bytes of the buffer, based on the number of bytes in the data. The byte format of the buffer after storeDatum is:

One length byte encodes value length from 0(0x0000) to 127(0x007f)
0xxxxxxx
-------- -------- -------- -------- -------- ...
|length | data value bytes
Two length bytes encode value length from 128(0x0080) to 32767(0x7fff)
1xxxxxxx xxxxxxxx
-------- -------- -------- -------- -------- ...
| length | data value bytes
where length (1 or 2 bytes) comes from TupleDatum.cbData (a 4 byte
type) and data value bytes are copied from TupleDatum.pData. When storing NULL values, the one-byte length value of 0x00 is used; empty strings are special-cased as the two-byte length value of 0x8000 (because NULL values are much more common than empty strings)

TODO jvs 22-Oct-2006: unify this up at the TupleAccessor level as a new TUPLE_FORMAT_UNALIGNED.

Definition at line 81 of file UnalignedAttributeAccessor.h.


Constructor & Destructor Documentation

UnalignedAttributeAccessor::UnalignedAttributeAccessor (  )  [explicit]

Definition at line 29 of file UnalignedAttributeAccessor.cpp.

References cbStorage, and MAXU.

00030 {
00031     cbStorage = MAXU;
00032 }

UnalignedAttributeAccessor::UnalignedAttributeAccessor ( TupleAttributeDescriptor const &  attrDescriptor  )  [explicit]

Creates an accessor for the given attribute descriptor.

Parameters:
attrDescriptor descriptor for values which will be accessed

Definition at line 34 of file UnalignedAttributeAccessor.cpp.

References compute().

00036 {
00037     compute(attrDescriptor);
00038 }


Member Function Documentation

void UnalignedAttributeAccessor::compressInt64 ( TupleDatum const &  datum,
PBuffer  pDest 
) const [inline, private]

Compresses and stores an 8-byte integer by stripping off leading zeros.

The stored value includes a leading byte indicating the length of the data.

Parameters:
[in] datum datum to compress
[in,out] pDest pointer to the buffer where the data will be stored

Definition at line 62 of file UnalignedAttributeAccessor.cpp.

References TupleDatum::cbData, FixedBuffer, and TupleDatum::pData.

Referenced by storeValue().

00065 {
00066     // NOTE jvs 22-Oct-2006:  Although it may not be obvious,
00067     // this correctly handles both STANDARD_TYPE_INT_64
00068     // and STANDARD_TYPE_UINT_64 (very large unsigned values
00069     // are handled as if they were negative here, but
00070     // the consumer of the TupleDatum won't be aware of that,
00071     // and the sign-extension in uncompress will be a no-op).
00072 
00073     assert(datum.cbData == 8);
00074     int64_t intVal = *reinterpret_cast<int64_t const *> (datum.pData);
00075     uint len;
00076 
00077     if (intVal >= 0) {
00078         FixedBuffer tmpBuf[8];
00079         PBuffer pTmpBuf = tmpBuf + 8;
00080         len = 0;
00081         do {
00082             *(--pTmpBuf) = intVal & 0xff;
00083             len++;
00084             intVal >>= 8;
00085         } while (intVal);
00086 
00087         // if the high bit is set, add an extra zero byte to distinguish this
00088         // value from a negative one
00089         if (*pTmpBuf & 0x80) {
00090             *(--pTmpBuf) = 0;
00091             len++;
00092         }
00093         *pDest = static_cast<uint8_t>(len);
00094         memcpy(pDest + 1, pTmpBuf, len);
00095     } else {
00096         // negative case -- calculate the number of bytes based on the value
00097         if (intVal >= -(0x80)) {
00098             len = 1;
00099         } else if (intVal >= -(0x8000)) {
00100             len = 2;
00101         } else if (intVal >= -(0x800000)) {
00102             len = 3;
00103         } else if (intVal >= -(0x80000000LL)) {
00104             len = 4;
00105         } else if (intVal >= -(0x8000000000LL)) {
00106             len = 5;
00107         } else if (intVal >= -(0x800000000000LL)) {
00108             len = 6;
00109         } else if (intVal >= -(0x80000000000000LL)) {
00110             len = 7;
00111         } else {
00112             len = 8;
00113         }
00114         *pDest = static_cast<uint8_t>(len);
00115         PBuffer pTmpBuf = pDest + 1 + len;
00116         while (len--) {
00117             *(--pTmpBuf) = intVal & 0xff;
00118             intVal >>= 8;
00119         }
00120     }
00121 }

void UnalignedAttributeAccessor::uncompressInt64 ( TupleDatum datum,
PConstBuffer  pDataWithLen 
) const [inline, private]

Uncompresses and loads an 8-byte integer, expanding it back to its original 8-byte value.

Parameters:
[in] datum datum to receive decompression result
[in] pDataWithLen data buffer to load from

Definition at line 123 of file UnalignedAttributeAccessor.cpp.

References TupleDatum::cbData, and TupleDatum::pData.

Referenced by loadValue().

00126 {
00127     uint len = *pDataWithLen;
00128     assert(len != 0);
00129     PConstBuffer pSrcBuf = pDataWithLen + 1;
00130     uint signByte = *(pSrcBuf++);
00131     // sign extend the high order byte if it's a negative number
00132     int64_t intVal =
00133         int64_t(signByte) | ((signByte & 0x80) ? 0xffffffffffffff00LL : 0);
00134     while (--len > 0) {
00135         intVal <<= 8;
00136         intVal |= *(pSrcBuf++);
00137     }
00138     datum.cbData = 8;
00139 
00140     // REVIEW jvs 25-Oct-2006:  Do we really need memcpy here?  I
00141     // think datum.pDatum is guaranteed to be aligned.
00142     memcpy(const_cast<PBuffer>(datum.pData), &intVal, 8);
00143 }

bool UnalignedAttributeAccessor::isInitialized (  )  const [private]

Definition at line 57 of file UnalignedAttributeAccessor.cpp.

References cbStorage, and isMAXU().

Referenced by getMaxByteCount(), getStoredByteCount(), loadValue(), and storeValue().

00058 {
00059     return !isMAXU(cbStorage);
00060 }

void UnalignedAttributeAccessor::compute ( TupleAttributeDescriptor const &  attrDescriptor  ) 

Precomputes access for a descriptor.

Must be called before any other method (or invoked explicitly by non-default constructor).

Parameters:
attrDescriptor descriptor for values which will be accessed

Definition at line 40 of file UnalignedAttributeAccessor.cpp.

References TupleAttributeDescriptor::cbStorage, cbStorage, StoredTypeDescriptor::getOrdinal(), isCompressedInt64, TupleAttributeDescriptor::isNullable, omitLengthIndicator, TupleAttributeDescriptor::pTypeDescriptor, STANDARD_TYPE_INT_64, STANDARD_TYPE_UINT_64, STANDARD_TYPE_UNICODE_VARCHAR, STANDARD_TYPE_VARBINARY, and STANDARD_TYPE_VARCHAR.

Referenced by LcsHash::init(), LcsRowScanExecStream::prepareResidualFilters(), and UnalignedAttributeAccessor().

00042 {
00043     cbStorage = attrDescriptor.cbStorage;
00044     StoredTypeDescriptor::Ordinal typeOrdinal =
00045         attrDescriptor.pTypeDescriptor->getOrdinal();
00046     isCompressedInt64 =
00047         (typeOrdinal == STANDARD_TYPE_INT_64) ||
00048         (typeOrdinal == STANDARD_TYPE_UINT_64);
00049     omitLengthIndicator =
00050         !attrDescriptor.isNullable
00051         && !isCompressedInt64
00052         && (typeOrdinal != STANDARD_TYPE_VARCHAR)
00053         && (typeOrdinal != STANDARD_TYPE_VARBINARY)
00054         && (typeOrdinal != STANDARD_TYPE_UNICODE_VARCHAR);
00055 }

void UnalignedAttributeAccessor::storeValue ( TupleDatum const &  datum,
PBuffer  pDataWithLen 
) const

Stores a value by itself, including length information, encoding it into the buffer passed in.

The caller needs to allocate a buffer of sufficient size. To do this, use the getMaxByteCount() method.
Parameters:
[in] datum value to be stored
[in,out] pDataWithLen data buffer to store to

Definition at line 145 of file UnalignedAttributeAccessor.cpp.

References TupleDatum::cbData, compressInt64(), isCompressedInt64, isInitialized(), omitLengthIndicator, ONE_BYTE_MAX_LENGTH, TupleDatum::pData, TWO_BYTE_LENGTH_BIT, TWO_BYTE_LENGTH_MASK1, TWO_BYTE_LENGTH_MASK2, and TWO_BYTE_MAX_LENGTH.

Referenced by LcsHash::insert(), and LcsHash::undoInsert().

00148 {
00149     assert(isInitialized());
00150 
00151     PBuffer tmpDataPtr = pDataWithLen;
00152 
00153     if (!datum.pData) {
00154         /*
00155          * NULL is stored as a special one-byte length: 0x00
00156          */
00157         *tmpDataPtr = 0x00;
00158     } else {
00159         /*
00160          * Note:
00161          * This storage format can only encode values shorter than 0x7fff bytes.
00162          */
00163         assert(datum.cbData <= TWO_BYTE_MAX_LENGTH);
00164 
00165         if (isCompressedInt64) {
00166             // strip off leading zeros from 8-byte ints
00167             compressInt64(datum, tmpDataPtr);
00168         } else {
00169             // for varying-length data and data that is nullable, store
00170             // a length byte in either 1 or 2 bytes, depending on the length
00171             if (!omitLengthIndicator) {
00172                 if (datum.cbData && (datum.cbData <= ONE_BYTE_MAX_LENGTH)) {
00173                     *tmpDataPtr = static_cast<uint8_t>(datum.cbData);
00174                     tmpDataPtr++;
00175                 } else {
00176                     uint8_t higherByte =
00177                         (datum.cbData & TWO_BYTE_LENGTH_MASK1) >> 8 |
00178                             TWO_BYTE_LENGTH_BIT;
00179                     uint8_t lowerByte  = datum.cbData & TWO_BYTE_LENGTH_MASK2;
00180                     *tmpDataPtr = higherByte;
00181                     tmpDataPtr++;
00182                     *tmpDataPtr = lowerByte;
00183                     tmpDataPtr++;
00184                 }
00185             }
00186 
00187             // store the value
00188             memcpy(tmpDataPtr, datum.pData, datum.cbData);
00189         }
00190     }
00191 }

void UnalignedAttributeAccessor::loadValue ( TupleDatum datum,
PConstBuffer  pDataWithLen 
) const

Loads a value from a buffer containing data encoded via storeValue.

Note:
See note on memCopyFrom method regarding why and how to preallocate the buffer.
Parameters:
[in] datum datum to receive loaded value
[in] pDataWithLen data buffer to load from

Definition at line 193 of file UnalignedAttributeAccessor.cpp.

References TupleDatum::cbData, cbStorage, isCompressedInt64, isInitialized(), omitLengthIndicator, ONE_BYTE_LENGTH_MASK, TupleDatum::pData, TWO_BYTE_LENGTH_BIT, and uncompressInt64().

Referenced by LcsColumnReader::findVal(), LcsCompareColKeyUsingOffsetIndex::lessThan(), and LcsHash::search().

00196 {
00197     assert(isInitialized());
00198     assert(datum.pData);
00199 
00200     // fixed width, non-nullable data is stored without leading length byte(s)
00201     if (omitLengthIndicator) {
00202         datum.cbData = cbStorage;
00203         memcpy(const_cast<PBuffer>(datum.pData), pDataWithLen, datum.cbData);
00204     } else {
00205         uint8_t firstByte = *pDataWithLen;
00206         if (!firstByte) {
00207             // null value
00208             datum.pData = NULL;
00209         } else if (firstByte & TWO_BYTE_LENGTH_BIT) {
00210             // not null, so must have a length that requires 2 bytes to
00211             // store
00212             datum.cbData =
00213                 ((firstByte & ONE_BYTE_LENGTH_MASK) << 8)
00214                 | *(pDataWithLen + 1);
00215             memcpy(
00216                 const_cast<PBuffer>(datum.pData),
00217                 pDataWithLen + 2,
00218                 datum.cbData);
00219         } else {
00220             if (isCompressedInt64) {
00221                 // 8-byte integers are stored with leading zeros stripped off
00222                 uncompressInt64(datum, pDataWithLen);
00223             } else {
00224                 // data that requires 1 byte to store the length
00225                 datum.cbData = firstByte;
00226                 memcpy(
00227                     const_cast<PBuffer>(datum.pData),
00228                     pDataWithLen + 1,
00229                     datum.cbData);
00230             }
00231         }
00232     }
00233 }

TupleStorageByteLength UnalignedAttributeAccessor::getStoredByteCount ( PConstBuffer  pDataWithLen  )  const

Gets the length information corresponding to the data stored in a buffer.

Parameters:
[in] pDataWithLen the data buffer to get the length from
Returns:
length of the value in stored format including any length indicator overhead

Definition at line 235 of file UnalignedAttributeAccessor.cpp.

References cbStorage, isInitialized(), omitLengthIndicator, ONE_BYTE_LENGTH_MASK, and TWO_BYTE_LENGTH_BIT.

Referenced by LcsHash::computeKey(), LcsClusterDump::fprintVal(), LcsHash::insert(), and LcsHash::restore().

00237 {
00238     assert(isInitialized());
00239     assert(pDataWithLen);
00240 
00241     if (omitLengthIndicator) {
00242         return cbStorage;
00243     }
00244 
00245     if (*pDataWithLen & TWO_BYTE_LENGTH_BIT) {
00246         return
00247             (((*pDataWithLen & ONE_BYTE_LENGTH_MASK) << 8)
00248                 | *(pDataWithLen + 1))
00249             + 2;
00250     } else {
00251         return (*pDataWithLen + 1);
00252     }
00253 }

TupleStorageByteLength UnalignedAttributeAccessor::getMaxByteCount (  )  const

Get the maximum number of bytes required to store any value of the given attribute.

Returns:
maximum storage length required for this attribute

Definition at line 255 of file UnalignedAttributeAccessor.cpp.

References cbStorage, isInitialized(), and omitLengthIndicator.

Referenced by LcsHash::init().

00256 {
00257     assert(isInitialized());
00258 
00259     if (omitLengthIndicator) {
00260         return cbStorage;
00261     } else {
00262         return cbStorage + 2;
00263     }
00264 }


Member Data Documentation

const TupleStorageByteLength UnalignedAttributeAccessor::ONE_BYTE_MAX_LENGTH = 127 [static, private]

Definition at line 83 of file UnalignedAttributeAccessor.h.

Referenced by storeValue().

const TupleStorageByteLength UnalignedAttributeAccessor::TWO_BYTE_MAX_LENGTH = 32767 [static, private]

Definition at line 84 of file UnalignedAttributeAccessor.h.

Referenced by storeValue().

const uint8_t UnalignedAttributeAccessor::ONE_BYTE_LENGTH_MASK = 0x7f [static, private]

Definition at line 85 of file UnalignedAttributeAccessor.h.

Referenced by getStoredByteCount(), and loadValue().

const uint16_t UnalignedAttributeAccessor::TWO_BYTE_LENGTH_MASK1 = 0x7f00 [static, private]

Definition at line 86 of file UnalignedAttributeAccessor.h.

Referenced by storeValue().

const uint16_t UnalignedAttributeAccessor::TWO_BYTE_LENGTH_MASK2 = 0x00ff [static, private]

Definition at line 87 of file UnalignedAttributeAccessor.h.

Referenced by storeValue().

const uint8_t UnalignedAttributeAccessor::TWO_BYTE_LENGTH_BIT = 0x80 [static, private]

Definition at line 88 of file UnalignedAttributeAccessor.h.

Referenced by getStoredByteCount(), loadValue(), and storeValue().

uint UnalignedAttributeAccessor::cbStorage [private]

Definition at line 90 of file UnalignedAttributeAccessor.h.

Referenced by compute(), getMaxByteCount(), getStoredByteCount(), isInitialized(), loadValue(), and UnalignedAttributeAccessor().

bool UnalignedAttributeAccessor::omitLengthIndicator [private]

Definition at line 92 of file UnalignedAttributeAccessor.h.

Referenced by compute(), getMaxByteCount(), getStoredByteCount(), loadValue(), and storeValue().

bool UnalignedAttributeAccessor::isCompressedInt64 [private]

Definition at line 94 of file UnalignedAttributeAccessor.h.

Referenced by compute(), loadValue(), and storeValue().


The documentation for this class was generated from the following files:
Generated on Mon Jun 22 04:00:48 2009 for Fennel by  doxygen 1.5.1