Falcon::TokenizerParams Class Reference

Parameters for the tokenizer. More...

#include <tokenizer.h>

Inheritance diagram for Falcon::TokenizerParams:

Inheritance graph
[legend]

List of all members.

Public Member Functions

TokenizerParamsbindSep (bool mode=true)
 Add the tokens to the non-token previous element.
TokenizerParamsgroupSep (bool mode=true)
 Activate this option to have the Tokenizer return only once for a sequence of separators all alike.
bool isBindSep () const
bool isGroupSep () const
bool isReturnSep () const
bool isTrim () const
bool isWsToken () const
int32 maxToken () const
TokenizerParamsmaxToken (int32 size)
 Sets the maximum size of the returned tokens.
TokenizerParamsreturnSep (bool mode=true)
 Returns found tokens separately.
 TokenizerParams ()
TokenizerParamstrim (bool mode=true)
 Whitespaces are trimmed from the retuned tokens.
TokenizerParamswsIsToken (bool mode=true)
 Treat a sequence of whitespaces of any lenght as a single token.


Detailed Description

Parameters for the tokenizer.

This is used for variable parameter idiom initialization of the Tokenizer class. Pass a direct instance of this class to configure the target Tokenizer.

The setting methods in this class return a reference to this class itself, so that is possible to set several behavior and settings in cascade.


Constructor & Destructor Documentation

Falcon::TokenizerParams::TokenizerParams (  )  [inline]


Member Function Documentation

TokenizerParams& Falcon::TokenizerParams::bindSep ( bool  mode = true  )  [inline]

Add the tokens to the non-token previous element.

This adds the separators to the token preceding them when returning the token. If grouping is activated, then more than a single separator may be returned.

TokenizerParams& Falcon::TokenizerParams::groupSep ( bool  mode = true  )  [inline]

Activate this option to have the Tokenizer return only once for a sequence of separators all alike.

In example, if the token list includes a space, then only one token will be returned no matter how many spaces are encountered. If not given, an empty string would be returned as a token if two tokens are found one after another.

bool Falcon::TokenizerParams::isBindSep (  )  const [inline]

bool Falcon::TokenizerParams::isGroupSep (  )  const [inline]

bool Falcon::TokenizerParams::isReturnSep (  )  const [inline]

bool Falcon::TokenizerParams::isTrim (  )  const [inline]

bool Falcon::TokenizerParams::isWsToken (  )  const [inline]

int32 Falcon::TokenizerParams::maxToken (  )  const [inline]

TokenizerParams& Falcon::TokenizerParams::maxToken ( int32  size  )  [inline]

Sets the maximum size of the returned tokens.

If the size of the input data exceeds this size while searching for a token, an item is returned as if a separator was found.

TokenizerParams& Falcon::TokenizerParams::returnSep ( bool  mode = true  )  [inline]

Returns found tokens separately.

This forces the tokenizer to return each token in a separate call. For example, if "," is a token:

         "a, b, c"
would be returned as "a" - "," - " b" - "," - " c".

TokenizerParams& Falcon::TokenizerParams::trim ( bool  mode = true  )  [inline]

Whitespaces are trimmed from the retuned tokens.

Whitespaces are tab, space, carrige return and line feed characters. If this option is actived, the returned tokens won't include spaces found at the beginning or at the end of the token. In example, if the spearator is ':', and trim is enabled, the following sequence:

         : a: b : :c
Will be parsed as a sequence of "a", "b", "", "c" tokens; otherwise, it would be parsed as " a", " b ", " ", "c".

TokenizerParams& Falcon::TokenizerParams::wsIsToken ( bool  mode = true  )  [inline]

Treat a sequence of whitespaces of any lenght as a single token.

This separates words between spaces and other tokens. For example, a text analyzer may use this mode to get words and puntactions with a single "next" call.


The documentation for this class was generated from the following file:

Generated on Mon Oct 19 10:11:48 2009 for Falcon_Core by  doxygen 1.5.8