Platforms to show: All Mac Windows Linux Cross-Platform
LlamaVocabMBS class
| Type | Topic | Plugin | Version | macOS | Windows | Linux | iOS | Targets |
| class | Llama | MBS Tools Plugin | 25.5 | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes | All |
This is an abstract class. You can't create an instance, but you can get one from various plugin functions.
- 8 properties
- 10 methods
- method Constructor Private
- method Destructor
- method Detokenize(tokens() as Int32, removeSpecial as boolean, unparseSpecial as Boolean) as String
- method isControl(Token as Int32) as Boolean
- method isEOG(Token as Int32) as Boolean
- method Text(Token as Int32) as String
- method TokenAttributes(Token as Int32) as Integer
- method Tokenize(text as String, AddSpecial as Boolean = true, ParseSpecial as Boolean = true) as Int32()
- method TokenScore(Token as Int32) as Single
- method TokenToPiece(Token as Int32, special as boolean = true) as String
- 19 constants
Constants
| Constant | Value | Description |
|---|---|---|
| TokenNull | -1 | The value used for a null token. |
Token Attributes
| Constant | Value | Description |
|---|---|---|
| TokenAttrByte | 32 |
Byte fallback token (e.g. used for bytes not covered by BPE merges) |
| TokenAttrControl | 8 |
Control token (e.g. special tokens, separators, directives) — tokens that don’t map directly to output text. |
| TokenAttrLstrip | 128 |
Token is to be left‑stripped (i.e. leading whitespace removal) |
| TokenAttrNormal | 4 |
A “normal” token (non‑special, non‑control) — a basic lexical token. |
| TokenAttrNormalized | 64 |
Token has been normalized (e.g. transformed / canonicalized) |
| TokenAttrRstrip | 256 |
Token is to be right‑stripped (i.e. trailing whitespace removal) |
| TokenAttrSingleWord | 512 |
Token represents a single word (i.e. atomic word, not subword) |
| TokenAttrUndefined | 0 |
No attribute set / default / “no classification” |
| TokenAttrUnknown | 1 |
Token whose status is unknown (e.g. not in vocabulary or fallback) |
| TokenAttrUnused | 2 |
Token that is present but not used / deprecated / reserved. |
| TokenAttrUserDefined | 16 |
A user‑defined token (e.g. custom token inserted by user / application) |
Vocab Types
| Constant | Value | Description |
|---|---|---|
| TypeBPE | 2 |
GPT-2 tokenizer based on byte-level BPE |
| TypeNone | 0 |
For models without vocab |
| TypePLAMO2 | 6 |
PLaMo-2 tokenizer based on Aho-Corasick with dynamic programming |
| TypeRWKV | 5 |
RWKV tokenizer based on greedy tokenization |
| TypeSPM | 1 |
LLaMA tokenizer based on byte-level BPE with byte fallback |
| TypeUGM | 4 |
T5 tokenizer based on Unigram |
| TypeWPM | 3 |
BERT tokenizer based on WordPiece |
This class has no sub classes.
Blog Entries
Release notes
- Version 26.1
- Fixed a memory leak in Tokenize and Detokenize functions in LlamaVocabMBS class.
Some methods using this class:
- LlamaSamplerMBS.InitInfill(vocab as LlamaVocabMBS) as LlamaSamplerMBS
Some properties using for this class:
- LlamaModelMBS.Vocab as LlamaVocabMBS
Some related classes:
The items on this page are in the following plugins: MBS Tools Plugin.
LlamaSamplerMBS - LMFitControlMBS