Platforms to show: All Mac Windows Linux Cross-Platform

Back to TessEngineMBS class.

TessEngineMBS.AllWordConfidences as Integer()

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Returns all word confidences (between 0 and 100) in an array.

The number of confidences should correspond to the number of space-delimited words in GetText.

TessEngineMBS.AnalyseLayout as TessPageIteratorMBS

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Runs page layout analysis in the mode set by SetPageSegMode.

May optionally be called prior to Recognize to get access to just the page layout results. Returns an iterator to the results.

Returns nil on error or an empty page.

TessEngineMBS.Clear

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Free up recognition results and any stored image data, without actually freeing any recognition data that would be time-consuming to reload.

Afterwards, you must call SetImage or TesseractRect before doing any Recognize or Get* operation.

TessEngineMBS.Constructor

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
The constructor.

Please call Initialize after this to get started.

TessEngineMBS.GetAltoText(PageNumber as Integer) as String

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Make an XML-formatted string with Alto markup from the internal data structures.

TessEngineMBS.GetAvailableLanguages as String()

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Returns the available languages as array.
Example
Dim OCR As TessEngineMBS // your instance of tesseract
Dim AvailableLanguages() As String = OCR.GetAvailableLanguages

Some examples using this method:

TessEngineMBS.GetBoolVariable(Name as String, byref value as boolean) as Boolean

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Queries boolean variable value.

Returns true if the parameter was found among Tesseract parameters.
Fills in value with the value of the parameter.

TessEngineMBS.GetBoxText(PageNumber as Integer) as String

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
The recognized text is returned as a char* which is coded in the same format as a box file used in training.

Constructs coordinates in the original image - not just the rectangle.
PageNumber is a 0-based page index that will appear in the box file.

TessEngineMBS.GetDoubleVariable(Name as String, byref value as Double) as Boolean

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Queries double variable value.

Returns true if the parameter was found among Tesseract parameters.
Fills in value with the value of the parameter.

TessEngineMBS.GetHOCRText(PageNumber as Integer) as String

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Make a HTML-formatted string with hOCR markup from the internal data structures.

PageNumber is 0-based but will appear in the output as 1-based.

TessEngineMBS.GetIntVariable(Name as String, byref value as Integer) as Boolean

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Queries integer variable value.

Returns true if the parameter was found among Tesseract parameters.
Fills in value with the value of the parameter.

TessEngineMBS.GetLoadedLanguages as String()

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Returns the loaded languages as array.
Example
Dim OCR As TessEngineMBS // your instance of tesseract
Dim LoadedLanguages() As String = OCR.GetLoadedLanguages

Includes all languages loaded by the last Init, including those loaded as dependencies of other loaded languages.

Some examples using this method:

TessEngineMBS.GetLSTMBoxText(PageNumber as Integer) as String

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Make a box file for LSTM training from the internal data structures.

Constructs coordinates in the original image - not just the rectangle.
PageNumber is a 0-based page index that will appear in the box file.

TessEngineMBS.GetStringVariable(Name as String) as String

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Queries string variable value.

Returns true if the parameter was found among Tesseract parameters.
Fills in value with the value of the parameter.

TessEngineMBS.GetText as String

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
The recognized text is returned as UTF-text.

TessEngineMBS.GetTsvText(PageNumber as Integer) as String

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Make a TSV-formatted string from the internal data structures.

PageNumber is 0-based but will appear in the output as 1-based.

TessEngineMBS.GetUNLVText as String

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
The recognized text is returned as a char* which is coded as UNLV format Latin-1 with specific reject and suspect codes.

TessEngineMBS.GetWordStrBoxText(PageNumber as Integer) as String

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
The recognized text is returned as a char* which is coded in the same format as a WordStr box file used in training.

PageNumber is a 0-based page index that will appear in the box file.

TessEngineMBS.Initialize(dataPath as String, language as String, Mode as Integer = 3, configs() as String = nil) as Boolean

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Initializes tesseract.
Example
Dim OCR As new TessEngineMBS // your instance of tesseract

If Not ocr.Initialize("C:\Program Files\Tesseract-OCR\tessdata", "eng") Then
MsgBox "failed to initialize"
Quit
End If

Returns true on success and false on failure.

The datapath must be the name of the tessdata directory.
The language is (usually) an ISO 639-3 string or "" will default to eng.
It is entirely safe (and eventually will be efficient too) to call Initialize multiple times on the same instance to change language, or just to reset the classifier.

The language may be a string of the form [~]<lang>[+[~]<lang>]* indicating that multiple languages are to be loaded. Eg hin+eng will load Hindi and English. Languages may specify internally that they want to be loaded with one or more other languages, so the ~ sign is available to override that. Eg if hin were set to load eng by default, then hin+~eng would force loading only hin. The number of loaded languages is limited only by memory, with the caveat that loading additional languages will impact both speed and accuracy, as there is more work to do to decide on the applicable language, and there is more chance of hallucinating incorrect words.

Warning: On changing languages, all Tesseract parameters are reset back to their default values. (Which may vary between languages.)
If you have a rare need to set a Variable that controls initialization for a second call to Init you should explicitly call End() and then use SetVariable before Init. This is only a very rare use case, since there are very few uses that require any parameters to be set before Init.

TessEngineMBS.IsValidWord(Word as String) as Boolean

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Check whether a word is valid according to Tesseract's language model.

Return false if the word is invalid, true if valid.

TessEngineMBS.PrintVariablesToFile(File as FolderItem) as Boolean

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Print Tesseract parameters to the given file.

Returns true on success.
Fails if the file can't be created.

TessEngineMBS.PrintVariablesToPath(Path as String) as Boolean

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Print Tesseract parameters to the given file.

Returns true on success.
Fails if the file can't be created.

TessEngineMBS.Recognize as Boolean

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Recognize the image from SetAndThresholdImage, generating Tesseract internal structures.

Returns true on success.

Optional. The Get*Text functions below will call Recognize if needed.

After Recognize, the output is kept internally until the next SetImage.

TessEngineMBS.ResultIterator as TessResultIteratorMBS

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Queries a result iterator.

Loop over it to query details.
The result iterator is only valid until you end the engine.

TessEngineMBS.SetImage(pic as picture)

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Sets input image via picture.
Example
Dim OCR As TessEngineMBS // your instance of tesseract

Dim f As FolderItem = SpecialFolder.Desktop.Child("test.jpg")
Dim p As Picture = f.OpenAsPicture
OCR.SetImage(p)

Pass Xojo picture and we copy the pixels.
Mask or alpha channel is ignored.

TessEngineMBS.SetImageData(Data as MemoryBlock)

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Sets input image via data.

Image data can be an image file content like JPEG or PNG.
Supported formats depends on what leptonica was compiled to support.

See also:

TessEngineMBS.SetImageData(Data as String)

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Sets input image via data.

Image data can be an image file content like JPEG or PNG.
Supported formats depends on what leptonica was compiled to support.

See also:

TessEngineMBS.SetImageFile(File as FolderItem)

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Sets input image via folderitem.
Example
Dim OCR As TessEngineMBS // your instance of tesseract

Dim f As FolderItem = SpecialFolder.Desktop.Child("test.jpg")
OCR.SetImageFile(f)

Point to an image file like JPEG or PNG.
Supported formats depends on what leptonica was compiled to support.

See also:

TessEngineMBS.SetImageFile(Path as String)

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Sets input image via file path.

Point to an image file like JPEG or PNG.
Supported formats depends on what leptonica was compiled to support.

See also:

TessEngineMBS.SetRectangle(Left as Integer, Top as Integer, Width as Integer, Height as Integer)

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Restrict recognition to a sub-rectangle of the image.

Call after SetImage.
Each SetRectangle clears the recogntion results so multiple rectangles can be recognized with the same image.

TessEngineMBS.SetVariable(Name as String, Value as String)

Type Topic Plugin Version macOS Windows Linux iOS Targets
method OCR MBS OCR Plugin 21.3 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Set the value of an internal "parameter."
Example
Dim OCR As TessEngineMBS // your instance of tesseract

OCR.SetVariable("tessedit_char_blacklist", "xyz")

Supply the name of the parameter and the value as a string, just as you would in a config file.
Returns false if the name lookup failed.
e.g.
SetVariable("tessedit_char_blacklist", "xyz") to ignore x, y and z.
Or
SetVariable("classify_bln_numeric_mode", "1") to set numeric-only mode.

SetVariable may be used before Init, but settings will revert to defaults on End().

Note: Must be called after Initialize(). Only works for non-init variables (init variables should be passed to Initialize()).

The items on this page are in the following plugins: MBS OCR Plugin.


The biggest plugin in space...