Platforms to show: All Mac Windows Linux Cross-Platform

Back to DynaPDFParserMBS class.

DynaPDFParserMBS.Abort

Type Topic Plugin Version macOS Windows Linux iOS Targets
method DynaPDF MBS DynaPDF Plugin 24.0 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Cancels parsing.

The function can be used to abort parsing whenever needed. However, parsing can be aborted only if ParsePage() was called in a separate thread. Note that it is not allowed to execute different functions of the same PDF instance in different threads. Every thread requires either its own PDF instance or function calls must be synchronized.

DynaPDFParserMBS.ChangeAltFont(FontHandle as integer) as Boolean

Type Topic Plugin Version macOS Windows Linux iOS Targets
method DynaPDF MBS DynaPDF Plugin 24.0 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Changes alternative font to use.

Changes the active alternate font that is used by ReplaceSelText() when the original font is not available. The font handle must be a handle that was returned by SetFont(), SetFontEx(), SetCIDFont(), or SetAltFont().

DynaPDFParserMBS.Constructor(PDF as DynaPDFMBS, OptimizeFlags as Integer = 0, OptimizeParams as DynaPDFOptimizeParamsMBS = nil)

Type Topic Plugin Version macOS Windows Linux iOS Targets
method DynaPDF MBS DynaPDF Plugin 24.0 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
The constructor.

The function creates a parser context that can be used to edit and extract text, or do delete arbitrary operators of a page.

The content parser is used by Optimize() too. Therefore, the same flags and optimization parameters are supported. Please have a look at Optimize() for a description of the available flags and parameters. The parameter OptimizeParams can be set to nil and OptimizeFlags to kofDefault if nothing special should be achieved.

The flags and parameters for optimize are kept in properties for later review in debugger.

Raises exception in case creating the context fails.

DynaPDFParserMBS.DeleteText(area as DynaPDFRectMBS) as Boolean

Type Topic Plugin Version macOS Windows Linux iOS Targets
method DynaPDF MBS DynaPDF Plugin 24.0 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Deletes text in the area.
Example
Dim pdf As New MyDynapdfMBS
... load some PDF

Dim count As Integer = pdf.GetPageCount
Dim Parser As New DynaPDFParserMBS(pdf)
Dim ContentParsingFlags As Integer = DynaPDFParserMBS.kcpfEnableTextSelection

For i As Integer = 1 To count

If parser.ParsePage(i, ContentParsingFlags) Then

// rectangle at 200x200 with 200x200 size
Dim area As New DynaPDFRectMBS(200, 200+200, 200+200, 200)
Dim needWrite As Boolean = parser.DeleteText(area)

If needWrite Then
Call Parser.WriteToPage
End If
End If

next

The function deletes every glyph or character that touches or lies inside the rectangle Area.

Area must be defined as if the page would be viewed in a PDF viewer. That means in bottom up coordinates and the orientation must be considered (see Orientation in DynaPDFPageMBS class). The width and height of a page must be calculated from the crop box if set, or from the media box otherwise (see BBox() in DynaPDFPageMBS class). Note also that the width and height must be exchanged if the orientation is 90, -90, 270, or -270 degrees.

Note that this function deletes text only. Text can also occur in form of images or vector graphics. There are no functions yet to identify and delete text in such objects.

If the function succeeds the return value is true. If the function fails the return value is false.

DynaPDFParserMBS.ExtractText(TextExtractionFlags as Integer, byref Text as String, area as DynaPDFRectMBS = nil) as Boolean

Type Topic Plugin Version macOS Windows Linux iOS Targets
method DynaPDF MBS DynaPDF Plugin 24.0 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Extracts text from parser.

The function extracts the text of a page with the same algorithm that FindText() uses to find text on a page. In order to get exactly the same result the flag ktefSortTextX must be set.

The function ExtractText() of the PDF instance calles in fact this function internally.

The optional parameter Area can be set to restrict text extraction to that rectangle. The rectangle must be defined as if the page would be viewed in a PDF viewer. That means in bottom up coordinates and the orientation must be considered. The page coordinate system is de-rotated before text extraction starts since this produces better results. The width and height must be calculated from the crop box if set, or from the media box otherwise. Note also that the width and height must be exchanged if the orientation is 90, -90, 270, or -270 degrees.

Returns true on success or false for failure.

DynaPDFParserMBS.FindText(area as DynaPDFRectMBS, SearchType as Integer, findText as String, continueSearch as boolean = false) as Boolean

Type Topic Plugin Version macOS Windows Linux iOS Targets
method DynaPDF MBS DynaPDF Plugin 24.0 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Searches for text and stores the result so that further editing actions can be applied.

Area must be defined as if the page would be viewed in a PDF viewer. That means in bottom up coordinates and the orientation must be considered (see GetPageOrientation()). The width and height of a page must be calculated from the crop box if set, or from the media box otherwise (see GetPageBBox()). Note also that the width and height must be exchanged if the orientation is 90, -90, 270, or -270 degrees.

The page coordinate system is de-rotated since this produces better results and it is much easier to find the location of text in rotated pages.

FindText() is usually called inside a loop until no more occurrences of the search string can be found. In the first call, continueSearch must be false. That means start at the beginning. The continue searches with continueSearch = true.

DynaPDFParserMBS.ParsePage(PageNum as Integer, ContentParseFlags as Integer = 0) as Boolean

Type Topic Plugin Version macOS Windows Linux iOS Targets
method DynaPDF MBS DynaPDF Plugin 24.0 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Parses a page.

The function parses a page and stores the page contents in a objects internally. Once a page was parsed various functions can be called, e.g. to extract the text of a page, to find and replace text, or to delete arbitrary operators.

The page that should be parsed should be closed, that means it should not be opened for editing beforehand with EditPage() or Append(). The function can parse an open page too but this can lead to errors and is not recommended.

If the function succeeds the return value is true. If the function fails the return value is false.

Some examples using this method:

DynaPDFParserMBS.ReplaceSelText(NewText as String) as Boolean

Type Topic Plugin Version macOS Windows Linux iOS Targets
method DynaPDF MBS DynaPDF Plugin 24.0 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Replaces text on the page.

The function replaces or deletes the text that was found by FindText(). Coordinates of surrounding text are not changed, this means that the new text can overlap surrounding text if the width of the new text is larger than the original text.

So, text replacement has its limitations since at some point text must might be new formatted or re-aligned. However, replacing placeholders with text is usually no problem as long as there is enough room for the new text.

Note that placeholders should not contain space characters since spaces are often not stored in PDF files and this can lead to issues finding the text.

Font substitution
Text replacement depends on the availability of the fonts which are used in a PDF file. If the original font is not available the function loads an alternate font that matches the characteristics of the original font as close as possible. However, font substitution is not perfect and a substituted font looks sometimes more different as expected.

To improve text replacement it is possible to set one or more alternate fonts which should be used if the original font cannot be found on the system. Alternate fonts can be loaded with SetAltFont(). However, a font loaded by SetFont(), SetFontEx() or SetCIDFont() works too. In order to activate a font loaded by a regular font loading function call ChangeAltFont().

It is possible to load more than one alternate font but only the active font will be used when replacing text. If more than one font must be loaded, store the handle returned by SetAltFont() and change the font with ChangeAltFont() whenever needed.

Returns true on success and false on failure.

Some examples using this method:

DynaPDFParserMBS.Reset

Type Topic Plugin Version macOS Windows Linux iOS Targets
method DynaPDF MBS DynaPDF Plugin 24.0 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Resets the parser.

DynaPDFParserMBS.SelBBox2 as DynaPDFPointMBS()   New in 24.1

Type Topic Plugin Version macOS Windows Linux iOS Targets
method DynaPDF MBS DynaPDF Plugin 24.1 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Returns the bounding box as quad points of a single node or glyph of the current selection. This function is useful for rotated text.

Some examples using this method:

DynaPDFParserMBS.SetAltFont(Name as string, Style as integer = 0, Size as double = 12, Embed as boolean = true, CP as integer = &h27) as integer

Type Topic Plugin Version macOS Windows Linux iOS Targets
method DynaPDF MBS DynaPDF Plugin 24.0 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Sets the font as alternate font that is used by ReplaceText() if the original font is not available.

Although the parameter Size must be greater zero, the value is in fact not used when replacing text. The parameter is reserved for future use.
Please note that the replacement text must be defined in the code page with which the font was loaded.

To effiently change the font whenever needed, call ChangeAltFont(). It is also possible to call SetAltFont() but this would require more processing time.

If the function succeeds the return value is the font handle, a value greater of equal zero. If the function fails the return value is 0.

DynaPDFParserMBS.WriteToPage(OptimizeFlags as Integer = 0, OptimizeParams as DynaPDFOptimizeParamsMBS = nil) as Boolean

Type Topic Plugin Version macOS Windows Linux iOS Targets
method DynaPDF MBS DynaPDF Plugin 24.0 ✅ Yes ✅ Yes ✅ Yes ✅ Yes All
Writes changes back to the page.

Writes the objects, that was created by ParsePage(), back to the page. The flags and optional parameters were taken from Optimize() because Optimize() uses the very same parser to optimize pages. Please have a look at this function to determine which flags and parameters are available.

Unchanged pages can be left unchanged or written back to the page. This is up to you. If WriteToPage() is called then the content stream will be optimized. If nothing special should be achieved set Flags to kofDefault and Parms to nil.

The constructor accepts already its own flags and DynaPDFOptimizeParamsMBS object. The parameters that was passed to this function is used for all pages unless WritePage() contains its own version.

If the OptimizeParams parameters is passes then this parameters becomes the new default until it is overriden again by a WriteToPage() call. The flags are always overriden.
The function makes a copy of the parameters if set.

Some examples using this method:

The items on this page are in the following plugins: MBS DynaPDF Plugin.


The biggest plugin in space...