public final class IgnoreBlocksAfterContentFilter extends java.lang.Object implements BoilerpipeFilter
DefaultLabels.INDICATES_END_OF_TEXT
. These marks are ignored
unless a minimum number of words in content blocks occur before this mark (default: 60).
This can be used in conjunction with an upstream TerminatingBlocksFinder
.TerminatingBlocksFinder
Modifier and Type | Field and Description |
---|---|
static IgnoreBlocksAfterContentFilter |
DEFAULT_INSTANCE |
static IgnoreBlocksAfterContentFilter |
INSTANCE_200 |
Constructor and Description |
---|
IgnoreBlocksAfterContentFilter(int minNumWords) |
Modifier and Type | Method and Description |
---|---|
static IgnoreBlocksAfterContentFilter |
getDefaultInstance()
Returns the singleton instance for DeleteBlocksAfterContentFilter.
|
protected static int |
getNumFullTextWords(TextBlock tb) |
protected static int |
getNumFullTextWords(TextBlock tb,
float minTextDensity) |
boolean |
process(TextDocument doc)
Processes the given document
doc . |
public static final IgnoreBlocksAfterContentFilter DEFAULT_INSTANCE
public static final IgnoreBlocksAfterContentFilter INSTANCE_200
public IgnoreBlocksAfterContentFilter(int minNumWords)
public static IgnoreBlocksAfterContentFilter getDefaultInstance()
public boolean process(TextDocument doc) throws BoilerpipeProcessingException
BoilerpipeFilter
doc
.process
in interface BoilerpipeFilter
doc
- The TextDocument
that is to be processed.true
if changes have been made to the
TextDocument
.BoilerpipeProcessingException
protected static int getNumFullTextWords(TextBlock tb)
protected static int getNumFullTextWords(TextBlock tb, float minTextDensity)