See: Description
Class | Description |
---|---|
BoilerplateBlockFilter |
Removes
TextBlock s which have explicitly been marked as "not content". |
InvertedFilter |
Reverts the "isContent" flag for all
TextBlock s |
LabelToBoilerplateFilter |
Marks all blocks that contain a given label as "boilerplate".
|
LabelToContentFilter |
Marks all blocks that contain a given label as "content".
|
MarkEverythingContentFilter |
Marks all blocks as content.
|
MinClauseWordsFilter |
Keeps only blocks that have at least one segment fragment ("clause") with at
least k words (default: 5).
|
MinWordsFilter |
Keeps only those content blocks which contain at least k words.
|
SplitParagraphBlocksFilter |
Splits TextBlocks at paragraph boundaries.
|
SurroundingToContentFilter |
The BoilerpipeFilters in this package are straight-forward and probably not really specific to English.