See: Description
Class | Description |
---|---|
AddPrecedingLabelsFilter |
Adds the labels of the preceding block to the current block, optionally adding a prefix.
|
ArticleMetadataFilter | |
BlockProximityFusion |
Fuses adjacent blocks if their distance (in blocks) does not exceed a certain limit.
|
ContentFusion | |
DocumentTitleMatchClassifier |
Marks
TextBlock s which contain parts of the HTML
<TITLE> tag, using some heuristics which are quite
specific to the news domain. |
ExpandTitleToContentFilter |
Marks all
TextBlock s "content" which are between the headline and the part that
has already been marked content, if they are marked DefaultLabels.MIGHT_BE_CONTENT . |
KeepLargestBlockFilter |
Keeps the largest
TextBlock only (by the number of words). |
LabelFusion |
Fuses adjacent blocks if their labels are equal.
|
SimpleBlockFusionProcessor |
Merges two subsequent blocks if their text densities are equal.
|
The BoilerpipeFilters in this package are pure heuristics.