public final class CommonExtractors
extends java.lang.Object
BoilerpipeExtractor
s.Modifier and Type | Field and Description |
---|---|
static ArticleExtractor |
ARTICLE_EXTRACTOR
Works very well for most types of Article-like HTML.
|
static CanolaExtractor |
CANOLA_EXTRACTOR
Trained on krdwrd Canola (different definition of "boilerplate").
|
static DefaultExtractor |
DEFAULT_EXTRACTOR
Usually worse than
ArticleExtractor , but simpler/no heuristics. |
static KeepEverythingExtractor |
KEEP_EVERYTHING_EXTRACTOR
Dummy Extractor; should return the input text.
|
static LargestContentExtractor |
LARGEST_CONTENT_EXTRACTOR
Like
DefaultExtractor , but keeps the largest text block only. |
public static final ArticleExtractor ARTICLE_EXTRACTOR
public static final DefaultExtractor DEFAULT_EXTRACTOR
ArticleExtractor
, but simpler/no heuristics.public static final LargestContentExtractor LARGEST_CONTENT_EXTRACTOR
DefaultExtractor
, but keeps the largest text block only.public static final CanolaExtractor CANOLA_EXTRACTOR
public static final KeepEverythingExtractor KEEP_EVERYTHING_EXTRACTOR
BoilerpipeExtractor
, or
somewhere else.