public class DensityRulesClassifier extends java.lang.Object implements BoilerpipeFilter
TextBlock
s as content/not-content through rules that have
been determined using the C4.8 machine learning algorithm, as described in the
paper "Boilerplate Detection using Shallow Text Features", particularly using
text densities and link densities.Modifier and Type | Field and Description |
---|---|
static DensityRulesClassifier |
INSTANCE |
Constructor and Description |
---|
DensityRulesClassifier() |
Modifier and Type | Method and Description |
---|---|
protected boolean |
classify(TextBlock prev,
TextBlock curr,
TextBlock next) |
static DensityRulesClassifier |
getInstance()
Returns the singleton instance for RulebasedBoilerpipeClassifier.
|
boolean |
process(TextDocument doc)
Processes the given document
doc . |
public static final DensityRulesClassifier INSTANCE
public static DensityRulesClassifier getInstance()
public boolean process(TextDocument doc) throws BoilerpipeProcessingException
BoilerpipeFilter
doc
.process
in interface BoilerpipeFilter
doc
- The TextDocument
that is to be processed.true
if changes have been made to the
TextDocument
.BoilerpipeProcessingException