public final class HTMLHighlighter
extends java.lang.Object
TextDocument
.Modifier and Type | Method and Description |
---|---|
java.lang.String |
getExtraStyleSheet()
Returns the extra stylesheet definition that will be inserted in the HEAD
element.
|
java.lang.String |
getPostHighlight()
Returns the string that will be inserted after any highlighted HTML
block.
|
java.lang.String |
getPreHighlight()
Returns the string that will be inserted before any highlighted HTML
block.
|
boolean |
isOutputHighlightOnly()
If true, only HTML enclosed within highlighted content will be returned
|
static HTMLHighlighter |
newExtractingInstance()
Creates a new
HTMLHighlighter , which is set-up to return only the
extracted HTML text, including enclosed markup. |
static HTMLHighlighter |
newHighlightingInstance()
Creates a new
HTMLHighlighter , which is set-up to return the full
HTML text, with the extracted text portion highlighted. |
java.lang.String |
process(TextDocument doc,
org.xml.sax.InputSource is)
Processes the given
TextDocument and the original HTML text (as
an InputSource ). |
java.lang.String |
process(TextDocument doc,
java.lang.String origHTML)
Processes the given
TextDocument and the original HTML text (as a
String). |
java.lang.String |
process(java.net.URL url,
BoilerpipeExtractor extractor) |
void |
setExtraStyleSheet(java.lang.String extraStyleSheet)
Sets the extra stylesheet definition that will be inserted in the HEAD
element.
|
void |
setOutputHighlightOnly(boolean outputHighlightOnly)
Sets whether only HTML enclosed within highlighted content will be
returned, or the whole HTML document.
|
void |
setPostHighlight(java.lang.String postHighlight)
Sets the string that will be inserted after any highlighted HTML block.
|
void |
setPreHighlight(java.lang.String preHighlight)
Sets the string that will be inserted prior to any highlighted HTML
block.
|
public static HTMLHighlighter newHighlightingInstance()
HTMLHighlighter
, which is set-up to return the full
HTML text, with the extracted text portion highlighted.public static HTMLHighlighter newExtractingInstance()
HTMLHighlighter
, which is set-up to return only the
extracted HTML text, including enclosed markup.public java.lang.String process(TextDocument doc, java.lang.String origHTML) throws BoilerpipeProcessingException
TextDocument
and the original HTML text (as a
String).doc
- The processed TextDocument
.origHTML
- The original HTML document.BoilerpipeProcessingException
public java.lang.String process(TextDocument doc, org.xml.sax.InputSource is) throws BoilerpipeProcessingException
TextDocument
and the original HTML text (as
an InputSource
).doc
- The processed TextDocument
.is
- The original HTML document.BoilerpipeProcessingException
public java.lang.String process(java.net.URL url, BoilerpipeExtractor extractor) throws java.io.IOException, BoilerpipeProcessingException, org.xml.sax.SAXException
java.io.IOException
BoilerpipeProcessingException
org.xml.sax.SAXException
public boolean isOutputHighlightOnly()
public void setOutputHighlightOnly(boolean outputHighlightOnly)
public java.lang.String getExtraStyleSheet()
public void setExtraStyleSheet(java.lang.String extraStyleSheet)
extraStyleSheet
- Plain HTMLpublic java.lang.String getPreHighlight()
<span class=&qupt;x-boilerpipe-mark1">
public void setPreHighlight(java.lang.String preHighlight)
public java.lang.String getPostHighlight()
</span>
public void setPostHighlight(java.lang.String postHighlight)