public abstract class ExtractorBase extends java.lang.Object implements BoilerpipeExtractor
Constructor and Description |
---|
ExtractorBase() |
Modifier and Type | Method and Description |
---|---|
java.lang.String |
getText(org.xml.sax.InputSource is)
Extracts text from the HTML code available from the given
InputSource . |
java.lang.String |
getText(java.io.Reader r)
Extracts text from the HTML code available from the given
Reader . |
java.lang.String |
getText(java.lang.String html)
Extracts text from the HTML code given as a String.
|
java.lang.String |
getText(TextDocument doc)
Extracts text from the given
TextDocument object. |
java.lang.String |
getText(java.net.URL url)
Extracts text from the HTML code available from the given
URL . |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
process
public java.lang.String getText(java.lang.String html) throws BoilerpipeProcessingException
getText
in interface BoilerpipeExtractor
html
- The HTML code as a String.BoilerpipeProcessingException
public java.lang.String getText(org.xml.sax.InputSource is) throws BoilerpipeProcessingException
InputSource
.getText
in interface BoilerpipeExtractor
is
- The InputSource containing the HTMLBoilerpipeProcessingException
public java.lang.String getText(java.net.URL url) throws BoilerpipeProcessingException
URL
.
NOTE: This method is mainly to be used for show case purposes. If you are
going to crawl the Web, consider using getText(InputSource)
instead.url
- The URL pointing to the HTML code.BoilerpipeProcessingException
public java.lang.String getText(java.io.Reader r) throws BoilerpipeProcessingException
Reader
.getText
in interface BoilerpipeExtractor
r
- The Reader containing the HTMLBoilerpipeProcessingException
public java.lang.String getText(TextDocument doc) throws BoilerpipeProcessingException
TextDocument
object.getText
in interface BoilerpipeExtractor
doc
- The TextDocument
.BoilerpipeProcessingException