- getCharset() - Method in class de.l3s.boilerpipe.sax.HTMLDocument
-
- getContainedTextElements() - Method in class de.l3s.boilerpipe.document.TextBlock
-
Returns the containedTextElements BitSet, or null
.
- getContent() - Method in class de.l3s.boilerpipe.document.TextDocument
-
- getData() - Method in class de.l3s.boilerpipe.sax.HTMLDocument
-
- getDefaultInstance() - Static method in class de.l3s.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
-
Returns the singleton instance for DeleteBlocksAfterContentFilter.
- getDefaultInstance() - Static method in class de.l3s.boilerpipe.filters.english.MinFulltextWordsFilter
-
- getExtraStyleSheet() - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter
-
Returns the extra stylesheet definition that will be inserted in the HEAD
element.
- getInstance() - Static method in class de.l3s.boilerpipe.extractors.ArticleExtractor
-
- getInstance() - Static method in class de.l3s.boilerpipe.extractors.ArticleSentencesExtractor
-
- getInstance() - Static method in class de.l3s.boilerpipe.extractors.CanolaExtractor
-
- getInstance() - Static method in class de.l3s.boilerpipe.extractors.DefaultExtractor
-
- getInstance() - Static method in class de.l3s.boilerpipe.extractors.LargestContentExtractor
-
- getInstance() - Static method in class de.l3s.boilerpipe.extractors.NumWordsRulesExtractor
-
- getInstance() - Static method in class de.l3s.boilerpipe.filters.english.DensityRulesClassifier
-
Returns the singleton instance for RulebasedBoilerpipeClassifier.
- getInstance() - Static method in class de.l3s.boilerpipe.filters.english.NumWordsRulesClassifier
-
Returns the singleton instance for RulebasedBoilerpipeClassifier.
- getInstance() - Static method in class de.l3s.boilerpipe.filters.english.TerminatingBlocksFinder
-
Returns the singleton instance for TerminatingBlocksFinder.
- getInstance() - Static method in class de.l3s.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
-
Returns the singleton instance for ExpandTitleToContentFilter.
- getInstance() - Static method in class de.l3s.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
-
Returns the singleton instance for BlockFusionProcessor.
- getInstance() - Static method in class de.l3s.boilerpipe.filters.simple.BoilerplateBlockFilter
-
Returns the singleton instance for BoilerplateBlockFilter.
- getInstance() - Static method in class de.l3s.boilerpipe.filters.simple.SplitParagraphBlocksFilter
-
Returns the singleton instance for TerminatingBlocksFinder.
- getLabels() - Method in class de.l3s.boilerpipe.document.TextBlock
-
Returns the labels associated to this TextBlock, or null
if no such labels
exist.
- getLinkDensity() - Method in class de.l3s.boilerpipe.document.TextBlock
-
- getNumWords() - Method in class de.l3s.boilerpipe.document.TextBlock
-
- getNumWords() - Method in class de.l3s.boilerpipe.document.TextDocumentStatistics
-
Returns the overall number of words in all blocks.
- getNumWordsInAnchorText() - Method in class de.l3s.boilerpipe.document.TextBlock
-
- getOffsetBlocksEnd() - Method in class de.l3s.boilerpipe.document.TextBlock
-
- getOffsetBlocksStart() - Method in class de.l3s.boilerpipe.document.TextBlock
-
- getPostHighlight() - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter
-
Returns the string that will be inserted after any highlighted HTML
block.
- getPotentialTitles() - Method in class de.l3s.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
-
- getPreHighlight() - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter
-
Returns the string that will be inserted before any highlighted HTML
block.
- getTagLevel() - Method in class de.l3s.boilerpipe.document.TextBlock
-
- getText(String) - Method in interface de.l3s.boilerpipe.BoilerpipeExtractor
-
Extracts text from the HTML code given as a String.
- getText(InputSource) - Method in interface de.l3s.boilerpipe.BoilerpipeExtractor
-
Extracts text from the HTML code available from the given
InputSource
.
- getText(Reader) - Method in interface de.l3s.boilerpipe.BoilerpipeExtractor
-
Extracts text from the HTML code available from the given Reader
.
- getText(TextDocument) - Method in interface de.l3s.boilerpipe.BoilerpipeExtractor
-
- getText() - Method in class de.l3s.boilerpipe.document.TextBlock
-
- getText(boolean, boolean) - Method in class de.l3s.boilerpipe.document.TextDocument
-
- getText(String) - Method in class de.l3s.boilerpipe.extractors.ExtractorBase
-
Extracts text from the HTML code given as a String.
- getText(InputSource) - Method in class de.l3s.boilerpipe.extractors.ExtractorBase
-
Extracts text from the HTML code available from the given InputSource
.
- getText(URL) - Method in class de.l3s.boilerpipe.extractors.ExtractorBase
-
Extracts text from the HTML code available from the given URL
.
- getText(Reader) - Method in class de.l3s.boilerpipe.extractors.ExtractorBase
-
Extracts text from the HTML code available from the given Reader
.
- getText(TextDocument) - Method in class de.l3s.boilerpipe.extractors.ExtractorBase
-
- getTextBlocks() - Method in class de.l3s.boilerpipe.document.TextDocument
-
- getTextDensity() - Method in class de.l3s.boilerpipe.document.TextBlock
-
- getTextDocument() - Method in interface de.l3s.boilerpipe.BoilerpipeInput
-
- getTextDocument() - Method in class de.l3s.boilerpipe.sax.BoilerpipeSAXInput
-
- getTextDocument(BoilerpipeHTMLParser) - Method in class de.l3s.boilerpipe.sax.BoilerpipeSAXInput
-
- getTitle() - Method in class de.l3s.boilerpipe.document.TextDocument
-
Returns the "main" title for this document, or null
if no
such title has ben set.
- getTitle() - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
-
- TA_ANCHOR_TEXT - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions
-
Marks this tag as "anchor" (this should usually only be set for the <A>
tag).
- TA_BLOCK_LEVEL - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions
-
Explicitly marks this tag a simple "block-level" element, which always generates whitespace
- TA_BODY - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions
-
Marks this tag the body element (this should usually only be set for the <BODY>
tag).
- TA_FONT - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions
-
Special TagAction for the <FONT>
tag, which keeps track of the
absolute and relative font size.
- TA_IGNORABLE_ELEMENT - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions
-
Marks this tag as "ignorable", i.e.
- TA_INLINE - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions
-
- TA_INLINE_NO_WHITESPACE - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions
-
Marks this tag a simple "inline" element, which neither generates whitespace, nor a new block.
- TA_INLINE_WHITESPACE - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions
-
Marks this tag a simple "inline" element, which generates whitespace, but no new block.
- TagAction - Interface in de.l3s.boilerpipe.sax
-
Defines an action that is to be performed whenever a particular tag occurs
during HTML parsing.
- TagActionMap - Class in de.l3s.boilerpipe.sax
-
Base class for definition a set of
TagAction
s that are to be used for the
HTML parsing process.
- TagActionMap() - Constructor for class de.l3s.boilerpipe.sax.TagActionMap
-
- TerminatingBlocksFinder - Class in de.l3s.boilerpipe.filters.english
-
- TerminatingBlocksFinder() - Constructor for class de.l3s.boilerpipe.filters.english.TerminatingBlocksFinder
-
- TextBlock - Class in de.l3s.boilerpipe.document
-
Describes a block of text.
- TextBlock(String) - Constructor for class de.l3s.boilerpipe.document.TextBlock
-
- TextBlock(String, BitSet, int, int, int, int, int) - Constructor for class de.l3s.boilerpipe.document.TextBlock
-
- TextBlockCondition - Interface in de.l3s.boilerpipe.conditions
-
Evaluates whether a given
TextBlock
meets a certain condition.
- TextDocument - Class in de.l3s.boilerpipe.document
-
A text document, consisting of one or more
TextBlock
s.
- TextDocument(List<TextBlock>) - Constructor for class de.l3s.boilerpipe.document.TextDocument
-
- TextDocument(String, List<TextBlock>) - Constructor for class de.l3s.boilerpipe.document.TextDocument
-
- TextDocumentStatistics - Class in de.l3s.boilerpipe.document
-
Provides shallow statistics on a given TextDocument
- TextDocumentStatistics(TextDocument, boolean) - Constructor for class de.l3s.boilerpipe.document.TextDocumentStatistics
-
- TITLE - Static variable in class de.l3s.boilerpipe.labels.DefaultLabels
-
- toInputSource() - Method in class de.l3s.boilerpipe.sax.HTMLDocument
-
- toInputSource() - Method in interface de.l3s.boilerpipe.sax.InputSourceable
-
- tokenize(CharSequence) - Static method in class de.l3s.boilerpipe.util.UnicodeTokenizer
-
Tokenizes the text and returns an array of tokens.
- toString() - Method in class de.l3s.boilerpipe.document.TextBlock
-
- toString() - Method in class de.l3s.boilerpipe.labels.LabelAction
-
- toTextDocument() - Method in interface de.l3s.boilerpipe.BoilerpipeDocumentSource
-
- toTextDocument() - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
-
- toTextDocument() - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLParser
-