public interface WordTokenizer
An interface for objects which take a text-based media as input, and iterate through the words in the text stored in that media. Examples of such media could be Strings, Documents, Files, TextComponents etc.
When the object is instantiated, and before the first call to next()
is made,
the following methods should throw a WordNotFoundException
:
getCurrentWordEnd()
, getCurrentWordPosition()
,
isNewSentence()
and replaceWord()
.
A call to next()
when hasMoreWords()
returns false
should throw a WordNotFoundException
.
Modifier and Type | Method and Description |
---|---|
java.lang.String |
getContext()
Returns the context text that is being tokenized (should include any
changes that have been made).
|
int |
getCurrentWordCount()
Returns the number of word tokens that have been processed thus far
|
int |
getCurrentWordEnd()
Returns an index representing the end location of the current word in the text.
|
int |
getCurrentWordPosition()
Returns an index representing the start location of the current word in the text.
|
boolean |
hasMoreWords()
Indicates if there are more words left
|
boolean |
isNewSentence()
Returns true if the current word is at the start of a sentence
|
java.lang.String |
nextWord()
This returns the next word in the iteration.
|
void |
replaceWord(java.lang.String newWord)
Replaces the current word token
When a word is replaced care should be taken that the WordTokenizer
repositions itself such that the words that were added aren't rechecked.
|
java.lang.String getContext()
int getCurrentWordCount()
int getCurrentWordEnd()
WordNotFoundException
- current word has not yet been set.int getCurrentWordPosition()
WordNotFoundException
- current word has not yet been set.boolean isNewSentence()
WordNotFoundException
- current word has not yet been set.boolean hasMoreWords()
java.lang.String nextWord()
WordNotFoundException
- search string contains no more words.void replaceWord(java.lang.String newWord)
newWord
- the string which should replace the current word.WordNotFoundException
- current word has not yet been set.