public interface HtmlParser extends Parser
Note: These are the exact methods exposed in the original C++ Parser. The names are simply modified to conform to Java.
Modifier and Type | Interface and Description |
---|---|
static class |
HtmlParser.ATTR_TYPE
Indicates the type of HTML attribute that the parser is currently in or
NONE if the parser is not currently in an attribute. |
static class |
HtmlParser.Mode
The Parser Mode requested for parsing a given template.
|
Modifier and Type | Field and Description |
---|---|
static ExternalState |
STATE_ATTR |
static ExternalState |
STATE_COMMENT |
static ExternalState |
STATE_CSS_FILE |
static ExternalState |
STATE_JS_FILE |
static ExternalState |
STATE_TAG |
static ExternalState |
STATE_TEXT
All the states in which the parser can be.
|
static ExternalState |
STATE_VALUE |
STATE_ERROR
Modifier and Type | Method and Description |
---|---|
String |
getAttribute()
Returns the name of the HTML attribute the parser is currently processing.
|
HtmlParser.ATTR_TYPE |
getAttributeType()
Returns the type of the attribute that the parser is in
or
ATTR_TYPE.NONE if we are not parsing an attribute. |
ExternalState |
getJavascriptState()
Returns the state the Javascript parser is in.
|
String |
getTag()
Returns the name of the HTML tag if the parser is currently within one.
|
String |
getValue()
Returns the value of an HTML attribute if the parser is currently
within one.
|
int |
getValueIndex()
Returns the current position of the parser within the HTML attribute
value, zero being the position of the first character in the value.
|
boolean |
inAttribute()
Returns
true if and only if the parser is currently within
an attribute, be it within the attribute name or the attribute value. |
boolean |
inCss()
Returns
true if and only if the parser is currently within
a CSS context. |
boolean |
inJavascript()
Returns
true if the parser is currently processing Javascript. |
void |
insertText()
A specialized directive to tell the parser there is some content
that will be inserted here but that it will not get to parse.
|
boolean |
isAttributeQuoted()
Returns
true if and only if the parser is currently within
an attribute value and that attribute value is quoted. |
boolean |
isJavascriptQuoted()
Returns
true if the parser is currently processing
a Javascript litteral that is quoted. |
boolean |
isUrlStart()
Returns
true if and only if the current position of the parser is
at the start of a URL HTML attribute value. |
void |
resetMode(HtmlParser.Mode mode)
Resets the state of the parser, allowing for reuse of the
HtmlParser object. |
getColumnNumber, getLineNumber, getState, parse, parse, reset, setColumnNumber, setLineNumber
static final ExternalState STATE_TEXT
STATE_TEXT
the parser is in HTML proper.
STATE_TAG
the parser is inside an HTML tag name.
STATE_COMMENT
the parser is inside an HTML comment.
STATE_ATTR
the parser is inside an HTML attribute name.
STATE_VALUE
the parser is inside an HTML attribute value.
STATE_JS_FILE
the parser is inside javascript code.
STATE_CSS_FILE
the parser is inside CSS code.
All these states map exactly to those exposed in the C++ (original) version of the HtmlParser.
static final ExternalState STATE_TAG
static final ExternalState STATE_COMMENT
static final ExternalState STATE_ATTR
static final ExternalState STATE_VALUE
static final ExternalState STATE_JS_FILE
static final ExternalState STATE_CSS_FILE
boolean inJavascript()
true
if the parser is currently processing Javascript.
Such is the case if and only if, the parser is processing an attribute
that takes Javascript, a Javascript script block or the parser
is (re)set with HtmlParser.Mode.JS
.true
if the parser is processing Javascript,
false
otherwiseboolean isJavascriptQuoted()
true
if the parser is currently processing
a Javascript litteral that is quoted. The caller will typically
invoke this method after determining that the parser is processing
Javascript. Knowing whether the element is quoted or not helps
determine which escaping to apply to it when needed.true
if and only if the parser is inside a quoted
Javascript literalboolean inAttribute()
true
if and only if the parser is currently within
an attribute, be it within the attribute name or the attribute value.true
if and only if inside an attributeboolean inCss()
true
if and only if the parser is currently within
a CSS context. A CSS context is one of the below:
true
if and only if the parser is inside CSSHtmlParser.ATTR_TYPE getAttributeType()
ATTR_TYPE.NONE
if we are not parsing an attribute.
The caller will typically invoke this method after determining
that the parser is processing an attribute.
This is useful to determine which escaping to apply based on the type of value this attribute expects.
HtmlParser.ATTR_TYPE
boolean isAttributeQuoted()
true
if and only if the parser is currently within
an attribute value and that attribute value is quoted.true
if and only if the attribute value is quotedString getTag()
String
if the parser is not
in a tag as determined by getCurrentExternalState
.String
if we are
not within an HTML tagString getAttribute()
String
if the parser is not
in an attribute as determined by getCurrentExternalState
.String
if we are not within an HTML attributeString getValue()
getCurrentExternalState
.String
if the parser is not
in an HTML attribute valueint getValueIndex()
Parser.getState()
.boolean isUrlStart()
true
if and only if the current position of the parser is
at the start of a URL HTML attribute value. This is the case when the
following three conditions are all met:
getAttributeType()
returning .ATTR_TYPE#URI
.
This method may be used by an Html Sanitizer or an Auto-Escape system
to determine whether to validate the URL for well-formedness and validate
the scheme of the URL (e.g. HTTP
, HTTPS
) is safe.
In particular, it is recommended to use this method instead of
checking that getValueIndex()
is 0
to support attribute
types where the URL does not start at index zero, such as the
content
attribute of the meta
HTML tag.
true
if and only if the parser is at the start of the URLvoid resetMode(HtmlParser.Mode mode)
HtmlParser
object.
See the HtmlParser.Mode
enum for information on all
the valid modes.
mode
- is an enum representing the high-level state of the parservoid insertText() throws ParseException
Returns false
if and only if the parser encountered
a fatal error which prevents it from continuing further parsing.
Note: The return value is different from the C++ Parser which
always returns true
but in my opinion makes more sense.
ParseException
- if an unrecoverable error occurred during parsingExternalState getJavascriptState()
See JavascriptParser
for more information on the valid
external states. The caller will typically first determine that the
parser is processing Javascript and then invoke this method to
obtain more fine-grained state information.
Copyright © 2010–2016 Google. All rights reserved.