Package | Description |
---|---|
org.htmlparser |
The basic API classes which will be used by most developers when working with
the HTML Parser.
|
org.htmlparser.filters |
The filters package contains example filters to select only desired nodes.
|
org.htmlparser.lexer |
The lexer package is the base level I/O subsystem.
|
org.htmlparser.nodeDecorators |
The nodeDecorators package contains classes that use the Decorator pattern.
|
org.htmlparser.nodes |
The nodes package has the concrete node implementations.
|
org.htmlparser.parserapplications.filterbuilder | |
org.htmlparser.parserapplications.filterbuilder.wrappers | |
org.htmlparser.sax |
The sax package implements a SAX (Simple API for XML) parser for HTML.
|
org.htmlparser.scanners |
The scanners package contains classes responsible for the tertiary
identification of tags.
|
org.htmlparser.tags |
The tags package contains specific tags.
|
org.htmlparser.util |
Code which can be reused by many classes, is located in this package.
|
org.htmlparser.visitors |
The visitors package contains classes that use the Visitor pattern.
|
Modifier and Type | Interface and Description |
---|---|
interface |
Remark
This interface represents a comment in the HTML document.
|
interface |
Tag
This interface represents a tag (<xxx yyy="zzz">) in the HTML document.
|
interface |
Text
This interface represents a piece of the content of the HTML document.
|
Modifier and Type | Method and Description |
---|---|
Node[] |
Parser.extractAllNodesThatAre(java.lang.Class nodeType)
Deprecated.
Use extractAllNodesThatMatch (new NodeClassFilter (cls)).
|
Node |
Node.getParent()
Get the parent of this node.
|
Modifier and Type | Method and Description |
---|---|
boolean |
NodeFilter.accept(Node node)
Predicate to determine whether or not to keep the given node.
|
void |
Node.setParent(Node node)
Sets the parent of this node.
|
Modifier and Type | Field and Description |
---|---|
protected Node |
IsEqualFilter.mNode
The node to match.
|
Modifier and Type | Method and Description |
---|---|
boolean |
CssSelectorNodeFilter.accept(Node node)
Accept nodes that match the selector expression.
|
boolean |
NotFilter.accept(Node node)
Accept nodes that are not acceptable to the predicate filter.
|
boolean |
NodeClassFilter.accept(Node node)
Accept nodes that are assignable from the class provided in
the constructor.
|
boolean |
RegexFilter.accept(Node node)
Accept string nodes that match the regular expression.
|
boolean |
StringFilter.accept(Node node)
Accept string nodes that contain the string.
|
boolean |
LinkStringFilter.accept(Node node)
Accept nodes that are a LinkTag and
have a URL that matches the pattern supplied in the constructor.
|
boolean |
HasAttributeFilter.accept(Node node)
Accept tags with a certain attribute.
|
boolean |
AndFilter.accept(Node node)
Accept nodes that are acceptable to all of it's predicate filters.
|
boolean |
OrFilter.accept(Node node)
Accept nodes that are acceptable to any of it's predicate filters.
|
boolean |
TagNameFilter.accept(Node node)
Accept nodes that are tags and have a matching tag name.
|
boolean |
IsEqualFilter.accept(Node node)
Accept the node.
|
boolean |
HasSiblingFilter.accept(Node node)
Accept tags with a sibling acceptable to the filter.
|
boolean |
HasChildFilter.accept(Node node)
Accept tags with children acceptable to the filter.
|
boolean |
HasParentFilter.accept(Node node)
Accept tags with parent acceptable to the filter.
|
boolean |
LinkRegexFilter.accept(Node node)
Accept nodes that are a LinkTag and have a URL
that matches the regex pattern supplied in the constructor.
|
Constructor and Description |
---|
IsEqualFilter(Node node)
Creates a new IsEqualFilter that accepts only the node provided.
|
Modifier and Type | Method and Description |
---|---|
protected Node |
Lexer.makeRemark(int start,
int end)
Create a remark node based on the current cursor and the one provided.
|
protected Node |
Lexer.makeString(int start,
int end)
Create a string node based on the current cursor and the one provided.
|
protected Node |
Lexer.makeTag(int start,
int end,
java.util.Vector attributes)
Create a tag node based on the current cursor and the one provided.
|
Node |
Lexer.nextNode()
Get the next node from the source.
|
Node |
Lexer.nextNode(boolean quotesmart)
Get the next node from the source.
|
Node |
Lexer.parseCDATA()
Return CDATA as a text node.
|
Node |
Lexer.parseCDATA(boolean quotesmart)
Return CDATA as a text node.
|
protected Node |
Lexer.parseJsp(int start)
Parse a java server page node.
|
protected Node |
Lexer.parseRemark(int start,
boolean quotesmart)
Parse a comment.
|
protected Node |
Lexer.parseString(int start,
boolean quotesmart)
Parse a string node.
|
protected Node |
Lexer.parseTag(int start)
Parse a tag.
|
Modifier and Type | Class and Description |
---|---|
class |
AbstractNodeDecorator
Deprecated.
Use direct subclasses or dynamic proxies instead.
Use either direct subclasses of the appropriate node and set them on the
Here is an example of how to use dynamic proxies to accomplish the same effect as using decorators to wrap Text nodes: import java.lang.reflect.InvocationHandler; import java.lang.reflect.InvocationTargetException; import java.lang.reflect.Method; import java.lang.reflect.Proxy; import org.htmlparser.Parser; import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.Text; import org.htmlparser.nodes.TextNode; import org.htmlparser.util.ParserException; public class TextProxy implements InvocationHandler { protected Object mObject; public static Object newInstance (Object object) { Class cls; cls = object.getClass (); return (Proxy.newProxyInstance ( cls.getClassLoader (), cls.getInterfaces (), new TextProxy (object))); } private TextProxy (Object object) { mObject = object; } public Object invoke (Object proxy, Method m, Object[] args) throws Throwable { Object result; String name; try { result = m.invoke (mObject, args); name = m.getName (); if (name.equals ("clone")) result = newInstance (result); // wrap the cloned object else if (name.equals ("doSemanticAction")) // or other methods System.out.println (mObject); // do the needful on the TextNode } catch (InvocationTargetException e) { throw e.getTargetException (); } catch (Exception e) { throw new RuntimeException ("unexpected invocation exception: " + e.getMessage()); } finally { } return (result); } public static void main (String[] args) throws ParserException { // create the wrapped text node and set it as the prototype Text text = (Text) TextProxy.newInstance (new TextNode (null, 0, 0)); PrototypicalNodeFactory factory = new PrototypicalNodeFactory (); factory.setTextPrototype (text); // perform the parse Parser parser = new Parser (args[0]); parser.setNodeFactory (factory); parser.parse (null); } } |
class |
DecodingNode
Deprecated.
Use direct subclasses or dynamic proxies instead.
Use either direct subclasses of the appropriate node and set them on the
|
class |
EscapeCharacterRemovingNode
Deprecated.
Use direct subclasses or dynamic proxies instead.
Use either direct subclasses of the appropriate node and set them on the
|
class |
NonBreakingSpaceConvertingNode
Deprecated.
Use direct subclasses or dynamic proxies instead.
Use either direct subclasses of the appropriate node and set them on the
|
Modifier and Type | Method and Description |
---|---|
Node |
AbstractNodeDecorator.getParent()
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
void |
AbstractNodeDecorator.setParent(Node node)
Deprecated.
|
Modifier and Type | Class and Description |
---|---|
class |
AbstractNode
The concrete base class for all types of nodes (tags, text remarks).
|
class |
RemarkNode
The remark tag is identified and represented by this class.
|
class |
TagNode
TagNode represents a generic tag.
|
class |
TextNode
Normal text in the HTML document is represented by this class.
|
Modifier and Type | Field and Description |
---|---|
protected Node |
AbstractNode.parent
The parent of this node.
|
Modifier and Type | Method and Description |
---|---|
Node |
AbstractNode.getParent()
Get the parent of this node.
|
Modifier and Type | Method and Description |
---|---|
void |
AbstractNode.setParent(Node node)
Sets the parent of this node.
|
Modifier and Type | Field and Description |
---|---|
protected Node |
HtmlTreeModel.mRoot
The root
Node . |
Modifier and Type | Method and Description |
---|---|
boolean |
AndFilterWrapper.accept(Node node)
Predicate to determine whether or not to keep the given node.
|
boolean |
StringFilterWrapper.accept(Node node)
Predicate to determine whether or not to keep the given node.
|
boolean |
OrFilterWrapper.accept(Node node)
Predicate to determine whether or not to keep the given node.
|
boolean |
NotFilterWrapper.accept(Node node)
Predicate to determine whether or not to keep the given node.
|
boolean |
TagNameFilterWrapper.accept(Node node)
Predicate to determine whether or not to keep the given node.
|
boolean |
RegexFilterWrapper.accept(Node node)
Predicate to determine whether or not to keep the given node.
|
boolean |
HasAttributeFilterWrapper.accept(Node node)
Predicate to determine whether or not to keep the given node.
|
boolean |
HasChildFilterWrapper.accept(Node node)
Predicate to determine whether or not to keep the given node.
|
boolean |
HasSiblingFilterWrapper.accept(Node node)
Predicate to determine whether or not to keep the given node.
|
boolean |
NodeClassFilterWrapper.accept(Node node)
Predicate to determine whether or not to keep the given node.
|
boolean |
HasParentFilterWrapper.accept(Node node)
Predicate to determine whether or not to keep the given node.
|
protected void |
HasAttributeFilterWrapper.addAttributes(java.util.Set set,
Node node)
Add the attribute names from the node to the set of attribute names.
|
protected void |
HasAttributeFilterWrapper.addAttributeValues(java.util.Set set,
Node node)
Add the attribute values from the node to the set of attribute values.
|
protected void |
TagNameFilterWrapper.addName(java.util.Set set,
Node node)
Add the tag name and it's children's tag names to the set of tag names.
|
Modifier and Type | Method and Description |
---|---|
protected void |
XMLReader.doSAX(Node node)
Process nodes recursively on the DocumentHandler.
|
Modifier and Type | Method and Description |
---|---|
protected void |
CompositeTagScanner.addChild(Tag parent,
Node child)
Add a child to the given tag.
|
Modifier and Type | Class and Description |
---|---|
class |
AppletTag
AppletTag represents an <Applet> tag.
|
class |
BaseHrefTag
BaseHrefTag represents an <Base> tag.
|
class |
BodyTag
A Body Tag.
|
class |
Bullet
A bullet tag.
|
class |
BulletList
A bullet list tag.
|
class |
CompositeTag
The base class for tags that have an end tag.
|
class |
Div
A div tag.
|
class |
DoctypeTag
The HTML Document Declaration Tag can identify <!DOCTYPE> tags.
|
class |
FormTag
Represents a FORM tag.
|
class |
FrameSetTag
Identifies an frame set tag.
|
class |
FrameTag
Identifies a frame tag
|
class |
HeadTag
A head tag.
|
class |
Html
A html tag.
|
class |
ImageTag
Identifies an image tag.
|
class |
InputTag
An input tag in a form.
|
class |
JspTag
The JSP/ASP tags like <%...%> can be identified by this class.
|
class |
LabelTag
A label tag.
|
class |
LinkTag
Identifies a link tag.
|
class |
MetaTag
A Meta Tag
|
class |
ObjectTag
ObjectTag represents an <Object> tag.
|
class |
OptionTag
An option tag within a form.
|
class |
ScriptTag
A script tag.
|
class |
SelectTag
A select tag within a form.
|
class |
Span
A span tag.
|
class |
StyleTag
A StyleTag represents a <style> tag.
|
class |
TableColumn
A table column tag.
|
class |
TableHeader
A table header tag.
|
class |
TableRow
A table row tag.
|
class |
TableTag
A table tag.
|
class |
TextareaTag
A text area tag within a form.
|
class |
TitleTag
A title tag.
|
Modifier and Type | Method and Description |
---|---|
Node |
CompositeTag.childAt(int index)
Get child at given index
|
Node |
CompositeTag.getChild(int index)
Get the child of this node at the given position.
|
Node[] |
CompositeTag.getChildrenAsNodeArray()
Get the children as an array of
Node objects. |
Modifier and Type | Method and Description |
---|---|
int |
CompositeTag.findPositionOf(Node searchNode)
Returns the node number of a child node given the node object.
|
Modifier and Type | Method and Description |
---|---|
Node |
NodeList.elementAt(int i) |
static Node[] |
ParserUtils.findTypeInNode(Node node,
java.lang.Class type)
Search given node and pick up any objects of given type.
|
Node |
SimpleNodeIterator.nextNode()
Get the next node.
|
Node |
IteratorImpl.nextNode()
Get the next node.
|
Node |
NodeIterator.nextNode()
Get the next node.
|
Node |
NodeList.remove(int index) |
Node[] |
NodeList.toNodeArray() |
Modifier and Type | Method and Description |
---|---|
void |
NodeList.add(Node node) |
void |
NodeList.copyToNodeArray(Node[] array) |
static Node[] |
ParserUtils.findTypeInNode(Node node,
java.lang.Class type)
Search given node and pick up any objects of given type.
|
void |
NodeList.prepend(Node node)
Insert the given node at the head of the list.
|
Constructor and Description |
---|
NodeList(Node node)
Create a one element node list.
|
Modifier and Type | Method and Description |
---|---|
Node[] |
ObjectFindingVisitor.getTags() |
Node[] |
TagFindingVisitor.getTags(int index) |
HTML Parser is an open source library released under LGPL.