org.apache.jmeter.protocol.http.parser
Class HTMLParser
public abstract class HTMLParser
HtmlParsers can parse HTML content to obtain URLs.
HTMLParser() - Protected constructor to prevent instantiation except from within
subclasses.
|
Iterator | getEmbeddedResourceURLs(byte[] html, URL baseUrl) - Get the URLs for all the resources that a browser would automatically
download following the download of the HTML content, that is: images,
stylesheets, javascript files, applets, etc...
|
Iterator | getEmbeddedResourceURLs(byte[] html, URL baseUrl, Collection coll) - Get the URLs for all the resources that a browser would automatically
download following the download of the HTML content, that is: images,
stylesheets, javascript files, applets, etc...
|
abstract Iterator | getEmbeddedResourceURLs(byte[] html, URL baseUrl, URLCollection coll) - Get the URLs for all the resources that a browser would automatically
download following the download of the HTML content, that is: images,
stylesheets, javascript files, applets, etc...
|
static HTMLParser | getParser()
|
static HTMLParser | getParser(String htmlParserClassName)
|
protected boolean | isReusable() - Parsers should over-ride this method if the parser class is re-usable, in
which case the class will be cached for the next getParser() call.
|
ATT_BACKGROUND
protected static final String ATT_BACKGROUND
ATT_HREF
protected static final String ATT_HREF
ATT_IS_IMAGE
protected static final String ATT_IS_IMAGE
ATT_REL
protected static final String ATT_REL
ATT_SRC
protected static final String ATT_SRC
ATT_STYLE
protected static final String ATT_STYLE
ATT_TYPE
protected static final String ATT_TYPE
DEFAULT_PARSER
public static final String DEFAULT_PARSER
PARSER_CLASSNAME
public static final String PARSER_CLASSNAME
STYLESHEET
protected static final String STYLESHEET
TAG_APPLET
protected static final String TAG_APPLET
TAG_BASE
protected static final String TAG_BASE
TAG_BGSOUND
protected static final String TAG_BGSOUND
TAG_EMBED
protected static final String TAG_EMBED
TAG_FRAME
protected static final String TAG_FRAME
TAG_IMAGE
protected static final String TAG_IMAGE
TAG_INPUT
protected static final String TAG_INPUT
TAG_LINK
protected static final String TAG_LINK
TAG_SCRIPT
protected static final String TAG_SCRIPT
HTMLParser
protected HTMLParser()
Protected constructor to prevent instantiation except from within
subclasses.
getEmbeddedResourceURLs
public Iterator getEmbeddedResourceURLs(byte[] html,
URL baseUrl)
throws HTMLParseException
Get the URLs for all the resources that a browser would automatically
download following the download of the HTML content, that is: images,
stylesheets, javascript files, applets, etc...
URLs should not appear twice in the returned iterator.
Malformed URLs can be reported to the caller by having the Iterator
return the corresponding RL String. Overall problems parsing the html
should be reported by throwing an HTMLParseException.
html
- HTML codebaseUrl
- Base URL from which the HTML code was obtained
- an Iterator for the resource URLs
getEmbeddedResourceURLs
public Iterator getEmbeddedResourceURLs(byte[] html,
URL baseUrl,
Collection coll)
throws HTMLParseException
Get the URLs for all the resources that a browser would automatically
download following the download of the HTML content, that is: images,
stylesheets, javascript files, applets, etc...
N.B. The Iterator returns URLs, but the Collection will contain objects
of class URLString.
html
- HTML codebaseUrl
- Base URL from which the HTML code was obtainedcoll
- Collection - will contain URLString objects, not URLs
- an Iterator for the resource URLs
getEmbeddedResourceURLs
public abstract Iterator getEmbeddedResourceURLs(byte[] html,
URL baseUrl,
URLCollection coll)
throws HTMLParseException
Get the URLs for all the resources that a browser would automatically
download following the download of the HTML content, that is: images,
stylesheets, javascript files, applets, etc...
All URLs should be added to the Collection.
Malformed URLs can be reported to the caller by having the Iterator
return the corresponding RL String. Overall problems parsing the html
should be reported by throwing an HTMLParseException.
N.B. The Iterator returns URLs, but the Collection will contain objects
of class URLString.
html
- HTML codebaseUrl
- Base URL from which the HTML code was obtainedcoll
- URLCollection
- an Iterator for the resource URLs
getParser
public static final HTMLParser getParser()
getParser
public static final HTMLParser getParser(String htmlParserClassName)
isReusable
protected boolean isReusable()
Parsers should over-ride this method if the parser class is re-usable, in
which case the class will be cached for the next getParser() call.
- true if the Parser is reusable
Copyright © 1998-2010 Apache Software Foundation. All Rights Reserved.