NAME

Html2Wml - Program that can convert HTML pages to WML pages


SYNOPSIS

in a shell:

    html2wml.cgi [options] <file|url>

as a CGI:

    /cgi-bin/html2wml.cgi?url=<url>


DESCRIPTION

Html2Wml converts HTML pages to WML pages, suitable for being viewed on a Wap device. The conversion can be done either on the command line to create static WML pages or on-the-fly by calling this program as a CGI.

As of version 0.3, the resulting WML should be well-formed, and in most cases valid. This is not guarantied but it should work for most HTML pages. To be more precise, the validity of the WML depends on the quality of the input HTML. Pages created with softwares that conform to W3C standard are most likely to produce valid WML. To check your HTML pages, your can use W3C's excellent software HTML Tidy, written by Dave Raggett.


OPTIONS

Note that most of these options can be used when calling Html2Wml as a CGI. See the file form.html in the t/ directory for an example.


Conversion Options

ascii

When this option is on, named HTML entities are converted to US-ASCII using the same 7 bit approximations as Lynx. By default, this is off, so that named entities are converted into numeric entities.

collapse, nocollapse

This option tells Html2Wml to collapse redundant white space characters and empty paragraphs. This option is on by default, but you can desactivate this by using --nocollapse.

This behavior is not really standard, but the aim is to reduce the size of the output. WML pages are primarily intented for Wap devices, which usually have slow connections. The smaller the WML result is, the faster it can be downloaded. Furthermore, collapsing white spaces is the normal behavior for HTML pages.

Empty paragraphs are also collapsed (this is really not standard), but it should avoid empty screens: the display of a Wap device is usually small, and it can be annoying to scroll down a lot because of many empty lines.

compile

This option uses the WML compiler from WML Tools to convert the WML to a compact binary representation of the WML deck.

hreftmpl=template

This options sets the template that will be used to reconstruct the href links.

See Links Reconstruction for more information.

linearize, nolinearize

This options is on by default. It makes Html2Wml flattens the tables à la Lynx. I think it is better than trying to use WML tables because, contrary to HTML tables, they have extremely limited features (in particular, they can't be nested). Therefore it's quite difficult to decide what to do when you have three nested tables. Furthermore, calculations on tables are quite CPU consuming, and Wap devices are not supposed to be powerful.

nopre

This option tells Html2Wml not to use the &lt;pre&gt; tag. This is useful if you want to use the WML compiler from WML Tools 0.0.4, which doesn't recognize this tag.

srctmpl=template

This options sets the template that will be used to reconstruct the src links.

See Links Reconstruction for more information.


Card Splitting Options

max-card-size=size

This option allows you to limit the size of the generated cards. The value is given in bytes. Default is 2000 bytes, which should be small enought to be loaded on any Wap device.

card-split-threshold=size

Splitting can occur when the size of the current card is between max-card-size - card-split-threshold and max-card-size.

next-card-label=label

This option sets the label of the link that allows the user to go to the next card. Default is ``[&gt;&gt;]'' (which will be rendered as ``[>>]'').


Debugging Options

debug

This option activates the debug mode. This prints the output result in HTML with line numbering and with the result of the XML check. This option is very useful for debugging as you can use any web browser for that purpose.

xmlcheck

When this option is on, it send the WML output to XML::Parser to check its well-formedness.


FEATURES


Card Splitting

In order to match the low memory capabilities of many Wap devices, Html2Wml allows you to convert the HTML document as a WML deck that contains several cards. The upper limit size of these cards can be set using the max-card-size option. This is not a guaranty as the size is calculated in an approximated way (if you wonder why I don't do an exact calculation, it's because it would be difficult in the current architecture of Html2Wml).


Actions

Actions are a feature similar to the SSI (Server Side Includes) available on web servers like Apache. In order not to interfere with real SSI, but to keep their syntax easy to learn, it differs in very few points.

Syntax

The syntax to execute an action is:

    <!-- [action param1="value" param2='value'] -->

Note that the angle brackets are part of the syntax. Except for that point, Actions syntax is very similar to SSI syntax.

Available actions

include

Description

Includes a file in the document at the current point. Please note that Html2Wml doesn't check nor parse the file, and if the file cannot be found, will silently die (this is the same behavior as SSI).

Parameters

virtual=url

The file is get by http.

file=path

The file is read from the local disk.

Note

If you use the file parameter, an absolute path is recommend.

fsize

Description

Returns the size of a file at the current point of the document.

Parameters

You can use the same parameters as for the include action.

Examples

To include a small navigation bar:

    <!-- [include virtual="nav.wml"] -->


Links Reconstruction

This engine allows you to reconstruct the links of the HTML document being converted. It has two modes, depending upon whether Html2Wml was launched from the shell or as a CGI.

When used as a CGI, this engine will reconstructs the links of the HTML document so that all the urls will be passed to Html2Wml in order to convert the pointed files (pages or images). This is completly automatic and can't be customized for now (but I don't think it would be really useful).

When used from the shell, this engine reconstructs the links with the URL template (the parameter of the hreftmpl option). Note that absolute URLs will be left untouched. The template can be customized using the following syntax. If no template is supplied, the links will be left untouched.

Syntax

The template is a string that contains the new URL. You can interpolate parameters by simply including them in the template between curly brackets: {<em>param</em>}

If the URL contains a query part or a fragment part, they will be appended to the result of the template.

Available parameters

URL

This parameter contains the original URL from the href or src attribute.

FILENAME

This parameter contains the base name of the file.

FILEPATH

This parameter contains the leading path of the file.

FILETYPE

This parameter contains the suffix of the file.

Examples

To add a path option:

    {URL}$wap

Using Apache, you can then add a Rewrite directive so that URL ending with $wap will be redirected to Html2Wml:

    RewriteRule  ^(/.*)\$wap$  /cgi-bin/html2wml.cgi?url=$1

To change the base name of the file:

    {FILEPATH}{FILENAME}_wap{FILETYPE}

To change the extension of the file:

    {FILEPATH}{FILENAME}.wap

Note that FILETYPE contains all the extensions of the file, so its name is index.html.fr for example, FILETYPE contains ``.html.fr''.


CAVEATS

Currently, only the well-formedness of the resulting WML can be tested, not its validity.

Inverted tags (like ``<b>bold <i>italic</b></i>'') may produce unexpected results. But only bad softwares do bad stuff like this.


LINKS

Html2Wml -- HTML to WML converter

http://www.resus.univ-mrs.fr/~madingue/techie/html2wml.html

HTML Tidy

http://www.w3.org/People/Raggett/tidy

WML Tools

http://pwot.co.uk/wml/

wApua -- Wap Wml browser written in Perl/Tk

http://fsinfo.cs.uni-sb.de/~abe/wApua/

Tofoa -- Wap emulator written in Python

http://tofoa.free-system.com/

WML Browser -- Free WML browser for Linux

http://www.wmlbrowser.org/

MobiliX -- Linux-Mobile-Guide, Infrared-HOWTO

http://www.mobilix.org


ACKNOWLEDGEMENTS

Werner Heuser - for his numerous ideas, advices and his help for the debugging


AUTHOR

Sébastien Aperghis-Tramoni <madingue@resus.univ-mrs.fr>


COPYRIGHT

Html2Wml is Copyright (C)2000 Sébastien Aperghis-Tramoni.

This program is free software. You can redistribute it and/or modify it under the terms of either the Perl Artistic License or the GNU General Public License, version 2 or later.