#!/usr/bin/perl
# -*- coding: ascii -*-
###########################################################################
# clive, command line video extraction utility.
# Copyright 2007, 2008, 2009 Toni Gundogdu.
#
# This file is part of clive.
#
# clive is free software: you can redistribute it and/or modify it under
# the terms of the GNU General Public License as published by the Free
# Software Foundation, either version 3 of the License, or (at your option)
# any later version.
#
# clive is distributed in the hope that it will be useful, but WITHOUT ANY
# WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
# FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
# details.
#
# You should have received a copy of the GNU General Public License along
# with this program. If not, see <http://www.gnu.org/licenses/>.
###########################################################################
use warnings;
use strict;

binmode( STDOUT, ":utf8" );
binmode( STDERR, ":utf8" );

use clive::App;

clive::App->main;

__END__

=head1 NAME

clive - command line video extraction utility

=head1 SYNOPSIS

clive [options]... [URL]...

=head1 DESCRIPTION

clive is a command line video extraction utility for Youtube and other similar
video-sharing websites. It was written to work around the Adobe Flash plugin
requirement as the technology is poorly supported on Unix-like systems.

clive is not an universal video extraction utility. In fact, it supports only
a number of video websites. Each website typically exposes access to the video
content in a very different way, meaning that clive has to be customized for
each website in order to download any videos from them.

=head1 OPTIONS

 -h, --help                     print help and exit
 -v, --version                  print version and exit
     --hosts                    print supported hosts and exit
     --upgrade-config           upgrade 2.0/2.1 config to 2.2+ format
 -l, --last                     recall last input
     --last-file=FILE           read/write FILE instead of default path
Output Options:
     --emit-csv                 emit video details in CSV to stdout
     --debug                    print cURL debug messages
 -q, --quiet                    turn off all output
     --stderr                   redirect all output to stderr
     --print-fname              print filename before download starts
HTTP Options:
     --agent=STRING             identify as STRING to http server
     --connect-timeout=SECS     max time allowed connection to take
     --connect-timeout-socks=S  same as above, tries to workaround SOCKS
     --proxy=ADDR               use ADDR for http proxy
     --no-proxy                 disable all use of http proxy
     --cookie-jar=FILE          enable cookies, write them to FILE
Cache Options:
     --cache-file=FILE          read/write FILE instead of default path
 -r, --cache-read               enable reading from cache
 -d, --cache-dump               dump cache records to stdout
     --cache-dump-format=STRING print format for dumping cache records
     --cache-grep=PATTERN       grep cache records for PATTERN
 -i, --cache-ignore-case        ignore case while matching records
 -D, --cache-remove-record      remove matched records from cache
     --cache-clear              truncate cache records
     --no-cache                 disable cache all (read and write) use
Download Options:
 -f, --format=FORMAT            extract FORMAT of video
 -O, --output-file=FILE         write video to file
 -c, --continue                 continue partially downloaded file
 -n, --no-extract               do not extract any videos
     --save-dir=DIR             save video files to DIR
     --cclass=CLASS             use character CLASS to filter titles
 -C, --no-cclass                do not apply character class
     --filename-format=STRING   use STRING to format output filename
     --exec=CMD                 command to run when transfer finishes
 -e, --exec-run                 invoke command defined with --exec
     --stream-exec=CMD          stream command to be invoked
     --stream=PERCENT           invoke --stream-exec when transfer reaches %
 -s, --stream-pass              pass video link to --stream-exec command
     --limit-rate=AMOUNT        limit transfer rate to AMOUNT (KB/s)
     --stop-after=SIZE|PERCENT  stop file transfer after SIZE or PERCENT

=head1 OPTION SYNTAX

You may freely mix different option styles and specify options after
the command line arguments, e.g.:
  % clive -c URL --format=best

You may also put several options together that do not require arguments:
  % clive -cnrf best URL

Note that the "dashed" options have aliases. For example:
  % clive --no-extract --no_extract --noextract
  % clive --cache-read --cache_read --cacheread

=head1 OPTION DESCRIPTIONS

=over 4

=item B<-h, --help>

Print help and exit.

=item B<-v, --version>

Print version and exit.

=item B<--hosts>

Print supported hosts with available formats and exit.

=item B<--upgrade-config>

Upgrade clive 2.0/2.1 config to current 2.2+ format and exit.

=item B<-l, --last>

Re-feed the previously fed video page links from the last run time.

=item B<--last-file>=I<path>

Use I<path> instead of the default path. See also L</FILES>.

=back

B<Output options>

=over 4

=item B<--emit-csv>

Print (or emit) video details in CSV format to standard output.
Implies --no-extract.

=item B<--print-fname>

Print filename on a separate line before download starts.

=item B<--debug>

Print cURL debug (or verbose) messages to standard error.

=item B<-q, --quiet>

Turn off all output to standard output and error.

=item B<--stderr>

Direct all output to standard error.

=back

B<HTTP Options>

=over 4

=item B<--agent>=I<string>

Identify clive as I<string> to HTTP servers. Defaults to "Mozilla/5.0".

=item B<--connect-timeout>=I<seconds>

Maximum time in I<seconds> allowed for connection to take. Defaults to 30.

=item B<--connect-timeout-socks>=I<seconds>

Same as above but tries to workaround the SOCKS proxy bug in cURL.
Defaults to 30.

=item B<--proxy>=I<address>

Use I<address> for HTTP proxy. Example: "http://foo:1234".

=item B<--no-proxy>

Disable all use of HTTP proxy, even if http_proxy environment variable is set.

=item B<--cookie-jar>=I<file>

Enable cookies, which are otherwise rejected by default, and have libcurl
to write them to I<file>. Specify "-" to instead to have the cookies written
to stdout.

=back

B<Cache Options>

=over 4

=item B<--cache-file>=I<path>

Use I<path> instead of the default path. See L</FILES>.

=item B<-r, --cache-read>

Read video details from cache record if it exists. Allows clive to
skip video page fetching and parsing again. See L</CACHE> section for more
on this.

=item B<-d, --cache-dump>

Dump cache records to standard output.

=item B<--cache-dump-format>=I<format-string>

Used to format the output of the above. Defaults to "%n: %t [%f, %mMB]".

Example:
  % clive --cache-dump --cache-dump-format="%d: %t"

Supported format specifiers:
  %t .. video page title
  %i .. video id
  %h .. video host
  %l .. video file length (bytes)
  %m .. video file length (MB)
  %d .. date (last update)
  %T .. time (last update)
  %s .. time stamp (same as "%d %T")
  %f .. video file format
  %n .. index

=item B<--cache-grep>=I<pattern>

Grep stored cache records for I<pattern>. See also L</EXAMPLES - ADVANCED USE>.

=item B<-i, --cache-ignore-case>

Ignore case-differences while matching records.

=item B<-D, --cache-remove-record>

Remove matched records from cache.

=item B<--cache-clear>

Truncate cache records.

=item B<--no-cache>

Disable all (read and write) cache use.

=back

B<Download Options>

=over 4

=item B<-f, --format>=I<format>

Download the I<format> of the video. If I<format> is set to I<best>, clive
will attempt to download the best quality of the video.

Note that the I<format> is strictly host specific. See the L</FORMATS>
section for more on this.

=item B<-n, --no-extract>

Do not extract the video. In other words: simulate only to the point
that clive verifies the video link after fetching and parsing the
video page.

=item B<-O, --output-file>=I<file>

Write video to I<file>. Overwrites an already existing file.

Do not use this option when you are downloading more than one video
 on one go.

See also the note below.

=item B<-c, --continue>

Continue partially downloaded video file.

Note that, by default, clive appends a numeric suffix to the filename
if the file exists already. That is unless:

  * file is already completely retrieved, or:
  * -c or -O is used

=item B<--save-dir>=I<dir>

Save extracted videos to I<dir>. clive defaults to the current working
directory.

=item B<--cclass>=I<class>

Use character-I<class> to filter video page titles. Defaults to "\w".
This is a Perl regular expression character class. For example:
"[A-Za-z0-9]".

=item B<-C, --no-cclass>

Disables the use of B<--cclass>. Causes clive to use the video page
title as it is for output filename.

=item B<--filename-format>=I<format-string>

Use I<format-string> to format output video filenames. Default is "%t.%s".

Supported format specifiers:
  %t .. video page title (after applying character-class filter)
  %s .. video file suffix (e.g. "flv")
  %i .. video id
  %h .. video host

=item B<--exec>=I<command>;

Defines the I<command> to run when video file transfer completes.
Note that B<--exec-run> must be used to actually cause clive
to invoke the defined I<command>.

Optional arguments may be passed to the command. The expression must be
terminated by a semicolon (";"). If the specifier "%i" appears anywhere
in the I<command>, it is replaced by the pathname of the extracted
video file.

=item B<--exec>=I<command>+

Same as above but "%i" is replaced with as many path names as
possible for the invocation of I<command>.

=item B<-e, --exec-run>

Causes clive to invoke the command defined with B<--exec> when
transfer finishes.

=item B<--stream-exec>=I<command>

Define the command to be invoked with B<--stream> and B<--stream-pass>.
If a "%i" specifier is used in the I<command>, it will be replaced with
either video file path name (B<--stream>) or parsed video link
(B<--stream-pass>).

=item B<--stream>=I<percent>

Causes clive to fetch, parse, start download and eventually invoke
the command defined with B<--stream-exec> when the transfer reaches
the percentage defined with this option. See also L</EXAMPLES>.

Note that clive does nothing to check if there is enough data buffered
before invoking the B<--stream-exec> defined I<command>. If the transfer
rate drops significantly after starting the process and it runs out of data,
clive does nothing to fix this.

Also note that clive will not continue to download another file before the
child process exits.

This mode is supported for historical reasons. Consider using
B<--stream-pass> instead.

=item B<-s, --stream-pass>

Otherwise the same as above but B<instead of> starting the download,
clive passes the parsed video link to the command defined with
B<--stream-exec>. See also L</EXAMPLES>.

This option was inspired by a C<clive(1)> wrapper script contributed
by Bill Squire.

=item B<--limit-rate>=I<amount>

Limit transfer rate to I<amount> KB/s.

=item B<--stop-after>=I<size|percent>

Stop file transfer after I<size> or I<percent>. The value must
be terminated by either '%' or 'M'.

=back

=head1 EXAMPLES - BASIC USE

=over 4

=item clive "http://youtube.com/watch?v=3HD220e0bx4"

Extracts video (flv) from the above video page link. You
can then play the flv video file in a media player.

=item cat E<gt> url.lst

  http://en.sevenload.com/videos/IUL3gda-Funny-Football-Clips
  http://youtube.com/watch?v=3HD220e0bx4
  http://break.com/index/beach-tackle-whip-lash.html
  http://www.liveleak.com/view?i=704_1228511265

=item cat url.lst | clive

You can feed clive multiple video page links like this
or as command line arguments.

=item clive URL1 URL2 URL3 URL4

When you are using the pipes, be sure to separate each link with a newline.

=item xclip -o | clive

There are many X clipboard utilities. The above example uses C<xclip(1)>
and a pipe to paste (or feed) the contents to clive.

=item clive -l

Recall last video page link input. Regardless the way they were fed
to clive.

=back

=head1 EXAMPLES - ADVANCED USE

=over 4

=item clive -f best "http://youtube.com/watch?v=3HD220e0bx4"

Extract the best format of the video.

=item clive -r -f best "http://youtube.com/watch?v=3HD220e0bx4"

Same as above but read the cache record without fetching and
parsing the video page again.

=item clive --cache-dump

Dump all cache records to stdout. You can use --cache-dump-format
to format the output.

=item clive -ig 3hd2

Grep for "3hd2" pattern in cache records. If pattern matches, clive
continues to extract the matched videos. Note the use of "-i"
(--cache-ignore-case).

=item clive -ig 3hd2 -D

Same as above but removes the record from cache instead of extracting
the video.

=item clive --exec="ffmpeg -i %i -acodec libvorbis %i.ogg;" -e URL

Copy audio from downloaded video to ogg with C<ffmpeg(1)>.

=item clive --stream-exec="mplayer -really-quiet %i" --stream=25 URL

Start playing the video being extracted when the transfer reaches
25% complete.

=item echo '--stream-exec="mplayer -really-quiet %i"' >> ~/.cliverc

=item clive -s URL

Alternative to Adobe Flash. C<vlc(1)> and C<totem(1)> have been reported
to work also.

=back

=head1 FORMATS

clive downloads "flv" by default from all of the supported websites.

=over 4

=item B<youtube.com>

=item B<last.fm>

Format: (flv|fmt17|fmt18|fmt22|fmt35)

If --format option is not unused, clive defaults to whatever
Youtube defaults to. Technically speaking, we leaving the "&fmt="
from the video link.

Youtube likes to rehash these from time to time so don't be
surprised if, for example, the quality is not what you expected.
The same applies to the suffices listed below.

 YoutubeID Alias    Suffix  Resolution
 fmt22     hd       mp4     1280x720
 fmt35     hq       flv      640x380
 fmt18     mp4      mp4      480x360
 fmt34     -        flv      320x180 (quality reportedly varies)
 fmt17     3gp      3gp      176x144

You can use either, the "alias" (e.g. "hd") or the "YoutubeID"
(e.g. "fmt22") with --format. The aliases exist for historical
reasons. The suffix is parsed from the content-type field of
the returned HTTP header.

clive can also download videos that last.fm lists as Youtube
hosted videos.

=item B<video.google.com>

Format: (flv|mp4)

mp4 format is available for a limited number of videos.

=item B<dailymotion.com>

Format: (flv|hq|hd)

The HD and HQ videos may not always be available.

  hd    (1280x720)
  hq     (848x480)
  flv    (320x240) aka "sd"

=item B<spiegel.de>

Format: (flv|vp6_(64|576|928)|h264_1400)

  h264_1400 .. mp4 (996x560)
  vp6_928   .. flv (996x560)
  vp6_576   .. flv (560x315)
  flv       .. flv (180x100)
  vp6_64    .. flv (180x100)

Format: (3gp|small|iphone|podcast)

The data that clive parses indicates that these formats should be available
although we are yet to find a video with these formats available.
If you find one, let us know, too.

  3gp       .. 3gp (?)
  small     .. 3gp (?)
  iphone    .. mp4 (?)
  podcast   .. mp4 (?)

=item B<golem.de>

Format: (flv|high|ipod)

=item B<vimeo.com>

Format: (flv|hd)

HD should be available for the vimeo.com/hd channel videos at least.
Note that "flv" only means the "default flv". Some of the hosted
"default" videos are actually "mp4", not "flv".

For further reading:
  http://vimeo.com/help/hd

=item B<Other>

All other supported websites (see --host output) support
the flv format only.

=back

=head1 FILES

Should HOME environment variable be undefined for some reason, clive will
use the current working directory instead.

=over 4

=item $HOME/.cliverc, $HOME/.clive/config, $HOME/.config/clive/config

User configuration file. For example:
  % cat >> ~/.cliverc
  -f best
  --proxy=http://foo:1234

=item $HOME/.cache/clive/last

File containing the last user input (video page links).

You can use --last-file to override the path, e.g.:
  --last-file=/path/to/last/file.

You can also define this option in the config file.

See also CLIVE_CACHE notes below.

=item $HOME/.cache/clive/cache

BerkeleyDB based cache file containing the records of fetched
and parsed video pages.

You can use --cache-file to override the path., e.g.:
  --cache-file=/path/to/cache/file.

You can also define this option in the config file.

See also CLIVE_CACHE notes below.

=item Notes: CLIVE_CACHE

clive defaults to use $HOME/.cache/clive/ for "last" and
"cache" files described above.

The use of the default path can be overridden by
CLIVE_CACHE environment variable. Note that clive
will attempt to create the specified path recursively.

Examples:
  setenv CLIVE_CACHE /home/user/cachedata (in csh terms)
  clive # will read/write /home/user/cachedata/(last|cache)

  unsetenv CLIVE_CACHE
  clive # read/write $HOME/clive/(last|cache)

  clive --last-file=mylast --cache-file=cachedata/mycache
    # read/write "mylast" file, read/write cachedata/mycache file

=back

=head1 CACHE

The purpose of the cache is to allow clive to skip fetching
and parsing the video page again. It does not contain any
actual video data so one should not expect to recover a deleted
video file from the cache. Only some of the parsed details
are stored as records to the cache.

By now, it is should be a well known fact that the cache fails
with some of the supported hosts. For example Youtube video links
expire after some time, this causes the re-extraction to fail if
the cached video link is used later again.

This was the main reason why in 2.2.0 reading from cache was
disabled by default. Many users reported the reuse of expired
video links as a bug previously even though it was well documented
in the manual page explaining that most of the HTTP 403/404 errors
were actually caused by expired video links.

It is, of course, still possible to read from cache. You can
enable this by invoking the --cache-read option. This causes
clive to look up a saved cache record and reuse the stored
video details if they are found instead of fetching the video
page.

The use of the cache can be disabled with the --no-cache option.
This disables both read and write. Note that if the BerkeleyDB Perl
module is not installed, clive will not use the cache.

See also the --cache-grep option.

=head1 UNICODE

Q: Why am I seeing mangled video filenames?

A: Make sure you have set appropriate locale. For example (in csh/urxvt terms):
  % setenv LANG en_US.UTF-8
  % urxvt &

You can get a list of supported locales on your typical Unix-like system with:
  % locale -a

=head1 DEBUGGING

Some tips that we have found useful:

  % clive --debug URL

Causes B<libcurl> to be verbose.

  % clive -n URL

Simulates video extraction only.

=head1 BUGS

Sure to be some.

Please report them:
  <http://code.google.com/p/clive/issues/>

=head1 EXIT STATUS

clive exits 0 on success, and E<gt>0 if an error occurs.

 CLIVE_OK          = 0
 CLIVE_NOTHINGTODO = 1    # file already retrieved
 CLIVE_NOSUPPORT   = 2    # host not supported
 CLIVE_READ        = 3    # file open/read error
 CLIVE_GREP        = 4    # grep: nothing matched in cache
 CLIVE_OPTARG      = 5    # invalid option argument
 CLIVE_SYSTEM      = 6    # system call failed (e.g. fork)
 CLIVE_REGEXP      = 7    # regexp pattern matching failed
 CLIVE_FORMAT      = 8    # requested format unavailable
 CLIVE_NET         = 9    # network error
 CLIVE_STOP        = 10   # --stop-after
 CLIVE_MARKEDBROKEN = 11  # support marked broken

=head1 OTHER

Project page:
  <http://clive.googlecode.com/>

Front-end (GUI):
  <http://abby.googlecode.com/>

Development code:
  % git clone git://repo.or.cz/clive.git

=head1 HISTORY

  * Originally written in Python
  * Rewritten in Perl for 2.0.0

=head1 SEE ALSO

C<cclive(1)>

=head1 AUTHOR

Toni Gundogdu <legatvs@gmail.com>

Thanks to all those who have contributed to the project
by sending patches, reporting bugs and writing feedback.
You know who you are.

=cut
