#318232 - 16/01/2009 17:52
Re: bash scripting (xml parsing) help... (mainly awk and sed)
[Re: hybrid8]
|
carpal tunnel
Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
|
Yeah, this part: wget -c -O "$MOVIESAVEPATH/$NEWPREVIEWNAME" $PREVIEW1080p; PREVIEWOUT1080p=$?
if [ $PREVIEWOUT1080p -eq 0 ]; then Is more commonly written as just: wget -c -O "$MOVIESAVEPATH/$NEWPREVIEWNAME" $PREVIEW1080p
if [ $? -eq 0 ]; then There's not really anything wrong with the way you have it, but it seems ... wasteful. There's no way to embed the wget inside the if. Well, I suppose you could do this: if [ `wget ....; echo $?` -eq 0 ]; then but don't.
_________________________
Bitt Faulk
|
Top
|
|
|
|
#318233 - 16/01/2009 18:14
Re: bash scripting (xml parsing) help... (mainly awk and sed)
[Re: wfaulk]
|
carpal tunnel
Registered: 24/12/2001
Posts: 5528
|
Implementing this in Perl would be cleaner IMO.
|
Top
|
|
|
|
#318234 - 16/01/2009 18:15
Re: bash scripting (xml parsing) help... (mainly awk and sed)
[Re: tman]
|
carpal tunnel
Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
|
Absolutely. But Bruno keeps saying that he wants to stick with bash, sed, and awk.
_________________________
Bitt Faulk
|
Top
|
|
|
|
#318235 - 16/01/2009 18:15
Re: bash scripting (xml parsing) help... (mainly awk and sed)
[Re: hybrid8]
|
carpal tunnel
Registered: 24/12/2001
Posts: 5528
|
Or if it was something I needed to host remotely, I'd definitely do it in PHP. You can run PHP standalone and not as part of a webserver.
|
Top
|
|
|
|
#318236 - 16/01/2009 18:20
Re: bash scripting (xml parsing) help... (mainly awk and sed)
[Re: wfaulk]
|
carpal tunnel
Registered: 13/07/2000
Posts: 4180
Loc: Cambridge, England
|
There's no way to embed the wget inside the if. Well, I suppose you could do this: if [ `wget ....; echo $?` -eq 0 ]; then but don't. If you only need the return code in order to drive the "if", what's wrong with: ...? Peter
|
Top
|
|
|
|
#318237 - 16/01/2009 19:05
Re: bash scripting (xml parsing) help... (mainly awk and sed)
[Re: wfaulk]
|
carpal tunnel
Registered: 12/11/2001
Posts: 7738
Loc: Toronto, CANADA
|
Nonono... I absolutely loath bash. Hadn't I made that clear? What I said was that I had cygwin installed on my Windows machine and therefore bash was already there. And importantly that I had sourced this script which was already bash and wanted to avoid having to re-write the whole thing myself. Prior to you (Bitt) cleaning everything up, I was ready to install PHP so that I could rewrite the whole thing in PHP-CLI (still using XMLSTARLET though ) Trevor, I found out last night about installing PHP for use without a web server. If I had known that someone would have taken the time to pretty much re-write the whole thing for me, I would have said to feel free and do it in Perl or PHP. You guys are too nice however and also I wasn't trying to put anyone out. I thought maybe someone would just try it out to tell me if it was hanging for them or not and then I'd just resign myself to doing it all from (mostly) scratch. And Peter, with regards to the If wget... That's exactly how I'd do it in PHP, so that's what I was asking about the implementation in bash. Anyway, just another amazing example of the empegBBS circle of friends.
Edited by hybrid8 (16/01/2009 19:06)
|
Top
|
|
|
|
#318238 - 16/01/2009 19:30
Re: bash scripting (xml parsing) help... (mainly awk and sed)
[Re: peter]
|
carpal tunnel
Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
|
If you only need the return code in order to drive the "if", what's wrong with: ...? Well, you can only check for 0 vs. not-0 that way. It will work in this case, but there's no way to test for other exit codes.
_________________________
Bitt Faulk
|
Top
|
|
|
|
#318241 - 16/01/2009 22:11
Re: bash scripting (xml parsing) help... (mainly awk and sed)
[Re: hybrid8]
|
Mojo
Unregistered
|
Nonono... I absolutely loath bash. Hadn't I made that clear?
Me too. No offense to anyone who contributed to that script, but it's ugly and cryptic. I wanted a similar script after reading this thread, so I decided to write my own during my lunch break. Here it is. Why do I like mine better? Well it's a whole 27 lines shorter Also it doesn't have any external dependencies like xmlstarlet or awk or wget. And it's cross-platform; it'll run on Windows, Mac OS X, Linux, Solaris... I also find it to be much easier to read, which means much easier to edit and maintain. What do you need? You need a Tcl interpreter. If you're on Mac OS X or Linux then you're all set because Tcl should already be installed. If you're on Windows, install Tcl: http://www.activestate.com/activetcl/downloads/The destination folder for the trailers ($TargetDir) is set to the current working directory by default. So if you put the script in C:\whatever\trailers then you would do this: > cd C:\whatever\trailers > tclsh % source GetTrailers.tcl The files are organized by each movie title being the name of a folder which contains the trailer, large & extra large poster images, and a movieinfo.xml file that contains the relevant xml data pertaining to that movie (so that you have all the good info in there for some future use). #! /usr/bin/tclsh
# Location of the raw XML movie index.
set FeedsURL "http://www.apple.com/trailers/home/xml/current.xml"
# Download to the current directory.
set TargetDir [pwd]
# We'll use this global variable for the raw XML.
set FeedsXML ""
# And this will be for our organized movie data.
array set Movies [list]
# Load this standard Tcl package.
package require http
# Parses all relevant data for the next listed movie in the XML data, starting at the specified character index.
proc parseNextMovie {index} {
global FeedsXML Movies
set startIndex [string first {<movieinfo id="} $FeedsXML $index]
set endIndex [string first {</movieinfo>} $FeedsXML $index]
incr endIndex 11
set xml [string range $FeedsXML $startIndex $endIndex]
if { $startIndex == -1 } {
# There are no more movies to be parsed
return -1
}
# Parse the movie title.
set index [string first {<title>} $xml]
incr index 7
set end [string first {</title>} $xml $index]
incr end -1
set title [cleanTitle [string range $xml $index $end]]
# Parse the large movie poster URL.
set index [string first {<poster><location>} $xml]
incr index 18
set end [string first {</location>} $xml $index]
incr end -1
set posterLargeURL [string range $xml $index $end]
# Parse the extra large movie poster URL.
set index [string first {<xlarge>} $xml]
incr index 8
set end [string first {</xlarge>} $xml $index]
incr end -1
set posterXLargeURL [string range $xml $index $end]
# Parse the trailer URL.
set index [string first {<preview>} $xml]
set index [string first {">} $xml $index]
incr index 2
set end [string first {</} $xml $index]
incr end -1
set trailerURL [string range $xml $index $end]
# Save all this info in our Movies array.
set Movies($title) [list $xml $posterLargeURL $posterXLargeURL $trailerURL]
# Return the ending character index for this movie within $FeedsXML.
return $endIndex
}
# Downloads the specified movie trailer and posters.
proc downloadMovie {title} {
global Movies ProgressBar TargetDir
if { ![info exists Movies($title)] } {
return
}
set xml [lindex $Movies($title) 0]
set posterLargeURL [lindex $Movies($title) 1]
set posterXLargeURL [lindex $Movies($title) 2]
set trailerURL [lindex $Movies($title) 3]
# Download the posters.
# Use [catch] just in case the URLs are bad, which they would be if Apple
# didn't provide posters for a certain movie for some reason.
set fileToken [open temp_poster_l w]
fconfigure $fileToken -translation binary
catch {
set httpToken [http::geturl $posterLargeURL -channel $fileToken]
http::cleanup $token
}
close $fileToken
set fileToken [open temp_poster_xl w]
fconfigure $fileToken -translation binary
catch {
set httpToken [http::geturl $posterXLargeURL -channel $fileToken]
http::cleanup $token
}
close $fileToken
# Download the trailer.
set ProgressBar -1
set fileToken [open temp_trailer w]
fconfigure $fileToken -translation binary
catch {
set httpToken [http::geturl $trailerURL -channel $fileToken -progress downloadProgress]
http::cleanup $token
}
close $fileToken
# Create a new directory for our freshly downloaded movie.
set dir $TargetDir/$title
file mkdir $dir
# Move all of our movie files into this directory.
file rename temp_poster_l $dir/poster_l.[file extension $posterLargeURL]
file rename temp_poster_xl $dir/poster_xl.[file extension $posterXLargeURL]
file rename temp_trailer $dir/[file tail $trailerURL]
# Save the xml data pertaining to this movie as movieinfo.xml.
set token [open $dir/movieinfo.xml w]
puts $token $xml
close $token
return
}
# Callback procedure for downloads that keeps us informed of the download progress.
proc downloadProgress {token total current} {
global ProgressBar
# Initiate ProgressBar if necessary.
if { $ProgressBar < 0 } {
set ProgressBar 0
puts "<------------------>"
flush stdout
}
# Calculate the number of progress bars that should be displayed.
set bytesPerBar [expr { 1.0 * $total / 20 }]
set bars [expr { int($current / $bytesPerBar) }]
while { $ProgressBar < $bars } {
puts -nonewline "|"
flush stdout
incr ProgressBar
}
return
}
# Replaces undesirable or incompatible characters with friendlier ones.
proc cleanTitle {title} {
set title [string map {
> >
< <
" \"
” \"
„ \"
‘ \"
’ \"
‚ ,
& &
> )
< (
: -
/ -
\\ -
? ""
| -
* +
} $title]
return $title
}
# Download the movie index.
set token [http::geturl $FeedsURL]
set FeedsXML [encoding convertfrom utf-8 [http::data $token]]
http::cleanup $token
# Loop through the XML, parsing movie data until there are no more movies to parse.
set index 0
while { $index > -1 } {
set index [parseNextMovie $index]
}
# Let's see which movies we already have downloaded and remove them from our Movies array.
# We'll see what directories are in our $TargetDir, and assume each is the name of a movie.
foreach file [glob -directory $TargetDir -nocomplain -tails *] {
if { [file isdirectory $file] } {
if { [info exists Movies($file)] } {
unset Movies($file)
}
}
}
# Now our Movies array only contains movies which haven't been downloaded yet. Let's download them one by one.
set titles [lsort -dictionary -increasing [array names Movies]]
set count 0
foreach title $titles {
incr count
puts "\nDownloading $count/[llength $titles] \"$title\""
downloadMovie $title
}
Attachments
GetTrailers.tcl (194 downloads)
|
Top
|
|
|
|
#318242 - 17/01/2009 00:36
Re: bash scripting (xml parsing) help... (mainly awk and sed)
[Re: ]
|
carpal tunnel
Registered: 12/11/2001
Posts: 7738
Loc: Toronto, CANADA
|
Umm, let me amend what I said before... I loath bash, but tcl looks like a huge pain in the ass and makes literally no sense whatsoever to someone used to programming in a sensible programming language. I think I can easily shave 20-30 lines from the original bash script, especially by optimizing the conditionals (they're currently done in a really lame way), but that tcl version looks infinitely more complicated to maintain. No offense of course as I imagine it's just the nature of the syntax. Benefits of the bash/awk/xmlstarlet solution: -Final xml metadata save file is formatted cleanly for human readability as well as programatic processing -The script can handle getting HD versions of the trailers - I didn't see the ability to get anything but what was specified in the XML feed in the tcl version -you can delete trailers or trailer folders that have been downloaded and the script will not re-download them - that benefit is realized by storing a list of downloaded trailers in a data file. This, IMO, is a very important necessity unless you want to keep a copy of absolutely everything going forward. -Not in the version I pasted, but easy to modify, all visual output can be hidden making it even better suited for scheduling automated launching. I don't intend to run the script manually except while testing it initially. I had a binary program for Windows that did a similar task but it didn't keep track of what had already been downloaded using an external file, so it suffered the same problem as the tcl above, plus I didn't like the way it names the movies. It also didn't allow downloading any resolution like the bash solution does, nor did it save the XML data for the movies (though a version was released as a plugin for Media Portal which I think may have done that specific to that host app). All things considered this is a really trivial problem to solve and my only problem was using the tools for the already existing script (bash, xmlstarlet, awk and sed) which I have only ever touched so briefly in the past. I still think this implemented in PERL or PHP would be a lot easier to follow from a code perspective and could be done a lot cleaner. At this point however this is working fine so I'm not concerned with redoing it. It does require the UTF-8 patch for cygwin, but it's for my own personal use anyway. Other people can feel free to take it to whatever next level they want.
|
Top
|
|
|
|
#318248 - 17/01/2009 04:15
Re: bash scripting (xml parsing) help... (mainly awk and sed)
[Re: hybrid8]
|
Mojo
Unregistered
|
Well suit yourself I obviously disagree that the bash script is more readable than Tcl. Also, if I used xmlstarlet and wget, I would shave about 130 lines off of that code. I could easily make it a quarter of the size of that bash script. Downloading the high-res trailers is a trivial modification. Also, I think it's a much cleaner solution to check the folder for existing movies than to maintain a flat file. I'm not looking to delete a movie to save 20 mb of diskspace, and the previews are rotated every so often anyways. If I was going to persist data, I would just go ahead and make a full-fledged sqlite database containing all of the movie info provided by apple. I suppose though that we are after two different solutions. You want to see new and upcoming movies on your TV. So I'm sure disk space is limited and you only want to see new stuff that you're interested in anyways and don't want to re-download trailers you've deleted because they don't interest you. I, on the other hand, would like to have the preview of every movie ever made if I could. I enjoy doing things like mirroring wikipedia. Still, I'd take Tcl over bash for a 200 line script any day. By all means though, use what works for you.
|
Top
|
|
|
|
#318252 - 17/01/2009 12:10
Re: bash scripting (xml parsing) help... (mainly awk and sed)
[Re: ]
|
carpal tunnel
Registered: 12/11/2001
Posts: 7738
Loc: Toronto, CANADA
|
The HD trailers are anywhere from 20-200MB each, but my disk space isn't that limited as I'v got about 3TB on my media server right now. 1.5TB of that is mirrored raid.
I will keep SOME trailers persistently, but I won't keep ALL trailers persistently. Basically this is how I'll manage the whole affair...
I will be downloading everything that gets listed in the Apple feed. This allows me to see what's coming up. I will only keep trailers for movies I have or that I would like to watch, which means some will be deleted. This allows me to review trailers for stuff that's already out as well.
As I purchase full length movies, I store them on my media server. I'm totally getting away from physical media. I did it for music years ago and now it's time to cut the cord for video.
Now, along with the full length feature, I have a good chance of also having the trailer for that feature saved along with it. This allows me (or other members of the house and friends) to take a quick peek at the trailer and read a synopsis to see if that's the movie they want to fire up and watch.
|
Top
|
|
|
|
#318518 - 27/01/2009 21:59
Re: bash scripting (xml parsing) help... (mainly awk and sed)
[Re: hybrid8]
|
carpal tunnel
Registered: 12/11/2001
Posts: 7738
Loc: Toronto, CANADA
|
Ok, hopefully someone is still feeling generous enough to help with a few new syntax questions... I'm modifying the script to check for trailer that have been updated (trailer 2 etc..) and I have the checks for that working fine using posting dates. Now I need to provide an OR condition when it comes time to save the downloads. The existing IF is this:
if ! grep -q "###$MOVIEID.PREVIEW" $DLDBPATH.downloaded.db; then I would like to include an OR for variable $getUpdate equal to 1. I've tried a couple of things that didn't work, such as this: if ( ! grep -q "###$MOVIEID.PREVIEW" $DLDBPATH.downloaded.db ) || [ "$getUpdate" -eq "1" ]; then and this if (( ! grep -q "###$MOVIEID.PREVIEW" $DLDBPATH.downloaded.db ) || [ "$getUpdate" -eq "1" ]); then The error produced is: "integer expression expected" - which is unexpected to me since an integer is what I thought I was expressing. In an earlier part of the script I'd like to check if xmlstarlet fails to open an input file.
oldPostDate=(`$XMLSTARLET sel -E utf-8 -D -T -t -m "/records/movieinfo" \
-v "info/postdate" \
"$MOVIESAVEPATH/description.xml" `)
Even when this produces an error the script works, but I'd like to skip performing a few steps if this isn't able to pull the data I'm expecting. Lastly, with regards to the same call just posted above, does anyone know any way that I can pass in the variable $MOVIESAVEPATH if it contains an apostrophe (single quote)? Will I have to massage it first to escape out that character before using it with xmlstarlet? It's a valid filename character but if one exists, it causes xmlstarlet to interpret it as a quote which then causes the syntax and passed params to break.
|
Top
|
|
|
|
#318519 - 27/01/2009 22:08
Re: bash scripting (xml parsing) help... (mainly awk and sed)
[Re: hybrid8]
|
carpal tunnel
Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
|
This is one of the reasons you shouldn't (IMO) get in the habit of using programs as arguments to if. grep -q "###$MOVIEID.PREVIEW" $DLDBPATH.downloaded.db
if [ ! "$?" -o "$getUpdate" -eq "1" ]; then Actually, "[" is a program. It's the same as "test". If you are obsessed with having the program inside the if, this is the correct syntax: if ! ( grep -q "###$MOVIEID.PREVIEW" $DLDBPATH.downloaded.db && [ "$getUpdate" -ne "1" ] ); then Notice that I had to invert some boolean operators to get it to work.
_________________________
Bitt Faulk
|
Top
|
|
|
|
#318520 - 27/01/2009 22:14
Re: bash scripting (xml parsing) help... (mainly awk and sed)
[Re: wfaulk]
|
carpal tunnel
Registered: 12/11/2001
Posts: 7738
Loc: Toronto, CANADA
|
I was editing/adding when you posted the reply, thanks Bitt.
With regards to the part I just added about checking to see if xmlstarlet fails to open its input file, I tried using the "$?" check as used elsewhere but this didn't work. It always seems to equal 0 when xmlstarlet fails to find the input file.
|
Top
|
|
|
|
#318521 - 27/01/2009 22:18
Re: bash scripting (xml parsing) help... (mainly awk and sed)
[Re: hybrid8]
|
carpal tunnel
Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
|
$? relies on the program it's checking on to be well behaved and return a useful return value. If it always just exits with the return code of 0 (which is the default and signifies success), even if there is an error, then there's not much you can do. Other than checking the file yourself manually to begin with.
_________________________
Bitt Faulk
|
Top
|
|
|
|
#318522 - 27/01/2009 22:24
Re: bash scripting (xml parsing) help... (mainly awk and sed)
[Re: hybrid8]
|
carpal tunnel
Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
|
does anyone know any way that I can pass in the variable $MOVIESAVEPATH if it contains an apostrophe (single quote)? Will I have to massage it first to escape out that character before using it with xmlstarlet? Unless there's some oddness in xmlstarlet, you should be good the way you are. Generally speaking, commands don't get interpolated twice. You're actually in a situation where they might because you're using backticks, but I'm pretty sure you're okay in this instance: % var="this's an apostrophe"
% echo `echo "$var"`
this's an apostrophe
_________________________
Bitt Faulk
|
Top
|
|
|
|
#318524 - 27/01/2009 22:38
Re: bash scripting (xml parsing) help... (mainly awk and sed)
[Re: wfaulk]
|
carpal tunnel
Registered: 12/11/2001
Posts: 7738
Loc: Toronto, CANADA
|
I'll go with the grep outside the IF. I didn't write that conditional and wouldn't have done it that way if I had. Especially since that grep command is issued numerous times throughout the script. I'd rather do it once and then store the result in a variable.
|
Top
|
|
|
|
#318525 - 27/01/2009 22:48
Re: bash scripting (xml parsing) help... (mainly awk and sed)
[Re: hybrid8]
|
carpal tunnel
Registered: 12/11/2001
Posts: 7738
Loc: Toronto, CANADA
|
I know xmlstarlet fails when I pass that var containing an apostrophe.
It produces an XPATH error: Invalid Expression
The output shows an arrow pointing to the first character after the apostrophe. In this case the variable contained "He's Just Not That Into You" and the arrow (a carat) was below the first "s"
The whole error output echoed back the variable name in single quotes, so perhaps when xmlstarlet sees its passed parameters from bash it's seeing them in single quotes? This would then obviously cause the first open quote to be closed by the apostrophe.
|
Top
|
|
|
|
#318528 - 28/01/2009 00:20
Re: bash scripting (xml parsing) help... (mainly awk and sed)
[Re: hybrid8]
|
carpal tunnel
Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
|
No, it's telling you that an apostrophe is invalid input.
_________________________
Bitt Faulk
|
Top
|
|
|
|
#319785 - 25/02/2009 18:05
Re: bash scripting (xml parsing) help... (mainly awk and sed)
[Re: wfaulk]
|
new poster
Registered: 25/02/2009
Posts: 2
|
That's actually a script I threw together for my media server and happened to stumble upon this thread doing a search for updated Apple XML feeds. Apple has broken things several times now which required some work-arounds over time. The 720p feed appears to recently become stale which I'll be post an updated version shortly to deal with this. Here is where I post updated versions of the script: http://majjix.com/luke/blog/081013/automatically-downloading-quicktime-trailers-and-posters
Edited by lstepnio (25/02/2009 18:37)
|
Top
|
|
|
|
#319790 - 26/02/2009 00:04
Re: bash scripting (xml parsing) help... (mainly awk and sed)
[Re: lstepnio]
|
carpal tunnel
Registered: 12/11/2001
Posts: 7738
Loc: Toronto, CANADA
|
I stopped using the 720 feed if you look closely at the last copies I posted. Your script didn't deal with different filename standards nor file extensions that were present in Apple's feed, so instead I look only at the base feed and instead create the correct filenames for the different sizes as required. The filename formats I've included in my changes are the ones necessary to get the correct/full downloads of the movie files. The filename formats included in the initial script you wrote wouldn't work for all the posted videos. I've made a number of of other changes that aren't included in the copy I last put up here and it's been running flawlessly since that time. The only drawback stems from the simple fact that Apple simply doesn't post all its videos to the XML feed. The most robust solution would instead use their RSS feed, but it's a lot more complicated to parse out as it also receives content such film excerpts. A very robust (I only did a quick look) script exists for MythTV that uses the RSS feed. Since I had what I needed working I didn't look into porting it though. Someday when I have some time I'll re-write this in PHP, since bash scripting is just about the most useless and ugly pile of crap I've ever had the displeasure of working with. No offense to anyone who happens to like bash scripting of course. If I have the chance in the next few days I'll send up my current version and you can take a look at the changes I've made.
|
Top
|
|
|
|
#319794 - 26/02/2009 03:33
Re: bash scripting (xml parsing) help... (mainly awk and sed)
[Re: hybrid8]
|
new poster
Registered: 25/02/2009
Posts: 2
|
The script from the MP forums was the initial script which didn't deal with the issues that came up over time which you have described. The issues you describe have been addressed as they appeared in the feeds. You'll find over time that Apple seems to make what appear to be totally random and inconsistent changes. The latest issue was that the 720p feed appears to be stale for the past week or so. I've worked around this recent by attempting to guess the HD file names based on the SD feed which is working on all but one item in the current feeds. Please post your source as it would be nice to incorporate any improvements into my usage. Here's the source for the current version: http://majjix.com/code/090219/appletrailersI was recently pointed to this website which appears to be promising for a feed source: http://www.hd-trailers.net/The feed they have available seems a bit chaotic and will be more trouble to parse. http://www.hd-trailers.net/blog/feed/If the feed was a bit better this would a better source as they list Apple and Yahoo trailers. :cheers:
|
Top
|
|
|
|
|
|