AppleScript: Essential Sub-Routines

HTML Routines

AppleScript scripts are often used to read and write HTML text. The following sub-routines help automate some common tasks involving HTML markup.

Converting RGB to HTML Color

The following sub-routine can be used to convert RGB color values to the format used in HTML documents.

An RGB color is stated as list of three numbers, each with a value between 0 and 65535. The following sub-routine converts those values to 8-bit or 256 color-based values which are then converted to their corresponding HEX values.

To use the sub-routine, pass it a list of RBG values and it will return the HTML code matching the passed RGB color

A sub-routine to convert RGB values to HTML format:

on RBG_to_HTML(RGB_values)
-- NOTE: this sub-routine expects the RBG values to be from 0 to 65535
set the hex_list to {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A", "B", "C", "D", "E", "F"}
set the the hex_value to ""
repeat with i from 1 to the count of the RGB_values
set this_value to (item i of the RGB_values) div 256
if this_value is 256 then set this_value to 255
set x to item ((this_value div 16) + 1) of the hex_list
set y to item (((this_value / 16 mod 1) * 16) + 1) of the hex_list
set the hex_value to (the hex_value & x & y) as string
end repeat
return ("#" & the hex_value) as string
end RBG_to_HTML

Here's an example of how to call the sub-routine using values from a color picker dialog:

Using the values from a color picker dialog:

set the RGB_value to (choose color default color {65535, 0, 0})
set the HTML_colorvalue to my RBG_to_HTML(RGB_value)

Removing Markup Codes From Text

This sub-routine can be used to remove angle bracket enclosed tags from text passed to the sub-routine.

set this_text to "This is a <B>great</B> time to own a Mac!"
remove_markup(this_text)
--> returns: "This is a great time to own a Mac!"

Here's the sub-routine:

A sub-routine for removing tags from text:

on remove_markup(this_text)
set copy_flag to true
set the clean_text to ""
repeat with this_char in this_text
set this_char to the contents of this_char
if this_char is "<" then
set the copy_flag to false
else if this_char is ">" then
set the copy_flag to true
else if the copy_flag is true then
set the clean_text to the clean_text & this_char as string
end if
end repeat
return the clean_text
end remove_markup

Parsing an HTML File

The following large sub-routine can be used to extract specific tags and their contents from HTML text.

The routine will return all matches of a specific opening and closing tag combination passed to the sub-routine.

There is also a parameter for indicating whether to include the specific enclosing tags with the returned text.

You can use this sub-routine to do the following:

Return All Links in an HTML Document

Pass the file path to the sub-routine as the first parameter. Leave the other settings as shown.

read_parse (this_file, "<A HREF=", "</A>", false)
--> <A HREF="http://www.apple.com/fileA.html">click here</A>
--> <A HREF="http://www.apple.com/fileB.html">click here</A>

Return All Images in an HTML Document

Pass the file path to the sub-routine as the first parameter. Leave the other settings as shown. Note the passed value for the closing tag parameter is a null string (""). The sub-routine is written to pass the results as single tagged elements if the closing tag parameter is null.

read_parse(this_file, "<IMG ", "", false)
--> <IMG SRC="gfx/clipboard.gif" BORDER="0">
--> <IMG SRC="printer_stopped.gif" ALIGN=TOP WIDTH="32" HEIGHT="32" BORDER="0">
--> <IMG SRC="printer_on.gif" ALIGN=TOP WIDTH="32" HEIGHT="32" BORDER="0">

Return All Tables in an HTML Document

Pass the file path to the sub-routine as the first parameter. Leave the other settings as shown.

read_parse(this_file, "<TABLE", "</TABLE>", false)
(*
<TABLE WIDTH="440">
<TR>
<TD ALIGN="CENTER" VALIGN="TOP">
<IMG SRC="gfx/clipboard.gif" BORDER="0">
</TD>
</TR>
</TABLE>
*)

Here's the sub-routine:

A sub-routine for extracting tags from an HTML file:

on read_parse(this_file, opening_tag, closing_tag, contents_only)
try
set this_file to this_file as text
set this_file to open for access file this_file
set the combined_results to ""
set the open_tag to ""
repeat
read this_file before "<" -- start of a tag
set this_tag to read this_file until ">" -- end of a tag
-- to make up for a bug in the "read before" command
if this_tag does not start with "<" then set this_tag to ("<" & this_tag) as string
-- EXAMINE THE TAG
if this_tag begins with the opening_tag then
--store the complete tag, not just the search string
set the open_tag to this_tag
-- check for single tag indicator
if the closing_tag is "" then
if the combined_results is "" then
set the combined_results to the combined_results & the open_tag
else
set the combined_results to the combined_results & return & the open_tag
end if
else
-- reset the text buffer
set the text_buffer to ""
-- extract the contents between the open and close tags
repeat
set the text_buffer to the text_buffer & ¬
(read this_file before "<") -- start of a tag
set the tag_buffer to read this_file until ">" -- end of a tag
-- to make up for a bug in the "read before" command
if the tag_buffer does not start with "<" then set the tag_buffer to ("<" & the tag_buffer)
-- check for the closing tag
if the tag_buffer is the closing_tag then
if contents_only is false then
set the text_buffer to the open_tag & the text_buffer & the tag_buffer
end if
if the combined_results is "" then
set the combined_results to the combined_results & the text_buffer
else
set the combined_results to the combined_results & return & the text_buffer
end if
exit repeat
else
set the text_buffer to the text_buffer & the tag_buffer
end if
end repeat
end if
end if
end repeat
close access this_file
on error error_msg number error_num
try
close access this_file
end try
if error_num is not -39 then return false
end try
return the combined_results
end read_parse

HTML Routines

Converting RGB to HTML Color

Removing Markup Codes From Text

Parsing an HTML File

Return All Links in an HTML Document

Return All Images in an HTML Document

Return All Tables in an HTML Document

Topics