HTML Routines
AppleScript scripts are often used to read and write HTML text. The following sub-routines help automate some common tasks involving HTML markup.
Converting RGB to HTML Color
The following sub-routine can be used to convert RGB color values to the format used in HTML documents.
An RGB color is stated as list of three numbers, each with a value between 0 and 65535. The following sub-routine converts those values to 8-bit or 256 color-based values which are then converted to their corresponding HEX values.
To use the sub-routine, pass it a list of RBG values and it will return the HTML code matching the passed RGB color
on RBG_to_HTML(RGB_values)
-- NOTE: this sub-routine expects the RBG values to be from 0 to 65535
set the hex_list to {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A", "B", "C", "D", "E", "F"}
set the the hex_value to ""
repeat with i from 1 to the count of the RGB_values
set this_value to (item i of the RGB_values) div 256
if this_value is 256 then set this_value to 255
set x to item ((this_value div 16) + 1) of the hex_list
set y to item (((this_value / 16 mod 1) * 16) + 1) of the hex_list
set the hex_value to (the hex_value & x & y) as string
end repeat
return ("#" & the hex_value) as string
end RBG_to_HTML
Here's an example of how to call the sub-routine using values from a color picker dialog:
set the RGB_value to (choose color default color {65535, 0, 0})
set the HTML_colorvalue to my RBG_to_HTML(RGB_value)
Removing Markup Codes From Text
This sub-routine can be used to remove angle bracket enclosed tags from text passed to the sub-routine.
set this_text to "This is a <B>great</B> time to own a Mac!"
remove_markup(this_text)
--> returns: "This is a great time to own a Mac!"
Here's the sub-routine:
on remove_markup(this_text)
set copy_flag to true
set the clean_text to ""
repeat with this_char in this_text
set this_char to the contents of this_char
if this_char is "<" then
set the copy_flag to false
else if this_char is ">" then
set the copy_flag to true
else if the copy_flag is true then
set the clean_text to the clean_text & this_char as string
end if
end repeat
return the clean_text
end remove_markup
Parsing an HTML File
The following large sub-routine can be used to extract specific tags and their contents from HTML text.
The routine will return all matches of a specific opening and closing tag combination passed to the sub-routine.
There is also a parameter for indicating whether to include the specific enclosing tags with the returned text.
You can use this sub-routine to do the following:
Return All Links in an HTML Document
Pass the file path to the sub-routine as the first parameter. Leave the other settings as shown.
read_parse (this_file, "<A HREF=", "</A>", false)
--> <A HREF="http://www.apple.com/fileA.html">click here</A>
--> <A HREF="http://www.apple.com/fileB.html">click here</A>
Return All Images in an HTML Document
Pass the file path to the sub-routine as the first parameter. Leave the other settings as shown. Note the passed value for the closing tag parameter is a null string (""). The sub-routine is written to pass the results as single tagged elements if the closing tag parameter is null.
read_parse(this_file, "<IMG ", "", false)
--> <IMG SRC="gfx/clipboard.gif" BORDER="0">
--> <IMG SRC="printer_stopped.gif" ALIGN=TOP WIDTH="32" HEIGHT="32" BORDER="0">
--> <IMG SRC="printer_on.gif" ALIGN=TOP WIDTH="32" HEIGHT="32" BORDER="0">
Return All Tables in an HTML Document
Pass the file path to the sub-routine as the first parameter. Leave the other settings as shown.
read_parse(this_file, "<TABLE", "</TABLE>", false)
(*
<TABLE WIDTH="440">
<TR>
<TD ALIGN="CENTER" VALIGN="TOP">
<IMG SRC="gfx/clipboard.gif" BORDER="0">
</TD>
</TR>
</TABLE>
*)
Here's the sub-routine:
on read_parse(this_file, opening_tag, closing_tag, contents_only)
try
set this_file to this_file as text
set this_file to open for access file this_file
set the combined_results to ""
set the open_tag to ""
repeat
read this_file before "<" -- start of a tag
set this_tag to read this_file until ">" -- end of a tag
-- to make up for a bug in the "read before" command
if this_tag does not start with "<" then set this_tag to ("<" & this_tag) as string
-- EXAMINE THE TAG
if this_tag begins with the opening_tag then
--store the complete tag, not just the search string
set the open_tag to this_tag
-- check for single tag indicator
if the closing_tag is "" then
if the combined_results is "" then
set the combined_results to the combined_results & the open_tag
else
set the combined_results to the combined_results & return & the open_tag
end if
else
-- reset the text buffer
set the text_buffer to ""
-- extract the contents between the open and close tags
repeat
set the text_buffer to the text_buffer & ¬
(read this_file before "<") -- start of a tag
set the tag_buffer to read this_file until ">" -- end of a tag
-- to make up for a bug in the "read before" command
if the tag_buffer does not start with "<" then set the tag_buffer to ("<" & the tag_buffer)
-- check for the closing tag
if the tag_buffer is the closing_tag then
if contents_only is false then
set the text_buffer to the open_tag & the text_buffer & the tag_buffer
end if
if the combined_results is "" then
set the combined_results to the combined_results & the text_buffer
else
set the combined_results to the combined_results & return & the text_buffer
end if
exit repeat
else
set the text_buffer to the text_buffer & the tag_buffer
end if
end repeat
end if
end if
end repeat
close access this_file
on error error_msg number error_num
try
close access this_file
end try
if error_num is not -39 then return false
end try
return the combined_results
end read_parse