Text Encoding | Decoding
A standard practice when creating URL's is to encode special characters (high-level ASCII) and spaces to their hexidecimal equivalents. For example, spaces in URL's are converted to: %20
The following sub-routines can be used to encode text in a variety of settings.
Character Encoding Sub-Routine
This sub-routine will encode a passed character. It is called by the other sub-routine examples and must be included in your script.
encode_char("$")
--> returns: "%24"
on encode_char(this_char)
set the ASCII_num to (the ASCII number this_char)
set the hex_list to {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A", "B", "C", "D", "E", "F"}
set x to item ((ASCII_num div 16) + 1) of the hex_list
set y to item ((ASCII_num mod 16) + 1) of the hex_list
return ("%" & x & y) as string
end encode_char
Text Encoding Sub-Routine
This sub-routine is used in conjunction with the encoding characters sub-routine to encode spaces and high-level ASCII characters (those above 127) in passed text strings. There are two parameters which control which characters to exempt from encoding.
The first parameter: encode_URL_A is a true or false value which indicates to the sub-routine whether to also encode most of the special characters reserved for use by URLs.
In the following example the encode_URL_A value is false thereby exempting the asterisk (*) character, which has a special meaning in URL's, from the encoding process. Only spaces and high-level ASCII characters, like the copyright symbol are encoded.
encode_text("*smith-wilson© report_23.txt", false, false)
--> "*smith-wilson%A9%20report_23.txt"
In the following example the encode_URL_A parameter is true and the asterisk character is included in the encoding process:
encode_text("*smith-wilson© report_23.txt", true, true)
--> "%2Asmith%2Dwilson%A9%20report%5F23%2Etxt"
In the following example the encode_URL_B is false, thereby exempting periods (.), colons(:), underscores (_), and hypens (-) from encoding:
encode_text("annual smith-wilson_report.txt", true, false)
--> "%2Aannual%20smith-wilson_report.txt
-- this sub-routine is used to encode text
on encode_text(this_text, encode_URL_A, encode_URL_B)
set the standard_characters to "abcdefghijklmnopqrstuvwxyz0123456789"
set the URL_A_chars to "$+!'/?;&@=#%><{}[]\"~`^\\|*"
set the URL_B_chars to ".-_:"
set the acceptable_characters to the standard_characters
if encode_URL_A is false then set the acceptable_characters to the acceptable_characters & the URL_A_chars
if encode_URL_B is false then set the acceptable_characters to the acceptable_characters & the URL_B_chars
set the encoded_text to ""
repeat with this_char in this_text
if this_char is in the acceptable_characters then
set the encoded_text to (the encoded_text & this_char)
else
set the encoded_text to (the encoded_text & encode_char(this_char)) as string
end if
end repeat
return the encoded_text
end encode_text
Text Decoding Routines
The following sub-routines can be used to decode previously encoded text:
A sub-routine for decoding a three-character hex string:
on decode_chars(these_chars)
copy these_chars to {indentifying_char, multiplier_char, remainder_char}
set the hex_list to "123456789ABCDEF"
if the multiplier_char is in "ABCDEF" then
set the multiplier_amt to the offset of the multiplier_char in the hex_list
else
set the multiplier_amt to the multiplier_char as integer
end if
if the remainder_char is in "ABCDEF" then
set the remainder_amt to the offset of the remainder_char in the hex_list
else
set the remainder_amt to the remainder_char as integer
end if
set the ASCII_num to (multiplier_amt * 16) + remainder_amt
return (ASCII character ASCII_num)
end decode_chars
-- this sub-routine is used to decode text strings
on decode_text(this_text)
set flag_A to false
set flag_B to false
set temp_char to ""
set the character_list to {}
repeat with this_char in this_text
set this_char to the contents of this_char
if this_char is "%" then
set flag_A to true
else if flag_A is true then
set the temp_char to this_char
set flag_A to false
set flag_B to true
else if flag_B is true then
set the end of the character_list to my decode_chars(("%" & temp_char & this_char) as string)
set the temp_char to ""
set flag_A to false
set flag_B to false
else
set the end of the character_list to this_char
end if
end repeat
return the character_list as string
end decode_text