Eli Zaretskii
2010-09-01 08:26:11 UTC
I'm planning on implementing the following API for reordering a
logical-order string into visual order. Comments are welcome,
especially regarding the way the extra information is returned.
(defun bidi-reorder-string string embedding &optional extra)
"Reorder the input STRING from logical to visual order.
This function reorders STRING according to the Unicode Bidirectional
Algorithm described in the Unicode Standard Annex #9 (UAX#9), see
http://unicode.org/reports/tr9/.
Second argument EMBEDDING provides the base paragraph embedding level
for the reordering. It is `left-to-right' for left-to-right
paragraphs, `right-to-left' for right-to-left paragraphs, or nil for
neutral paragraphs. In the latter case, the actual base paragraph
level is determined from the string itself, using the UAX#9 rules.
Value is the reordered string. For STRING without any characters
from right-to-left scripts, such as Arabic or Hebrew, this function
returns a copy of the original STRING, possibly with a text property
\(see below).
Optional argument EXTRA non-nil means return additional information
about the results of reordering. This information is recorded in the
value of the `bidi-info' text property of the returned string. The
value is a vector of the form:
[EMBEDDING LOG-TO-VIS VIS-TO-LOG LEVELS]
EMBEDDING is the actual base paragraph embedding level. It is
different from the input arg EMBEDDING only if the latter is nil.
LOG-TO-VIS is a vector of character indices mapping the original
logical-order STRING to the reordered visual-order string. The zeroth
element of LOG-TO-VIS gives the index of the first character of STRING
in the reordered string output by the function, the next element gives
the index of the second character, etc. For example, if the
logical-order string ABCDE is reordered into EDCBA, then the zeroth
element of LOG-TO-VIS will be 4, the 1st element will be 3, etc.
VIS-TO-LOG is a vector of character indices for the reverse mapping of
the characters in the visual-order string back to its logical-order
original.
LEVELS is a vector of resolved levels, one each for every character in
the reordered string. Resolved levels are numbers between zero and
63, with even levels indicating the left-to-right directionality and
odd levels indicating the right-to-left directionality. Note that the
levels are specified in the visual order, i.e. they correspond to
characters in the reordered string returned by the function."
logical-order string into visual order. Comments are welcome,
especially regarding the way the extra information is returned.
(defun bidi-reorder-string string embedding &optional extra)
"Reorder the input STRING from logical to visual order.
This function reorders STRING according to the Unicode Bidirectional
Algorithm described in the Unicode Standard Annex #9 (UAX#9), see
http://unicode.org/reports/tr9/.
Second argument EMBEDDING provides the base paragraph embedding level
for the reordering. It is `left-to-right' for left-to-right
paragraphs, `right-to-left' for right-to-left paragraphs, or nil for
neutral paragraphs. In the latter case, the actual base paragraph
level is determined from the string itself, using the UAX#9 rules.
Value is the reordered string. For STRING without any characters
from right-to-left scripts, such as Arabic or Hebrew, this function
returns a copy of the original STRING, possibly with a text property
\(see below).
Optional argument EXTRA non-nil means return additional information
about the results of reordering. This information is recorded in the
value of the `bidi-info' text property of the returned string. The
value is a vector of the form:
[EMBEDDING LOG-TO-VIS VIS-TO-LOG LEVELS]
EMBEDDING is the actual base paragraph embedding level. It is
different from the input arg EMBEDDING only if the latter is nil.
LOG-TO-VIS is a vector of character indices mapping the original
logical-order STRING to the reordered visual-order string. The zeroth
element of LOG-TO-VIS gives the index of the first character of STRING
in the reordered string output by the function, the next element gives
the index of the second character, etc. For example, if the
logical-order string ABCDE is reordered into EDCBA, then the zeroth
element of LOG-TO-VIS will be 4, the 1st element will be 3, etc.
VIS-TO-LOG is a vector of character indices for the reverse mapping of
the characters in the visual-order string back to its logical-order
original.
LEVELS is a vector of resolved levels, one each for every character in
the reordered string. Resolved levels are numbers between zero and
63, with even levels indicating the left-to-right directionality and
odd levels indicating the right-to-left directionality. Note that the
levels are specified in the visual order, i.e. they correspond to
characters in the reordered string returned by the function."