Discussion:
[emacs-bidi] API for bidi reordering a string
Eli Zaretskii
2010-09-01 08:26:11 UTC
Permalink
I'm planning on implementing the following API for reordering a
logical-order string into visual order. Comments are welcome,
especially regarding the way the extra information is returned.

(defun bidi-reorder-string string embedding &optional extra)
"Reorder the input STRING from logical to visual order.

This function reorders STRING according to the Unicode Bidirectional
Algorithm described in the Unicode Standard Annex #9 (UAX#9), see
http://unicode.org/reports/tr9/.

Second argument EMBEDDING provides the base paragraph embedding level
for the reordering. It is `left-to-right' for left-to-right
paragraphs, `right-to-left' for right-to-left paragraphs, or nil for
neutral paragraphs. In the latter case, the actual base paragraph
level is determined from the string itself, using the UAX#9 rules.

Value is the reordered string. For STRING without any characters
from right-to-left scripts, such as Arabic or Hebrew, this function
returns a copy of the original STRING, possibly with a text property
\(see below).

Optional argument EXTRA non-nil means return additional information
about the results of reordering. This information is recorded in the
value of the `bidi-info' text property of the returned string. The
value is a vector of the form:

[EMBEDDING LOG-TO-VIS VIS-TO-LOG LEVELS]

EMBEDDING is the actual base paragraph embedding level. It is
different from the input arg EMBEDDING only if the latter is nil.

LOG-TO-VIS is a vector of character indices mapping the original
logical-order STRING to the reordered visual-order string. The zeroth
element of LOG-TO-VIS gives the index of the first character of STRING
in the reordered string output by the function, the next element gives
the index of the second character, etc. For example, if the
logical-order string ABCDE is reordered into EDCBA, then the zeroth
element of LOG-TO-VIS will be 4, the 1st element will be 3, etc.

VIS-TO-LOG is a vector of character indices for the reverse mapping of
the characters in the visual-order string back to its logical-order
original.

LEVELS is a vector of resolved levels, one each for every character in
the reordered string. Resolved levels are numbers between zero and
63, with even levels indicating the left-to-right directionality and
odd levels indicating the right-to-left directionality. Note that the
levels are specified in the visual order, i.e. they correspond to
characters in the reordered string returned by the function."
Martin J. Dürst
2010-09-01 09:06:14 UTC
Permalink
Hello Eli,

Overall, the API makes sense. I'd personally put the levels (when extra
is non-nil) in logical order, but that's a detail.

This function could be very useful for testing (assuming it uses the
same logic as the built-in display reordering). On the other hand, I'm
not sure what other uses you have in mind. I'm quite afraid that some
people may take the existence of such a function as a license to convert
a lot of text from logical to visual encoding. As I hope we all agree,
that would be a bad idea, because it's a one-way street, and visual is
much less flexible (e.g. reflowing lines,...).

Regards, Martin.
Post by Eli Zaretskii
I'm planning on implementing the following API for reordering a
logical-order string into visual order. Comments are welcome,
especially regarding the way the extra information is returned.
(defun bidi-reorder-string string embedding&optional extra)
"Reorder the input STRING from logical to visual order.
This function reorders STRING according to the Unicode Bidirectional
Algorithm described in the Unicode Standard Annex #9 (UAX#9), see
http://unicode.org/reports/tr9/.
Second argument EMBEDDING provides the base paragraph embedding level
for the reordering. It is `left-to-right' for left-to-right
paragraphs, `right-to-left' for right-to-left paragraphs, or nil for
neutral paragraphs. In the latter case, the actual base paragraph
level is determined from the string itself, using the UAX#9 rules.
Value is the reordered string. For STRING without any characters
from right-to-left scripts, such as Arabic or Hebrew, this function
returns a copy of the original STRING, possibly with a text property
\(see below).
Optional argument EXTRA non-nil means return additional information
about the results of reordering. This information is recorded in the
value of the `bidi-info' text property of the returned string. The
[EMBEDDING LOG-TO-VIS VIS-TO-LOG LEVELS]
EMBEDDING is the actual base paragraph embedding level. It is
different from the input arg EMBEDDING only if the latter is nil.
LOG-TO-VIS is a vector of character indices mapping the original
logical-order STRING to the reordered visual-order string. The zeroth
element of LOG-TO-VIS gives the index of the first character of STRING
in the reordered string output by the function, the next element gives
the index of the second character, etc. For example, if the
logical-order string ABCDE is reordered into EDCBA, then the zeroth
element of LOG-TO-VIS will be 4, the 1st element will be 3, etc.
VIS-TO-LOG is a vector of character indices for the reverse mapping of
the characters in the visual-order string back to its logical-order
original.
LEVELS is a vector of resolved levels, one each for every character in
the reordered string. Resolved levels are numbers between zero and
63, with even levels indicating the left-to-right directionality and
odd levels indicating the right-to-left directionality. Note that the
levels are specified in the visual order, i.e. they correspond to
characters in the reordered string returned by the function."
_______________________________________________
emacs-bidi mailing list
http://lists.gnu.org/mailman/listinfo/emacs-bidi
--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:***@it.aoyama.ac.jp
Eli Zaretskii
2010-09-01 09:24:20 UTC
Permalink
Date: Wed, 01 Sep 2010 18:06:14 +0900
Overall, the API makes sense. I'd personally put the levels (when extra
is non-nil) in logical order, but that's a detail.
I don't mind to produce it in logical order, but is there a reason to
do so?
This function could be very useful for testing (assuming it uses the
same logic as the built-in display reordering). On the other hand, I'm
not sure what other uses you have in mind.
As a matter of fact, I don't see too much uses for it, except for
testing and "educational" purposes. The only other use-case I can
think of is encoding text into a visual-order encoding. But I need to
implement reordering of display strings, so I thought it would be nice
to have a Lisp binding for that.

Thanks for the comments.

Loading...