August 14, 2019 1

GTAC 2011: BidiChecker: Automated Bidi Testing of Web Applications

GTAC 2011: BidiChecker: Automated Bidi Testing of Web Applications

GTAC – Day 1
Yana Margolin and Jason Elbaum
October 26, 2011>>James Whittaker: Our next speaker is the
hardest one to introduce because I don’t understand any of this, right? So there is — if you’re trilingual, if you
speak three languages, they call you trilingual. If you speak two languages, they call you
bilingual. And if you only speak one language, they call
you an American. [ Laughter ]
>>James Whittaker: That’s where I stand. But apparently there are languages that you
speak backwards. This is Yana Margolin. She is a Googler and she speaks backwards
and she’s going to teach you how to speak backwards as well. [ Applause ]
>>Yana Margolin: Thank you, James. Thank you, everybody. As James said, my name is Yana Margolin. I’m from the Google Israel office. And today myself and my colleague Jason Elbaum
from the same office as me will tell you a bit more about bidi testing and specifically
bidi testing for the web. Let’s see that this is working. Great. Okay. So first of all, what is this bidi thing? Pretty much any language you can think of
is written in a left-to-right order. For instance, English, German, Spanish and
so on. Some languages, however, insist on being special
and being written the opposite way, in a right-to-left order, such as Arabic, for instance. But not only that, these languages still include
some elements that are written in a left-to-right order. For instance, you can see here in the first
example there’s a phone number. The phone number is always written left to
right or embedded left to right strings. In the second example here there’s a Hebrew
sentence and the word “cool” in English appears in the middle of it. And obviously is written in the opposite direction
than the rest of the text. Because of that we refer to right-to-left
languages as being bidirectional or bidi in short. So bidi languages include Arabic, which as
I mentioned before, Farsi, Urdu, Hebrew and some more languages, less common ones. In total we’re talking about 500 million speakers
and about, I would say, 140, 150 internet users — 150 million internet users. Quite a difference. It’s important to note that the internet penetration
in right-to-left speaking countries is about 20%, so there’s a lot of room to growth. It’s quite an important market share for us
going forwards. That’s why the bidi support in web applications
is so important. So this is a quick example to illustrate how
is it to come across bidirectional data on the web. What we have here is the Google search English
UI. Without user input, the UI contains only left-to-right
strings. It’s entirely in English. It doesn’t have any right-to-left or bidirectional
data. But here comes the user. It’s a right-to-left speaking user and as
many of right-to-left speaking users, it prefers to use the left to the right UI. Why? Because it’s the more difficult one, it has
the features or any other reason. Okay? So he opens the Google search English UI,
and types in New York in Hebrew. And you can see that already within the snippet
of the first search result, most of the snippet is right to left, it’s in Hebrew, but we already
have three appearances of New York in English. So voila, that’s bidirectional data for you,
even though you thought that probably the chance to get bidirectional data in this English
UI is minimal, you can see how easy it is to trigger the appearance of such data. Okay. So how does a web browser know to display
bidirectional data? That’s what the unicode bidi algorithm is
for. You can see the entire algorithm at the address
mentioned at the top of the slide. It’s pretty long and I won’t bore you with
the whole details of it. What is important for our tech talk today
is the following three key ingredients: First of all, this algorithm divides characters
into several types according to bidirectionality, right to left, left to right and several different
kinds of neutrals. The second thing that is important to know
about this algorithm is it determines the visual order of the characters, how they will
actually appear on your screen. And it tries to work properly without being
explicitly told where left-to-right or right-to-left text begin and end, but it still needs some
help to function properly as we will shortly see. Okay. So in addition to supporting the appearance
of bidirectional data within a regular left-to-right UI, if we want to support bidi properly in
an application we would usually want to create a right-to-left UI for it. To show you an example of how it’s done we
have here the regular left-to-right English UI of GoogleBooks. You can see that the overall directionality
of the elements in this UI is left to right. GoogleBooks logo and search results, thumbnails
appear on the left side of the screen while the ads appear on the right side of the screen,
for instance. One more thing I want to note here is you
see that the UI is once again completely in English. The user even typed in an English query and
searched for Italy, books about Italy. But the ads here are in Hebrew and they contain
bidirectional data. Once again you can see how easy it is to figure
the bidirectional data appearance within web applications. Okay. So we have this UI and we want to create a
right-to-left version of it, for instance, turn it into a Hebrew UI. What we get is pretty much a mirror image
of the UI we saw before. If we shift you can see it’s pretty much mirrored. The thumbnails are now on the right. The ads are now on the left. Looks pretty much like a mirror image. But it is not that simple, obviously. I want to illustrate this using these loverly
images of Mona Lisa, George W. Bush and Margaret Thatcher. We took these images and the mirror of them. Can anyone see anything wrong with them other
than the fact that they’re upside down right now? They look — yeah.>>>(inaudible).>>Yana Margolin: Good. You must be a QA person. Okay. Although they may look quite all right like
this, if you look at them in the right direction and not being upside down you can see that
they don’t look that good. This is what happens when you mirror a UI
and don’t pay any much thought into it and don’t provide bidi support properly. You might think that the UI was mirrored properly,
but the user that will try to use it will immediately see what’s wrong with it. Okay. So how do we do it properly? First of all, we need to declare the overall
directionality of our UI. For instance, if we want to create an Arabic
UI, in order for the elements in the UI to appear correctly, the first thing we want
to do is add the dir equals RTL attribute to the body tag of the HTML of the pages. This attribute affects default paragraph alignment. After you edit the paragraphs on the page
will be default right to line instead of left to line. It affects items at the beginning and end
of blocks, punctuation, images, inputs and so on. And it affects tables. The order of the columns within the tables
are now right to left instead of left to right. And of course, all the data within the table,
all the text, is now aligned to the right and appears with right-to-left directionality. One important thing to note here, this attribute
doesn’t take care of most layout issues. The layout issues are determined by CSS and
we will not go any deeper into it in this talk. Okay. So this is our beloved Google+. We decided to create a Hebrew UI for it. We took the English version of the UI, translated
some strings and got something like that. Well, a right-to-left speaker user who sees
that, it looks as incorrect to him as this does to you. For instance, and you can see the data within
the post, it’s not very readable. It appears garbled. In order to fix it and create a proper right-to-left
version of this UI, what we need to do is add the body dir equals RTL to the HTML of
the page and you can see it looks much better now. The text here is readable, appears with proper
directionality and is properly aligned. Okay. So this is the first step that you need to
do when you create a right-to-left UI. You’ve done it. The user starts using it, he’s very happy. He types in right-to-left data. Everything works fine, but then once again
the user is not a fool. He wants to enter any kind of data he wants. So this is the Google+ profile page, the about
tab. As long as he enters right-to-left data, it
appears properly, but then he decides to enter his work address in English, 17 Main Street. And look at how it appears. It’s not really readable. The 17 appears on the incorrect side of the
phrase. So is the period at the end of the sentence. It doesn’t look too good. Okay. So we want to fix this problem, but without
breaking the rest of the UI because the rest of the UI is fine. How do we do it? We surround this part with a span tag with
once again the dir attribute. And this time we say dir equals LTR to signify
that this part is left-to-right text and should appear as such. You can see that now the address appears properly. It is readable and the rest of the UI remains
unchanged, still appears properly. This span dir equals LTR or RTL attribute
is also useful in the following cases. For instance, in the second row in the table
you can see garbled leading punctuation. Okay is not okay. The opening quotes appear in incorrect location. If we surround that with span dir equals LTR
this will now appear properly and resume its original meaning. Same thing happens if we have a left-to-right
page and we want to insert some Arabic data, for instance, to say hello in Arabic, followed
by an exclamation mark. If we don’t do anything it will appear incorrectly,
the exclamation mark will appear on the incorrect side. If we surround this piece of data with the
span tag with dir equals RTL attribute, it will look properly and once again be readable. Okay. So one last thing I want to mention before
my colleague Jason will come and tell you how to find these type of bugs automatically,
is the text input boxes. What we have here is the Googlemaps Hebrew
UI with the search box. And usually we would like to enable directionality
autodetection for these search boxes, which means that as the user types in the data,
the directionality of the data is determined on the fly, the user doesn’t have to do anything,
and the data is displayed properly. In the first example, 17 Main Street, as the
user typed it in, it got recognized as being left-to-right string, and it appears with
proper directionality and aligned to the left. In the second example the user typed in right-to-left
query, New York in Hebrew, it got recognized as being right to left and got aligned properly
and it appears as it should. Having said that, for some input boxes, we
would like to enable fixed directionality. No need for this autodetection to take place. For instance, input boxes that contain URL’s. URL only contains English letters. It should always appear with left-to-right
directionality, so the directionality for that field is supposed to be fixed. Okay. So thank you so much for your attention. I would like Jason to come up now and tell
you a bit more about how you find these bugs easily and automatically. Thank you. [ Applause ]
>>Jason Elbaum: Thanks, Yana. Hi. I’ll apologize, I have a bit of a cold, so
I’ll be steering clear of Alberto. So I understand that test is dead, but let’s
try it anyway. Let’s see if we can be manual bidi testers
for a moment. We’re looking at right-to-left search interface
with left-to-right query string. And what we’re looking for is opposite directional
text, which is embedded somewhere. And typically we have punctuation or numbers
which surround it and which end up on the wrong side of the letters. Anybody? I think we have a few who spotted it. Now, it’s quite possible that A Space Odyssey
2001 may have been a better title. You will have to take that up with Kubrick
and Clarke. Meanwhile, what’s going on here essentially
is we have an embedded phrase, which is left-to-right directionality, and which is — basically
needs to be wrapped in a span with directionality LTR because otherwise essentially the browser
sees the 2001, the beginning of the phrase, and the beginning of the phrase from its perspective
here is right to left. And it gets it wrong. Apparently everywhere else on the page they
got it right and that’s pretty good. Let’s try another one. This one is more subtle. The one I’m thinking of is up there. The URL ends in a slash. Now, you might say this isn’t very significant. It might not be very significant in this case. In other cases depending on the actual data
that appears there, the garbling may be more severe, may interfere with the user’s actually
ability to use the page correctly. Here’s a map address, a search result for
maps. By now I imagine you probably have found this
one. So if you were trying to find your address
in New York City, you wouldn’t really be able to give this to a cab driver. He wouldn’t know how to find Fifth Avenue
350. He might, maybe if he spoke Hebrew. There’s another one. It’s harder to spot if you look a little closer. Just below that we have the subway station,
which is on 34th Street and it’s not Herald Square 34. So these are examples where you might not
be able to see a street sign to see where you’re going or give the address to somebody
in the real world to get to where you’re going. So bidi garbles can cause real problems for
real users. And you may notice that these are hard enough
to spot. Imagine if you’re trying to debug this in
the other case where we have a left-to-right user interface and the data is in a right-to-left
language. The data is in Hebrew or Arabic and you’re
trying to spot these instances when you don’t even speak the language. By this point you’re saying “Wow, I wish I
had an automated tool to find these problems for me.” Well, you’re out of luck. No, not quite. So we have a tool we call BidiChecker. It’s developed inside Google and it basically
runs — it’s a JavaScript library, it runs embedded in the page, scans for a number of
bidi errors, the most common of which is undeclared text, which is the opposite directionality
of its context. So here, for example, pops up a little UI
on your page. It lets you browse through the errors and
gives you the description of the error, its location on the page. It’s at street Herald Square, which is the
English text which is not declared and we have a hierarchical list of start HTML tags
to give you a sense of where exactly on the page to find it if you’re the developer. This is actually a somewhat primitive UI. We’re working on release number 2 of the UI
to let you organize and browse and categorize your errors more effectively, but it’s effective
for what it is and we’ll talk a little bit more about how it’s used. Okay. I’ll talk more about what it’s there for. So objectives, number one, to make it easier,
to facilitate the initial development of bidi support in a product. Which means as the developer can you run BidiChecker
on a page and get a list of potential errors, scan through them, identify what’s causing
them and go back, iterate your development and fix them. Objective 2, you might say, well, yes, we
can test for all these things. I can put in a unit test and check that this
field is declared left to right and that one is right to left, and that’s true, but that
only catches the things that you’re looking for. That will only find errors that you’re explicitly
searching for. BidiChecker scans the entire page and therefore
can find errors in features that have been added since the unit test and features which
you haven’t remembered to add tests for. So it will trap those for you as well. And finally there’s the whole regression test
use case where it’s running in your automated suite, somebody adds a new feature, forgets
to handle the bidi wrapping and your test fails, the test suite fails, the product alert
goes up and everybody goes and does something important about it. So here’s a bit of sample code in a JavaScript
unit test. We have a test function. We start with a little bit of code in your
application domain to navigate to wherever it is, navigate the page to the scenario you
want to test. And then we call BidiChecker. We’ve got a check page function. It takes a couple of parameters. One is a bullion indicating the expected directionality
of the page left to right, right to left. And then there’s the element you want to test. Usually we’re testing the whole page. Sometimes we might want to test a particular
frame or a particular element, depending on your test environment and the structure of
your application. And it returns a list of error objects. Typically, we will assert that that list is
empty. If an error comes up in bidi support the list
won’t be empty, the test will fail, the test framework will give you a nice little message
telling you what went wrong and then you can get to work. I will say a few words about error suppression. Why would we want to suppress errors? And this is actually a problem common to many
automated testing tools. First of all, not everything that the test
tool identifies is a real error, false positives. Examples in our case are product names which
might always appear in English, for example, even in a right-to-left UI. Or an acronym like HTML file format, things
like that which appear on the page which we know what they are, they’re fixed, they’re
not going to change, and they’re not going to become garbled. They’re not going to be replaced by the user
with some phone number or address. So — but it’s not worth investing the engineering
effort to fix the directionality decoration of it. It makes it much easier, much more sensible
to just disable that error from appearing in our list. Case number 2 is known bugs. We’ve already found this bug, we know it exists. We’ve filed the report. Somebody is working on it or somebody isn’t
working on it, but there’s no point in breaking our test suite for it anymore. And the third case we’re referring to is somebody
else’s problem. We have, let’s say, text that we’re bringing
in from elsewhere on the web. There’s nothing we can do about it if its
bidi support is wrong. Or we may have a module that we’re inserting
from other team, another project area. It’s not our responsibility to fix it, so
we can filter that from our results. And an example, we have a little navigation
list with a number of other products. One is Gmail. GMail is a name, a brand name. We don’t translate it, it just appears as
Gmail everywhere. But it’s not causing a garble. It’s not disrupting the flow of the page. So instead of getting this annoying error
from BidiChecker, we can add a little code to filter it out. We are creating a filter, calling our filter
factory that will have all messages, all errors which appear at the particular string Gmail. And we just add our filters to the call and
now our tests will run clean again until a new error crops up that we haven’t yet encountered. And we can filter errors on basically all
their fields, on the text, on the severity which I haven’t talked about, on the location
on the page, either by the ID of an element that contains it or a class name or XPath. We can map on a regular expression or a literal
string. We can combine filters with and and nor and
not. It’s both flexible and powerful and simple
error specification language. Finally, a few words about designing test
cases. In general, there are two main types of scenarios
that we want to test. One is when we have a left-to-right interface
and we’re going to handle right-to-left data. So we have a left-to-right Google+, and we
are inserting here all kinds of right-to-left content. And we want to make sure that that is correctly
handled. So we would need to set up a test case which
seeds the application with right-to-left data. And preferably seeds all of the fields, as
many fields as possible which might handle RTL data, we want to seed so that when we
run our test, we’re checking — our check is as comprehensive as possible. This sometimes is technically very difficult. Because some test suites, some application
test suites are not designed to be seated with data for tests. It’s a problem, but it’s a fact that you often
have a test suite where you run all your tests on the same data set and you’re not going
to go generating special data just for your bidi tests. In that case, the only type of scenario we
can realistically test is the reverse case of a right-to-left interface and your existing
standard left-to-right data. And then, again, every field preferably filled
in, every field which handles left-to-right data in your right-to-left interface, that
handling will be checked and flagged. For our own convenience, it’s ideal if our
data also starts or ends in numbers and parentheses and punctuation and other sorts of things
which are visually garbled by the bidi algorithm. That’s — that doesn’t matter to the tool. BidiChecker will check any undeclared opposite-directionality
text. But you’ll look at that and say, I don’t see
the error there. It looks fine to me. The moment you have an exclamation mark or
parenthesis or number on the wrong side, you can also as a developer visually spot the
problem in a way you can’t if the text is visually unchanged. Why do I want to flag an error if it’s visually
unchanged? Because the data is going to change. The views are going to give me data which
has problems that my test suite might not be covering. So it’s important from our perspective as
the developers, or the testers, for that matter, that we’re seeding it with data which will
visually be garbled by the bidi algorithm. The tool doesn’t care, but we do as humans. And the resources, BidiChecker is an open
source project on — hosted on Google code. And you’re welcome to check it out, download
it, give it a whirl, and contribute back, should you so desire. We also have a reasonably comprehensive document
about how to implement bidi support in a UI. It covers everything Yana covered plus a number
of topics that we didn’t get to here, call it a bidi how to. And that’s also available. And, finally, along with the — the regress
tests, the automated testing scenario that I demonstrated, we have a bookmarklet that
you can install on your browser, you drag a link to the bookmarks Toolbar. And then you just click on that on any page
in the browser, it will install the JavaScript, run it on the page, give you your little UI,
and let you figure out what’s wrong with your pages and your competitors’ pages, and everything
like that. Now, I have to remind you, of course, that
that’s only really useful when you have bidi data on the page. We can run this on Google Search in English
with an English query, and it’s going to do great, really. No problems there whatsoever. That’s what we have for you. Happy to take questions, comments, tomatoes. Sure. Oh, okay, yeah.>>>This isn’t really a testing question. But you sparked my curiosity. You mentioned a couple times, she mentioned
a couple times auto detection of bidi languages. What processes and methods do you use to do
that kind of detection?>>Jason Elbaum: Okay. The simple answer to that is that there is
code, JavaScript code, out there, Google Code Share Library, for example, has functions
which do this. And you can just hook that into the — the
— what’s it called? — the on- — whichever trigger it is you need to hook it onto in
the input box. And then every time the user adds text to
the search box, it will auto detect the directionality. To be a little bit more specific, there are
different types of detection, different algorithms, heuristics that are used to determine if something
is left-to-right or right-to-left. Because it could be mixed, for example, or
it could have numbers, and then you have to decide, do I want to call this right-to-left
or left-to-right. That’s a whole different question of how you
do it methodologically. Technically, there’s JavaScript which you
can hook into your page which will do that. And you can do that for every input box which
might be bidirectional.>>>So when you were showing the Google+ UI,
I just saw that the picture of the icon of home is not correct. If you applied bidi —
>>Jason Elbaum: Which one?>>>Now, like, Avatars, that one’s left and
all the text is on the right. So the question is do you have any algorithms
of very fine identifying — yeah, here it is.>>Jason Elbaum: This picture. That’s interesting, okay.>>>– bidi text or something on pictures? Because I can think of banners that have — that
were not translated, actually, to bidi, and you can’t just mirror them, you know.>>Jason Elbaum: Yes. So your question, in general, is what do you
do about images, some of which are directional —
>>>Yes.>>Jason Elbaum: — and most of which usually
are not directional.>>>Mm-hmm.>>Jason Elbaum: And as far as I know, the
only way you can really identify that is manually. Just because a picture has a directionality
doesn’t mean it has to be reversed for bidi. Like, those circles have a particular directionality,
but nobody would care if they were reversed, for example. But if you have arrows, then the direction
can matter a lot. You flip the order of them, and they look
wrong. So the only real way to trap that is manually. I’m not aware of an automated way to identify
it. Now, we do have automated tools to flip images. And I think, if I’m not mistaken, that HTML,
maybe HTML5, I don’t remember exactly, maybe has an automatic way to flip the directionality
of an image. I may be wrong on that.>>>The problem is you can’t just flip any
image.>>Jason Elbaum: You can’t flip all the images.>>>You can’t flip images at all. So for the home icon, you can do that, actually. But mostly, you can’t.>>Jason Elbaum: In this case, it might not
matter. The other answer is, yeah, that looks a little
backwards, but nobody’s going to notice it’s an icon. And that’s — it’s really a judgment call,
and it depends on the type of image, and it depends on the context and —
>>>So you’re basically — you’re basically only automate text, not, like, images.>>Jason Elbaum: At the moment, this test tool,
BidiChecker, is only testing the correct handling of the flow of the text. There are other kinds of ideas we have for
further types of automated tests, but that’s currently what we’re covering, yeah.>>>Okay. Thank you.>>Jason Elbaum: Sure.>>James Whittaker: Question over here.>>>(Off mike.) You showed us a (Off mike.)>>Jason Elbaum: Right.>>>Why don’t you use that to auto correct.>>Jason Elbaum: Yeah, auto correct, why don’t
we use the same tool to auto correct the problems. The short answer is that it’s not really possible;
that the — if I go back to one of the examples, let’s say here; right? So we know that we have “a space Odyssey,”
which itself should be declared as left-to-right. But where does the 2001 belong? That part of this introductory text in Hebrew? Or is it part of the movie title in English? Where does that break occur? And an automated tool has no way of knowing
what the correct answer is. In fact, the only way to really get this right
is for the application which embeds the data in the string, we have a string here that
says, here are the video results for this query. And the application knows that this is the
query string and this is the surrounding template text. But the automated tool, which doesn’t see
the template system and it doesn’t see the semantics of the different components, it
doesn’t know how to do that. It doesn’t understand how to break this down
between template and embedded and et cetera. So in the general case, there’s no real way
to fix it automatically. Yeah.>>James Whittaker: Are there any more questions?>>Jason Elbaum: Okay.>>James Whittaker: Great.>>Jason Elbaum: Thank you very much.>>James Whittaker: Thanks, Jason, thanks,
Yana. [ Applause ]

One Reply to “GTAC 2011: BidiChecker: Automated Bidi Testing of Web Applications”

Leave a Reply

Your email address will not be published. Required fields are marked *

© Copyright 2019. Amrab Angladeshi. Designed by