editorial: mark languageCode at risk #764

marcoscaceres · 2018-08-24T05:03:08Z

The following tasks have been completed:

Confirmed there are no ReSpec errors/warnings.
Added Web platform tests (link)
added MDN Docs (link)

Implementation commitment:

Safari (link to issue)
Chrome
Firefox
Edge (public signal)

Impact on Payment Handler spec?

Preview | Diff

marcoscaceres · 2018-08-24T05:03:38Z

Based on teleconference discussion and #608.

marcoscaceres · 2018-08-24T05:10:18Z

@rsolomakhin, big ask... but maybe we can write a small MDN description of how to achieve the same behavior using JS? That will at least give us a good story for anyone that might actually need this.

marcoscaceres · 2018-08-24T05:11:09Z

Filed bug on Gecko to remove the attribute: https://bugzilla.mozilla.org/show_bug.cgi?id=1485881

ianbjacobs

Hi @marcoscaceres,

The text looks fine but I have a suggestion: add a note that says that even if the feature is removed, the party that receives the response data can use JS or other libraries to determine the language of the text in question.

I am approving this nonetheless, and I will live if you don't include such a note.

rsolomakhin · 2018-08-24T16:09:00Z

Filed an issue in Chromium: https://crbug.com/877521. Looking into doing this work in JavaScript for an MDN article may take a while for me personally.

stpeter · 2018-08-24T16:22:53Z

I still need to write up conclusions from the i18n WG meeting yesterday - might not get to that until the weekend.

marcoscaceres · 2018-08-27T01:59:53Z

@rsolomakhin,

Looking into doing this work in JavaScript for an MDN article may take a while for me personally.

No problem. It might not come up and when can do that if it does. Happy to help also when needed.

marcoscaceres · 2018-08-27T02:00:35Z

Merging this, as it's just editorial. I'll do a new PR for removal and move the various browser bugs over to that.

stpeter · 2018-09-05T20:58:54Z

For the sake of traceability, here are the conclusions of my i18n review (it's possible @aphillips might have more or better suggestions)...

First, the i18n WG guidelines for spec authors [0] don't say about how to handle web forms. There are a few suggestions about character encoding [1] and text direction [2] in the guidelines for content authors and developers, but those are rather minimal, too. Because the Payment Request API is a strange beast (in essence it moves ecommerce checkout forms out of web content and into a browser dialog), it's likely an outlier for advice to spec developers. I've raised the issue of web forms guidance for discussion in the i18n WG.

Second, there are a few topics we might want to broach in the Payment Request API spec, such as:

(a) Recommend that the browser set a language tag for user input in the payment dialog. For instance, it could inherit the language tag from the html lang attribute [3] on the merchant site.

(b) Recommend that the browser be able to handle a locale value that is distinct from the language tag. As noted in [4] and relevant for our use cases, "the region code is also sometimes used to indicate the physical location, market, legal, or other governing policies for the user."

(c) Require the browser to treat all input from the payment dialog as UTF-8, consistent with [1].

(d) Mention that the user can set a base direction for textual input, as described at [2].

Third, there are probably easier ways to determine the script of user-inputted text than the algorithm Rouslan provided [5] (which I take it described what libaddressinput [6] uses). For instance, the browser could simply inspect the characters themselves to see if there are in Latin script, Japanese script, etc. (I'll grant you that mixed-script input could be a challenge, though.)

Fourth, a scenario Addison mentioned on an i18n WG call is the need for the same address in multiple forms (e.g., an English-language version for delivery from the U.S. to a import handling location in China and a Chinese-language version for final delivery to the customer). We have not designed for this yet, but might want to open a tracking bug for multiple representations of the same address.

Fifth, a related scenario might be billing address in one script and shipping address in another script. This is simpler than multiple representations of the same address, but still requires support for two different scripts in the same set of input forms.

We might uncover additional issues in the future, but these are the ones we've discussed so far.

[0] https://w3c.github.io/bp-i18n-specdev/#loc_forms
[1] https://www.w3.org/International/questions/qa-forms-utf-8
[2] https://www.w3.org/International/questions/qa-html-dir#userexplicit
[3] https://www.w3.org/International/articles/language-tags/
[4] https://www.w3.org/TR/ltli/
[5] #608 (comment)
[6] https://github.com/googlei18n/libaddressinput/blob/3cefac503f6321f7f84a790939dc7cb022bce169/cpp/src/language.cc#L58

marcoscaceres · 2018-09-05T22:18:22Z

Thanks so much for this input, Peter and I18n folks. Just noting for (a), you’ll be happy to hear that’s literally what we recommend in the spec:

It is RECOMMENDED that the language of the user interface match the language of the body element.

I’ll write a full response for the other points, but the tl;dr is that at a glance we get most what’s mentioned for free from the IDL layer (DOMStrings are already UTF16, irrespective of payment dialog input fields). And we can defer to the merchant for script detection when they need it (hence removal of this attribute).

If there is to be a script/lang detection mechanism, we should add that as native functionality to ECMAScript via the Intl API, rather than as a one off for this API. Then it would be globally useful, not just for addresses, but for any kind of input.

aphillips · 2018-09-05T22:43:03Z

@stpeter, @marcoscaceres Thanks for the summary. I don't necessarily see that removing languageTag is a good thing: we recommend [0] otherwise and for good reason.

One thing about the language tag is that it should not be used to indicate region/country or jurisdiction. That should be a separate bit of data, such as an ISO-3166 code or such. The region subtag in a language tag can indicate defaults for market, legal, or other locale-affected API usages. But it is a separate thing and it is a best practice not to use it as a proxy. That is, the language of an address has nothing to do with where the address is in the world. LTLI says something about this, but the quote @stpeter cites needs more context and explanation.

When it comes to text analysis, there are a number of APIs for determining the script of content. The key thing to recall here is there is what we call the "common" script, consisting of characters shared between many different writing systems. Punctuation, for example. Understanding this reduces (but does not eliminate) cases where there are truly mixed script usages. Script is defined by Unicode and there are APIs that could be exposed in e.g. intl, although I caution that script isn't necessarily always useful in the way that this spec's usage seems to suggest.

There is a need for more general I18N documentation for things such as field handling and definition, defining locale-neutral data structures, cultural awareness, etc. The LTLI document that you mention is actually one of the items that the I18N WG prioritized just below our current work and which I hope that we can get back to once Charmod/String-Meta are out of our systems. In the meantime, happy to help.

[0] https://w3c.github.io/string-meta/#

marcoscaceres · 2018-09-05T23:01:27Z

The challenge here is having a clear algorithm to identify the language of content - in this case, an address. Does [0] provide the algorithm? Apologies if it does and I missed it. Without such an algorithm, none of us can implement languageCode Interoperability (current situation, why it’s now marked at risk).

At the risk of getting circular, if there is such an algorithm, whereby a string is give and out comes a language tag(s), then IMO, it should be part of Intl, because it would be univesally useful to the web platform.

stpeter · 2018-09-05T23:15:40Z

I see the point that @aphillips makes: in the Payment Request API, the end user is a producer (as defined in the string-meta spec [0]) of a string or set of strings, and this is our chance to attach metadata about the language and base direction of the string. If we don't do that when the string is created, some other consumer of the string will need to figure it out later on, and they won't have as much context as we do at string creation time...

stpeter · 2018-09-05T23:39:29Z

Thinking about this further, I have a question for @aphillips - the answer to which might clear up some of the confusion around the current languageCode attribute in the Payment Request API. Right now, languageCode defines "the language in which the address is provided", which is "used to determine the field separators and the order of fields when formatting the address for display". But the PaymentAddress is a composite of multiple strings (country, region, city, organization, etc.). Should each of those strings be flagged with a langtag and base direction? What if the organization name is in kanji or hiragana and the other address fields are in romaji/Latin? It seems that we might be talking about two different things here: (1) the langtag/direction of each string (or potentially a set of strings), and (2) the layout of the set of strings representing an address in the input form of the payment dialog that the browser displays (remember that the Payment Request API essentially defines a set of forms, because it moves the ecommerce checkout flow from web content to an in-browser dialog). The layout property might actually be locale-based, not language-based or script-based.

marcoscaceres · 2018-09-06T00:23:37Z

Oh, can we please move this discussion to #608 ? The issue we are currently in was for the pull request to add “at risk” to the spec, and it’s been merged and closed.

marcoscaceres requested a review from adrianhopebailie August 24, 2018 05:03

marcoscaceres requested a review from ianbjacobs August 24, 2018 05:03

Mark languageCode at risk

7471998

marcoscaceres force-pushed the language_code_at_risk branch from afc08b6 to 7471998 Compare August 24, 2018 05:05

ianbjacobs approved these changes Aug 24, 2018

View reviewed changes

marcoscaceres changed the title ~~Mark languageCode at risk~~ editorial: mark languageCode at risk Aug 27, 2018

marcoscaceres merged commit 9f1244a into gh-pages Aug 27, 2018

marcoscaceres deleted the language_code_at_risk branch August 27, 2018 02:00

marcoscaceres mentioned this pull request Aug 27, 2018

Remove languageCode tests web-platform-tests/wpt#12684

Merged

w3c locked as off-topic and limited conversation to collaborators Sep 6, 2018

editorial: mark languageCode at risk #764

editorial: mark languageCode at risk #764

Uh oh!

Conversation

marcoscaceres commented Aug 24, 2018 • edited by pr-preview bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marcoscaceres commented Aug 24, 2018

Uh oh!

marcoscaceres commented Aug 24, 2018

Uh oh!

marcoscaceres commented Aug 24, 2018

Uh oh!

ianbjacobs left a comment

Choose a reason for hiding this comment

Uh oh!

rsolomakhin commented Aug 24, 2018

Uh oh!

stpeter commented Aug 24, 2018

Uh oh!

marcoscaceres commented Aug 27, 2018

Uh oh!

marcoscaceres commented Aug 27, 2018

Uh oh!

stpeter commented Sep 5, 2018

Uh oh!

marcoscaceres commented Sep 5, 2018

Uh oh!

aphillips commented Sep 5, 2018

Uh oh!

marcoscaceres commented Sep 5, 2018

Uh oh!

stpeter commented Sep 5, 2018

Uh oh!

stpeter commented Sep 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marcoscaceres commented Sep 6, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

marcoscaceres commented Aug 24, 2018 •

edited by pr-preview bot

Loading

stpeter commented Sep 5, 2018 •

edited

Loading