KEMBAR78
GitHub - webmachinelearning/translation-api: ๐ŸŒ A proposal for translator and language detector APIs
Skip to content

webmachinelearning/translation-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

41 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Explainer for the Translator and Language Detector APIs

This proposal is an early design sketch by the Chrome built-in AI team to describe the problem below and solicit feedback on the proposed solution. It has not been approved to ship in Chrome.

Browsers are increasingly offering language translation to their users. Such translation capabilities can also be useful to web developers. This is especially the case when browser's built-in translation abilities cannot help, such as:

  • translating user input or other interactive features;
  • pages with complicated DOMs which trip up browser translation;
  • providing in-page UI to start the translation; or
  • translating content that is not in the DOM, e.g. spoken content.

To perform translation in such cases, web sites currently have to either call out to cloud APIs, or bring their own translation models and run them using technologies like WebAssembly and WebGPU. This proposal introduces a new JavaScript API for exposing a browser's existing language translation abilities to web pages, so that if present, they can serve as a simpler and less resource-intensive alternative.

An important supplement to translation is language detection. This can be combined with translation, e.g. taking user input in an unknown language and translating it to a specific target language. In a similar way, browsers today often already have language detection capabilities, and we want to offer them to web developers through a JavaScript API.

Goals

Our goals are to:

  • Help web developers perform real-time translations (e.g. of user input).
  • Help web developers perform real-time language detection.
  • Guide web developers to gracefully handle failure cases, e.g. translation not being available or possible.
  • Harmonize well with existing browser and OS translation technology (Brave, Chrome, Edge, Firefox, Safari), e.g. by allowing on-the-fly downloading of different languages instead of assuming all are present from the start.
  • Allow a variety of implementation strategies, including on-device vs. cloud-based translation, while keeping these details abstracted from developers.
  • Allow implementations to expose different capabilities for translation vs. language detection. For example, an implementation might be able to detect 30+ languages, but only be able to translate between 6.

The following are explicit non-goals:

  • We do not intend to force every browser to ship language packs for every language combination, or even to support translation at all. It would be conforming to implement this API by always saying translation and language detection are unavailable, or to implement this API entirely by using cloud services instead of on-device translation.
  • We do not intend to provide guarantees of translation and language detection quality, stability, or interoperability between browsers. These are left as quality-of-implementation issues, similar to the shape detection API. (See also a discussion of interop in the W3C "AI & the Web" document.)

The following are potential goals we are not yet certain of:

  • Allow web developers to know whether translations are done on-device or using cloud services. This would allow them to guarantee that any user data they feed into this API does not leave the device, which can be important for privacy purposes. (Similarly, we might want to allow developers to request on-device-only translation, in case a browser offers both varieties.)
  • Allow web developers to know some identifier for the translation and language detection models in use, separate from the browser version. This would allow them to allowlist or blocklist specific models to maintain a desired level of quality.

Both of these potential goals are potentially detrimental to interoperability, so we want to investigate more how important such functionality is to developers to find the right tradeoff.

Examples

Note that in this API, languages are represented as BCP 47 language tags, as already used by the existing JavaScript Intl API or the HTML lang="" attribute. Examples: "ja", "en", "de-AT", "zh-Hans-CN".

See below for more on the details of how language tags are handled in this API, and the appendix for some helper code that converts between language tags and human-readable strings.

Translation

Here is the basic usage of the translator API, with no error handling:

const translator = await Translator.create({
  sourceLanguage: "en",
  targetLanguage: "ja"
});

const text = await translator.translate("Hello, world!");
const readableStreamOfText = translator.translateStreaming(`
  Four score and seven years ago our fathers brought forth, upon this...
`);

Note that the create() method call here might cause the download of a translation model or language pack. Later examples show how to get more insight into this process.

Language detection

A similar simplified example of the language detector API:

const detector = await LanguageDetector.create();

const results = await detector.detect(someUserText);
for (const result of results) {
  console.log(result.detectedLanguage, result.confidence);
}

Here results will be an array of { detectedLanguage, confidence } objects, with the detectedLanguage field being a BCP 47 language tag and confidence beeing a number between 0 and 1. The array will be sorted by descending confidence. The final entry in the array will always be "und", representing the probability that the text is not written in any language the model knows.

The array will always contain at least 1 entry, although it could be for the undetermined ("und") language.

Very low-confidence results are excluded. See the specification for more details, as well as the discussions in issue #39 and issue #51.

Because of how very low-confidence results are excluded, the sum of all confidence values could be less than 1.

Language detection with expected input languages

If there are certain languages you need to be able to detect for your use case, you can include them in the expectedInputLanguages option when creating a language detector:

const detector = await LanguageDetector.create({ expectedInputLanguages: ["en", "ja"] });

This will allow the implementation to download additional resources like language detection models if necessary, and will ensure that the promise is rejected with a "NotSupportedError" DOMException if the browser is unable to detect the given input languages.

Checking before creation, and a more realistic combined example

Both APIs provide the ability to know, before calling create(), what is possible with the implementation. This is done via availability() methods, which takes the same options as create(). They return a promise, which fulfills with one of the following values:

  • "unavailable" means that the implementation does not support translation or language detection of the given language(s).
  • "downloadable" means that the implementation supports translation or language detection of the given language(s), but it will have to download something (e.g., a machine learning model) as part of creating the associated object.
  • "downloading" means that the implementation supports translation or language detection of the given language(s), but it will have to finish an ongoing download as part of creating the associated object.
  • "available" means that the implementation supports translation or language detection of the given language(s), without performing any downloads.

Here is an example that adds capability checking to log more information and fall back to cloud services, as part of a language detection plus translation task:

async function translateUnknownCustomerInput(textToTranslate, targetLanguage) {
  const detectorAvailability = await LanguageDetector.availability();

  // If there is no language detector, then assume the source language is the
  // same as the document language.
  let sourceLanguage = document.documentElement.lang;

  // Otherwise, let's detect the source language.
  if (detectorAvailability !== "unavailable") {
    if (detectorAvailability !== "available") {
      console.log("Language detection is available, but something will have to be downloaded. Hold tight!");
    }

    const detector = await LanguageDetector.create();
    const [bestResult] = await detector.detect(textToTranslate);

    if (bestResult.detectedLanguage ==== "und" || bestResult.confidence < 0.4) {
      // We'll just return the input text without translating. It's probably mostly punctuation
      // or something.
      return textToTranslate;
    }
    sourceLanguage = bestResult.detectedLanguage;
  }

  // Now we've figured out the source language. Let's translate it!
  const translatorAvailability = await Translator.availability({ sourceLanguage, targetLanguage });
  if (translatorAvailability === "unavailable") {
    console.warn("Translation is not available. Falling back to cloud API.");
    return await useSomeCloudAPIToTranslate(textToTranslate, { sourceLanguage, targetLanguage });
  }

  if (translatorAvailability !== "available") {
    console.log("Translation is available, but something will have to be downloaded. Hold tight!");
  }

  const translator = await Translator.create({ sourceLanguage, targetLanguage });
  return await translator.translate(textToTranslate);
}

Download progress

For cases where using the API is only possible after a download, you can monitor the download progress (e.g. in order to show your users a progress bar) using code such as the following:

const translator = await Translator.create({
  sourceLanguage,
  targetLanguage,
  monitor(m) {
    m.addEventListener("downloadprogress", e => {
      console.log(`Downloaded ${e.loaded * 100}%`);
    });
  }
});

If the download fails, then downloadprogress events will stop being emitted, and the promise returned by create() will be rejected with a "NetworkError" DOMException.

Note that in the case that multiple entities are downloaded (e.g., an en โ†” ja language pack and a en โ†” ko language pack to support the ja โ†” ko use case) web developers do not get the ability to monitor the individual downloads. All of them are bundled into the overall downloadprogress events, and the create() promise is not fulfilled until all downloads and loads are successful.

The event is a ProgressEvent whose loaded property is between 0 and 1, and whose total property is always 1. (The exact number of total or downloaded bytes are not exposed; see the discussion in webmachinelearning/writing-assistance-apis issue #15.)

At least two events, with e.loaded === 0 and e.loaded === 1, will always be fired. This is true even if creating the translator or language detector doesn't require any downloading.

What's up with this pattern?

This pattern is a little involved. Several alternatives have been considered. However, asking around the web standards community it seemed like this one was best, as it allows using standard event handlers and ProgressEvents, and also ensures that once the promise is settled, the translator or language detector object is completely ready to use.

It is also nicely future-extensible by adding more events and properties to the m object.

Finally, note that there is a sort of precedent in the (never-shipped) FetchObserver design.

Too-large inputs

It's possible that the inputs given for translation or language detection might be too large for the underlying machine learning model to handle. Although there are often techniques that allow implementations to break up the inputs into smaller chunks, and combine the results, the APIs have some facilities to allow browsers to signal such too-large inputs.

Whenever any API call fails due to too-large input, it is rejected with a QuotaExceededError. This is a proposed new type of exception, which subclasses DOMException, and replaces the web platform's existing "QuotaExceededError" DOMException. See whatwg/webidl#1465 for this proposal. For our purposes, the important part is that it has the following properties:

  • requested: how much "usage" the input consists of
  • quota: how much "usage" was available (which will be less than requested)

The "usage" concept is specific to the implementation, and could be something like string length, or language model tokens.

This allows detecting failures due to overlarge inputs and giving clear feedback to the user, with code such as the following:

const detector = await LanguageDetector.create();

try {
  console.log(await detector.detect(potentiallyLargeInput));
} catch (e) {
  if (e.name === "QuotaExceededError") {
    console.error(`Input too large! You tried to detect the language of ${e.requested} tokens, but ${e.quota} is the max supported.`);

    // Or maybe:
    console.error(`Input too large! It's ${e.requested / e.quota}x as large as the maximum possible input size.`);
  }
}

In some cases, instead of providing errors after the fact, the developer needs to be able to communicate to the user how close they are to the limit. For this, they can use the inputQuota property and the measureInputUsage() method on the translator or language detector objects:

const translator = await Translator.create({
  sourceLanguage: "en",
  targetLanguage: "jp"
});
meterEl.max = translator.inputQuota;

textbox.addEventListener("input", () => {
  meterEl.value = await translator.measureInputUsage(textbox.value);
  submitButton.disabled = meterEl.value > meterEl.max;
});

submitButton.addEventListener("click", () => {
  console.log(translator.translate(textbox.value));
});

Note that if an implementation does not have any limits, e.g. because it uses techniques to split up the input and process it a bit at a time, then inputQuota will be +Infinity and measureInputUsage() will always return 0.

Developers need to be cautious not to over-use this API, however, as it requires a round-trip to the underlying model. That is, the following code is bad, as it performs two round trips with the same input:

// DO NOT DO THIS

const usage = await translator.measureInputUsage(input);
if (usage < translator.inputQuota) {
  console.log(await translator.translate(input));
} else {
  console.error(`Input too large!`);
}

If you're planning to call translate() anyway, then using a pattern like the one that opened this section, which catches QuotaExceededErrors, is more efficient than using measureInputUsage() plus a conditional call to translate().

Destruction and aborting

The API comes equipped with a couple of signal options that accept AbortSignals, to allow aborting the creation of the translator/language detector, or the translation/language detection operations themselves:

const controller = new AbortController();
stopButton.onclick = () => controller.abort();

const languageDetector = await LanguageDetector.create({ signal: controller.signal });
await languageDetector.detect(document.body.textContent, { signal: controller.signal });

Destroying a translator or language detector will:

  • Reject any ongoing calls to detect() or translate().
  • Error any ReadableStreams returned by translateStreaming().
  • And, most importantly, allow the user agent to unload the machine learning models from memory. (If no other APIs are using them.)

Allowing such destruction provides a way to free up the memory used by the model without waiting for garbage collection, since machine learning models can be quite large.

Aborting the creation process will reject the promise returned by create(), and will also stop signaling any ongoing download progress. (The browser may then abort the downloads, or may continue them. Either way, no further downloadprogress events will be fired.)

In all cases, the exception used for rejecting promises or erroring ReadableStreams will be an "AbortError" DOMException, or the given abort reason.

Detailed design

Language tag handling

If a browser supports translating from ja to en, does it also support translating from ja to en-US? What about en-GB? What about the (discouraged, but valid) en-Latn, i.e. English written in the usual Latin script? But translation to en-Brai, English written in the Braille script, is different entirely.

We're proposing that the API use the same model as JavaScript's Intl APIs, which tries to do best-fit matching of the requested language tag to the available language tags. The specification contains a more detailed example.

Multilingual text

For language detection of multilingual text, we return detected language confidences in proportion to the languages detected. The specification gives an example of how this works. See also the discussion in issue #13.

A future option might be to instead have the API return back the splitting of the text into different-language segments. There is some precedent for this, but it does not seem to be common yet. This could be added without backward-compatibility problems by making it a non-default mode.

Downloading

The current design envisions that availability() methods will not cause downloads of language packs or other material like a language detection model. Whereas, the create() methods can cause downloads. In all cases, whether or not creation will initiate a download can be detected beforehand by the corresponding availability() method.

After a developer has a Translator or LanguageDetector object, further calls are not expected to cause any downloads. (Although they might require internet access, if the implementation is not entirely on-device.)

This design means that the implementation must have all information about the capabilities of its translation and language detection models available beforehand, i.e. "shipped with the browser". (Either as part of the browser binary, or through some out-of-band update mechanism that eagerly pushes updates.)

Privacy and security considerations

Please see the Writing Assistance APIs specification, where we have centralized the normative privacy and security considerations that apply to these APIs as well as the writing assistance APIs.

Permissions policy, iframes, and workers

By default, these APIs are only available to top-level Windows, and to their same-origin iframes. Access to the APIs can be delegated to cross-origin iframes using the Permissions Policy allow="" attribute:

<iframe src="https://example.com/" allow="translator language-detector"></iframe>

These APIs are currently not available in workers, due to the complexity of establishing a responsible document for each worker in order to check the permissions policy status. See this discussion for more. It may be possible to loosen this restriction over time, if use cases arise.

Note that although the APIs are not exposed to web platform workers, a browser could expose them to extension service workers, which are outside the scope of web platform specifications and have a different permissions model.

Alternatives considered and under consideration

Streaming input support

Although the API contains support for streaming output of a translation, via the translateStreaming() API, it doesn't support streaming input. Should it?

We believe it should not, for now. In general, translation works best with more context; feeding more input into the system over time can produce very different results. For example, translating "ๅฝผๅฅณใฎ่ฉฑใ‚’่žใ„ใฆใ€้ฉšใ„ใŸ" to English would give "I was surprised to hear her story". But if you then streamed in another chunk, so that the full sentence was "ๅฝผๅฅณใฎ่ฉฑใ‚’่žใ„ใฆใ€้ฉšใ„ใŸใญใ“ใŒ้€ƒใ’ใŸ", the result changes completely to "Upon hearing her story, the surprised cat ran away." This doesn't fit well with how streaming APIs behave generally.

In other words, even if web developers are receiving a stream of input (e.g. over the network or from the user), they need to take special care in how they present such updating-over-time translations to the user. We shouldn't treat this as a usual stream-to-string or stream-to-stream API, because that will rarely be useful.

That said, we are aware of research on translation algorithms which are specialized for this kind of setting, and attempt to mitigate the above problem. It's possible we might want to support this sort of API in the future, if implementations are excited about implementing that research. This should be possible to fit into the existing API surface, possibly with some extra feature-detection API.

Flattening the API and reducing async steps

The current design requires multiple async steps to do useful things:

const translator = await Translator.create(options);
const text = await translator.translate(sourceText);

const detector = await LanguageDetector.create();
const results = await detector.detect(sourceText);

Should we simplify these down with convenience APIs that do both steps at once?

We're open to this idea, but we think the existing complexity is necessary to support the design wherein translation and language detection models might not be already downloaded. By separating the two stages, we allow web developers to perform the initial creation-and-possibly-downloading steps early in their page's lifecycle, in preparation for later, hopefully-quick calls to APIs like translate().

Another possible simplification is to make the availability() APIs synchronous instead of asynchronous. This would be implementable by having the browser proactively load the capabilities information into the main thread's process, upon creation of the global object. We think this is not worthwhile, as it imposes a non-negligible cost on all global object creation, even when the APIs are not used.

Allowing unknown source languages for translation

An earlier revision of this API including support for combining the language detection and translation steps into a single translation call, which did a best-guess on the source language. The idea was that this would possibly be more efficient than requiring the web developer to do two separate calls, and it could possibly even be done using a single model.

We abandoned this design when it became clear that existing browsers have very decoupled implementations of translation vs. language detection, using separate models for each. This includes supporting different languages for language detection vs. for translation. So even if the translation model supported an unknown-source-language mode, it might not support the same inputs as the language detection model, which would create a confusing developer experience and be hard to signal in the API.

Stakeholder feedback

Appendix: converting between language tags and human-readable strings

This code already works today and is not new to this API proposal. It is likely useful in conjunction with this API, for example when building user interfaces.

function languageTagToHumanReadable(languageTag, targetLanguage) {
  const displayNames = new Intl.DisplayNames([targetLanguage], { type: "language" });
  return displayNames.of(languageTag);
}

languageTagToHumanReadable("ja", "en");      // "Japanese"
languageTagToHumanReadable("zh", "en");      // "Chinese"
languageTagToHumanReadable("zh-Hant", "en"); // "Traditional Chinese"
languageTagToHumanReadable("zh-TW", "en");   // "Chinese (Taiwan)"

languageTagToHumanReadable("en", "ja");      // "่‹ฑ่ชž"

About

๐ŸŒ A proposal for translator and language detector APIs

Resources

License

Contributing

Stars

Watchers

Forks

Contributors 5