-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Closed
Closed
Copy link
Labels
arch-wasmWebAssembly architectureWebAssembly architecturearea-System.Globalizationruntime-monospecific to the Mono runtimespecific to the Mono runtime
Milestone
Description
The task is to remove as much data from ICU files as possible and exchange ICU4C functions that are using this data with platform native functions - in the case of WASM with Web API. Because we are not able to get rid of ICU datafile completely (some functionalities are not easily replaceable) we will keep loading icudt.dat in a reduced form. This mode will be called HybridGlobalization and will be by default switched off. User can switch it on by setting MsBuild's <HybridGlobalization> to true.
PoC branch is here: main...ilonatommy:runtime:icu-platform-native.
- Removing
collationsfor WASM
- Prepare icudt_wasm.dat and corresponding sharded datafiles without collations/standard, enable setting HybridGlobalization and write WBT checking if the new file got loaded instead of the old one. Reduced ICU files for
HybridGlobalizationicu#300 [ILONA] - Loading reduced ICU files to Blazor - this unified the startup for Blazor, so we could implement HG for it as well.
- Finish implementing
GlobalizationNative_ChangeCase+ optimize memory usage - do not create a new string for returning the value but pass the address of buffer reserved on C# size that will hold the result. [ILONA]
public API:TextInfo.ToLower,TextInfo.ToUpper, TextInfo.ToTitleCase` - Implement
GlobalizationNative_IndexOfandGlobalizationNative_LastIndexOf- will not work for letters that consist of more than one grapheme, issue: Add locale sensitive substring matching functions to Intl.Collator tc39/ecma402#506 [ILONA]
public API:CompareInfo.IndexOf,String.IndexOf,MemoryExtensions.IndexOf,CompareInfo.LastIndexOf,String.LastIndexOf,MemoryExtensions.LastIndexOf. - Implement
GlobalizationNative_StartsWithandGlobalizationNative_EndsWith[ILONA]
public API:CompareInfo.IsSuffix,String.EndsWidth,MemoryExtensions.EndsWith,CompareInfo.IsPrefix,String.StartsWidth,MemoryExtensions.StartsWith. - Implement
GlobalizationNative_CompareString(withoutOrdinalandOrdinalIgnoreCase,IgnoreKanaType,IgnoreWidth) [ILONA]
public API:CompareInfo.Compare,String.Compare -
ImplementIgnoreKanaTypeandIgnoreWidthbasing onpal_collation.ccode [ILONA] - Investigate
OrdinalandOrdinalIgnoreCase. [ILONA] - Throw PNSE on
GlobalizationNative_GetSortKey - How much do we gain removing SortVersion? 32kB on uncompressed. Not worth it.
If much, throw PNSE onGlobalizationNative_GetSortVersion. - Document how to use the flag and what to expect when switching it on.
- Coordinate flow of
HybridGlobalizationfrom Blazor. Changes in dotnet/sdk might be needed.
- Removing
normalizationfor WASM:
Removed from planned Hybrid features. Savings from normalization removal on WASM are ~60kB. The removal breaks public APIs: string.Normalize, string.IsNormalized, IdnMapping.GetAsciii, IdnMapping.GetUnicode. Normalize/IsNormalized were succesfully replaced in [browser][non-icu]HybridGlobalizationnormalization. #85510.
For GetAscii/GetUnicode replacement, Invariant implementation enhanced by normalization step was used, see branch https://github.com/ilonatommy/runtime/tree/idn-mapping. The mapping still lacked detection of disallowed/ignored/mapped characters and would need access to MappingTables of the current Unicode version to e.g. detect incorrect inputs to throw. One Unicode version mapping table in plain text weights ~900kB. Even if we compressed it, we still would need to maintain it with every Unicode version. Development time spent on correct implementation and chances of real size reduction, taking into cosideration the need to keep the mapping tables, are too small to remove normalization data from ICU.
-
Updateicudt_wasm.datand corresponding sharded datafiles. -
Implement Punycode,might be using this algorithmusing InvariantGlobalization algorithm + normalization function. -
Use normalization from the PoC branch. -
Update documentations.
- Investigate implications of removing further data batches, e.g. check the effect of removing all collations,
coll_ucadata,locales_treeetc.
- Fix
no exception thrownforCultureInfoAll.LcidTest,CultureInfoAll.GetCultureTest,CultureInfoConstructor.Ctor_String(now we support wider range of locales so we should not expect some of them throw as it was with standard ICU)
- (optional) Enhancement of collations by manual workarounds:
- Assess risks connected with manual workarounds.
-
Consider fixing someIgnoreSymbolsby adding static data on JS side. -
Consider shifting katakana/hiragana and high/low symbols, based onpal_collation.ccode - Check if we can implement invariant SortKey functionality for this scenario.
- Refactoring according to suggestions: https://github.com/dotnet/runtime/pull/86895/files#r1222395670, https://github.com/dotnet/runtime/pull/86895/files#r1222406677 ...
- (optional) Consider failing a build when HybridGlobalization function is not supported
- we are throwing PNSE when unsupported globalization functionality is used. Consider this idea: [browser][non-icu]
HybridGlobalizationcompare #84249 (comment) to catch them already during the build time.
Tracking issues:
filipnavara, rogihee, SamMonoRT, jirisykora83 and maxkatz6
Metadata
Metadata
Assignees
Labels
arch-wasmWebAssembly architectureWebAssembly architecturearea-System.Globalizationruntime-monospecific to the Mono runtimespecific to the Mono runtime