Thank you very much for the pointer! I will take a look.
-
-
Replying to @cmuratori @mark_dev_
Are you using EGCs as the unit for caching glyphs? If so, that might be the real problem. To render many scripts correctly, EGC to glyph mapping must be treated as a many-to-many mapping (called “shaping”). EGC code like utf8proc won’t do that. But Uniscribe presumably does.
2 replies 0 retweets 1 like -
This page has some examples. (They use “character” to means EGC!) https://gankra.github.io/blah/text-hates-you/ …
1 reply 0 retweets 1 like -
Replying to @kssreeram @mark_dev_
Yes - we already do all that, because as you can see from the demo we handle Arabic, Hindi, Hebrew, etc. That's why it is necessary to do more than simple grapheme clustering.
1 reply 0 retweets 1 like -
Replying to @cmuratori @mark_dev_
I could have communicated better. Refterm does indeed render correctly. It's just a matter of terminology. The clusters from Uniscribe's ScriptItemize/ScriptBreak are not "extended grapheme clusters". They also do "shaping".
1 reply 0 retweets 2 likes -
To do that without Uniscribe, an algorithm for extended grapheme clusters isn't enough. Shaping is needed too. Harfbuzz can do that for example.
1 reply 0 retweets 2 likes -
To be more precise, the clustering algorithm must be “shaping” aware. The term “shaping” includes glyph generation/positioning too.
1 reply 0 retweets 2 likes -
Replying to @kssreeram @mark_dev_
Yes - I find the terms to be kind of confusing and not well-defined :( So yes, we basically need "shaping-based breaking", meaning that the algorithm needs to tell us where to split inputs on boundaries where the glyphs won't change because of the split.
1 reply 0 retweets 1 like -
I guess the easiest way to say it is, we need an algorithm that will tell us where we can safely break inputs into pieces which will not "look different" when broken at those points.
2 replies 0 retweets 2 likes -
The short answer is that you have to run shaping to know such boundaries. Which... defeats the purpose. But then again, full shaping in a grid-layout is not well-defined... I'll leave a few pointers in the next tweet:
2 replies 1 retweet 2 likes
But that's actually fine, because we cache the results. Most of the problem here is just that Uniscribe is not good at letting us control the process. It's entirely possible that switching to something like Harfbuzz would give us the control we need to do the right thing...
-
-
Replying to @cmuratori @behdadesfahbod and
Stated more specifically, the ideal solution would just be a state machine that we feed characters to that occasionally says "OK, I found a glyph I can generate definitively" and then it does. And we cache it :)
0 replies 0 retweets 4 likesThanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.