Are you using EGCs as the unit for caching glyphs? If so, that might be the real problem. To render many scripts correctly, EGC to glyph mapping must be treated as a many-to-many mapping (called “shaping”). EGC code like utf8proc won’t do that. But Uniscribe presumably does.
-
-
This page has some examples. (They use “character” to means EGC!) https://gankra.github.io/blah/text-hates-you/ …
1 reply 0 retweets 1 like -
Replying to @kssreeram @mark_dev_
Yes - we already do all that, because as you can see from the demo we handle Arabic, Hindi, Hebrew, etc. That's why it is necessary to do more than simple grapheme clustering.
1 reply 0 retweets 1 like -
Replying to @cmuratori @mark_dev_
I could have communicated better. Refterm does indeed render correctly. It's just a matter of terminology. The clusters from Uniscribe's ScriptItemize/ScriptBreak are not "extended grapheme clusters". They also do "shaping".
1 reply 0 retweets 2 likes -
To do that without Uniscribe, an algorithm for extended grapheme clusters isn't enough. Shaping is needed too. Harfbuzz can do that for example.
1 reply 0 retweets 2 likes -
To be more precise, the clustering algorithm must be “shaping” aware. The term “shaping” includes glyph generation/positioning too.
1 reply 0 retweets 2 likes -
Replying to @kssreeram @mark_dev_
Yes - I find the terms to be kind of confusing and not well-defined :( So yes, we basically need "shaping-based breaking", meaning that the algorithm needs to tell us where to split inputs on boundaries where the glyphs won't change because of the split.
1 reply 0 retweets 1 like -
I guess the easiest way to say it is, we need an algorithm that will tell us where we can safely break inputs into pieces which will not "look different" when broken at those points.
2 replies 0 retweets 2 likes -
The short answer is that you have to run shaping to know such boundaries. Which... defeats the purpose. But then again, full shaping in a grid-layout is not well-defined... I'll leave a few pointers in the next tweet:
2 replies 1 retweet 2 likes -
Replying to @behdadesfahbod @cmuratori and
On
@HarfBuzz clusters: https://harfbuzz.github.io/clusters.html HB_GLYPH_FLAG_UNSAFE_TO_BREAK https://harfbuzz.github.io/harfbuzz-hb-buffer.html#hb-glyph-flags-t … Discussion on the limitations of above flag:https://github.com/harfbuzz/harfbuzz/issues/1463 …1 reply 1 retweet 11 likes
This is great! That's actually exactly what I wanted, TBH. I don't suppose Uniscribe supports that, although maybe it does, I will have to look. But those Harfbuzz flags seem like exactly what I want. Since glyph caching is per-font anyway, this is not a limitation.
-
-
Replying to @cmuratori @kssreeram and
Thanks Casey. I'll get back to you tomorrow.
0 replies 0 retweets 2 likesThanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.