i had no idea anyone needed convincing
-
-
-
addendum: microsoft needs convincing.
-
UCS-2 IN SOME OF THE PLACES



-
Apparently, in the Win10 April build you can set the codepage to UTF-8, and the old *A() functions and CRT functions (fopen...) will accept UTF-8 strings. I guess/hope that can be done per process. Hard to google for info, but it's mentioned here: https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#UTF-8 …
-
my version of windows 10 still has intense allergic reactions to UTF-8 - it got worse, if anything; certain sequences now cause the process to abort regularly! I recently had to switch the whole process to unicode output.
-
Oh ok. I usually call the "unicode" *W() functions directly, keep all my strings in UTF-8, and only convert from/to UCS-2 on the Windows API boundary. But the only places where this mattered are window title, and (more important) FS paths that contain the user's login name.
-
...and Input Method Editor support for text input, which is a whole topic on its own :/
-
I tried to get the command prompt to output Unicode, and failed miserably
- 1টি আরও উত্তর
নতুন কথা-বার্তা -
-
-
One of my tats is in UTF-8 :D
কথা-বার্তা শেষ
নতুন কথা-বার্তা -
-
The one, true Unicode. :)
-
Nonsense; UCS-4 (UTF-32) is the *true* Unicode! Equal treatment for all codepoints!
-
UTF-8 in the (digital superhighway) streets, UTF-32 in the (memory) sheets.
-
So you say, but once you start down the dark path of compression, forever will it dominate your data!
-
🀰🁢🁢🁢🁢🁢🁢🁢🁢🁢🁢🁢🁢🁢🁢🁢🁢🁢🁢🁢🁢🁢🁢🁢🁢🁢🁢
-
UTF-32 for strings would indeed solve some of the immediate problems i have with counting codepoints and such. but nobody likes transcoding data to screen.
-
on the other hand, the bytelength of a UTF-32 string provides a neat upper bound for allocating UTF-8 memory. you can even do it in place. these are good properties.
-
Ideally, if indeed you need to segment the text at all, it is an array of Extended Grapheme Clusters. Or at the very least an array of codepoints. The reality is more like JavaScript, where it is an array of UCS-2/UTF-16 code units natively. So you Array.from() to codepoints.
কথা-বার্তা শেষ
নতুন কথা-বার্তা -
-
-
"we believe that the very popular UTF-16 encoding [...] has no place in library APIs except for specialized text processing libraries, e.g. ICU" Even in ICU, UTF-16 was a mistake, IMHO
ধন্যবাদ। আপনার সময়রেখাকে আরো ভালো করে তুলতে টুইটার এটিকে ব্যবহার করবে। পূর্বাবস্থায়পূর্বাবস্থায়
-
-
-
This manifesto inspired my library.
https://bitbucket.org/knight666/utf8rewind …ধন্যবাদ। আপনার সময়রেখাকে আরো ভালো করে তুলতে টুইটার এটিকে ব্যবহার করবে। পূর্বাবস্থায়পূর্বাবস্থায়
-
লোড হতে বেশ কিছুক্ষণ সময় নিচ্ছে।
টুইটার তার ক্ষমতার বাইরে চলে গেছে বা কোনো সাময়িক সমস্যার সম্মুখীন হয়েছে আবার চেষ্টা করুন বা আরও তথ্যের জন্য টুইটারের স্থিতি দেখুন।