I de-obfuscated a bunch of client code (tedious) and came to the same conclusion as people have reached via experimentation...
-
-
If we can find 1024 "good" characters (difficulty unclear), "Base1024" would express 10 bits per character = 350 bytes per Tweet
Show this thread -
Base2048 -> 11 bits/character -> 385 bytes/Tweet (highly unlikely) Base4096 -> 12 bits/character -> 420 bytes/Tweet (definitely impossible)
Show this thread -
Binary encodings using non-power-of-two Unicode character counts are possible but generally more inconvenient to implement.
Show this thread -
With a power of two, you just dice the bits up and then put them back together in different groupings.
Show this thread -
There are 0x1100 = 4352 good characters, and 1109760 bad ones.
Show this thread -
A currently-unanswered question is how many of the good characters are "safe".
Show this thread -
"Safeness" was a concept I defined while initially developing Base65536: https://github.com/qntm/base65536gen#what-makes-a-character-safe …
Show this thread -
I suppose I shall have to dig that old code out and do some further number crunching...
Show this thread -
I think I'm going to call these characters "light" and "heavy" rather than "good" and "bad".
Show this thread
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.