Okay, so here's what's going on with the new Twitter client-side length check.
-
-
"Unicode characters 0x100 upwards" includes every character used for Base65536, including the padding characters.
Show this thread -
So the maximum number of Base65536 characters allowed in a Tweet remains 140, and the maximum amount of data so expressed remains 280 bytes.
Show this thread -
(As far as I know, this limit is only enforced client-side. It is possible to circumvent the check and send 280 characters/560 bytes...
Show this thread -
...but I'm taking the check seriously for the purposes of this piece of work.)
Show this thread -
So, no, the new changes do not make it possible to send 560 bytes of text per Tweet. That limit stays at 280 bytes. With current encodings.
Show this thread -
Twitter has divided Unicode up into two sets for our purposes: "good" (0x0000 to 0x10FF, weight 1) and "bad" (0x1100 to 0x10FFFF, weight 2).
Show this thread -
We need a new encoding which makes the maximum possible use of "safe" characters from the "good" range and uses none from the "bad" range.
Show this thread -
If we can find 256 "good" characters (trivial), "Base256" would allow 280 bytes per 280-character Tweet, same as Base65536.
Show this thread -
If we can find 512 "good" characters (easy?), "Base512" would express 9 bits per character, or 315 bytes per 280-character Tweet.
Show this thread -
If we can find 1024 "good" characters (difficulty unclear), "Base1024" would express 10 bits per character = 350 bytes per Tweet
Show this thread -
Base2048 -> 11 bits/character -> 385 bytes/Tweet (highly unlikely) Base4096 -> 12 bits/character -> 420 bytes/Tweet (definitely impossible)
Show this thread -
Binary encodings using non-power-of-two Unicode character counts are possible but generally more inconvenient to implement.
Show this thread -
With a power of two, you just dice the bits up and then put them back together in different groupings.
Show this thread -
There are 0x1100 = 4352 good characters, and 1109760 bad ones.
Show this thread -
A currently-unanswered question is how many of the good characters are "safe".
Show this thread -
"Safeness" was a concept I defined while initially developing Base65536: https://github.com/qntm/base65536gen#what-makes-a-character-safe …
Show this thread -
I suppose I shall have to dig that old code out and do some further number crunching...
Show this thread -
I think I'm going to call these characters "light" and "heavy" rather than "good" and "bad".
Show this thread
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.