Ok very interesting. Someone could theoretically send messages to another person that is readable by a human but unsearchable by Twitter search.
Conversation
This Tweet was deleted by the Tweet author. Learn more
If you use unicode and select a different character set like gothic, you can't find it with a normal search even if you search the words in the tweet. You would probably have to search the exact same character set. This could allow people to send public messages that can't get
2
5
This Tweet was deleted by the Tweet author. Learn more
This Tweet was deleted by the Tweet author. Learn more
You know what? I wonder what language Twitter thinks it is. That will be interesting to check.
This Tweet was deleted by the Tweet author. Learn more
Replying to
So in Elasticsearch, there is a token filter called "folding" that basically allows you to search for words that have letters with accents so you can search for them and get a hit if you don't use the accent.
But Twitter's search doesn't appear to fold the gothic unicode.
This Tweet was deleted by the Tweet author. Learn more
Interesting! I would think it would detect what language the majority of text was and assign it to the appropriate language. Is it just "und" for undefined?
Replying to
Hum, can you try Gothic with a more exotic language?
Maybe it default to EN or guess EN from your profile?
1

