Conversation

It's kind of funny what GitHub Copilot considers a slur. There are the obvious ones, but then there's also "communist", "socialism", "israel", "man", and "woman"
7
231
The slurs are "hashed" (not cryptographically, something hand-rolled) but I've managed to use existing wordlists to decode 923/1170 of them (though some of these may be hash collisions)
2
83
There are also some funny concessions to the language used in coding -- although "kill someone" (as two words) is considered bad, "kill" by itself isn't, presumably because that gets used in code all the time.
1
82
By prompting it with # List of ideologies ideologies = [ it was happy to complete things including socialism and communism so it doesn't seem to be a blanket ban.
    'absolute_monarchy',
    'anarchism',
    'capitalism',
    'communism',
    'corporatism',
    'corrupt_monarchy',
    'fascism',
    'global_socialism',
    'ideal_monarchy',
    'individualism',
    'paretoism',
    'progressive_socialism',
    'socialism',
    'the_people_party',
    'total_global_socialism',
    'total_socialism',
    'total_socialism_and_global_socialism',
5
79
Replying to
I think it must have learned this kind of thing from something like Hearts of Iron 4 mods. Paradox Interactive grand strategy games have massive amount of plain text scripting/configuration for events, decisions, countries, tons of historical data and all kinds of other things.
2
5
Replying to and
A huge amount of the input data would be that kind of data from game mods. GitHub Copilot knows all about historical events, lineage of nobility, wars, etc. from Paradox Interactive games. It'd even have historical population data from Victoria 2 mods. Funny thinking about it.
1
3
Replying to and
The mods probably largely only include the files they modified but I'm sure there's a lot of that data on GitHub regardless, so it would have learned from all of that. All that stuff is plain text even without extracting any archives, etc. and I'm sure people uploaded lots.
1