Conversation

def extract_hashtags(text): '''Extract hashtags''' valid_tags = set() tags = re.findall(r'#(\w+)', text) for tag in tags: if tag.isdigit(): continue else: valid_tags.add(tag) return valid_tags
2
2
This will ignore hashtags that are only digits. It seems hashtags can start with numbers but they can't just be numbers only. At least this is a starting point and it appears to handle characters from other languages besides English.
1
1
If anyone can test this method to see if it produces invalid hashtags, I'd really appreciate it. I'll keep working on it to try and get it to exactly emulate Twitter's hashtag rules.
2
Show replies