Skip to content

Instantly share code, notes, and snippets.

@smoaddeli
Created April 29, 2014 20:50
Show Gist options
  • Save smoaddeli/0efaf9f2394310b18887 to your computer and use it in GitHub Desktop.
Save smoaddeli/0efaf9f2394310b18887 to your computer and use it in GitHub Desktop.
# -*- coding: utf-8 -*-
import os
import re
import codecs
fo = open("allHashtags.tsv", "w")
fpath = 'tail.tsv'
source = codecs.open(fpath,'rb',encoding='utf-8')
pattern = re.compile("#\w+", re.UNICODE)
for line in source:
matches = re.findall(pattern,line)
for item in matches:
print item,
print '\n---'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment