This project aims to develop a news event detection framework that will detect newsworthy events from tweets and generate headlines from them. We will first pass the tweets through threshold based clustering techniques to cluster tweets corresponding to a certain event based on their similarity, retweet map and Links. Finally, we will use those clusters to topic categorize the events and then generate headlines for that event using ML summarization techniques. Our Clustering technique has a recall of around 89% and process 2000 tweets per minute. Future work hopes to optimize this further.
Before the news is aired on TV channels, its already trending at various social media sites like twitter. Witnesses to the events post their discoveries online and are picked up by news channels subsequently. However, processing of such huge amount of information to filter newsworthy text fo