Tag: Hadoop

Java MapReduce for top N Twitter Hashtags

Back in 2015 i had to implement MapReduce job to extract top 15 hashtags from  twitter’s raw data in Hadoop. This was a part of Business Intelligence lecture exercise at Vienna University of Technology. Regex used (not the best one by my opinion), for hashtag extraction, has following format: String regex = “text\”:\\\”(.*)\\\”,\”source”; The ‘source’ field comes ...