Twitter live data mining using Spark streaming and Scala.

Want to work and learn live streaming data processing? Easiest way to create a twitter developer app and follow below code to ingest and store data in your AWS S3 for further analysis and processing with tools like Amazon EMR or Machine learning projects.

For deeper concepts connect me in linked. You may also outsource tech screening interview process to me.

 * @author Gyanendra
 * @Date : 08/12/19

import org.apache.spark.SparkConf
import org.apache.spark.streaming.twitter.TwitterUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}
import twitter4j.auth.OAuthAuthorization
import twitter4j.conf.ConfigurationBuilder

object TweeterStreamReaderApp {
  def main(args: Array[String]) {

    var twitterCredentials = new Array[String](4);
    twitterCredentials(0) = "gA7xFE3S1QfVTN55Uuzb";
    twitterCredentials(1) = "2te2Z1yFvynXcp06rc2j3zg38tNAa1zY29rOT3d5BFI";
    twitterCredentials(2) = "1063309360480-61DChczOivazJZTWodLfuRRW8gDNfJ";
    twitterCredentials(3) = "bFYPmpiWhFgOtdJGe95YyhOntxOQAmx0xEYtF";

    val appName = "TweeterStreamReader"
    val conf = new SparkConf()
    val ssc = new StreamingContext(conf, Seconds(5))
    val Array(consumerKey, consumerSecret, accessToken, accessTokenSecret) = twitterCredentials.take(4)
    val filters = args.takeRight(args.length - 4)
    val cb = new ConfigurationBuilder
    val auth = new OAuthAuthorization(
    val tweets = TwitterUtils.createStream(ssc, Some(auth), filters)
    val englishTweets = tweets.filter(_.getLang() == "en")

    // lets print all rdd. Further you can store this to S3
    englishTweets.foreachRDD { (rdd, time) =>

    def p(rdd: org.apache.spark.rdd.RDD[_]) = rdd.foreach(println)

Download this code from my repo

Love of Python : My first Python script

Every next day my love of Python and Scala is getting stronger, Hence I thought to post one python script here which is a sample script to perform some common configurations that we need in application deployment on production, test or automation servers. I am going to use this script (updated version of it) in my project to reduce an overtake of approximately 15-20 mins of manual configuration works of service deployment.

This python script performs following tasks
(1) Add new configuration
(2) Update configurations
(3) Run Unix commands (native OS) on server and few other tasks

You can customize this script as per your project requirement. The script is available at my gitlab repository. I would love to improvise any of my reader’s recommendations for any specific change/requirement over this script.

PS: This script is in the initial draft form, I will update it with more useful commands and comments to make it more user-friendly.