Thursday, June 22, 2017

Amazon Alexa - Intelligent Personal Assistant

As technologies are moving with a fast pace, things evolved from Web Applications to
mobile applications and now we want everything on voice command. Thanks to Amazon that comes with an amazing Artificial Intelligent IoT Device Alexa.
Alexa is an intelligent personal assistant developed by Amazon, made popular by the Amazon Echo and the Amazon Echo Dot devices developed by Amazon Lab126. It is capable of voice interaction, music playback, making to-do lists, setting alarms, streaming podcasts, playing audiobooks, and providing weather, traffic, and other real time information, such as news. Alexa can also control several smart devices using itself as a home automation system. Currently, interaction and communication with Alexa is only available in English and German. Now, LG Electronic devices are coming equipped with Alexa by which one can place voice commands to perform the task. Similarly Ford Amazon tie-up lets you control your car with your voice.
Build an interesting use case to recharge a Mobile Phone using Alexa Device. Follow the steps to build Skill and Lambda functions as part of the Alexa Application Development on Amazon AWS.

References -
  • Linkedin

Tuesday, April 11, 2017

Get Top, Bottom N Elements by Key

Given a set of (key-as-string, value-as-integer) pairs, say we want to create a
top N (where N > 0) list. Top N is a design pattern. For example, if key-as-string is a URL
and value-as-integer is the number of times that URL is visited, then you might
ask: what are the top 10 URLs for last week? This kind of question is common for
these types of key-value pairs. Finding a top 10 list is categorized as a filtering pattern
(i.e., you filter out data and find the top 10 list).

Approach 1 -
val inputRDD = sc.parallelize(Array(("A", 5), ("A",10), ("A", 6), ("B", 67), ("B", 78), ("B", 7)))
val resultsRDD = inputRDD.groupByKey.map{case (key, numbers) => key -> numbers.toList.sortBy(x => x).take(2)}
resultsRDD .collect()

Approach 2 -
val inputRDD  = sc.parallelize(Array(("A", 5), ("A",10), ("A", 6), ("B", 67), ("B", 78), ("B", 7)))
inputRDD .groupByKey.mapValues(number => number.toList.sortBy(x => x).take(2)).collect()

We can add negation at x => -x for bottom N elements.

Monday, March 13, 2017

SBT Project Creation

  1. Install SBT Window Plugin http://www.scala-sbt.org/download.html
  2. After installation set the Environmental path
    1. SBT_HOME
    2. Path
  3. Once Environmental setup is done create a directory with the project name say DEMOSBT
  4. Inside the DEMOSBT create a file name called build.sbt and folder project
  5. Inside the build.sbt write
name := "ProjectName"
version := "1.0"
scalaVersion := "2.10."4
Inside this file further we will be adding dependencies and assembly plugin details to build the UBER JAR.
  1. Inside the DEMOSBT\project folder create a file named plugins.sbt
  2. In plugins.sbt add below line to add Eclipse dependency
 addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "2.5.0")
Further similar to that we will add assembly plugin dependency as well for UBER JAR creation
  1. From the command window, move to DEMOSBT path and run sbt eclipse command that adds all the required dependency
  2. Import the project in Eclipse
  3. Down the line if you want to add any dependency then you have to modify build.sbt file. After that, again you have to run the sbt eclipse Refresh the eclipse project you will get the referenced libraries in the bundle.

Spark Installation on Windows

  • Choose a Spark pre built package for Hadoop i.e.  Pre-built for Hadoop 2.6 or later. Download and extract it to any drive i.e. D:\spark-2.1.0-bin-hadoop2.6
  • Set SPARK_HOME and add %SPARK_HOME%\bin in PATH in environment variables
  • Run following command on command line.
spark-shell
  • You’ll get an error for winutils.exe:
winutil
      Though we aren’t using Hadoop with Spark, but somewhere it checks for HADOOP_HOME variable in configuration. So to overcome this error, download winutils.exe and place it in any location (i.e. D:\winutils\bin\winutils.exe).
P.S. As per the Operating system version, this winutils.exe may vary. So in case, if it doesn't support to your OS, please find another one and use. You can refer this Problems running Hadoop on Windows link for winutils.exe.
  • Set HADOOP_HOME = D:\winutils in environment variable
  • Now, Re run the command "spark-shell", you’ll see the scala shell. For latest spark releases, if you get the permission error for /tmp/hive directory as given below:
The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
You need to run following command :
D:\spark>D:\winutils\bin\winutils.exe chmod 777 D:\tmp\hive

Apache Kafka Installation on Windows

  1. Make sure JDK and JRE are installed and path is set in environment variable
  2. Download Apache Kafka from https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.0/kafka_2.10-0.10.2.0.tgz
  3. Unpack kafka_2.10-0.10.2.0.tgz
  4. Copy extracted kafka_2.10-0.8.2.1 directory to C:\
  5. Open C:\kafka_2.10-0.10.2.0\config\server.properties and update existing log.dir entry as mentioned below :
    dirs=c:/kafka_2.10-0.10.2.0/kafka-logs
  6. Open C:\kafka_2.10-0.10.2.0\config\zookeeper.properties and update existing dataDir entry as mentioned below :
    dataDir=c:/kafka_2.10-0.10.2.0/zookeeper-data(Note : Make sure you use forward slash as depicted for step-5 and step-6)
  7. Installation of Single Node Single Broker Kafka is done on Windows
  8. Starting Zookeeper and Kafka server
    Open a command prompt and start Zookeeper server using following command :
    cd c:\kafka_2.10-0.10.2.0
    bin\windows\zookeeper-server-start.bat .\config\zookeeper.properties
  9. Now open another command prompt and start Kafka server using following command
    .\bin\windows\kafka-server-start.bat .\config\server.properties
    I hope both Zookeeper and Kafka server are executed successfully for you.
  10. Now let's create Kafka Topic using following command
    cd C:\kafka_2.10-0.10.2.0
    bin\windows\kafka-topics.bat ––create ––zookeeper localhost:2181 ––replication-factor 1 ––partition 1 ––topic test
  11. List the topics using following command
    cd C:\kafka_2.10-0.10.2.0
    bin\windows\kafka-topics.bat ––list  ––zookeeper localhost:2181
  12. Send message using Kafka console producer
    cd C:\kafka_2.10-0.10.2.0
    bin\windows\kafka-console-producer.bat ––broker-list localhost:9092 ––topic test
  13. Consume the above message using Kafka console consumer
    cd C:\kafka_2.10-0.10.2.0
    bin\windows\kafka-console-consumer.bat ––zookeeper localhost:2181 ––topic test ––from-beginning

Apache Spark – Catalyst Optimizer

Optimizer is the one that automatically finds out the most efficient plan to execute data operations specified in the user’s program. In...