Big Data Hacks

Thursday, June 22, 2017

Amazon Alexa - Intelligent Personal Assistant

As technologies are moving with a fast pace, things evolved from Web Applications to

mobile applications and now we want everything on voice command. Thanks to Amazon that comes with an amazing Artificial Intelligent IoT Device Alexa.

Alexa is an intelligent personal assistant developed by Amazon, made popular by the Amazon Echo and the Amazon Echo Dot devices developed by Amazon Lab126. It is capable of voice interaction, music playback, making to-do lists, setting alarms, streaming podcasts, playing audiobooks, and providing weather, traffic, and other real time information, such as news. Alexa can also control several smart devices using itself as a home automation system. Currently, interaction and communication with Alexa is only available in English and German. Now, LG Electronic devices are coming equipped with Alexa by which one can place voice commands to perform the task. Similarly Ford Amazon tie-up lets you control your car with your voice.

Build an interesting use case to recharge a Mobile Phone using Alexa Device. Follow the steps to build Skill and Lambda functions as part of the Alexa Application Development on Amazon AWS.

https://github.com/ojasviharsola/Alexa-IoT

References -

Tuesday, April 11, 2017

Get Top, Bottom N Elements by Key

Given a set of (key-as-string, value-as-integer) pairs, say we want to create a

top N (where N > 0) list. Top N is a design pattern. For example, if key-as-string is a URL

and value-as-integer is the number of times that URL is visited, then you might

ask: what are the top 10 URLs for last week? This kind of question is common for

these types of key-value pairs. Finding a top 10 list is categorized as a filtering pattern

(i.e., you filter out data and find the top 10 list).

Approach 1 -

val inputRDD = sc.parallelize(Array(("A", 5), ("A",10), ("A", 6), ("B", 67), ("B", 78), ("B", 7)))

val resultsRDD = inputRDD.groupByKey.map{case (key, numbers) => key -> numbers.toList.sortBy(x => x).take(2)}

resultsRDD .collect()

Approach 2 -

val inputRDD = sc.parallelize(Array(("A", 5), ("A",10), ("A", 6), ("B", 67), ("B", 78), ("B", 7)))

inputRDD .groupByKey.mapValues(number => number.toList.sortBy(x => x).take(2)).collect()

We can add negation at x => -x for bottom N elements.

Monday, March 13, 2017

SBT Project Creation

Install SBT Window Plugin http://www.scala-sbt.org/download.html
After installation set the Environmental path
1. SBT_HOME
2. Path
Once Environmental setup is done create a directory with the project name say DEMOSBT
Inside the DEMOSBT create a file name called build.sbt and folder project
Inside the build.sbt write

name := "ProjectName"

version := "1.0"

scalaVersion := "2.10."4

Inside this file further we will be adding dependencies and assembly plugin details to build the UBER JAR.

Inside the DEMOSBT\project folder create a file named plugins.sbt
In plugins.sbt add below line to add Eclipse dependency

addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "2.5.0")

Further similar to that we will add assembly plugin dependency as well for UBER JAR creation

From the command window, move to DEMOSBT path and run sbt eclipse command that adds all the required dependency
Import the project in Eclipse
Down the line if you want to add any dependency then you have to modify build.sbt file. After that, again you have to run the sbt eclipse Refresh the eclipse project you will get the referenced libraries in the bundle.

Spark Installation on Windows

Choose a Spark pre built package for Hadoop i.e. Pre-built for Hadoop 2.6 or later. Download and extract it to any drive i.e. D:\spark-2.1.0-bin-hadoop2.6
Set SPARK_HOME and add %SPARK_HOME%\bin in PATH in environment variables
Run following command on command line.

You’ll get an error for winutils.exe:

Though we aren’t using Hadoop with Spark, but somewhere it checks for HADOOP_HOME variable in configuration. So to overcome this error, download winutils.exe and place it in any location (i.e. D:\winutils\bin\winutils.exe).

P.S. As per the Operating system version, this winutils.exe may vary. So in case, if it doesn't support to your OS, please find another one and use. You can refer this Problems running Hadoop on Windows link for winutils.exe.

Set HADOOP_HOME = D:\winutils in environment variable
Now, Re run the command "spark-shell", you’ll see the scala shell. For latest spark releases, if you get the permission error for /tmp/hive directory as given below:

The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-

You need to run following command :

D:\spark>D:\winutils\bin\winutils.exe chmod 777 D:\tmp\hive

For Spark UI : open http://localhost:4040/in browser.

Apache Kafka Installation on Windows

Make sure JDK and JRE are installed and path is set in environment variable
Download Apache Kafka from https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.0/kafka_2.10-0.10.2.0.tgz
Unpack kafka_2.10-0.10.2.0.tgz
Copy extracted kafka_2.10-0.8.2.1 directory to C:\
Open C:\kafka_2.10-0.10.2.0\config\server.properties and update existing log.dir entry as mentioned below :
dirs=c:/kafka_2.10-0.10.2.0/kafka-logs
Open C:\kafka_2.10-0.10.2.0\config\zookeeper.properties and update existing dataDir entry as mentioned below :
dataDir=c:/kafka_2.10-0.10.2.0/zookeeper-data(Note : Make sure you use forward slash as depicted for step-5 and step-6)
Installation of Single Node Single Broker Kafka is done on Windows
Starting Zookeeper and Kafka server
Open a command prompt and start Zookeeper server using following command :
cd c:\kafka_2.10-0.10.2.0
bin\windows\zookeeper-server-start.bat .\config\zookeeper.properties
Now open another command prompt and start Kafka server using following command
.\bin\windows\kafka-server-start.bat .\config\server.properties
I hope both Zookeeper and Kafka server are executed successfully for you.
Now let's create Kafka Topic using following command
cd C:\kafka_2.10-0.10.2.0
bin\windows\kafka-topics.bat ––create ––zookeeper localhost:2181 ––replication-factor 1 ––partition 1 ––topic test
List the topics using following command
cd C:\kafka_2.10-0.10.2.0
bin\windows\kafka-topics.bat ––list ––zookeeper localhost:2181
Send message using Kafka console producer
cd C:\kafka_2.10-0.10.2.0
bin\windows\kafka-console-producer.bat ––broker-list localhost:9092 ––topic test
Consume the above message using Kafka console consumer
cd C:\kafka_2.10-0.10.2.0
bin\windows\kafka-console-consumer.bat ––zookeeper localhost:2181 ––topic test ––from-beginning