An introduction to Apache Spark MLlib


Presentation Description

A introduction to Apache Spark MLlib, what is it and how does it work ? What can it do ?


Presentation Transcript

Apache Spark MLlib:

Apache Spark MLlib What is Apache Spark ? What is MLlib ? Functionality Dependencies Books Eco-system

Spark – What is it ?:

Spark – What is it ? Alternative to Map Reduce for certain applications A low latency cluster computing system For very large data sets May be 100 times faster than Map Reduce Used with Hadoop / HDFS Uses in memory cluster computing Memory access faster than disk access Has API's written in Scala / Java / Python

Spark MLlib – What is it ?:

Spark MLlib – What is it ? Spark Machine Learning Library Provided with Spark Install Code in Scala / Java / Python Contain libraries Spark.mllib ( V1.2 ) Provides common functionality classification, regression, clustering collaborative filtering, dimensionality reduction

Spark MLlib – Functionality:

Spark MLlib – Functionality Basic Stats Classification and regression Collaborative Filtering Clustering Dimensionality reduction Feature extraction and transformation Optimization

Spark MLlib – Dependencies:

Spark MLlib – Dependencies NumPy for Python Breeze ( linear algebra ) Netlib-java Jblas Gfortran runtime library

Available Books:

Available Books See our Hadoop book from Apress / Springer “Big Data Made Easy” Look out for our Apache Spark based book from Packt in 2015

Spark Eco system:

Spark Eco system

Contact Us:

Contact Us Feel free to contact us at We offer IT project consultancy We are happy to hear about your problems You can just pay for those hours that you need To solve your problems

authorStream Live Help