An introduction to Databricks

Views:
 
     
 

Presentation Description

A introduction to Databricks, what is it and how does it work ? What can it do ?

Comments

Presentation Transcript

Databricks:

Databricks What is Databricks ? Cloud services used Functionality Languages Spark Usage 3 rd Party Apps Architecture Books www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Databricks – What is it ?:

Databricks – What is it ? A Cloud based Apache Spark cluster service Offers scalable Spark clusters based on AWS Developed by the same people who created Spark Multiple cluster management Job scheduling and library import Offers access to all Spark modules www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Databricks – Cloud Services:

Databricks – Cloud Services Currently uses Amazon AWS Uses EC2 and has access to S3 buckets Uses a minimum of 2 EC2 instances Attempts to optimise EC2 usage Plans to extend to other cloud providers www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Databricks – Functionality:

Databricks – Functionality Architecture based on Notebooks and folders Has a cluster manager for Defined (min 54gb) clusters Spot clusters On Demand clusters Has a job manager and scheduler Has user management Has full Spark functionality Has strong data visualisation capability Can export reports and dashboards www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Databricks – Languages:

Databricks – Languages Can have Notebooks in Scala Python SQL SQL can be executed in non SQL Notebooks Markdown comments can be placed in Notebooks Notebooks can be shared by multiple sessions Libraries can be imported and called in Notebooks www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Databricks – Spark Usage:

Databricks – Spark Usage Lastest Spark version available i.e. DB 1.3.4 uses Spark 1.3.1 at June 2015 All Spark modules available SQL, GraphX, MlLib, Streaming Strong integration between modules and visualisation Extensive use of tables to import data Tables available via SQL www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Databricks – 3rd Party Apps:

Databricks – 3 rd Party Apps Current available and more to come Pentaho Qlik Tableau TIBC Jaspersoft PanTera ZoomData www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Databricks – Architecture:

Databricks – Architecture www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Available Books:

Available Books See our Hadoop book from Apress / Springer “Big Data Made Easy” Look out for our Apache Spark based book from Packt in 2015 www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Contact Us:

Contact Us Feel free to contact us at www.semtech-solutions.co.nz info@semtech-solutions.co.nz We offer IT project consultancy We are happy to hear about your problems You can just pay for those hours that you need To solve your problems

authorStream Live Help