Why Our World Would End If Apache Spark Disappeared

Views:
 
Category: Education
     
 

Presentation Description

Apache Spark and Hadoop, both are the Big Data frameworks, that offers different tools to performs Big Data related tasks, but not accurately the same tasks. Originally developed in UC Berkeley’s AMPLab, and later distributed as an open-source Project, Apache Spark is a powerful processing engine for Big Data. It is a framework for performing data analytics, which provides faster and more general data processing platform.

Comments

Presentation Transcript

slide 1:

Kovid Academy Catalyst for Digital Evolution Visit Us www.kovidacademy.com

slide 2:

Visit Us www.kovidacademy.com Why Our World Would End If Apache Spark Disappeared

slide 3:

Apache Spark In the current Data Analytics market there is a lot of buzz going around Apache Spark. Most of the business experts are labelling Spark on topofHadoop. If you are in to the Big Data Analytics business or ambitious of entering the market in the coming days then you should probably know – to what extent does Spark rules over Hadoop This article endeavours to help you in locating answers to some of your latent questions. Before shedding key focus on Spark vs Hadoop issues let us initially discuss what Spark and Hadoop are.

slide 4:

Apache Spark and Hadoop both are the Big Data frameworks that offers different tools to performs Big Data related tasks but not accurately the same tasks. Originally developed in UC Berk ele y’ s AMPLab and later distributed as an open-source Project Apache Spark is a powerful processing engine for Big Data. It is a framework for performing data analytics which provides faster and more general data processing platform. ApacheHadoop – On the other hand Hadoop is a distributed data infrastructure which distributes huge data collections across several nodes within the cluster of commodity servers.

slide 5:

It further keeps a record of that data enabling big data processing and analytics more effective. Hadoop is largely considered as the general-purpose framework that supports multiple models. Hadoop for many years was traditionally used to run the Map/Reduce jobs which usually are the long running jobs. To accelerate the process Spark has been designed to run on top of Hadoop cluster for real-time stream data processing and fast interactive queries that can be completed in a fraction of seconds. Today most of the projects undertake distributed storage i.e. instead of storing the data in a single location it has become feasible for the businesses to store data on multiple storage devices like disks.

slide 6:

For processing such distributed data spread across multiple devices Hadoop with its Distributed File System HDFS feature is defining the most scalable means available in the open-source community. Spark does not contain its own system for data processing and requires some third-party provider. This is the core reason for most of the Big Data projects for installing Spark on top of the Hadoop. It means Hadoop extends its core support to both the traditional Map/Reduce and Apache Spark and it will be precise to consider Spark as an enhancement to Hadoop MapReduce rather than as the replacement to Hadoop. Let us shed some light on the key features of Apache Spark which are highlighting it in the world of ‘B ig Data’.

slide 7:

1. Speed Spark uses the concept of RDD Resilient Distributed Dataset which enables it to store data on memory and thereby reducing the number of read/writes to disc data will be persisted on the disc only when it is largely required. This makes the applications in Hadoop clusters to run up to 100x faster in memory and 10x faster when running on disk. 2. Ease of Use Spark enables to write applications in Scala Python or Java etc. which makes it highly transparent for the developers to develop and run the applications in any of their known programming languages. Spark also contains a set of more than 80 in-built operators which can be used to query the data within the shell.

slide 8:

3. Effective Integration Spark runs independent and can also run on Hadoop Mesos Cassandra Standalone or on Cloud. Its efficiency to access read data from any of the Hadoop data sources like HDFS HBase etc. makes it highly suitable for migrating the existing Hadoop applications. 4. Real-time Stream Processing MapReduce primarily handles and processes the stored data however Spark handles the real-time streaming data i.e. it does not ignore the other existing frameworks that can be integrated to handle streaming in Hadoop.

slide 9:

5. Expanding Community A wide set of developers from more than 50 companies have built Apache Spark. The project was initiated in the year 2009 and today more than 250 developers have already made their valuable contributions to this booming project. Conclusion: Apache Spark is the new and shiny player in the field of Big Data whereas Hadoop is a much more experienced player. It is the concepts like speed performance and ease of use that gives Spark an edge over the Hadoop. Though Spark stands as a big winner it is with the concepts of Hadoop Distributed File System that lets you to win the game by using the full Big Data package.

slide 10:

Contact Us: supportkovidacademy.com US: 609-436-9548 IND: +91 9700022933. Website: https://kovidacademy.com FB: https://www.facebook.com/kovidacademy/ Twitter: https://twitter.com/KovidAcademy LinkedIn: https://www.linkedin.com/company/kovid-academy YouTube: https://www.youtube.com/channel/UCbmkCnMoOUDsrS7O4bVpLjA To gain real-time expertise on the various industry based concepts of Apache Hadoop and Apache Spark join Kovid Academy and kick-start your career with splendid career prospects. Thank you Kovid Academy.

authorStream Live Help