Limitations of Apache Spark

Name: Limitations of Apache Spark | Humix Video
Uploaded: 2024-11-28T05:36:34+00:00
Duration: 3 min 32 s
Description: Limitations of Apache Spark

0:00
In this video we are discussing limitations of Apache Spark
0:04
We know that Apache Spark has got certain advantages but some limitations are also there
0:10
So let us discuss them one by one. So here we have mentioned five different limitations
0:16
The first one is the problem with small files. We know that each in every file, each in every file will be represented as a small partition
0:25
into our Apache Spark. And in case of large file, it will be
0:29
be divided into multiple smaller partitions. And obviously, problem with the small files
0:35
should be there and these particular partitions will be known as the small partitions
0:40
Next, we're having the no file management system. In Apache Spark, that is no inbuilt
0:46
file management system for it. It depends on Hadoop. We're having the latency. Apache Spark
0:52
latency is higher compared to the Apache Flink. So that's why latency is one of the problems
0:57
We're having the manual optimization. So here, no automatic optimization is there
1:03
The manual optimization synchronizations are to be done. So that is one of the limitations of Apache Spark
1:09
We're having the expensive. We know that Apache Spark supports in-memory computation
1:14
That means the data will be available in the memory. During competition, when the intermediate results will be produced and they will be kept
1:21
in the memory at the same time. We know that memory is a very costly thing
1:26
that is a primary memory or RAM is a very costly thing. So, that's why putting huge data into the memory means we require huge amount of memory. So that is a very expensive affairs
1:38
So let us discuss all of them into some more details So there are some limitations of Apache Spark and some of them will be like this So problems with small files In the SparkRDD each file is a small partition and for a large
1:55
file there will be a large number of small partitions will be there. To perform tasks in
2:00
efficient way, we need to repartition them into manageable format. So this particular repartition
2:08
will be a time consuming one. Next one is no file management system. So Spark has got no file
2:15
management system in depends on some other platform like our Hadoop, etc. So spark itself is having
2:23
no file management system. So it depends on other file management systems we can consider here
2:28
as an Hadoop as one of the examples. Next one is the expensive. The spark will be very expensive
2:37
when we want to do cost-efficient processing of big data. And keeping data in the memory is very expensive, and we need lots of RAM to do such
2:48
work smoothly. Because in-memory computation means the huge data will be loaded onto the memory requires
2:54
also the huge primary memory or RAM. Manual optimization. So that is a very headacheful thing
3:01
So the Spark job is needed to be manually optimized to specific data sets
3:06
and to partition and cache in Spark is also to be correct
3:12
So, these are the manual optimization will be a very troublesome for the users
3:17
We are having the latency. That is the spark has higher latency compared to the Apache Flink
3:23
So, these are the different limitations of Apache Spark. We have discussed that one with some diagram and detailing
3:30
Thanks for watching this video

Limitations of Apache Spark

Tutorialspoint

Understanding the TAKE function in Excel | Simplify Data Extraction!

How to use comments and notes in Excel | Enhance Collaboration and Clarity!

Simple PDF to Excel Tutorial (Quick & Easy) | Convert Data in Seconds!

Upgrade Your TV with the Infomir MAG555 4K HDR Google TV Box

RAM Explained – How Much Memory Your PC Really Needs for Smooth Performance

Share a Canva Design as a Website

? How to Clear Your Cache in WordPress Step by Step Guide & Troubleshooting Advice

How to Embed Google Sheets in WordPress (Without Plugin)

Build a Canva Webcam Video & Screen Recorder Online Tool in Browser Using HTML CSS & JavaScript

Python 3 Web Scraping Script to Get All Videos of Youtube Channel Using Youtube Data API V3

How to Use Emotional Avoidance in Your Favor- Distress Tolerance in 2023

How to Speed Up Elementor Sites for Free

Up next in 10

Limitations of Apache Spark

Tutorialspoint