Overview
Overall experience with Apache Spark
About Company
Company Details
Do You Manage Peer Insights at Apache Software Foundation?
Access Vendor Portal to update and manage your profile.
Key Insights
A Snapshot of What Matters - Based on Validated User Reviews
Reviewer Insights for: Apache Spark
Deciding Factors: Apache Spark Vs. Market Average
Performance of Apache Spark Across Market Features
Apache Spark Likes & Dislikes
deployment , less support needed , easy to use, open source
The use of module via easy set of programs. a programer need not write large codes to operate and use of sql like commands is also supported.
Some of the features of Apache Spark are- 1. It is easily compatible with SQL makes it accessible to users having very less or no programming knowledge. It works with various formats like JSON, Parquet etc. 2. Its in-memory database allows this software to process large volume data. Its processing speed may reach to Petabytes sometimes. 3. It allows users to track real time data and do react to any specific changes done instantly. It is widely accepted by financial industry to operate real time trading and identify gaps instantly.
No native file storage, python program is running slow
Nothing as of now
1. Foremost drawback of this software is its high memory consumption. It heavily consumes RAM to provide high speed data processing. But this could lead to major memory consumption and needs additional hardware investment. 2. Apache Spark's processing speed becomes slow when it works on multiple small files. This makes it vulnerable for small scale industry with small and multiple datasets. 3. Fetching data from different sources might affect on data accuracy and data quality which may result to inaccurate analysis result.
Top Apache Spark Alternatives
Peer Discussions
Apache Spark Reviews and Ratings
- Director of Finance<50M USDBankingReview Source
Efficient for Large Datasets But Faces Issues with Python Performance and Storage
its excellent for big data analysis and easy to use and manage - IT Associate10B+ USDBankingReview Source
Efficient Python Module Handles Huge Data with Ease
I used this via a python module. It's great for handling large amounts of data with efficiency. The module also provides easy coding options and optimization. - DATA ANALYTICS MANAGER50M-1B USDFinance (non-banking)Review Source
Processing Large Datasets using Apache Spark
Apache spark is a unified engine software made for large scale data analytics powered by Apache Software Foundation. Its flexible option allows this software to work on multiple language and execute Data Analytics and Machine Learning tasks. - DATA AND ANALYTICS MANAGER50M-1B USDRetailReview Source
it is flexible, scalable, fault tolerant and difficult to use at early stage
The main reason for my rating is very very convenient in handling big data and big scale data, and the functionality it offers which is flexibility, scalability using multiple data source is commendable and recommendable, one of the projects which i was using apache spark is for dynamic pricing algorithm with multiple statistical approaches with big data handled very easily there. - Manager of IT Services1B-10B USDConsumer GoodsReview Source
Review on Apache Spark
As we are handling more than 100 gb data on daily basis, I found spark framework as best solution to accomplish the business need. As spark is know for its speed and performance. It also supports both dataframe API and SQL queries in mutiple languages like scala, python, R and java.


