Apache Spark (https://spark.apache.org/) is one of the most popular analytics engines used nowadays. It has been around since 2009 and one of the reasons it became so popular is its speed compared to traditional MapReduce and specifically Hadoop, which it was inspired by.
Continue reading “Spark Caching – RDD vs Dataset”