Apache Spark - Big Data Platform for All

by Sumit Mund | May 19, 2015

Apache Spark is a powerful open source in-memory cluster computing framework built around speed, ease of use, and sophisticated analytics. It runs everywhere - Hadoop (YARN), Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, S3 and more. Spark powers a stack of high-level tools including Spark SQL, MLlib for machine learning, GraphX for graph processing, and Spark Streaming to build scalable fault-tolerant streaming applications.


Partnership with Hortonworks!

by Sumit Mund | Nov 22, 2013

Today, Hadoop has been synonymous with big data as it has been the platform of choice for big data processing. Apache™ Hadoop® is an open source project governed by the Apache Software Foundation (ASF) that allows you to gain insight from massive amounts of structured and unstructured data quickly and without significant investment.


@mucons / @sumitmund

sumitmund: If u want to set up a #DataScience env Just found a nice & quick post to Install #ApacheSpark on Windows7 http://t.co/ztEkTemOLG #analytics 7 days ago · reply · retweet · favourite

sumitmund: 5 Techniques To Understand #MachineLearning Algorithms Without the Background in Math http://t.co/dvPSnX6Imp via @josephsirosh #Analytics 9 days ago · reply · retweet · favourite

mucons: RT @sumitmund Gartner's suggestion - How to Get Started With Prescriptive Analytics https://t.co/j1F1n1AjW5 #predictiveanalytics #bigdata… 29 days ago · reply · retweet · favourite

mucons: RT @sumitmund #ApacheSpark now inbuilt on Microsoft Azure #Hadoop Cluster- HDInsight. Exciting! #BigDataAnalytics #DataScience http://t… 34 days ago · reply · retweet · favourite