Apache Spark - Big Data Platform for All

by Sumit Mund | May 19, 2015

Apache Spark is a powerful open source in-memory cluster computing framework built around speed, ease of use, and sophisticated analytics. It runs everywhere - Hadoop (YARN), Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, S3 and more. Spark powers a stack of high-level tools including Spark SQL, MLlib for machine learning, GraphX for graph processing, and Spark Streaming to build scalable fault-tolerant streaming applications.

Read More...

Partnership with Hortonworks!

by Sumit Mund | Nov 22, 2013

Today, Hadoop has been synonymous with big data as it has been the platform of choice for big data processing. Apache™ Hadoop® is an open source project governed by the Apache Software Foundation (ASF) that allows you to gain insight from massive amounts of structured and unstructured data quickly and without significant investment.

Read More...

@mucons / @sumitmund

sumitmund: If u want to set up a #DataScience env Just found a nice & quick post to Install #ApacheSpark on Windows7 http://t.co/ztEkTemOLG #analytics 7 days ago · reply · retweet · favourite

sumitmund: 5 Techniques To Understand #MachineLearning Algorithms Without the Background in Math http://t.co/dvPSnX6Imp via @josephsirosh #Analytics 9 days ago · reply · retweet · favourite

mucons: RT @sumitmund Gartner's suggestion - How to Get Started With Prescriptive Analytics https://t.co/j1F1n1AjW5 #predictiveanalytics #bigdata… 29 days ago · reply · retweet · favourite

mucons: RT @sumitmund #ApacheSpark now inbuilt on Microsoft Azure #Hadoop Cluster- HDInsight. Exciting! #BigDataAnalytics #DataScience http://t… 34 days ago · reply · retweet · favourite