Apache Spark - Big Data Platform for All

by Sumit Mund | May 19, 2015

Apache Spark is a powerful open source in-memory cluster computing framework built around speed, ease of use, and sophisticated analytics. It runs everywhere - Hadoop (YARN), Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, S3 and more. Spark powers a stack of high-level tools including Spark SQL, MLlib for machine learning, GraphX for graph processing, and Spark Streaming to build scalable fault-tolerant streaming applications.

Read More...

Partnership with Hortonworks!

by Sumit Mund | Nov 22, 2013

Today, Hadoop has been synonymous with big data as it has been the platform of choice for big data processing. Apache™ Hadoop® is an open source project governed by the Apache Software Foundation (ASF) that allows you to gain insight from massive amounts of structured and unstructured data quickly and without significant investment.

Read More...

@mucons / @sumitmund

mucons: RT @sumitmund #ApacheSpark now inbuilt on Microsoft Azure #Hadoop Cluster- HDInsight. Exciting! #BigDataAnalytics #DataScience http://t… 9Hrs ago · reply · retweet · favourite

sumitmund: #ApacheSpark now inbuilt on Microsoft Azure #Hadoop Cluster- HDInsight. Exciting! #BigDataAnalytics #DataScience http://t.co/i8Z7BVzOq6 9Hrs ago · reply · retweet · favourite

mucons: RT @sumitmund iPython Notebook available and well integrated with #Azure #MachineLearning http://t.co/aihbkamRAD #DataScience #Python #Azu… 6 days ago · reply · retweet · favourite

sumitmund: iPython Notebook available and well integrated with #Azure #MachineLearning http://t.co/aihbkamRAD #DataScience #Python #AzureML #Analytics 6 days ago · reply · retweet · favourite