by Sumit Mund | Nov 08, 2015
What is an Enterprise Data Lake?
Way back in 2010, Pentaho co-founder and CTO, James Dixon coined the term 'Data Lake'. While these days, there exist many interpretations of the term, usually it means a repository that holds a vast amount of raw data in its native format until it is needed. Raw data at its most granular level is stored so that any ad-hoc analysis can be performed at any time.
by Sumit Mund | May 19, 2015
Apache Spark is a powerful open source in-memory cluster computing framework built around speed, ease of use, and sophisticated analytics. It runs everywhere - Hadoop (YARN), Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, S3 and more.