Data Eng Weekly

Hadoop Weekly Issue #46

01 December 2013

In the U.S. we had a short week due to Thanksgiving, and thus this edition of Hadoop Weekly is one of the smallest ever. The big news this week is that Cloudera and WibiData announced new releases of their enterprise software. And to cure your Thanksgiving food-coma and get back into the swing of Hadoop, there are a number of meetups this week across the world.


Cloudera Impala, the low-latency SQL-on-Hadoop framework, provides an API to allow access via ODBC. Many applications support ODBC, including the R statistics framework. In this tutorial, you'll learn about configuring Impala for ODBC access, loading data into Impala, and pulling data from Impala to R over ODBC.

A lot of the articles in this newsletter are told from the perspective of the Hadoop community, but there are more and more articles written by software companies consuming data in Hadoop. This article from Timo Elliot of SAP recaps two talks from Hadoop Summit Europe that took place earlier this year. Those talks, by two large enterprises, help to motivate the need for bridging Hadoop and existing systems like SAP. Even if you don't read the whole article, the summary of HSBC's talk on adopting Hadoop is a great read.

Software analyst Tony Baer has written a piece about the hadoop ecosystem, which is one of the most all-encompassing posts that I've seen. He covers many of the companies in the ecosystem -- from established companies like SAS and Microstrategy to new-comers Datameer and Platfora. Next, he talks about how Hadoop is integrating with existing software -- there seem to be two strategies, write a Hadoop API or use one of the SQL-on-Hadoop solutions with your existing software. He also covers the role that Hadoop will play in the data center -- will it be the enterprise data hub as Cloudera has suggested or play a complementary role to low-latency solutions?


WibiData announced WibiEnterprise 3.0. WibiEnterprise is the commercially licensed/supported SDK based upon the Kiji project. Kiji is an open-source framework/datastore for building "entity-centric" applications. It is built atop of Apache HBase and uses Apache Avro for serialization.

Cloudera released CDH 4.5, Cloudera Manager 4.8, Cloudera Impala 1.2.1 and Cloudera Search 1.1. CDH 4.5 includes fixes and/or improvements to Flume, Hive (SSL encryption for HiveServer2 and JDBC), HUE (SAML authentication), MapReduce (job token tracking), Oozie (Parquet integration), and Sentry. Of note, CDH 4.5 includes integration of Parquet 1.x throughout the Hadoop stack. Cloudera Impala 1.2.1 adds support for UDFs and UDAFs, automatic metdata refresh, and additional SQL support. If you're upgrading Impala, it shares a mutual dependency with Cloudera Manager, and they must be upgraded together.!msg/impala-user/T8yr4yXYn6E/XDZ-yqBxUkcJ


Curated by Mortar Data (

Monday, December 2

Big Data Meetup at eBay Lab (Tel Aviv-Yafo, Israel)

Tuesday, December 3

Big Data World Conference: Big Data Science in the Cloud (Munchen, Germany)

Our first Midwest Cloudera User Group! (Chicago, IL)

HBase London - December meetup @Qubit (London, England)

Hortonworks presentation on Hadoop 2.0 and YARN (Columbus, OH)

Wednesday, December 4

Agile Data Science (San Francisco, CA)

Spire: Real-Time SQL on Hadoop + Whitepages: Scaling DevOps (Seattle, WA)

Thursday, December 5

BigData Options on AWS (Herndon, VA)

Our first NY Cloudera User Group Meetup! (New York, NY)

Real-time Analytics using Cassandra, Spark and Shark at Ooyala (Cupertino, CA)

YARN:Overview, Tez:Faster query processing on Hadoop-Siddharth Seth/Hortonworks (Los Angeles, CA)

Inaugural Meeting -- Hands on Data Science & Unconference (Palo Alto, CA)

Friday, December 6

Big Analytics 2013 event in Manhattan (New York, NY)

Saturday, December 7

Bangalore Baby December Hadoop Meetup (Bangalore, India)