We talk with Jack Norris, VP of marketing at MapR, about the Big Data market in the UK and upcoming products
US Big Data experts MapR have announced they will open the first European office in London on 3 January. It will be followed by a smaller base in Munich, Germany later next year.
The company specialises in products based on the Apache Hadoop framework – which is becoming the de-facto standard in distributed computing applications. MapR’s user base is already huge in the UK, but until now, it lacked a permanent address on this side of Atlantic.
Another Hadoop-centric company, Cloudera, announced plans to set up a UK office earlier this year and analysts predict that Hortonworks will follow suit.
TechWeekEurope met with Jack Norris, VP of marketing at MapR, to discuss the presence of the company in Europe and the new M7, the “ultimate decision machine” for Big Data operations, due to be launched in the first quarter of 2013.
Elephants come to London
MapR was founded in 2009, by some of the people who developed BigTable analytics at Google. According to Norris, Google’s success is in a large part due to MapReduce – a programming model for processing large data sets, which was first released to the public in 2004.
Since then, the model’s open-source version – Hadoop – has taken the world by storm, and is almost synonymous with Big Data analytics.
“The architecture of Hadoop is limited,” told us Norris. “Our mission is to innovate and create a multi-purpose platform that could widen the types of applications we can use the framework for.”
In some architectures, pulling the data from the storage can take more time than actually analysing it. With Hadoop, the analysis takes place directly on the data.
The popularity of the framework has been going through the roof, since it relies on cheap local storage and not expensive, high-performance SAN or NAS systems.
“We really view London as the logical headquarters for the EMEA operation, there is a lot going on in the Big Data community here, and not just from a Hadoop standpoint, but the data scientist standpoint,” said Norris.
“Ted Dunning [chief application architect at MapR] has been in contact with Open Data Institute, and we are very excited to see what will come out of this initiative,” he added.
The wait for M7
Today, MapR partners include EMC, Cisco and Amazon, and its products are deployed at thousands of companies worldwide. They are aimed primarily at serious Hadoop users, those who have deployed the framework before but need more speed and reliability.
MapR currently offers two service packages – the free M3 and the more advanced M5, which requires an annual fee but comes with round-the-clock support. According to MapR, both are easy to use, satisfy strict data protection requirements and integrate into already existing infrastructure.
In the first quarter of 2013, the company plans to release an even more advanced product – M7, which is currently in beta testing. With M7, MapR have simplified administration for the HBase database, further increased performance, lowered latency and gotten rid of JVM (Java Virtual Machine)-related bottlenecks.
“It’s the antithesis of the data silos we see today. Sure, data warehousing works, if you know what you are looking for, and if you have a lot of time, and if you are trying to figure out what happened last quarter. With MapR and HBase, you can get a real-time response and the scalability is fantastic,” explains Norris.
“This is the first system that places unstructured files and tables in the same data layer, so it’s really unique. Even in a large cluster, the table limit is about 100 per cluster, while our environment supports a trillion tables. A regular blob size per cell is limited to kilobytes, while we can extend it to 1GB.
“The secret of MapR is we have really tried to understand where the bottlenecks are and addressed those bottlenecks on a uniform basis, to get as much out of hardware as possible.”
Google and pals
You would be forgiven for thinking this is just marketing spiel, but it was MapR that was selected by Google to demonstrate the capabilities of its Compute Engine and break the analytics speed record at Google I/O event in October.
Using the TeraSort benchmark MapR shattered the previous record for Hadoop applications set by Yahoo, using just 1/6 of the disks, a third of the cores and two-thirds of the servers.
“Being on stage with Google at the Google I/O, to unveil their IaaS offering, was really the apex of what we have done so far. It was their whitepaper that started this whole Hadoop process,” admits Norris.
MapR has already forged some important links with organisations in the UK, not the least among which is the Big Data Partnership (BDP). “One of the challenges facing enterprise is that the support for those solutions is not always easily available from the vendor and can be frustrating working with account managers that are so separated from the organisation as they are across the globe,” commented Mike Merritt-Holmes from BDP.
“By doing this, MapR will now be able to offer the UK and to a certain degree EMEA a much more personal offering which will certainly be an attraction to many enterprise organisations. It will also allow training and SI partners such as ourselves to really engage with MapR on client engagements and this will only strengthen the proposition.”
How much do you know about storage devices? Take our quiz!