My answer is based on knowledge and experience of Hadoop MR and learning of Mongo DB MR.
Lets see what are major differences and then try to define criteria for selection:
Differences are:
- Hadoop's MR can be written in Java, while MongoDB's is in JavaScript.
- Hadoop's MR is capable of utilizing all cores, while MongoDB's is single threaded.
- Hadoop MR will not be collocated with the data, while Mongo DB's will be collocated.
- Hadoop MR has millions of engine/hours and can cope with many corner cases with massive size of output, data skews, etc
- There are higher level frameworks like Pig, Hive, Cascading built on top of the Hadoop MR engine.
- Hadoop MR is mainstream and a lot of community support is available.
From the above I can suggest the following criteria for selection:
Select Mongo DB MR if you need simple group by and filtering, do not expect heavy shuffling between map and reduce. In other words - something simple.
Select hadoop MR if you're going to do complicated, computationally intense MR jobs (for example some regressions calculations). Having a lot or unpredictable size of data between map and reduce also suggests Hadoop MR.
Java is a stronger language with more libraries, especially statistical. That should be taken into account.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…