Mapreduce using Java

Mapreduce using java

I haven’t coded in java in eons. The assignment (Mapreduce, Pig and Spark) I worked on over last 3 weeks is a good way to jolt me out from my comfort zone.

Java is something I need to brush up on before taking the Software Development Process module which requires me to write an android app. Argh!

Back to Mapreduce. It’s a useful framework if you’ve to summarise huge datasets (gigabytes, terabytes). Here are some of the common steps in mapreduce,

  • In Mapper method, each entity of data (e.g. row, word) is tokenized and assigned values
  • In Reduce method, data is aggregated (e.g sum, average, etc.)
  • In Main function, mapreduce jobs are managed through certain parameters

Note: Mapreduce data types are different from Java data types! It could be confusing at times. But if you get used to it, it will be easier.

public class MapReuceJobs {

public static class TokenizerMapper
extends Mapper<LongWritable, Text, Text, Text>
{
<Deliberately left blank>
...Each row of data is tokenize and assign a number
}

//Reducer
public static class IntSumReducer
extends Reducer<Text,Text,Text,Text> {

<Deliberately left blank>
...Aggregated data is collated here
}



//IntSumReducer  
public static void main(String[] args) throws Exception {

<Deliberately left blank>
...Jobs are managed here
}

Related

comments powered by Disqus