Mapreduce using java
I haven’t coded in java in eons. The assignment (Mapreduce, Pig and Spark) I worked on over last 3 weeks is a good way to jolt me out from my comfort zone.
Java is something I need to brush up on before taking the Software Development Process module which requires me to write an android app. Argh!
Back to Mapreduce. It’s a useful framework if you’ve to summarise huge datasets (gigabytes, terabytes). Here are some of the common steps in mapreduce,
- In Mapper method, each entity of data (e.g. row, word) is tokenized and assigned values
- In Reduce method, data is aggregated (e.g sum, average, etc.)
- In Main function, mapreduce jobs are managed through certain parameters
Note: Mapreduce data types are different from Java data types! It could be confusing at times. But if you get used to it, it will be easier.
public class MapReuceJobs {
public static class TokenizerMapper
extends Mapper<LongWritable, Text, Text, Text>
{
<Deliberately left blank>
...Each row of data is tokenize and assign a number
}
//Reducer
public static class IntSumReducer
extends Reducer<Text,Text,Text,Text> {
<Deliberately left blank>
...Aggregated data is collated here
}
//IntSumReducer
public static void main(String[] args) throws Exception {
<Deliberately left blank>
...Jobs are managed here
}