Mapreduce using Java

Mar 15, 2020 1 min read programming

Mapreduce using java

I haven’t coded in java in eons. The assignment (Mapreduce, Pig and Spark) I worked on over last 3 weeks is a good way to jolt me out from my comfort zone.

Java is something I need to brush up on before taking the Software Development Process module which requires me to write an android app. Argh!

Back to Mapreduce. It’s a useful framework if you’ve to summarise huge datasets (gigabytes, terabytes). Here are some of the common steps in mapreduce,

In Mapper method, each entity of data (e.g. row, word) is tokenized and assigned values
In Reduce method, data is aggregated (e.g sum, average, etc.)
In Main function, mapreduce jobs are managed through certain parameters

Note: Mapreduce data types are different from Java data types! It could be confusing at times. But if you get used to it, it will be easier.

public class MapReuceJobs {

public static class TokenizerMapper
extends Mapper<LongWritable, Text, Text, Text>
{
<Deliberately left blank>
...Each row of data is tokenize and assign a number
}

//Reducer
public static class IntSumReducer
extends Reducer<Text,Text,Text,Text> {

<Deliberately left blank>
...Aggregated data is collated here
}



//IntSumReducer  
public static void main(String[] args) throws Exception {

<Deliberately left blank>
...Jobs are managed here
}

programming

Mapreduce using Java

Mapreduce using java

Related