MapReduce Framework And Programming Model
MapReduce is a Programming model for distributed computing.Mapper means software for doing the assigned task after organizing the data blocks imported using the key. A key specifies in a command line of Mapper. The command map the key to the data, which an application uses.
Reducer means software for reducing the mapped data by using the aggregation, query or user-specified function. The reducer provides a concise cohesive response for the application.
Aggregation function means the function that groups the value of multiple rows together to result a single value of more significant meaning or measurement. For example, function such as count, sum, maximum, minimum,deviation and standard deviation.
MapReduce allows writing applications to process reliably the huge amount of data, in parallel, on large clusters of server. The cluster size does not limit as such to process in parallel. The parallel program of MapReduce are useful for performing large scale data analysis using multiple machine in the cluster.
Features of MapReduce Framework are as follows -
- Provides automatic Parallelization and distribution of computati on based on several processors.
- Processes data stored on distributed clusters of DataNodes and racks.
- Allows processing large amount of data in parallel.
- Provides scalability for usages of large number of servers.
- Provides MapReduce batch — oriented programming model in Hadoop Version 1.
- Provides additional processing modes in Hadoop 2 YARN — Based system and enables required parallel processing. For example, for queries, graph databases, streaming data, messages, real-time OLAP and ad hoc analytics with Big Data 3V characteristics.
Hadoop MapReduce Framework
MapReduce provides two important function. The distribution of job based on client application task or user query to various nodes within a cluster is one function. The second function is organizing and reducing the result from each node into a cohesive response to the application or answer to the query.
The processing task are submitted to the Hadoop. The Hadoop framework in turns manage the task of issuing jobs, job completion, and copying data around the cluster between the DataNodes with the help of JobTracker.
MapReduce runs as per assigned Job by JobTracker, which keep track of job submitted for execution and runs TaskTracker for tracking the tasks. MapReduce programming enables job scheduling and task execution .
A client node submits a request of an application to the JobTracker .Following are the steps to request on MapReduce :
- Estimate the need of resource for processing that request.
- Analyze the states of the slave nodes.
- Place the mapping tasks in queue.
- Monitor the progress of task, and on the failure, restart the task on slots of time available.
The job execution is controlled by two types of process in MapReduce :
- The Mapper deploys map tasks on the slots. Map tasks assign to those nodes where the data for the application is stored. The Reducer output transfers to the client node after the data serialization using AVRO.
- The Hadoop system sends the Map and Reduce jobs to the appropriate servers in the cluster. The Hadoop framework in turns manage the task of issuing job, job completion and copying data around the cluster between the slave nodes. Finally , the cluster collect and reduce the data to obtain the result and send it back to the Hadoop server after completion of given tasks.
MapReduce Programming Model
MapReduce program can be written in any language including JAVA, C++ PIPES OR PYTHON. Map function of MapReduce program do mapping to compute the data and convert the data into other data sets. After the Mapper computations finish, the Reducer function collects the result of map and generates the final output result. MapReduce program can be applied to any type of data, i.e structured or unstructured stored in HDFS.
For more details , please visit https://www.digitalallen.gq/2021/03/mapreduce-framework-and-programming.html