Apache Hadoop MapReduce
Hadoop MapReduce is a Yarn-based system for parallel processing of large data sets.
This integration installs and configures Telegraf and a custom Python script to send Hadoop MapReduce metrics into Wavefront. Telegraf is a light-weight server process capable of collecting, processing, aggregating, and sending metrics to a Wavefront proxy. The custom script uses the Hadoop HTTP REST API to gather metrics.
In addition to setting up the metrics flow, this integration also sets up a dashboard.
To see a list of the metrics for this integration, select the integration from https://github.com/influxdata/telegraf/tree/master/plugins/inputs.
Apache Hadoop MapReduce
Step 1. Install the Telegraf Agent
If you don’t have the Telegraf agent installed, follow the steps below. Otherwise, continue to step 2.
Log in to your Wavefront instance and follow the instructions in the Setup tab to install Telegraf and a Wavefront proxy in your environment. If a proxy is already running in your environment, you can select that proxy and the Telegraf install command connects with that proxy. Sign up for a free trial to check it out!
Step 2. Download the Script to Gather Hadoop MapReduce Metrics
- Download mapreduce.py onto your Telegraf agent server.
- Test the script execution using this command:
You should get a response similar to this:
usage: mapreduce.py [-h] [--username [USERNAME]] [--password [PASSWORD]] [server] mapreduce.py: error: server must be provided
If the script is not executing, adjust the file permission and the Python path.
Step 3. Configure Telegraf EXEC Input Plugin
First create a file called
/etc/telegraf/telegraf.d and enter the following snippet:
[[inputs.exec]] commands = [ "python <script location> <RM web UI address>:8088"] tag_keys = ["id","user","name","queue","applicationType","clusterId"] timeout = "5s" name_override = "hadoop.mapreduce" data_format = "json"
commands option, specify the location of the Python binary (if necessary), the location of the mapreduce.py script, and the address of the YARN ResourceManager web application (8088 is the default port).
commands = [ "/usr/bin/python /home/telegraf/hadoop/mapreduce.py http://yarn1.foo.com:8088"]
To monitor multiple Yarn servers, add entries within
commands = [ "/usr/bin/python /home/telegraf/hadoop/mapreduce.py http://yarn1.foo.com:8088", "/usr/bin/python /home/user/hadoop/mapreduce.py http://yarn2.foo.com:8088", "/usr/bin/python /root/hadoop/mapreduce.py http://yarn3.foo.com:8088" ]
Step 4. Restart Telegraf
sudo service telegraf restart to restart your Telegraf agent.