Hadoop的简单测试及使用

1,302次阅读

共计 3942 个字符，预计需要花费 10 分钟才能阅读完成。

接上文Hadoop完全分布式环境搭建，本文介绍关于 Hadoop 的简单测试及使用

创建文件夹

在HDFS上创建一个文件夹/test/input

[hadoop@master ~]$ hadoop fs -mkdir -p /test/input

查看创建的文件夹

[hadoop@master ~]$ hadoop fs -ls /
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2018-12-12 17:58 /test

向HDFS上传文件

创建一个文本文件words.txt

[hadoop@master ~]$ vim words.txt
[hadoop@master ~]$ hadoop fs -put words.txt /test/input

[hadoop@master ~]$ hadoop fs -ls /test/input
Found 1 items
-rw-r--r--   2 hadoop supergroup         35 2018-12-12 18:00 /test/input/words.txt

从HDFS下载文件

将刚刚上传的文件下载到~/data文件夹中

[hadoop@master ~]$ hadoop fs -get /test/input/words.txt ~/data
[hadoop@master ~]$ ls data/
hadoopdata  words.txt

用自带的demo–wordcount来测试hadoop集群能不能正常跑任务：

执行wordcount程序，并将结果放入/test/output/文件夹：

[hadoop@master ~]$ hadoop jar ~/apps/hadoop-2.9.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar wordcount /test/input /test/output

[hadoop@master ~]$ hadoop jar ~/apps/hadoop-2.9.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar wordcount /test/input /test/output
18/12/12 18:02:54 INFO client.RMProxy: Connecting to ResourceManager at slave3/172.20.2.110:8032
18/12/12 18:02:55 INFO input.FileInputFormat: Total input files to process : 1
18/12/12 18:02:56 INFO mapreduce.JobSubmitter: number of splits:1
18/12/12 18:02:56 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
18/12/12 18:02:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1544607847316_0001
18/12/12 18:02:57 INFO impl.YarnClientImpl: Submitted application application_1544607847316_0001
18/12/12 18:02:58 INFO mapreduce.Job: The url to track the job: http://slave3:8088/proxy/application_1544607847316_0001/
18/12/12 18:02:58 INFO mapreduce.Job: Running job: job_1544607847316_0001
18/12/12 18:03:09 INFO mapreduce.Job: Job job_1544607847316_0001 running in uber mode : false
18/12/12 18:03:09 INFO mapreduce.Job:  map 0% reduce 0%
18/12/12 18:03:17 INFO mapreduce.Job:  map 100% reduce 0%
18/12/12 18:03:24 INFO mapreduce.Job:  map 100% reduce 100%
18/12/12 18:03:25 INFO mapreduce.Job: Job job_1544607847316_0001 completed successfully
18/12/12 18:03:25 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=53
		FILE: Number of bytes written=395007
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=139
		HDFS: Number of bytes written=31
		HDFS: Number of read operations=6
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=5738
		Total time spent by all reduces in occupied slots (ms)=4348
		Total time spent by all map tasks (ms)=5738
		Total time spent by all reduce tasks (ms)=4348
		Total vcore-milliseconds taken by all map tasks=5738
		Total vcore-milliseconds taken by all reduce tasks=4348
		Total megabyte-milliseconds taken by all map tasks=5875712
		Total megabyte-milliseconds taken by all reduce tasks=4452352
	Map-Reduce Framework
		Map input records=3
		Map output records=6
		Map output bytes=59
		Map output materialized bytes=53
		Input split bytes=104
		Combine input records=6
		Combine output records=4
		Reduce input groups=4
		Reduce shuffle bytes=53
		Reduce input records=4
		Reduce output records=4
		Spilled Records=8
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=217
		CPU time spent (ms)=1580
		Physical memory (bytes) snapshot=498122752
		Virtual memory (bytes) snapshot=4297453568
		Total committed heap usage (bytes)=292028416
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=35
	File Output Format Counters 
		Bytes Written=31

查看执行结果：

[hadoop@master ~]$ hadoop fs -ls /test/output
Found 2 items
-rw-r--r--   2 hadoop supergroup          0 2018-12-12 18:03 /test/output/_SUCCESS
-rw-r--r--   2 hadoop supergroup         31 2018-12-12 18:03 /test/output/part-r-00000

在output/part-r-00000可以看到程序执行结果：

[hadoop@master ~]$ hadoop fs -cat /test/output/part-r-00000
Ouer	1
hadoop	1
hello	3
root	1
[hadoop@master ~]$

用自带的demo–pi来测试hadoop集群能不能正常跑任务：

执行pi程序：

[hadoop@master ~]$ hadoop jar ~/apps/hadoop-2.9.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar pi 10 10

用来求圆周率，pi是类名，第一个10表示Map次数，第二个10表示随机生成点的次数(与计算原理有关)

正文完

Hadoop

发表至： Linux

2018-12-12

转载说明：除特殊说明外本站文章皆由CC-4.0协议发布，转载请注明出处。

文章搜索

vim复制粘贴删除及块操作

Ansible模块之ping、command、shell、copy

Nginx不间断服务状态下平滑升级版本

Oracle查看表空间大小和使用率

Hadoop完全分布式环境搭建

Hadoop的简单测试及使用

1、HDFS的简单使用测试

2、运行第一个Map Reduce的例子程序：wordcount

3、运行例子程序：求圆周率