hadoop学习过程中,我们会遇到各种各样的问题,常见的有hadoop无法启动,集群不能正常工作,不停跳出报错信息等等,这里总结了常见的几个问题及除了方法,希望对大家有用。
hadoop无法正常启动(1)
执行 $ bin/hadoop start-all.sh之后,无法启动.
异常一
Exception in thread "main" java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.defaultFS): file:/// has no authority.
localhost: at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:214)
localhost: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:135)
localhost: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.<init>(SecondaryNameNode.java:119)
localhost: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:481)
解决方法:此时是没有配置conf/mapred-site.xml的缘故. 在0.21.0版本上是配置mapred-site.xml,在之前的版本是配置core-site.xml,0.20.2版本中配置mapred-site.xml无效,只能配置core-site.xml文件
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://localhost:9001</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
hadoop无法正常启动(2)
异常二、
starting namenode, logging to /home/xixitie/hadoop/bin/../logs/hadoop-root-namenode-aist.out
localhost: starting datanode, logging to /home/xixitie/hadoop/bin/../logs/hadoop-root-datanode-aist.out
localhost: starting secondarynamenode, logging to /home/xixitie/hadoop/bin/../logs/hadoop-root-secondarynamenode-aist.out
localhost: Exception in thread "main" java.lang.NullPointerException
localhost: at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:134)
localhost: at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:156)
localhost: at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:160)
localhost: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:131)
localhost: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.<init>(SecondaryNameNode.java:115)
localhost: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:469)
starting jobtracker, logging to /home/xixitie/hadoop/bin/../logs/hadoop-root-jobtracker-aist.out
localhost: starting tasktracker, logging to /home/xixitie/hadoop/bin/../logs/hadoop-root-tasktracker-aist.out
解决方法:此时是没有配置conf/mapred-site.xml的缘故. 在0.21.0版本上是配置mapred-site.xml,在之前的版本是配置core-site.xml , 0.20.2版本中配置mapred-site.xml无效,只能配置core-site.xml文件
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://localhost:9001</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
hadoop无法正常启动(3)
异常三、
starting namenode, logging to /home/xixitie/hadoop/bin/../logs/hadoop-root-namenode-aist.out
localhost: starting datanode, logging to /home/xixitie/hadoop/bin/../logs/hadoop-root-datanode-aist.out
localhost: Error: JAVA_HOME is not set.
localhost: starting secondarynamenode, logging to /home/xixitie/hadoop/bin/../logs/hadoop-root-secondarynamenode-aist.out
localhost: Error: JAVA_HOME is not set.
starting jobtracker, logging to /home/xixitie/hadoop/bin/../logs/hadoop-root-jobtracker-aist.out
localhost: starting tasktracker, logging to /home/xixitie/hadoop/bin/../logs/hadoop-root-tasktracker-aist.out
localhost: Error: JAVA_HOME is not set.
解决方法:
请在$hadoop/conf/hadoop-env.sh文件中配置JDK的环境变量
JAVA_HOME=/home/xixitie/jdk
CLASSPATH=$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export JAVA_HOME CLASSPATH
hadoop无法正常启动(4)
异常四:mapred-site.xml配置中使用hdfs://localhost:9001,而不使用localhost:9001的配置
异常信息如下:
11/04/20 23:33:25 INFO security.Groups: Group mapping impl=org.apache.hadoop.sec urity.ShellBasedUnixGroupsMapping; cacheTimeout=300000
11/04/20 23:33:25 WARN fs.FileSystem: "localhost:9000" is a deprecated filesyste m name. Use "hdfs://localhost:9000/" instead.
11/04/20 23:33:25 WARN conf.Configuration: mapred.task.id is deprecated. Instead , use mapreduce.task.attempt.id
11/04/20 23:33:25 WARN fs.FileSystem: "localhost:9000" is a deprecated filesyste m name. Use "hdfs://localhost:9000/" instead.
11/04/20 23:33:25 WARN fs.FileSystem: "localhost:9000" is a deprecated filesyste m name. Use "hdfs://localhost:9000/" instead.
解决方法:
mapred-site.xml配置中使用hdfs://localhost:9000,而不使用localhost:9000的配置
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://localhost:9001</value>
</property>
hadoop无法正常启动(5)
异常五、no namenode to stop 问题的解决:
异常信息如下:
11/04/20 21:48:50 INFO ipc.Client: Retrying connect to server: localhost/127.0.0 .1:9000. Already tried 0 time(s).
11/04/20 21:48:51 INFO ipc.Client: Retrying connect to server: localhost/127.0.0 .1:9000. Already tried 1 time(s).
11/04/20 21:48:52 INFO ipc.Client: Retrying connect to server: localhost/127.0.0 .1:9000. Already tried 2 time(s).
11/04/20 21:48:53 INFO ipc.Client: Retrying connect to server: localhost/127.0.0 .1:9000. Already tried 3 time(s).
11/04/20 21:48:54 INFO ipc.Client: Retrying connect to server: localhost/127.0.0 .1:9000. Already tried 4 time(s).
11/04/20 21:48:55 INFO ipc.Client: Retrying connect to server: localhost/127.0.0 .1:9000. Already tried 5 time(s).
11/04/20 21:48:56 INFO ipc.Client: Retrying connect to server: localhost/127.0.0 .1:9000. Already tried 6 time(s).
11/04/20 21:48:57 INFO ipc.Client: Retrying connect to server: localhost/127.0.0 .1:9000. Already tried 7 time(s).
11/04/20 21:48:58 INFO ipc.Client: Retrying connect to server: localhost/127.0.0 .1:9000. Already tried 8 time(s).
解决方法:
这个问题是由namenode没有启动起来引起的,为什么no namenode to stop,可能之前的一些数据对namenode有影响,
你需要执行:
$ bin/hadoop namenode -format
然后
$bin/hadoop start-all.sh
hadoop无法正常启动(6)
异常五、no datanode to stop 问题的解决:
有时数据结构出现问题会产生无法启动datanode的问题。
然后用 hadoop namenode -format 重新格式化后仍然无效,/tmp中的文件并没有清楚。
其实还需要清除/tmp/hadoop*里的文件。
执行步骤:
一、先删除hadoop:///tmp
hadoop fs -rmr /tmp
二、停止 hadoop
stop-all.sh
三、删除/tmp/hadoop*
rm -rf /tmp/hadoop*
四、格式化hadoop
hadoop namenode -format
五、启动hadoop
start-all.sh
之后即可解决这个datanode没法启动的问题
++++++++++++++++++++++++++++++++++++++++
由于机器服务器维护需要,要求hadoop集群的一台服务器停止服务,于是我就到那台服务器去停止hadoop的datanode和tasktracker,运行以下命令:
bin/hadoop-daemon.sh stop datanode
竟然输出:
no datanode to stop
但是查看进程,却发现datanode和tasktracker都还在运行,尝试了好几次都是同样结果,最后我试图使用namenode的命令停止:
bin/stop_dfs.sh
还是输出:
no datanode to stop
不得已,只好使用暴力手段,直接kill -9 进程了。
在杀死hadoop进程之后,bin/hadoop-daemon.sh又可以正常使用了。不知道其他的hadoop使用者是否遇到过此问题??
但是问题不能就这么算了,在网上查了下资料,没找到满意的结果。没办法,自己看代码吧!
在看了hadoop-daemon.sh代码后,我发现脚本是通过pid文件来停止hadoop服务的,而我的集群配置是使用的默认配置,pid文件位于/tmp目录下,于是我对比了/tmp目录下hadoop pid文件中的进程id和ps ax查出来的进程id,发现两个进程id不一致,终于找到了问题的根源。
呵呵,赶紧去更新hadoop的配置吧!
修改hadoop-env.sh中的:HADOOP_PID_DIR = hadoop安装路径
然后根据集群hadoop进程的pid在hadoop安装路径下建立相应的pid文件:
hadoop-hadoop运行用户名-datanode.pid
hadoop-hadoop运行用户名-tasktracker.pid
hadoop-hadoop运行用户名-namenode.pid
hadoop-hadoop运行用户名-jobtracker.pid
hadoop-hadoop运行用户名-secondarynamenode.pid