Hbase安装
1、centos下载并解压jdk
关键是要添加--no-cookie --header "Cookie: oraclelicense=accept-securebackup-cookie"
SRE实战 互联网时代守护先锋,助力企业售后服务体系运筹帷幄!一键直达领取阿里云限量特价优惠。wget --no-cookie --header "Cookie: oraclelicense=accept-securebackup-cookie" https://download.oracle.com/otn-pub/java/jdk/12.0.1+12/69cfe15208a647278a19ef0990eea691/jdk-12.0.1_linux-x64_bin.tar.gz
2、设置环境变量
(1)解压JDK
(2)vi /etc/profile
JAVA_HOME=/root/jdk/jdk-12.0.1 CLASSPATH=.:$JAVA_HOME/lib PATH=$PATH:$JAVA_HOME/bin export JAVA_HOME CLASSPATH PATH
(3)source /etc/profile
3、下载并解压zookeeper
wget https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.4.14/zookeeper-3.4.14.tar.gz
4、修改zk配置文件
zk配置文件路径:/root/zookeeper/zookeeper-3.4.14/conf/zoo.cfg,主要修改dataDir,添加dataLogDir
dataDir=/root/zookeeper/zkdata
dataLogDir=/root/zookeeper/zkdatalog
5、启动zk
cd /root/zookeeper/zookeeper-3.4.14/bin
./zkServer.sh start
6、下载并解压hadoop
wget http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.7.6/hadoop-2.7.6.tar.gz
为啥要用这个版本hadoop,可以参考https://hbase.apache.org/2.1/book.html#basic.prerequisites
7、修改hadoop配置文件
主要增加java_home配置,配置文件地址/root/hadoop/hadoop-2.7.6/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/root/jdk/jdk-12.0.1
8、hadoop mapreduce例子测试
(1)mkdir input
(2) cp hadoop-2.7.6/etc/hadoop/*.xml input
(3)执行
hadoop-2.7.6/bin/hadoop jar hadoop-2.7.6/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.6.jar grep input output 'dfs[a-z.]+'
9、hadoop伪分布式安装
hbase需要hdfs,我们使用伪分布式模式安装hadoop,启动hdfs。伪分布式模式在1个节点上运行HDFS的NameNode、DataNod和YARN的ResourceManger、NodeManager java进程。
伪分布式模式参考:https://www.cnblogs.com/ee900222/p/hadoop_1.html
3.2.1 修改设定文件
1 2 3 4 5 6 7 | # vi etc/hadoop/core-site.xml <configuration> <property> <name>fs.defaultFS< /name > <value>hdfs: //localhost :9000< /value > < /property > < /configuration > |
1 2 3 4 5 6 7 | # vi etc/hadoop/hdfs-site.xml <configuration> <property> <name>dfs.replication< /name > <value>1< /value > < /property > < /configuration > |
3.2.2 设定本机的无密码ssh登陆
1 2 | # ssh-keygen -t rsa # cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys |
3.2.3 执行Hadoop job
MapReduce v2 叫做YARN,下面分别操作一下这两种job
3.2.4 执行MapReduce job
3.2.4.1 格式化文件系统
1 | # hdfs namenode -format |
3.2.4.2 启动名称节点和数据节点后台进程
1 | # sbin/start-dfs.sh |
在localhost启动一个1个NameNode和1个DataNode,在0.0.0.0启动第二个NameNode
3.2.4.3 确认
1 | # jps |
3.2.4.4 访问NameNode的web页面
http://localhost:50070/
3.2.4.5 创建HDFS
1 2 | # hdfs dfs -mkdir /user # hdfs dfs -mkdir /user/test |
3.2.4.6 拷贝input文件到HDFS目录下
1 | # hdfs dfs -put etc/hadoop /user/test/input |
确认,查看
1 | # hadoop fs -ls /user/test/input |
3.2.4.7 执行Hadoop job
1 | # hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar grep /user/test/input output 'dfs[a-z.]+' |
3.2.4.8 确认执行结果
1 | # hdfs dfs -cat output/* |
或者从HDFS拷贝到本地查看
1 2 | # bin/hdfs dfs -get output output # cat output/* |
3.2.4.9 停止daemon
1 | # sbin/stop-dfs.sh |
3.2.5 执行YARN job
MapReduce V2框架叫YARN
3.2.5.1 修改设定文件
1 2 3 4 5 6 7 8 | # cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml # vi etc/hadoop/mapred-site.xml <configuration> <property> <name>mapreduce.framework.name< /name > <value>yarn< /value > < /property > < /configuration > |
1 2 3 4 5 6 7 | # vi etc/hadoop/yarn-site.xml <configuration> <property> <name>yarn.nodemanager.aux-services< /name > <value>mapreduce_shuffle< /value > < /property > < /configuration > |
3.2.5.2 启动ResourceManger和NodeManager后台进程
1 | # sbin/start-yarn.sh |
3.2.5.3 确认
1 | # jps |
3.2.5.4 访问ResourceManger的web页面
http://localhost:8088/
3.2.5.5 执行hadoop job
1 | # hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar grep /user/test/input output 'dfs[a-z.]+' |
3.2.5.6 确认执行结果
1 | # hdfs dfs -cat output/* |
执行结果和MapReduce job相同
3.2.5.7 停止daemon
1 | # sbin/stop-yarn.sh |
3.2.5.8 问题点
1. 单节点测试情况下,同样的input,时间上YARN比MapReduce好像慢很多,查看日志发现DataNode上GC发生频率较高,可能是测试用VM配置比较低有关。
2. 出现下面警告,是因为没有启动job history server
1 | java.io.IOException: java.net.ConnectException: Call From test166 /10 .86.255.166 to 0.0.0.0:10020 failed on connection exception: java.net.ConnectException: Connection refused; |
启动jobhistory daemon
1 | # sbin/mr-jobhistory-daemon.sh start historyserver |
确认
1 | # jps |
访问Job History Server的web页面
http://localhost:19888/
3. 出现下面警告,DataNode日志中有错误,重启服务后恢复
1 | java.io.IOException: java.io.IOException: Unknown Job job_1451384977088_0005 |
3.3 启动/停止
也可以用下面的启动/停止命令,等同于start/stop-dfs.sh + start/stop-yarn.sh
1 | # sbin/start-all.sh |
1 | # sbin/stop-all.sh |
3.4 日志
日志在Hadoop安装路径下的logs目录下
