Ubuntu20.04.1LTS安装Hadoop3.3.0和hive3.1.2⽂章⽬录
Ubuntu 20.04.1 LTS安装Hadoop3.3.0和hive3.1.2
安装过程参考⽹上各种教程, 现在汇总下安装步骤内容
先上本机运⾏情况
查看电脑环境
:~$ uname -srmo
Linux 5.4.0-48-generic x86_64 GNU/Linux
查看java环境
:~$ java -version
openjdk version "1.8.0_265"
OpenJDK Runtime Environment (build 1.8.0_265-8u265-b01-0ubuntu2~20.04-b01)
OpenJDK 64-Bit Server VM (build 25.265-b01, mixed mode)
启动Hadoop
:~$ start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as *** in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [******]
Starting resourcemanager
Starting nodemanagers
启动mysql
:~$ service mysql start
启动hive
:~$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in[jar:file:/opt/apache-hive-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in[jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See /codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type[org.apache.logging.slf4j.Log4jLoggerFactory]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in[jar:file:/opt/apache-hive-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in[jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See /codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type[org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = cdacc36f-3cf7-40de-833b-503f04b0db09
Logging initialized using configuration in file:/opt/apache-hive-bin/conf/hive-log4j2.properties Async: true
Hive Session ID = b56e7d35-a955-456a-8d21-6712e92891ad
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hi ve 1.X releases.
hive>
执⾏hive查询
hive> use hive;
OK
Time taken: 0.997 seconds
hive>select * from env;
OK
1 hadoop hadoop-3.3. 3.3.0 /opt/hadoop
2 hive apache-hive-3.1. 3.1.2 /opt/apache-hive-bin
3 mysql mysql-server 8.0.21 /usr/bin/mysql
Time taken: 1.983 seconds, Fetched: 3 row(s)
hive>
查看那hdfs系统hive表路径⽂件内容
:~$ hdfs dfs -ls -R /user/hive/warehouse
drwxr-xr-x  - *** supergroup          0 2020-10-08 11:37 /user/hive/warehouse/hive.db
drwxr-xr-x  - *** supergroup          0 2020-10-08 11:51 /user/hive/warehouse/hive.db/env
-rw-r--r--  1 *** supergroup        897 2020-10-08 11:51 /user/hive/warehouse/hive.db/env/000000_0
分别查看Hadoop, hive, mysql安装路径
:~$ which hadoop
/opt/hadoop/bin/hadoop
:~$ which hive
/opt/apache-hive-bin/bin/hive
:~$ which mysql
/
usr/bin/mysql
准备⼯作
设备&电脑
电脑(虚拟机): Ubuntu20.04.1 LTS, 已安装open-jdk(1.8)
安装包&执⾏⽂件
Hadoop安装⽂件: hadoop-3.2.
Hive安装⽂件: apache-hive-3.1.
mysql-connector-java⽂件: mysql-connector-java-8.0.21.jar
软件准备
安装ssh-server
:~$ sudo apt install openssh-server
检查ssh是否安装成功
:~$ ssh localhost
安装成功则会提⽰需要使⽤密码
输⼊密码后则会提⽰连接成功
***@localhost's password:
Welcome to Ubuntu 20.04.1 LTS (GNU/Linux 5.4.0-48-generic x86_64)
* Documentation:  help.ubuntu
* Management:    landscape.canonical
* Support:        ubuntu/advantage
0 updates can be installed immediately.
0 of these updates are security updates.
Your Hardware Enablement Stack (HWE) is supported until April 2025.
Last login: Tue Oct  6 14:37:47 2020 from 127.0.0.1
退出ssh连接
:~$ logout
配置ssh免密登陆(⽣成⽂件authorized_keys)
:~$ cd ./.ssh
:~/.ssh$ ls
id_rsa  id_rsa.pub  known_hosts
:~/.ssh$ cat ./id_rsa.pub >> ./authorized_keys
:~/.ssh$ ls
authorized_keys  id_rsa  id_rsa.pub  known_hosts
Ubuntu安装openssh-server后会⾃动在⽤户⽂件⽬录下⽣成.ssh⽂件夹, 同时⽣成密钥安装Hadoop
解压Hadoop⽂件
:~$ sudo tar -zxvf ./hadoop-3.2. -C /opt
:~$ cd /opt
:/opt$ sudo mv ./hadoop-3.2.1 ./hadoop
:/opt$ sudo chgrp -R root ./hadoop
:/opt$ sudo chown -R root ./hadoop
:/opt$ sudo chmod -R 755 ./hadoop
:/opt$ ls -al |grep'hadoop'
drwxr-xr-x  9 root root  4096 9⽉  11  2019 hadoop
配置Hadoop环境变量ubuntu安装教程
增加Hadoop环境变量
:/opt$ cd
:~$ vim ./.bashrc
在vim模式下增加HADOOP_HOME, HADOOP_INSTALL, HADOOP_MAPRED_HOME, HADOOP_COMMON_HOME, HADOOP_HDFS_HOME, YARN_HOME, PATH 和HADOOP_CONF_DIR配置
如:
export HADOOP_HOME=/opt/hadoop
export HADOOP_INSTALL=${HADOOP_HOME}
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export PATH=${PATH}:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
更新Hadoop环境变量配置
:~$ source ./.bashrc
不同模式的关键匹配属性
配置Hadoop独⽴模式
:~$ cd /opt/hadoop/etc/hadoop
:/opt/hadoop/etc/hadoop$ sudo vim ./hadoop-env.sh
在vim模式下配置⽂件hadoop-env.sh中JAVA_HOME变量 和 HADOOP_LOG_DIR变量(⽇志输出路径)
如:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_LOG_DIR=/home/***/tmp/logs
独⽴模式不需要其他的设置了。
配置Hadoop伪分布模式
配置l
:/opt/hadoop/etc/hadoop$ sudo vim ./l
配置后的内容如下
:/opt/hadoop/etc/hadoop$ tail -n 10 ./l
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:8020/</value>
</property>
<property>
<name&p.dir</name>
<value>/home/***/tmp/data/hadoop</value>
</property>
</configuration>
配置l
:/opt/hadoop/etc/hadoop$ sudo vim ./l 配置后的内容如下
:/opt/hadoop/etc/hadoop$ tail -n l
<configuration>
<property>
<name&plication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/***/tmp/data/hadoop/dfs/name</value> </property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/***/tmp/data/hadoop/dfs/data</value> </property>
</configuration>
配置l
:/opt/hadoop/etc/hadoop$ sudo vim ./l 配置后的内容如下
:/opt/hadoop/etc/hadoop$ tail -n 6 ./l
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
配置l
:
/opt/hadoop/etc/hadoop$ sudo vim ./l 配置后的内容如下