ubuntu安装hive

环境:ubuntu-12.04-server-amd64

JDK1.7

hidoop分布式请看http://blog.ziki.cn/984.html

如把hadoop当作数据库的存储部分的话,那么hive可以理解为一个sql语言解释器,或者可以叫执行类SQL语言的shell解释器。

使用hive中可以使用表的操作select/insert/create来映射到hadoop的hdfs系统中一系列文本操作,可以理解为sql对hdfs的行为映射.

更可以通过实现自定义的udf、udaf来实现复杂的筛选select,这些带有函数的select语句hive会自动翻译成hadoop的mapreduce来执行job最终返回结果

返回结果可以直接输出也可以直接insert到一张hive创建的表,这样的话就将算出的最终结果持久化回hadoop的hdfs中,可以通过haddop的命令行直接访问最终结果。

当然用hive更是可以访问。

1、获取hive-0.8.1.tar.gz

2、解压缩到/opt/hive-0.8.1

3、vi /etc/prfole

export HIVE_HOME=/opt/hive-0.8.1

export PATH=$HIVE_HOME/bin:$PATH

4、vi /opt/hive-0.8.1/conf/hive-env.sh

HADOOP_HOME=/opt/hadoop-1.0.1

5、准备一个mysql数据库并命名为hive

6、vi /opt/hive-0.8.1/conf/hive-site.xml

 <configuration>
    <property>
    <name>hive.metastore.local</name>
      <value>true</value>
    </property>

    <property>
    <name>javax.jdo.option.ConnectionURL</name>
      <value>jdbc:mysql://192.168.122.104:3306/hive</value>
    </property>

    <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
      <value>com.mysql.jdbc.Driver</value>
    </property>

    <property>
      <name>javax.jdo.option.ConnectionUserName</name>
      <value>hiveuser</value>
    </property>

    <property>
      <name>javax.jdo.option.ConnectionPassword</name>
      <value>hiveuser</value>
    </property>
    <property>
      <name>datanucleus.fixedDatastore</name>
      <value>false</value>
    </property>
    </configuration>

7、打开hive

hive

当前hive的shell感觉跟mysql的客户端shell差不多,在下边就可以运行hivesql了。

建表

hive>create table access_log(url STRING,count_no STRING);

导入数据

hive>LOAD DATA LOCAL INPATH '/opt/access_log/access.log' OVERWRITE INTO TABLE access_log;

可以看到,这句sql产生了hadoop的hdfs文件

ubuntu@server1:/$ /opt/hadoop-1.0.1/bin/hadoop dfs -ls /user/hive/warehouse
Found 1 items
drwxr-xr-x   – ubuntu supergroup          0 2012-05-07 19:03 /user/hive/warehouse/access_log
 

hive> show tables;

access_log
OK
Time taken: 4.003 seconds
 

hive>select count(*) from access_log;

可以看到select count(*) from access_log 语句的执行就产生了一个hadoop的mapreduce 来运行。这个过程被hive封装了

hive> select count(*)  from access_log;                                                   
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201205090916_0001, Tracking URL = http://server1:50030/jobdetails.jsp?jobid=job_201205090916_0001
Kill Command = /opt/hadoop-1.0.1/libexec/../bin/hadoop job  -Dmapred.job.tracker=hdfs://server1:9001 -kill job_201205090916_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2012-05-09 09:46:07,510 Stage-1 map = 0%,  reduce = 0%
2012-05-09 09:46:12,621 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.99 sec
2012-05-09 09:46:13,633 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.99 sec
2012-05-09 09:46:14,649 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.99 sec
2012-05-09 09:46:15,680 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.99 sec
2012-05-09 09:46:16,707 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.99 sec
2012-05-09 09:46:17,728 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.99 sec
2012-05-09 09:46:18,748 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.99 sec
2012-05-09 09:46:19,763 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.99 sec
2012-05-09 09:46:20,774 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.99 sec
2012-05-09 09:46:21,785 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 0.99 sec
2012-05-09 09:46:22,811 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 0.99 sec
2012-05-09 09:46:23,883 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 0.99 sec
2012-05-09 09:46:24,894 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.83 sec
2012-05-09 09:46:25,914 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.83 sec
2012-05-09 09:46:26,930 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.83 sec
2012-05-09 09:46:27,960 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.83 sec
2012-05-09 09:46:29,009 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.83 sec
2012-05-09 09:46:30,025 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.83 sec
2012-05-09 09:46:31,045 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.83 sec
MapReduce Total cumulative CPU time: 3 seconds 830 msec
Ended Job = job_201205090916_0001
MapReduce Jobs Launched:
Job 0: Map: 1  Reduce: 1   Accumulative CPU: 3.83 sec   HDFS Read: 4951 HDFS Write: 3 SUCESS
Total MapReduce CPU Time Spent: 3 seconds 830 msec
OK
21
Time taken: 35.107 seconds

原创文章,转载请注明: 转载自海波无痕

本文链接地址: ubuntu安装hive

文章的脚注信息由WordPress的wp-posturl插件自动生成

此条目发表在javaee, server分类目录,贴了, , , , , 标签。将固定链接加入收藏夹。

发表评论

电子邮件地址不会被公开。 必填项已用*标注

评论链接可以 移除 nofollow.