Welcome to aparke’s blog!
HIVE 中的复合数据类型
数组
有如下数据:战狼2,吴京:吴刚:龙母,2017-08-16
三生三世十里桃花,刘亦菲:痒痒,2017-08-20
普罗米修斯,苍老师:小泽老师:波多老师,2017-09-17
美女与野兽,吴刚:加藤鹰,2017-09-17
建表映射:create table t_movie(movie_name string,actors array<string>,first_show date)
row format delimited fields terminated by ','
collection items terminated by ':';
-- 导入数据
load data local inpath '/root/hive1/hivetest/movies.txt' into table t_movie;
查询select movie_name,actors[0],first_show from t_movie;
+-------------+------+-------------+--+
| movie_name | _c1 | first_show |
+-------------+------+-------------+--+
| 战狼2 | 吴京 | 2017-08-16 |
| 三生三世十里桃花 | 刘亦菲 | 2017-08-20 |
| 普罗米修斯 | 苍老师 | 2017-09-17 |
| 流浪地球 | 吴京 | 2019-01-01 |
| 飞驰人生 | 沈腾 | 2019-01-01 |
| 疯狂的外星人 | 黄渤 | 2019-01-01 |
+-------------+------+-------------+--+
查询演员数组中有吴京的select movie_name,actors,first_show
from t_movie where array_contains(actors,'吴刚');
+-------------+--------------------+-------------+--+
| movie_name | actors | first_show |
+-------------+--------------------+-------------+--+
| 战狼2 | ["吴京","吴刚","龙母"] | 2017-08-16 |
| 流浪地球 | ["吴京","刘启","韩朵朵"] | 2019-01-01 |
+-------------+--------------------+-------------+--+
查询每部电影由几个主演select movie_name
,size(actors) as actor_number
,first_show
from t_movie;
+-------------+------------+-------------+--+
| movie_name | actor_num | first_show |
+-------------+------------+-------------+--+
| 战狼2 | 3 | 2017-08-16 |
| 三生三世十里桃花 | 2 | 2017-08-20 |
| 普罗米修斯 | 3 | 2017-09-17 |
| 流浪地球 | 3 | 2019-01-01 |
| 飞驰人生 | 3 | 2019-01-01 |
| 疯狂的外星人 | 3 | 2019-01-01 |
+-------------+------------+-------------+--+
Map集合
有如下数据:1,zhangsan,father:xiaoming#mother:xiaohuang#brother:xiaoxu,28
2,lisi,father:mayun#mother:huangyi#brother:guanyu,22
3,wangwu,father:wangjianlin#mother:ruhua#sister:jingtian,29
4,mayun,father:mayongzhen#mother:angelababy,26
建表映射上述数据create table t_family(id int,name string,family_members map<string,string>,age int)
row format delimited fields terminated by ','
collection items terminated by '#'
map keys terminated by ':';
-- 导入数据
load data local inpath '/root/hive1/hivetest/family.txt' into table t_family;
+--------------+----------------+----------------------------------------------------------------+---------------+--+ |
查出每个人的 爸爸、姐妹select id,name,family_members["father"] as father,family_members["sister"] as sister,age
from t_family;
+-----+-----------+--------------+-----------+------+--+
| id | name | father | sister | age |
+-----+-----------+--------------+-----------+------+--+
| 1 | zhangsan | xiaoming | NULL | 28 |
| 2 | lisi | mayun | NULL | 22 |
| 3 | wangwu | wangjianlin | jingtian | 29 |
| 4 | mayun | mayongzhen | NULL | 26 |
+-----+-----------+--------------+-----------+------+--+
查出每个人有哪些亲属关系 |
查出每个人的亲人名字select id,name,map_values(family_members) as relations,age
from t_family;
+-----+-----------+-------------------------------------+------+--+
| id | name | relations | age |
+-----+-----------+-------------------------------------+------+--+
| 1 | zhangsan | ["xiaoming","xiaohuang","xiaoxu"] | 28 |
| 2 | lisi | ["mayun","huangyi","guanyu"] | 22 |
| 3 | wangwu | ["wangjianlin","ruhua","jingtian"] | 29 |
| 4 | mayun | ["mayongzhen","angelababy"] | 26 |
+-----+-----------+-------------------------------------+------+--+
查出每个人的亲人数量select id,name,size(family_members) as relations,age
from t_family;
+-----+-----------+------------+------+--+
| id | name | relations | age |
+-----+-----------+------------+------+--+
| 1 | zhangsan | 3 | 28 |
| 2 | lisi | 3 | 22 |
| 3 | wangwu | 3 | 29 |
| 4 | mayun | 2 | 26 |
+-----+-----------+------------+------+--+
查出所有拥有兄弟的人及他的兄弟是谁
方案1:一句话写完
select id,name,age,family_members['brother']
from t_family where array_contains(map_keys(family_members),'brother');-- 将map_keys(family_members)作为数组查询是否有borther的 ,array_contains数组中是否包含brother方案2:子查询
-- 思路:子查询中先在家庭成员中查找出家庭关系查出来,返回一个数组,然后作为子查询,在外层在查出数组中包含brother的
select id,name,age,family_members['brother']
from
(select id,name,age,map_keys(family_members) as relations,family_members
from t_family) tmp
where array_contains(relations,'brother');
+-----+-----------+------+---------+--+
| id | name | age | _c3 |
+-----+-----------+------+---------+--+
| 1 | zhangsan | 28 | xiaoxu |
| 2 | lisi | 22 | guanyu |
+-----+-----------+------+---------+--+方案三
select id,name,age,family_members['brother']
from t_family
where family_members['brother'] is not null;
hive数据类型struct 相当于一个对象
假如有以下数据: |
建表映射上述数据drop table if exists t_user;
create table t_user(id int,name string,info struct<age:int,sex:string,addr:string>)
row format delimited fields terminated by ','
collection items terminated by ':';
-- 导入数据
load data local inpath '/root/hive1/hivetest/struct_user.txt' into table t_user;
查询每个人的id name和地址,跟java一样直接用对象名info去.使用select id,name,info.addr
from t_user;
+-----+-----------+-------+--+
| id | name | addr |
+-----+-----------+-------+--+
| 1 | zhangsan | 深圳 |
| 2 | lisi | 北京 |
| 3 | wangwu | 广州 |
| 4 | 赵六 | 上海 |
| 5 | 钱琪 | 杭州 |
| 6 | 王八 | 南京 |
+-----+-----------+-------+--+
6 rows selected (0.086 seconds)
条件控制函数:case when
0: jdbc:hive2://localhost:10000> select * from t_user; |
需求:查询出用户的id、name、年龄(如果年龄在30岁以下,显示年轻人,30-40之间,显示中年人,40以上老年人)select id,name,
case
when info.age<30 then '青年'
when info.age>=30 and info.age<40 then '中年'
else '老年'
end
from t_user;
条件控制函数:IF
0: jdbc:hive2://localhost:10000> select * from t_movie; |
需求: 查询电影信息,并且如果主演中有吴刚的,显示好电影,否则烂片select movie_name,actors,first_show,
if(array_contains(actors,'吴刚'),'好片儿','烂片儿')
from t_movie;
+-------------+------------------------+-------------+------+--+
| movie_name | actors | first_show | _c3 |
+-------------+------------------------+-------------+------+--+
| 战狼2 | ["吴京","吴刚","龙母"] | 2017-08-16 | 好片儿 |
| 三生三世十里桃花 | ["刘亦菲","痒痒"] | 2017-08-20 | 烂片儿 |
| 普罗米修斯 | ["苍老师","小泽老师","波多老师"] | 2017-09-17 | 烂片儿 |
| 流浪地球 | ["吴京","刘启","韩朵朵"] | 2019-01-01 | 烂片儿 |
| 飞驰人生 | ["沈腾","尹正","腾格尔"] | 2019-01-01 | 烂片儿 |
| 疯狂的外星人 | ["黄渤","沈腾","徐峥"] | 2019-01-01 | 烂片儿 |
+-------------+------------------------+-------------+------+--+