Welcome to aparke’s blog!
函数:explode 和 lateral view
行转列函数explode() –英语 炸裂的意思
可以将一个数组变成列
有如下数据1,zhangsan,化学:物理:数学:语文
2,lisi,化学:数学:生物:生理:卫生
3,wangwu,化学:语文:英语:体育:生物
创建表create table t_subject(id int,name string,subjects array<string>)
row format delimited fields terminated by ','
collection items terminated by ':';
load data local inpath '/root/hive1/fun/subject.txt' into table t_subject;
0: jdbc:hive2://master:10000> select subjects from t_subject;
+-----------------------------+--+
| subjects |
+-----------------------------+--+
| ["化学","物理","数学","语文"] |
| ["化学","数学","生物","生理","卫生"] |
| ["化学","语文","英语","体育","生物"] |
+-----------------------------+--+
3 rows selected (0.079 seconds)
求不重复的课表
第一步行转列 此时数据文件仍未表结构:
select explode(subjects) from t_subject;
+------+--+
| col |
+------+--+
| 化学 |
| 物理 |
| 数学 |
| 语文 |
| 化学 |
| 数学 |
| 生物 |
| 生理 |
| 卫生 |
| 化学 |
| 语文 |
| 英语 |
| 体育 |
| 生物 |
+------+--+第二步 直接distinct查询(查询不重复的数据)但是distinct针对的不能是一个表 只能是一个列所以采用子查询的方法
select distinct sub
from (select explode(subjects) as sub from t_subject) as tmp;
+------+--+
| sub |
+------+--+
| 体育 |
| 化学 |
| 卫生 |
| 数学 |
| 物理 |
| 生物 |
| 生理 |
| 英语 |
| 语文 |
+------+--+
表生成函数lateral view 补上之前explode出来缺少的别的列(属性)
select id,name,tmp.sub |
以上两个函数可以将结构化数据打散 知道改变成上一步的时候与原始表结构对比,以上两个函数结合使用更利于查询+---------------+-----------------+-----------------------------+--+
| t_subject.id | t_subject.name | t_subject.subjects |
+---------------+-----------------+-----------------------------+--+
| 1 | zhangsan | ["化学","物理","数学","语文"] |
| 2 | lisi | ["化学","数学","生物","生理","卫生"] |
| 3 | wangwu | ["化学","语文","英语","体育","生物"] |
+---------------+-----------------+-----------------------------+--+
HQL的wordcount
利用行专列函数可以实现wordcount
–有如下数据hello tom hello tim
hello rose hello tom
tom love rose rose love jim
jim love tom love is what
what is love
create table wordcount(sencen string); |
第一步:切分数据
select split(sencen," ") from wordcount;
+--------------------------------------------+--+
| _c0 |
+--------------------------------------------+--+
| ["hello","tom","hello","tim"] |
| ["hello","rose","hello","tom"] |
| ["tom","love","rose","rose","love","jim"] |
| ["jim","love","tom","love","is","what"] |
| ["what","is","love",""] |
+--------------------------------------------+--+第二步:行转列
select explode(split(sencen," ")) from wordcount;
+--------+--+
| col |
+--------+--+
| hello |
| tom |
| hello |
| tim |
| hello |
| rose |
| hello |
| tom |
| tom |
| love |
| rose |
| rose |
| love |
| jim |
| jim |
| love |
| tom |
| love |
| is |
| what |
| what |
| is |
| love |
| |
+--------+--+第三步:统计每个单词的次数
select word,count(1)
from
(select explode(split(sencen," ")) as word from wordcount) as tmp group by word;
+--------+------+--+
| word | _c1 |
+--------+------+--+
| | 1 |
| hello | 4 |
| is | 2 |
| jim | 2 |
| love | 5 |
| rose | 3 |
| tim | 1 |
| tom | 4 |
| what | 2 |
+--------+------+--+排序
desc倒序 不加为顺序select word,count(1) as ctns
from(
select explode(split(sencen," "))as word
from wordcount)tmp
group by word
order by ctns desc;
+--------+-------+--+
| word | ctns |
+--------+-------+--+
| love | 5 |
| tom | 4 |
| hello | 4 |
| rose | 3 |
| what | 2 |
| jim | 2 |
| is | 2 |
| tim | 1 |
| | 1 |
+--------+-------+--+