Welcome to aparke’s blog!
hive内置函数–窗口分析函数–row_number_over
row_number——over 分组TOP_N
row_number() over() 函数
需求:求出每种性别的最大的年龄两个
造数据:1,18,a,male
2,19,b,male
3,22,c,female
4,16,d,female
5,30,e,male
6,26,f,female
1.创建表create table t_rn(id int,age int,name string,sex string)
row format delimited fields terminated by ',';
2.导入造的数据load data local inpath '/root/hive1/rn_top.data' into table t_rn;
3.分组标记序号
重要 通过设置sex为分区 排序设置按age倒序select id,age,name,sex,
row_number() over(partition by sex order by age desc) as rn
from t_rn;
+-----+------+-------+---------+-----+--+
| id | age | name | sex | rn |
+-----+------+-------+---------+-----+--+
| 6 | 26 | f | female | 1 |
| 3 | 22 | c | female | 2 |
| 4 | 16 | d | female | 3 |
| 5 | 30 | e | male | 1 |
| 2 | 19 | b | male | 2 |
| 1 | 18 | a | male | 3 |
+-----+------+-------+---------+-----+--+
4.通过子查询 加条件 再查询 top2select *
from
(select id,age,name,sex,
row_number() over(partition by sex order by age desc) as rn
from t_rn) tmp
where rn<3
;
+---------+----------+-----------+----------+---------+--+
| tmp.id | tmp.age | tmp.name | tmp.sex | tmp.rn |
+---------+----------+-----------+----------+---------+--+
| 6 | 26 | f | female | 1 |
| 3 | 22 | c | female | 2 |
| 5 | 30 | e | male | 1 |
| 2 | 19 | b | male | 2 |
+---------+----------+-----------+----------+---------+--+