Steven's Blog

A Dream Land of Peace!

SparkSQL入门和进阶

常用的函数:

selectlitasgroupByaggsumaswherewithColumncolwhenotherwisejoinwithColumnRenamedisincast$uniongtstructsortdescshoworderByascrepartitionsortWithinPartitionsfilterselectExprpivotexprrow_numberoverpartitionBy

1
2
3
4
5
6
df
.select($"id",lit(1).as("cnt"))
.groupBy("idd")
.agg(sum("cnt").as("total"))
.where("total >=" + cnt2)
.select("uid","total")

UDF:

spark.udf.register

参考链接: