聚合(aggs)
聚合一般用于数据的统计分析,类似于mysql的group by。
聚合里面有两个基本概念,一个叫桶,一个叫度量。
桶的作用,是按照某种方式对数据进行分组,每一组数据成为一个桶。比如对手机品牌分组,可以得到小米桶,华为桶。
桶的分组方式
Date Histogram Aggregation:根据日期阶梯分组,例如给定阶梯为周,会自动每周分为一组
Histogram Aggregation:根据数值阶梯分组,与日期类似
Terms Aggregation:根据词条内容分组,词条内容完全匹配的为一组
Range Aggregation:数值和日期的范围分组,指定开始和结束,然后按段分组
可以看出ES的分组方式相当强大,mysql的group by只能实现类似Terms Aggregation的分组效果,而ES还可以根据阶梯和范围来分组。
度量
度量类似mysql的avg,max等函数,用来求分组内平均值,最大值等。
比较常用的一些度量聚合方式:
Avg Aggregation:求平均值
Max Aggregation:求最大值
Min Aggregation:求最小值
Percentiles Aggregation:求百分比
Stats Aggregation:同时返回avg、max、min、sum、count等
Sum Aggregation:求和
Top hits Aggregation:求前几
Value Count Aggregation:求总数
词条桶
我们来看最简单的词条桶,brand_aggs就是自定义桶的名字,terms表示词条桶,field:brand表示按照字段brand来划分桶,size为0表示不想返回查询结果,从这里可以看出分页不影响聚合的结果,也就是说可以实现分页查询和聚合结果一起返回。
下面的查询是通过品牌名来分组统计
GET /goods/_search
{
"size" : 0,
"aggs" : {
"brand_aggs" : {
"terms" : {
"field" : "brand"
}
}
}
}
查询结果:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"brand_aggs" : { //桶的名字
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ //查询结果
{
"key" : "华为", //品牌名,因为是按照品牌分组
"doc_count" : 3 //统计的数量
},
{
"key" : "小米",
"doc_count" : 2
}
]
}
}
}
可以看到不需要加度量默认就把总数求出来了,如果要求品牌下平均手机价格,就需要加度量了
度量平均值
GET /goods/_search
{
"size" : 0,
"aggs" : {
"brand_aggs" : {
"terms" : {
"field" : "brand"
},
"aggs":{
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
返回结果:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"brand_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "华为",
"doc_count" : 3,
"avg_price" : {
"value" : 4500.0
}
},
{
"key" : "小米",
"doc_count" : 2,
"avg_price" : {
"value" : 5000.0
}
}
]
}
}
}
代码实现
public void testAggs() {
AbstractAggregationBuilder aggregationBuilder = AggregationBuilders.terms("brand_aggs").field("brand");//通过品牌分组
aggregationBuilder.subAggregation(AggregationBuilders.avg("avg_price").field("price")); //平均值度量,计算price平均值
NativeSearchQuery nativeSearchQuery = new NativeSearchQueryBuilder()
.withPageable(PageRequest.of(0, 1)) //size只能大于0
.addAggregation(aggregationBuilder)
.build();
SearchHits<GoodsInfo> goodsInfos = elasticsearchRestTemplate.search(nativeSearchQuery, GoodsInfo.class);
Terms brandTerms = goodsInfos.getAggregations().get("brand_aggs");
brandTerms.getBuckets().stream().forEach(bucket -> {
System.out.println(bucket.getKey()); //获取品牌名
System.out.println(bucket.getDocCount()); //获取总数
ParsedAvg avgPrice = bucket.getAggregations().get("avg_price"); //获取平均价格
System.out.println(avgPrice.getValue());
});
}
阶梯桶Histogram
下面的例子是按照500为一个阶梯统计不同价位手机数量
GET /goods/_search
{
"size":0,
"aggs":{
"price_histogram":{
"histogram": {
"field": "price",
"interval": 500
}
}
}
}
结果:
{
"took" : 103,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"price_histogram" : {
"buckets" : [
{
"key" : 3500.0,
"doc_count" : 1
},
{
"key" : 4000.0,
"doc_count" : 0
},
{
"key" : 4500.0,
"doc_count" : 2
},
{
"key" : 5000.0,
"doc_count" : 0
},
{
"key" : 5500.0,
"doc_count" : 2
}
]
}
}
}
代码:
public void testHistogram() {
AbstractAggregationBuilder aggregationBuilder = AggregationBuilders.histogram("price_histogram").field("price").interval(500);//500一个阶梯统计
NativeSearchQuery nativeSearchQuery = new NativeSearchQueryBuilder()
.withPageable(PageRequest.of(0, 1)) //size只能大于0
.addAggregation(aggregationBuilder)
.build();
SearchHits<GoodsInfo> goodsInfos = elasticsearchRestTemplate.search(nativeSearchQuery, GoodsInfo.class);
ParsedHistogram priceHistogram = goodsInfos.getAggregations().get("price_histogram");
priceHistogram.getBuckets().stream().forEach(bucket -> {
System.out.println(bucket.getKey()); //阶梯值
System.out.println(bucket.getDocCount()); //获取总数
});
}
范围分桶Range Aggregation
统计价格在4000-6000手机的数量
GET /goods/_search
{
"size": 0,
"aggs": {
"price_range": {
"range": {
"field": "price",
"ranges": [
{
"from": 4000,
"to": 6000
}
]
}
}
}
}
结果:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"price_range" : {
"buckets" : [
{
"key" : "4000.0-6000.0",
"from" : 4000.0,
"to" : 6000.0,
"doc_count" : 4
}
]
}
}
}
代码:
public void testRangeAggrs() {
AbstractAggregationBuilder aggregationBuilder = AggregationBuilders.range("price_range").field("price").addRange(4000, 6000);//500一个阶梯统计
NativeSearchQuery nativeSearchQuery = new NativeSearchQueryBuilder()
.withPageable(PageRequest.of(0, 1)) //size只能大于0
.addAggregation(aggregationBuilder)
.build();
SearchHits<GoodsInfo> goodsInfos = elasticsearchRestTemplate.search(nativeSearchQuery, GoodsInfo.class);
ParsedRange priceHistogram = goodsInfos.getAggregations().get("price_range");
priceHistogram.getBuckets().stream().forEach(bucket -> {
System.out.println(bucket.getKey()); //key值
System.out.println(bucket.getDocCount()); //获取总数
});
}
日期桶DateHistogram
GET /cars/_search
{
"size":0,
"aggs" : {
"date" : {
"date_histogram" : {
"field" : "sold",
"interval" : "1M",
"format" : "yyyy-MM",
"time_zone": "-01:00",
"min_doc_count": 1
}
}
}
}
结果:
"aggregations" : {
"date" : {
"buckets" : [
{
"key_as_string" : "2013-12",
"key" : 1385859600000,
"doc_count" : 1
},
{
"key_as_string" : "2014-02",
"key" : 1391216400000,
"doc_count" : 1
},
{
"key_as_string" : "2014-05",
"key" : 1398906000000,
"doc_count" : 1
},
{
"key_as_string" : "2014-07",
"key" : 1404176400000,
"doc_count" : 1
},
{
"key_as_string" : "2014-08",
"key" : 1406854800000,
"doc_count" : 1
},
{
"key_as_string" : "2014-10",
"key" : 1412125200000,
"doc_count" : 1
},
{
"key_as_string" : "2014-11",
"key" : 1414803600000,
"doc_count" : 2
}
]
}
}
?