雨翔河
首页
列表
关于
使用lucene进行group操作
2016-04-23 23:38
近来用lucene进行查询,需要对结果进行group操作,结果发现核心包里面没有提供这个功能。 如果在内存里面对结果再自己用代码实现的话效率太低。查询了一番文档,发现了lucene-grouping。 jar包位置: ``` <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-grouping</artifactId> <version>4.0.0</version> </dependency> ``` 使用这个包可以实现对搜索结果进行group操作 首先建立索引 ``` private Document createDocument(String id, String tag_value, String score,String city,String time, String from,String type,String typeId, String gender,String birthday) { Document document = new Document(); document.add(new TextField("userId", id, Field.Store.YES)); document.add(new StringField("tagName", tag_value, Field.Store.YES)); document.add(new StringField("tagScoreValue", score, Field.Store.YES)); document.add(new StringField("usersCurrentCity", city, Field.Store.YES)); document.add(new StringField("tagRelatedTime", time, Field.Store.YES)); document.add(new StringField("dataFrom", from, Field.Store.YES)); document.add(new StringField("dataType", type, Field.Store.YES)); document.add(new StringField("dataId", typeId, Field.Store.YES)); document.add(new StringField("gender", gender, Field.Store.YES)); document.add(new StringField("birthday", birthday, Field.Store.YES)); return document; } ``` 这是我的索引结构,测试用的。 下面是我的测试代码 ``` public void testCustomSort() throws Exception { List<Document> docs = Arrays.asList( createDocument("1", "java", "55", "sz", "2015", "www.yuorfei.com", "blog", "0", "1", "1993"), createDocument("1", "php", "56", "sz", "2015", "www.yuorfei.com", "blog", "0", "1", "1993"), createDocument("2", "docker", "57", "sz", "2015", "www.yuorfei.com", "blog", "0", "1", "1993"), createDocument("2", "php", "55", "sz", "2015", "www.yuorfei.com", "blog", "0", "1", "1993"), createDocument("2", "java", "56", "sz", "2015", "www.yuorfei.com", "blog", "0", "1", "1993"), createDocument("3", "java", "10", "sz", "2015", "www.yuorfei.com", "blog", "0", "1", "1993"), createDocument("4", "javascript", "99", "sz", "2015", "www.yuorfei.com", "blog", "0", "1", "1993"), createDocument("4", "java", "99999", "szbjbj44444", "20154444", "www.yuorfei.com", "blog", "0", "1", "1993")); LuceneTool.writeDocument(indexDirectoryPath, docs); String groupField = "userId"; String tagName="java,php"; GroupingSearch groupingSearch = new GroupingSearch(groupField); groupingSearch.setFillSortFields(true); groupingSearch.setCachingInMB(4.0, true); groupingSearch.setAllGroups(true); groupingSearch.setGroupDocsLimit(10); TagRadarSearchBuilder tagRadarSearchBuilder = new TagRadarSearchBuilder(); String[] tagNames = StringUtils.split(tagName, ","); tagRadarSearchBuilder.tagName(tagNames).sortedBy(TagRadarSortedCategory.SORTED_BY_TAG_SCORE_VALUE_DESC); FSDirectory dir = FSDirectory.open(new File(indexDirectoryPath)); dir.setReadChunkSize(104857600);//100兆 IndexReader reader = DirectoryReader.open(dir); IndexSearcher indexSearch = new IndexSearcher(reader); groupingSearch.setGroupSort(tagRadarSearchBuilder.getSort()); groupingSearch.setSortWithinGroup(tagRadarSearchBuilder.getSort()); TopGroups<BytesRef> result = groupingSearch.search(indexSearch, tagRadarSearchBuilder.getQuery(), 0, 1000); System.out.println("搜索命中数:" + result.totalHitCount); System.out.println("搜索结果分组数:" + result.groups.length); System.out.println("\n-------------------------\n"); for (GroupDocs<BytesRef> groupDocs : result.groups) { System.out.println("分组用户id:" + groupDocs.groupValue.utf8ToString()); System.out.println("组内记录数量:" + groupDocs.totalHits); for (ScoreDoc scoreDoc : groupDocs.scoreDocs) { System.out.println(indexSearch.doc(scoreDoc.doc)); } System.out.println("\n-------------------------\n"); } } ``` 最主要的核心的地方是: ``` GroupingSearch groupingSearch = new GroupingSearch(groupField); groupingSearch.setGroupSort(tagRadarSearchBuilder.getSort()); groupingSearch.setSortWithinGroup(tagRadarSearchBuilder.getSort()); TopGroups<BytesRef> result = groupingSearch.search(indexSearch, tagRadarSearchBuilder.getQuery(), 0, 1000); ``` 设置的排序方法,调用groupSearch的search进行搜索。 setGroupSort为组间排序 setSortWithinGroup为组内排序 这样就能简单的实现对lucene搜索结果进行group操作,效率非常高。
类型:日常
标签:lucene,group,java
Copyright © 雨翔河
我与我周旋久
独孤影
开源实验室