Fork me on GitHub

lucene索引时join和查询时join使用示例

lucene(18)—lucene索引时join和查询时join使用示例

了解sql的朋友都知道,我们在查询的时候可以采用join查询,即对有一定关联关系的对象进行联合查询来对多维的数据进行整理。这个联合查询的方式挺方便的,跟我们现实生活中的托人找关系类似,我们想要完成一件事,先找自己的熟人,然后通过熟人在一次找到其他,最终通过这种手段找到想要联系到的人。有点类似于”世间万物皆有联系“的感觉。

lucene的join包提供了索引时join和查询时join的功能;

Index-time join

大意是索引时join提供了查询时join的支持,且IndexWriter.addDocuments()方法调用时被join的documents以单个document块存储索引。索引时join对普通文本内容(如xml文档或数据库表)是方便可用的。特别是对类似于数据库的那种多表关联的情况,我们需要对提供关联关系的列提供join支持;

在索引时join的时候,索引中的documents被分割成parent documents(每个索引块的最后一个document)和child documents (除了parent documents外的所有documents). 由于lucene并不记录doc块的信息,我们需要提供一个Filter来标示parent documents。

在搜索结果的时候,我们利用ToParentBlockJoinQuery来从child query到parent document space来remap/join对应的结果。

如果我们只关注匹配查询条件的parent documents,我们可以用任意的collector来采集匹配到的parent documents;如果我们还想采集匹配parent document查询条件的child documents,我们就需要利用ToParentBlockJoinCollector来进行查询;一旦查询完成,我们可以利用ToParentBlockJoinCollector.getTopGroups()来获取匹配条件的TopGroups.

Query-time joins

查询时join是基于索引词,其实现有两步:

  • 第一步先从匹配fromQuery的fromField中采集所有的数据;
  • 从第一步得到的数据中筛选出所有符合条件的documents

查询时join接收一下输入参数:

  • fromField:fromField的名称,即要join的documents中的字段;
  • formQuery: 用户的查询条件
  • multipleValuesPerDocument: fromField在document是否是多个值
  • scoreMode:定义other join side中score是如何被使用的。如果不关注scoring,我们只需要设置成ScoreMode.None,此种方式会忽略评分因此会更高效和节约内存;
  • toField:toField的名称,即要join的toField的在对应的document中的字段

通常查询时join的实现类似于如下:

1
2
3
4
5
6
7
8
9
String fromField = "from"; // Name of the from field
boolean multipleValuesPerDocument = false; // Set only yo true in the case when your fromField has multiple values per document in your index
String toField = "to"; // Name of the to field
ScoreMode scoreMode = ScoreMode.Max // Defines how the scores are translated into the other side of the join.
Query fromQuery = new TermQuery(new Term("content", searchTerm)); // Query executed to collect from values to join to the to values
Query joinQuery = JoinUtil.createJoinQuery(fromField, multipleValuesPerDocument, toField, fromQuery, fromSearcher, scoreMode);
TopDocs topDocs = toSearcher.search(joinQuery, 10); // Note: toSearcher can be the same as the fromSearcher
// Render topDocs...

查询示例

这里我们模拟6组数据,示例代码如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
package com.lucene.index.test;
import static org.junit.Assert.assertEquals;
import java.nio.file.Paths;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.SortedDocValuesField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.join.JoinUtil;
import org.apache.lucene.search.join.ScoreMode;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.BytesRef;
import org.junit.Test;
public class TestJoin {
@Test
public void testSimple() throws Exception {
final String idField = "id";
final String toField = "productId";
Directory dir = FSDirectory.open(Paths.get("index"));
Analyzer analyzer = new StandardAnalyzer();
IndexWriterConfig config = new IndexWriterConfig(analyzer);
config.setOpenMode(OpenMode.CREATE);
IndexWriter w = new IndexWriter(dir, config);
// 0
Document doc = new Document();
doc.add(new TextField("description", "random text", Field.Store.YES));
doc.add(new TextField("name", "name1", Field.Store.YES));
doc.add(new TextField(idField, "1", Field.Store.YES));
doc.add(new SortedDocValuesField(idField, new BytesRef("1")));
w.addDocument(doc);
// 1
Document doc1 = new Document();
doc1.add(new TextField("price", "10.0", Field.Store.YES));
doc1.add(new TextField(idField, "2", Field.Store.YES));
doc1.add(new SortedDocValuesField(idField, new BytesRef("2")));
doc1.add(new TextField(toField, "1", Field.Store.YES));
doc1.add(new SortedDocValuesField(toField, new BytesRef("1")));
w.addDocument(doc1);
// 2
Document doc2 = new Document();
doc2.add(new TextField("price", "20.0", Field.Store.YES));
doc2.add(new TextField(idField, "3", Field.Store.YES));
doc2.add(new SortedDocValuesField(idField, new BytesRef("3")));
doc2.add(new TextField(toField, "1", Field.Store.YES));
doc2.add(new SortedDocValuesField(toField, new BytesRef("1")));
w.addDocument(doc2);
// 3
Document doc3 = new Document();
doc3.add(new TextField("description", "more random text", Field.Store.YES));
doc3.add(new TextField("name", "name2", Field.Store.YES));
doc3.add(new TextField(idField, "4", Field.Store.YES));
doc3.add(new SortedDocValuesField(idField, new BytesRef("4")));
w.addDocument(doc3);
// 4
Document doc4 = new Document();
doc4.add(new TextField("price", "10.0", Field.Store.YES));
doc4.add(new TextField(idField, "5", Field.Store.YES));
doc4.add(new SortedDocValuesField(idField, new BytesRef("5")));
doc4.add(new TextField(toField, "4", Field.Store.YES));
doc4.add(new SortedDocValuesField(toField, new BytesRef("4")));
w.addDocument(doc4);
// 5
Document doc5 = new Document();
doc5.add(new TextField("price", "20.0", Field.Store.YES));
doc5.add(new TextField(idField, "6", Field.Store.YES));
doc5.add(new SortedDocValuesField(idField, new BytesRef("6")));
doc5.add(new TextField(toField, "4", Field.Store.YES));
doc5.add(new SortedDocValuesField(toField, new BytesRef("4")));
w.addDocument(doc5);
//6
Document doc6 = new Document();
doc6.add(new TextField(toField, "4", Field.Store.YES));
doc6.add(new SortedDocValuesField(toField, new BytesRef("4")));
w.addDocument(doc6);
w.commit();
w.close();
IndexReader reader = DirectoryReader.open(dir);
IndexSearcher indexSearcher = new IndexSearcher(reader);
// Search for product
Query joinQuery = JoinUtil.createJoinQuery(idField, false, toField, new TermQuery(new Term("name", "name2")), indexSearcher, ScoreMode.None);
System.out.println(joinQuery);
TopDocs result = indexSearcher.search(joinQuery, 10);
System.out.println("查询到的匹配数据:"+result.totalHits);
joinQuery = JoinUtil.createJoinQuery(idField, false, toField, new TermQuery(new Term("name", "name1")), indexSearcher, ScoreMode.None);
result = indexSearcher.search(joinQuery, 10);
System.out.println("查询到的匹配数据:"+result.totalHits);
// Search for offer
joinQuery = JoinUtil.createJoinQuery(toField, false, idField, new TermQuery(new Term("id", "5")), indexSearcher, ScoreMode.None);
result = indexSearcher.search(joinQuery, 10);
System.out.println("查询到的匹配数据:"+result.totalHits);
indexSearcher.getIndexReader().close();
dir.close();
}
}

程序的运行结果如下:

1
2
3
查询到的匹配数据:3
查询到的匹配数据:2
查询到的匹配数据:1

以第一个查询为例:

我们在查询的时候先根据name=name2这个查询条件找到记录为doc3的document,由于查询的是toField匹配的,我们在根据doc3找到其toField的值为4,然后查询条件变为productId:4,找出除本条记录外的其他数据,结果正好为3,符合条件。

坚持原创技术分享,您的支持将鼓励我继续创作!