Fork me on GitHub

lucene搜索之自定义排序的实现原理和编写自己的自定义排序工具

lucene(13)—lucene搜索之自定义排序的实现原理和编写自己的自定义排序工具

自定义排序说明

我们在做lucene搜索的时候,可能会需要排序功能,虽然lucene内置了多种类型的排序,但是如果在需要先进行某些值的运算然后在排序的时候就有点显得无能为力了;

要做自定义查询,我们就要研究lucene已经实现的排序功能,lucene的所有排序都是要继承FieldComparator,然后重写内部实现,这里以IntComparator为例子来查看其实现;

IntComparator相关实现

其类的声明为 public static class IntComparator extends NumericComparator,这里说明IntComparator接收的是Integer类型的参数,即只处理IntField的排序;

IntComparator声明的参数为:

1
2
3
private final int[] values;
private int bottom; // Value of bottom of queue
private int topValue;

查看copy方法可知

  • values随着类初始化而初始化其长度
  • values用于存储NumericDocValues中读取到的内容

具体实现如下:

values的初始化

1
2
3
4
5
6
7
8
/**
* Creates a new comparator based on {@link Integer#compare} for {@code numHits}.
* When a document has no value for the field, {@code missingValue} is substituted.
*/
public IntComparator(int numHits, String field, Integer missingValue) {
super(field, missingValue);
values = new int[numHits];
}

values值填充(此为IntComparator的处理方式)

1
2
3
4
5
6
7
8
9
10
11
@Override
public void copy(int slot, int doc) {
int v2 = (int) currentReaderValues.get(doc);
// Test for v2 == 0 to save Bits.get method call for
// the common case (doc has value and value is non-zero):
if (docsWithField != null && v2 == 0 && !docsWithField.get(doc)) {
v2 = missingValue;
}
values[slot] = v2;
}

这些实现都是类似的,我们的应用实现自定义排序的时候需要做的是对binaryDocValues或NumericDocValues的值进行计算,然后实现FieldComparator内部方法,对应IntComparator就是如上的值copy操作;

然后我们需要实现compareTop、compareBottom和compare,IntComparator的实现为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
@Override
public int compare(int slot1, int slot2) {
return Integer.compare(values[slot1], values[slot2]);
}
@Override
public int compareBottom(int doc) {
int v2 = (int) currentReaderValues.get(doc);
// Test for v2 == 0 to save Bits.get method call for
// the common case (doc has value and value is non-zero):
if (docsWithField != null && v2 == 0 && !docsWithField.get(doc)) {
v2 = missingValue;
}
return Integer.compare(bottom, v2);
}
1
2
3
4
5
6
7
8
9
10
    @Override
    public int compareTop(int doc) {
      int docValue = (int) currentReaderValues.get(doc);
      // Test for docValue == 0 to save Bits.get method call for
      // the common case (doc has value and value is non-zero):
      if (docsWithField != null && docValue == 0 && !docsWithField.get(doc)) {
        docValue = missingValue;
      }
      return Integer.compare(topValue, docValue);
    }

实现自己的FieldComparator

要实现FieldComparator,需要对接收参数进行处理,定义处理值的集合,同时定义BinaryDocValues和接收的参数等,这里我写了一个通用的比较器,代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
package com.lucene.search;
import java.io.IOException;
import org.apache.lucene.index.BinaryDocValues;
import org.apache.lucene.index.DocValues;
import org.apache.lucene.index.LeafReaderContext;
import org.apache.lucene.search.SimpleFieldComparator;
import com.lucene.util.ObjectUtil;
/**自定义comparator
* @author lenovo
*
*/
public class SelfDefineComparator extends SimpleFieldComparator<String> {
private Object[] values;//定义的Object[],同IntComparator
private Object bottom;
private Object top;
private String field;
private BinaryDocValues binaryDocValues;//接收的BinaryDocValues,同IntComparator中的NumericDocValues
private ObjectUtil objectUtil;//这里为了便于拓展用接口代替抽象类
private Object[] params;//接收的参数
public SelfDefineComparator(String field, int numHits, Object[] params,ObjectUtil objectUtil) {
values = new Object[numHits];
this.objectUtil = objectUtil;
this.field = field;
this.params = params;
}
@Override
public void setBottom(int slot) {
this.bottom = values[slot];
}
@Override
public int compareBottom(int doc) throws IOException {
Object distance = getValues(doc);
return (bottom.toString()).compareTo(distance.toString());
}
@Override
public int compareTop(int doc) throws IOException {
Object distance = getValues(doc);
return objectUtil.compareTo(top,distance);
}
@Override
public void copy(int slot, int doc) throws IOException {
values[slot] = getValues(doc);
}
private Object getValues(int doc) {
Object instance = objectUtil.getValues(doc,params,binaryDocValues) ;
return instance;
}
@Override
protected void doSetNextReader(LeafReaderContext context)
throws IOException {
binaryDocValues = DocValues.getBinary(context.reader(), field);//context.reader().getBinaryDocValues(field);
}
@Override
public int compare(int slot1, int slot2) {
return objectUtil.compareTo(values[slot1],values[slot2]);
}
@Override
public void setTopValue(String value) {
this.top = value;
}
@Override
public String value(int slot) {
return values[slot].toString();
}
}

其中ObjectUtil是一个接口,定义了值处理的过程,最终是要服务于comparator的compare方法的,同时对comparator的内部compare方法进行了定义

ObjectUtil接口定义如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
package com.lucene.util;
import org.apache.lucene.index.BinaryDocValues;
public interface ObjectUtil {
/**自定义的获取处理值的方法
* @param doc
* @param params
* @param binaryDocValues
* @return
*/
public abstract Object getValues(int doc, Object[] params, BinaryDocValues binaryDocValues) ;
/**compare比较器实现
* @param object
* @param object2
* @return
*/
public abstract int compareTo(Object object, Object object2);
}

我们不仅要提供比较器和comparator,同时还要提供接收用户输入的FiledComparatorSource

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
package com.lucene.search;
import java.io.IOException;
import org.apache.lucene.search.FieldComparator;
import org.apache.lucene.search.FieldComparatorSource;
import com.lucene.util.ObjectUtil;
/**comparator用于接收用户原始输入,继承自FieldComparatorSource实现了自定义comparator的构建
* @author lenovo
*
*/
public class SelfDefineComparatorSource extends FieldComparatorSource {
private Object[] params;//接收的参数
private ObjectUtil objectUtil;//这里为了便于拓展用接口代替抽象类
public Object[] getParams() {
return params;
}
public void setParams(Object[] params) {
this.params = params;
}
public ObjectUtil getObjectUtil() {
return objectUtil;
}
public void setObjectUtil(ObjectUtil objectUtil) {
this.objectUtil = objectUtil;
}
public SelfDefineComparatorSource(Object[] params, ObjectUtil objectUtil) {
super();
this.params = params;
this.objectUtil = objectUtil;
}
@Override
public FieldComparator<?> newComparator(String fieldname, int numHits,
int sortPos, boolean reversed) throws IOException {
//实际比较由SelfDefineComparator实现
return new SelfDefineComparator(fieldname, numHits, params, objectUtil);
}
}

相关测试程序,这里我们模拟一个StringComparator,对String值进行排序

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
package com.lucene.search;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.BinaryDocValuesField;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.MatchAllDocsQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.Sort;
import org.apache.lucene.search.SortField;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.TopFieldDocs;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.BytesRef;
import com.lucene.util.CustomerUtil;
import com.lucene.util.ObjectUtil;
import com.lucene.util.StringComparaUtil;
/**
*
* @author 吴莹桂
*
*/
public class SortTest {
public static void main(String[] args) throws Exception {
RAMDirectory directory = new RAMDirectory();
Analyzer analyzer = new StandardAnalyzer();
IndexWriterConfig indexWriterConfig = new IndexWriterConfig(analyzer);
indexWriterConfig.setOpenMode(OpenMode.CREATE_OR_APPEND);
IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig);
addDocument(indexWriter, "B");
addDocument(indexWriter, "D");
addDocument(indexWriter, "A");
addDocument(indexWriter, "E");
indexWriter.commit();
indexWriter.close();
IndexReader reader = DirectoryReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);
Query query = new MatchAllDocsQuery();
ObjectUtil util = new StringComparaUtil();
Sort sort = new Sort(new SortField("name",new SelfDefineComparatorSource(new Object[]{},util),true));
TopDocs topDocs = searcher.search(query, Integer.MAX_VALUE, sort);
ScoreDoc[] docs = topDocs.scoreDocs;
for(ScoreDoc doc : docs){
Document document = searcher.doc(doc.doc);
System.out.println(document.get("name"));
}
}
private static void addDocument(IndexWriter writer,String name) throws Exception{
Document document = new Document();
document.add(new StringField("name",name,Field.Store.YES));
document.add(new BinaryDocValuesField("name", new BytesRef(name.getBytes())));
writer.addDocument(document);
}
}

其对应的ObjectUtil实现如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
package com.lucene.util;
import org.apache.lucene.index.BinaryDocValues;
import org.apache.lucene.util.BytesRef;
public class StringComparaUtil implements ObjectUtil {
@Override
public Object getValues(int doc, Object[] params,
BinaryDocValues binaryDocValues) {
BytesRef bytesRef = binaryDocValues.get(doc);
String value = bytesRef.utf8ToString();
return value;
}
@Override
public int compareTo(Object object, Object object2) {
// TODO Auto-generated method stub
return object.toString().compareTo(object2.toString());
}
}
坚持原创技术分享,您的支持将鼓励我继续创作!