Thứ Bảy, 31 tháng 3, 2018

Apache Solr : Personalize Rerank Query


The meaning of re-rank query in solr:

Query Re-Ranking allows you to run a simple query (A) for matching documents and then re-rank the top N documents using the scores from a more complex query (B).
But sometime you need to customize deeply more than a complex query (B), for example : get some metrics from database to calculate new score, this post will help you do it.

Firstly, we need to clarify that if we use the interface Solr provide, we can't do what we want to customize. So we need to customize by defining an other QParserPlugin, the original plugin is 

org.apache.solr.search.ReRankQParserPlugin. We will use the code in this class to customize what we want.

The original plugin use reRankQuery to calculate new score for document, we don't use query tocaculate new score for document, we use user information queried from db, combining with some
document's metrics to calculate document's score per user. That mean we need to access db for each
Personalized Rerank Query.We will remove reRankQuery because we don't need it.
TopDocs rescoredDocs = new QueryRescorer(reRankQuery) {
@Override protected float combine(float firstPassScore, boolean secondPassMatches,
float secondPassScore) {
float score = firstPassScore;
if (secondPassMatches) {
score += reRankWeight * secondPassScore;
}
return score;
}
}.rescore(searcher, mainDocs, mainDocs.scoreDocs.length);


We can remove above block code with your custom code. We can access db to get the
user's information. Combining with the document's information through
IndexSearcher.doc(docId) to calculate new score.
The new score result need to be updated to mainDocs.scoreDocs. If you want to get original score from sort formula, you should set fillFields = true instead that original initialization:

this.mainCollector = TopFieldCollector.create(sort, Math.max(this.reRankDocs, length),
false, true, true, true);


ScoreDoc will become the instance of FieldDoc, and you can get the original score from FieldDoc.

Note: 
*Pay attention to performance cost when you access to external system in re-rank query. For example
- If you need to get user's information, you can put user's info in redis db, and use Guava caching to cache user info (Apache Solr use Guava for caching)
- To get some metrics of document, you can index them to document, or just load it
into heap memory.
*Rerank query can't use with group function, you can use (collapse, expand) instead group.

Không có nhận xét nào:

Đăng nhận xét