vim-patch:8.2.1726: fuzzy matching only works on strings

Problem:    Fuzzy matching only works on strings.
Solution:   Support passing a dict.  Add matchfuzzypos() to also get the match
            positions. (Yegappan Lakshmanan, closes vim/vim#6947)
4f73b8e9cc

Also remove some N/A and seemingly useless NULL checks -- Nvim allocs can't
return NULL. I'm not sure why the retmatchpos stuff in match_fuzzy checks for
NULL too, given that Vim checks for NULL alloc in do_fuzzymatch; assert that the
li stuff is not NULL as that's the one check I'm ever-so-slightly unsure about.

Adjust tests. Note that the text_cb tests actually throw E6000 in Nvim, but we
also can't assert that error due to v8.2.1183 not being ported yet.
This commit is contained in:
Sean Dewar 2022-01-01 16:40:28 +00:00
parent fba00b5e7e
commit 960ea01972
No known key found for this signature in database
GPG Key ID: 08CC2C83AD41B581
8 changed files with 530 additions and 162 deletions

View File

@ -296,7 +296,10 @@ matcharg({nr}) List arguments of |:match|
matchdelete({id} [, {win}]) Number delete match identified by {id} matchdelete({id} [, {win}]) Number delete match identified by {id}
matchend({expr}, {pat}[, {start}[, {count}]]) matchend({expr}, {pat}[, {start}[, {count}]])
Number position where {pat} ends in {expr} Number position where {pat} ends in {expr}
matchfuzzy({list}, {str}) List fuzzy match {str} in {list} matchfuzzy({list}, {str} [, {dict}])
List fuzzy match {str} in {list}
matchfuzzypos({list}, {str} [, {dict}])
List fuzzy match {str} in {list}
matchlist({expr}, {pat}[, {start}[, {count}]]) matchlist({expr}, {pat}[, {start}[, {count}]])
List match and submatches of {pat} in {expr} List match and submatches of {pat} in {expr}
matchstr({expr}, {pat}[, {start}[, {count}]]) matchstr({expr}, {pat}[, {start}[, {count}]])
@ -4858,12 +4861,25 @@ matchend({expr}, {pat} [, {start} [, {count}]]) *matchend()*
Can also be used as a |method|: > Can also be used as a |method|: >
GetText()->matchend('word') GetText()->matchend('word')
matchfuzzy({list}, {str}) *matchfuzzy()* matchfuzzy({list}, {str} [, {dict}]) *matchfuzzy()*
Returns a list with all the strings in {list} that fuzzy If {list} is a list of strings, then returns a list with all
match {str}. The strings in the returned list are sorted the strings in {list} that fuzzy match {str}. The strings in
based on the matching score. {str} is treated as a literal the returned list are sorted based on the matching score.
string and regular expression matching is NOT supported.
The maximum supported {str} length is 256. If {list} is a list of dictionaries, then the optional {dict}
argument supports the following items:
key key of the item which is fuzzy matched against
{str}. The value of this item should be a
string.
text_cb |Funcref| that will be called for every item
in {list} to get the text for fuzzy matching.
This should accept a dictionary item as the
argument and return the text for that item to
use for fuzzy matching.
{str} is treated as a literal string and regular expression
matching is NOT supported. The maximum supported {str} length
is 256.
If there are no matching strings or there is an error, then an If there are no matching strings or there is an error, then an
empty list is returned. If length of {str} is greater than empty list is returned. If length of {str} is greater than
@ -4874,11 +4890,38 @@ matchfuzzy({list}, {str}) *matchfuzzy()*
< results in ["clay"]. > < results in ["clay"]. >
:echo getbufinfo()->map({_, v -> v.name})->matchfuzzy("ndl") :echo getbufinfo()->map({_, v -> v.name})->matchfuzzy("ndl")
< results in a list of buffer names fuzzy matching "ndl". > < results in a list of buffer names fuzzy matching "ndl". >
:echo getbufinfo()->matchfuzzy("ndl", {'key' : 'name'})
< results in a list of buffer information dicts with buffer
names fuzzy matching "ndl". >
:echo getbufinfo()->matchfuzzy("spl",
\ {'text_cb' : {v -> v.name}})
< results in a list of buffer information dicts with buffer
names fuzzy matching "spl". >
:echo v:oldfiles->matchfuzzy("test") :echo v:oldfiles->matchfuzzy("test")
< results in a list of file names fuzzy matching "test". > < results in a list of file names fuzzy matching "test". >
:let l = readfile("buffer.c")->matchfuzzy("str") :let l = readfile("buffer.c")->matchfuzzy("str")
< results in a list of lines in "buffer.c" fuzzy matching "str". < results in a list of lines in "buffer.c" fuzzy matching "str".
matchfuzzypos({list}, {str} [, {dict}]) *matchfuzzypos()*
Same as |matchfuzzy()|, but returns the list of matched
strings and the list of character positions where characters
in {str} matches.
If {str} matches multiple times in a string, then only the
positions for the best match is returned.
If there are no matching strings or there is an error, then a
list with two empty list items is returned.
Example: >
:echo matchfuzzypos(['testing'], 'tsg')
< results in [['testing'], [[0, 2, 6]]] >
:echo matchfuzzypos(['clay', 'lacy'], 'la')
< results in [['lacy', 'clay'], [[0, 1], [1, 2]]] >
:echo [{'text': 'hello', 'id' : 10}]
\ ->matchfuzzypos('ll', {'key' : 'text'})
< results in [{'id': 10, 'text': 'hello'}] [[2, 3]]
matchlist({expr}, {pat} [, {start} [, {count}]]) *matchlist()* matchlist({expr}, {pat} [, {start} [, {count}]]) *matchlist()*
Same as |match()|, but return a |List|. The first item in the Same as |match()|, but return a |List|. The first item in the
list is the matched string, same as what matchstr() would list is the matched string, same as what matchstr() would

View File

@ -609,6 +609,7 @@ String manipulation: *string-functions*
match() position where a pattern matches in a string match() position where a pattern matches in a string
matchend() position where a pattern match ends in a string matchend() position where a pattern match ends in a string
matchfuzzy() fuzzy matches a string in a list of strings matchfuzzy() fuzzy matches a string in a list of strings
matchfuzzypos() fuzzy matches a string in a list of strings
matchstr() match of a pattern in a string matchstr() match of a pattern in a string
matchstrpos() match and positions of a pattern in a string matchstrpos() match and positions of a pattern in a string
matchlist() like matchstr() and also return submatches matchlist() like matchstr() and also return submatches

View File

@ -249,7 +249,8 @@ return {
matcharg={args=1, base=1}, matcharg={args=1, base=1},
matchdelete={args={1, 2}, base=1}, matchdelete={args={1, 2}, base=1},
matchend={args={2, 4}, base=1}, matchend={args={2, 4}, base=1},
matchfuzzy={args=2, base=1}, matchfuzzy={args={2, 3}, base=1},
matchfuzzypos={args={2, 3}, base=1},
matchlist={args={2, 4}, base=1}, matchlist={args={2, 4}, base=1},
matchstr={args={2, 4}, base=1}, matchstr={args={2, 4}, base=1},
matchstrpos={args={2,4}, base=1}, matchstrpos={args={2,4}, base=1},

View File

@ -98,7 +98,6 @@ PRAGMA_DIAG_POP
#endif #endif
static char *e_listarg = N_("E686: Argument of %s must be a List");
static char *e_listblobarg = N_("E899: Argument of %s must be a List or Blob"); static char *e_listblobarg = N_("E899: Argument of %s must be a List or Blob");
static char *e_invalwindow = N_("E957: Invalid window number"); static char *e_invalwindow = N_("E957: Invalid window number");

View File

@ -979,6 +979,7 @@ EXTERN char e_invalidreg[] INIT(= N_("E850: Invalid register name"));
EXTERN char e_dirnotf[] INIT(= N_("E919: Directory not found in '%s': \"%s\"")); EXTERN char e_dirnotf[] INIT(= N_("E919: Directory not found in '%s': \"%s\""));
EXTERN char e_au_recursive[] INIT(= N_("E952: Autocommand caused recursive behavior")); EXTERN char e_au_recursive[] INIT(= N_("E952: Autocommand caused recursive behavior"));
EXTERN char e_autocmd_close[] INIT(= N_("E813: Cannot close autocmd window")); EXTERN char e_autocmd_close[] INIT(= N_("E813: Cannot close autocmd window"));
EXTERN char e_listarg[] INIT(= N_("E686: Argument of %s must be a List"));
EXTERN char e_unsupportedoption[] INIT(= N_("E519: Option not supported")); EXTERN char e_unsupportedoption[] INIT(= N_("E519: Option not supported"));
EXTERN char e_fnametoolong[] INIT(= N_("E856: Filename too long")); EXTERN char e_fnametoolong[] INIT(= N_("E856: Filename too long"));
EXTERN char e_float_as_string[] INIT(= N_("E806: using Float as a String")); EXTERN char e_float_as_string[] INIT(= N_("E806: using Float as a String"));

View File

@ -48,6 +48,8 @@
#include "nvim/vim.h" #include "nvim/vim.h"
#include "nvim/window.h" #include "nvim/window.h"
typedef uint32_t matchidx_T;
#ifdef INCLUDE_GENERATED_DECLARATIONS #ifdef INCLUDE_GENERATED_DECLARATIONS
# include "search.c.generated.h" # include "search.c.generated.h"
#endif #endif
@ -4811,24 +4813,109 @@ the_end:
/// There is not an explicit bonus for an exact match. Unmatched letters receive /// There is not an explicit bonus for an exact match. Unmatched letters receive
/// a penalty. So shorter strings and closer matches are worth more. /// a penalty. So shorter strings and closer matches are worth more.
typedef struct { typedef struct {
const listitem_T *item; listitem_T *item;
int score; int score;
list_T *lmatchpos;
} fuzzyItem_T; } fuzzyItem_T;
static bool fuzzy_match_recursive(const char_u *fuzpat, const char_u *str, int *const outScore, /// bonus for adjacent matches
const char_u *const strBegin, const char_u *const srcMatches, #define SEQUENTIAL_BONUS 15
char_u *const matches, const int maxMatches, int nextMatch, /// bonus if match occurs after a separator
int *const recursionCount, const int recursionLimit) #define SEPARATOR_BONUS 30
FUNC_ATTR_NONNULL_ARG(1, 2, 3, 4, 6, 9) FUNC_ATTR_WARN_UNUSED_RESULT /// bonus if match is uppercase and prev is lower
#define CAMEL_BONUS 30
/// bonus if the first letter is matched
#define FIRST_LETTER_BONUS 15
/// penalty applied for every letter in str before the first match
#define LEADING_LETTER_PENALTY -5
/// maximum penalty for leading letters
#define MAX_LEADING_LETTER_PENALTY -15
/// penalty for every letter that doesn't match
#define UNMATCHED_LETTER_PENALTY -1
/// Score for a string that doesn't fuzzy match the pattern
#define SCORE_NONE -9999
#define FUZZY_MATCH_RECURSION_LIMIT 10
/// Maximum number of characters that can be fuzzy matched
#define MAXMATCHES 256
/// Compute a score for a fuzzy matched string. The matching character locations
/// are in 'matches'.
static int fuzzy_match_compute_score(const char_u *const str, const int strSz,
const matchidx_T *const matches, const int numMatches)
FUNC_ATTR_NONNULL_ALL FUNC_ATTR_WARN_UNUSED_RESULT FUNC_ATTR_PURE
{
// Initialize score
int score = 100;
// Apply leading letter penalty
int penalty = LEADING_LETTER_PENALTY * matches[0];
if (penalty < MAX_LEADING_LETTER_PENALTY) {
penalty = MAX_LEADING_LETTER_PENALTY;
}
score += penalty;
// Apply unmatched penalty
const int unmatched = strSz - numMatches;
score += UNMATCHED_LETTER_PENALTY * unmatched;
// Apply ordering bonuses
for (int i = 0; i < numMatches; i++) {
const matchidx_T currIdx = matches[i];
if (i > 0) {
const matchidx_T prevIdx = matches[i - 1];
// Sequential
if (currIdx == prevIdx + 1) {
score += SEQUENTIAL_BONUS;
}
}
// Check for bonuses based on neighbor character value
if (currIdx > 0) {
// Camel case
const char_u *p = str;
int neighbor;
for (matchidx_T sidx = 0; sidx < currIdx; sidx++) {
neighbor = utf_ptr2char(p);
mb_ptr2char_adv(&p);
}
const int curr = utf_ptr2char(p);
if (mb_islower(neighbor) && mb_isupper(curr)) {
score += CAMEL_BONUS;
}
// Separator
const bool neighborSeparator = neighbor == '_' || neighbor == ' ';
if (neighborSeparator) {
score += SEPARATOR_BONUS;
}
} else {
// First letter
score += FIRST_LETTER_BONUS;
}
}
return score;
}
static bool fuzzy_match_recursive(const char_u *fuzpat, const char_u *str, matchidx_T strIdx,
int *const outScore, const char_u *const strBegin,
const int strLen, const matchidx_T *const srcMatches,
matchidx_T *const matches, const int maxMatches, int nextMatch,
int *const recursionCount)
FUNC_ATTR_NONNULL_ARG(1, 2, 4, 5, 8, 11) FUNC_ATTR_WARN_UNUSED_RESULT
{ {
// Recursion params // Recursion params
bool recursiveMatch = false; bool recursiveMatch = false;
char_u bestRecursiveMatches[256]; matchidx_T bestRecursiveMatches[MAXMATCHES];
int bestRecursiveScore = 0; int bestRecursiveScore = 0;
// Count recursions // Count recursions
(*recursionCount)++; (*recursionCount)++;
if (*recursionCount >= recursionLimit) { if (*recursionCount >= FUZZY_MATCH_RECURSION_LIMIT) {
return false; return false;
} }
@ -4839,119 +4926,59 @@ static bool fuzzy_match_recursive(const char_u *fuzpat, const char_u *str, int *
// Loop through fuzpat and str looking for a match // Loop through fuzpat and str looking for a match
bool first_match = true; bool first_match = true;
while (*fuzpat != '\0' && *str != '\0') { while (*fuzpat != NUL && *str != NUL) {
const int c1 = utf_ptr2char(fuzpat);
const int c2 = utf_ptr2char(str);
// Found match // Found match
if (mb_tolower(*fuzpat) == mb_tolower(*str)) { if (mb_tolower(c1) == mb_tolower(c2)) {
// Supplied matches buffer was too short // Supplied matches buffer was too short
if (nextMatch >= maxMatches) { if (nextMatch >= maxMatches) {
return false; return false;
} }
// "Copy-on-Write" srcMatches into matches // "Copy-on-Write" srcMatches into matches
if (first_match && srcMatches) { if (first_match && srcMatches != NULL) {
memcpy(matches, srcMatches, nextMatch); memcpy(matches, srcMatches, nextMatch * sizeof(srcMatches[0]));
first_match = false; first_match = false;
} }
// Recursive call that "skips" this match // Recursive call that "skips" this match
char_u recursiveMatches[256]; matchidx_T recursiveMatches[MAXMATCHES];
int recursiveScore = 0; int recursiveScore = 0;
if (fuzzy_match_recursive(fuzpat, str + 1, &recursiveScore, strBegin, matches, const char_u *const next_char = str + utfc_ptr2len(str);
recursiveMatches, sizeof(recursiveMatches), nextMatch, if (fuzzy_match_recursive(fuzpat, next_char, strIdx + 1, &recursiveScore, strBegin, strLen,
recursionCount, recursionLimit)) { matches, recursiveMatches,
sizeof(recursiveMatches) / sizeof(recursiveMatches[0]), nextMatch,
recursionCount)) {
// Pick best recursive score // Pick best recursive score
if (!recursiveMatch || recursiveScore > bestRecursiveScore) { if (!recursiveMatch || recursiveScore > bestRecursiveScore) {
memcpy(bestRecursiveMatches, recursiveMatches, 256); memcpy(bestRecursiveMatches, recursiveMatches, MAXMATCHES * sizeof(recursiveMatches[0]));
bestRecursiveScore = recursiveScore; bestRecursiveScore = recursiveScore;
} }
recursiveMatch = true; recursiveMatch = true;
} }
// Advance // Advance
matches[nextMatch++] = (char_u)(str - strBegin); matches[nextMatch++] = strIdx;
fuzpat++; mb_ptr2char_adv(&fuzpat);
} }
str++; mb_ptr2char_adv(&str);
strIdx++;
} }
// Determine if full fuzpat was matched // Determine if full fuzpat was matched
const bool matched = *fuzpat == '\0'; const bool matched = *fuzpat == NUL;
// Calculate score // Calculate score
if (matched) { if (matched) {
// bonus for adjacent matches *outScore = fuzzy_match_compute_score(strBegin, strLen, matches, nextMatch);
const int sequential_bonus = 15;
// bonus if match occurs after a separator
const int separator_bonus = 30;
// bonus if match is uppercase and prev is lower
const int camel_bonus = 30;
// bonus if the first letter is matched
const int first_letter_bonus = 15;
// penalty applied for every letter in str before the first match
const int leading_letter_penalty = -5;
// maximum penalty for leading letters
const int max_leading_letter_penalty = -15;
// penalty for every letter that doesn't matter
const int unmatched_letter_penalty = -1;
// Iterate str to end
while (*str != '\0') {
str++;
}
// Initialize score
*outScore = 100;
// Apply leading letter penalty
int penalty = leading_letter_penalty * matches[0];
if (penalty < max_leading_letter_penalty) {
penalty = max_leading_letter_penalty;
}
*outScore += penalty;
// Apply unmatched penalty
const int unmatched = (int)(str - strBegin) - nextMatch;
*outScore += unmatched_letter_penalty * unmatched;
// Apply ordering bonuses
for (int i = 0; i < nextMatch; i++) {
const char_u currIdx = matches[i];
if (i > 0) {
const char_u prevIdx = matches[i - 1];
// Sequential
if (currIdx == (prevIdx + 1)) {
*outScore += sequential_bonus;
}
}
// Check for bonuses based on neighbor character value
if (currIdx > 0) {
// Camel case
const char_u neighbor = strBegin[currIdx - 1];
const char_u curr = strBegin[currIdx];
if (islower(neighbor) && isupper(curr)) {
*outScore += camel_bonus;
}
// Separator
const bool neighborSeparator = neighbor == '_' || neighbor == ' ';
if (neighborSeparator) {
*outScore += separator_bonus;
}
} else {
// First letter
*outScore += first_letter_bonus;
}
}
} }
// Return best result // Return best result
if (recursiveMatch && (!matched || bestRecursiveScore > *outScore)) { if (recursiveMatch && (!matched || bestRecursiveScore > *outScore)) {
// Recursive score is better than "this" // Recursive score is better than "this"
memcpy(matches, bestRecursiveMatches, maxMatches); memcpy(matches, bestRecursiveMatches, maxMatches * sizeof(matches[0]));
*outScore = bestRecursiveScore; *outScore = bestRecursiveScore;
return true; return true;
} else if (matched) { } else if (matched) {
@ -4969,21 +4996,22 @@ static bool fuzzy_match_recursive(const char_u *fuzpat, const char_u *str, int *
/// normalized and varies with pattern. /// normalized and varies with pattern.
/// Recursion is limited internally (default=10) to prevent degenerate cases /// Recursion is limited internally (default=10) to prevent degenerate cases
/// (fuzpat="aaaaaa" str="aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"). /// (fuzpat="aaaaaa" str="aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa").
/// Uses char_u for match indices. Therefore patterns are limited to 256 /// Uses char_u for match indices. Therefore patterns are limited to MAXMATCHES
/// characters. /// characters.
/// ///
/// Returns true if fuzpat is found AND calculates a score. /// @return true if 'fuzpat' matches 'str'. Also returns the match score in
static bool fuzzy_match(const char_u *const str, const char_u *const fuzpat, int *const outScore) /// 'outScore' and the matching character positions in 'matches'.
static bool fuzzy_match(char_u *const str, const char_u *const fuzpat, int *const outScore,
matchidx_T *const matches, const int maxMatches)
FUNC_ATTR_NONNULL_ALL FUNC_ATTR_WARN_UNUSED_RESULT FUNC_ATTR_NONNULL_ALL FUNC_ATTR_WARN_UNUSED_RESULT
{ {
char_u matches[256];
int recursionCount = 0; int recursionCount = 0;
int recursionLimit = 10; const int len = mb_charlen(str);
*outScore = 0; *outScore = 0;
return fuzzy_match_recursive(fuzpat, str, outScore, str, NULL, matches, sizeof(matches), 0, return fuzzy_match_recursive(fuzpat, str, 0, outScore, str, len, NULL, matches, maxMatches, 0,
&recursionCount, recursionLimit); &recursionCount);
} }
/// Sort the fuzzy matches in the descending order of the match score. /// Sort the fuzzy matches in the descending order of the match score.
@ -4996,70 +5024,188 @@ static int fuzzy_item_compare(const void *const s1, const void *const s2)
return v1 == v2 ? 0 : v1 > v2 ? -1 : 1; return v1 == v2 ? 0 : v1 > v2 ? -1 : 1;
} }
/// Fuzzy search the string 'str' in 'strlist' and return the matching strings /// Fuzzy search the string 'str' in a list of 'items' and return the matching
/// in 'fmatchlist'. /// strings in 'fmatchlist'.
static void match_fuzzy(const list_T *const strlist, const char_u *const str, /// If 'items' is a list of strings, then search for 'str' in the list.
list_T *const fmatchlist) /// If 'items' is a list of dicts, then either use 'key' to lookup the string
FUNC_ATTR_NONNULL_ARG(2, 3) /// for each item or use 'item_cb' Funcref function to get the string.
/// If 'retmatchpos' is true, then return a list of positions where 'str'
/// matches for each item.
static void match_fuzzy(list_T *const items, char_u *const str, const char_u *const key,
Callback *const item_cb, const bool retmatchpos, list_T *const fmatchlist)
FUNC_ATTR_NONNULL_ARG(2, 4, 6)
{ {
const long len = tv_list_len(strlist); const long len = tv_list_len(items);
if (len == 0) { if (len == 0) {
return; return;
} }
fuzzyItem_T *const ptrs = xmalloc(sizeof(fuzzyItem_T) * len); fuzzyItem_T *const ptrs = xcalloc(len, sizeof(fuzzyItem_T));
long i = 0; long i = 0;
bool found_match = false; bool found_match = false;
matchidx_T matches[MAXMATCHES];
// For all the string items in strlist, get the fuzzy matching score // For all the string items in items, get the fuzzy matching score
TV_LIST_ITER_CONST(strlist, li, { TV_LIST_ITER(items, li, {
ptrs[i].item = li; ptrs[i].item = li;
ptrs[i].score = -9999; ptrs[i].score = SCORE_NONE;
// ignore non-string items in the list char_u *itemstr = NULL;
typval_T rettv;
rettv.v_type = VAR_UNKNOWN;
const typval_T *const tv = TV_LIST_ITEM_TV(li); const typval_T *const tv = TV_LIST_ITEM_TV(li);
if (tv->v_type == VAR_STRING && tv->vval.v_string != NULL) { if (tv->v_type == VAR_STRING) { // list of strings
int score; itemstr = tv->vval.v_string;
if (fuzzy_match(tv->vval.v_string, str, &score)) { } else if (tv->v_type == VAR_DICT && (key != NULL || item_cb->type != kCallbackNone)) {
ptrs[i].score = score; // For a dict, either use the specified key to lookup the string or
found_match = true; // use the specified callback function to get the string.
if (key != NULL) {
itemstr = (char_u *)tv_dict_get_string(tv->vval.v_dict, (const char *)key, false);
} else {
typval_T argv[2];
// Invoke the supplied callback (if any) to get the dict item
tv->vval.v_dict->dv_refcount++;
argv[0].v_type = VAR_DICT;
argv[0].vval.v_dict = tv->vval.v_dict;
argv[1].v_type = VAR_UNKNOWN;
if (callback_call(item_cb, 1, argv, &rettv)) {
if (rettv.v_type == VAR_STRING) {
itemstr = rettv.vval.v_string;
}
}
tv_dict_unref(tv->vval.v_dict);
} }
} }
int score;
if (itemstr != NULL
&& fuzzy_match(itemstr, str, &score, matches, sizeof(matches) / sizeof(matches[0]))) {
// Copy the list of matching positions in itemstr to a list, if
// 'retmatchpos' is set.
if (retmatchpos) {
const int strsz = mb_charlen(str);
ptrs[i].lmatchpos = tv_list_alloc(strsz);
for (int j = 0; j < strsz; j++) {
tv_list_append_number(ptrs[i].lmatchpos, matches[j]);
}
}
ptrs[i].score = score;
found_match = true;
}
i++; i++;
tv_clear(&rettv);
}); });
if (found_match) { if (found_match) {
// Sort the list by the descending order of the match score // Sort the list by the descending order of the match score
qsort(ptrs, (size_t)len, sizeof(fuzzyItem_T), fuzzy_item_compare); qsort(ptrs, len, sizeof(fuzzyItem_T), fuzzy_item_compare);
// Copy the matching strings with 'score != -9999' to the return list // For matchfuzzy(), return a list of matched strings.
// ['str1', 'str2', 'str3']
// For matchfuzzypos(), return a list with two items.
// The first item is a list of matched strings. The second item
// is a list of lists where each list item is a list of matched
// character positions.
// [['str1', 'str2', 'str3'], [[1, 3], [1, 3], [1, 3]]]
list_T *l;
if (retmatchpos) {
const listitem_T *const li = tv_list_find(fmatchlist, 0);
assert(li != NULL && TV_LIST_ITEM_TV(li)->vval.v_list != NULL);
l = TV_LIST_ITEM_TV(li)->vval.v_list;
} else {
l = fmatchlist;
}
// Copy the matching strings with a valid score to the return list
for (i = 0; i < len; i++) { for (i = 0; i < len; i++) {
if (ptrs[i].score == -9999) { if (ptrs[i].score == SCORE_NONE) {
break; break;
} }
const typval_T *const tv = TV_LIST_ITEM_TV(ptrs[i].item); tv_list_append_tv(l, TV_LIST_ITEM_TV(ptrs[i].item));
tv_list_append_string(fmatchlist, (const char *)tv->vval.v_string, -1); }
// next copy the list of matching positions
if (retmatchpos) {
const listitem_T *const li = tv_list_find(fmatchlist, -1);
assert(li != NULL && TV_LIST_ITEM_TV(li)->vval.v_list != NULL);
l = TV_LIST_ITEM_TV(li)->vval.v_list;
for (i = 0; i < len; i++) {
if (ptrs[i].score == SCORE_NONE) {
break;
}
tv_list_append_list(l, ptrs[i].lmatchpos);
}
} }
} }
xfree(ptrs); xfree(ptrs);
} }
/// "matchfuzzy()" function /// Do fuzzy matching. Returns the list of matched strings in 'rettv'.
void f_matchfuzzy(typval_T *argvars, typval_T *rettv, FunPtr fptr) /// If 'retmatchpos' is true, also returns the matching character positions.
static void do_fuzzymatch(const typval_T *const argvars, typval_T *const rettv,
const bool retmatchpos)
FUNC_ATTR_NONNULL_ALL
{ {
if (argvars[0].v_type != VAR_LIST) { // validate and get the arguments
emsg(_(e_listreq)); if (argvars[0].v_type != VAR_LIST || argvars[0].vval.v_list == NULL) {
return; semsg(_(e_listarg), retmatchpos ? "matchfuzzypos()" : "matchfuzzy()");
}
if (argvars[0].vval.v_list == NULL) {
return; return;
} }
if (argvars[1].v_type != VAR_STRING || argvars[1].vval.v_string == NULL) { if (argvars[1].v_type != VAR_STRING || argvars[1].vval.v_string == NULL) {
semsg(_(e_invarg2), tv_get_string(&argvars[1])); semsg(_(e_invarg2), tv_get_string(&argvars[1]));
return; return;
} }
match_fuzzy(argvars[0].vval.v_list, (const char_u *)tv_get_string(&argvars[1]),
tv_list_alloc_ret(rettv, kListLenUnknown)); Callback cb = CALLBACK_NONE;
const char_u *key = NULL;
if (argvars[2].v_type != VAR_UNKNOWN) {
if (argvars[2].v_type != VAR_DICT || argvars[2].vval.v_dict == NULL) {
emsg(_(e_dictreq));
return;
}
// To search a dict, either a callback function or a key can be
// specified.
dict_T *const d = argvars[2].vval.v_dict;
const dictitem_T *const di = tv_dict_find(d, "key", -1);
if (di != NULL) {
if (di->di_tv.v_type != VAR_STRING || di->di_tv.vval.v_string == NULL
|| *di->di_tv.vval.v_string == NUL) {
semsg(_(e_invarg2), tv_get_string(&di->di_tv));
return;
}
key = (const char_u *)tv_get_string(&di->di_tv);
} else if (!tv_dict_get_callback(d, "text_cb", -1, &cb)) {
semsg(_(e_invargval), "text_cb");
return;
}
}
// get the fuzzy matches
tv_list_alloc_ret(rettv, retmatchpos ? 2 : kListLenUnknown);
if (retmatchpos) {
// For matchfuzzypos(), a list with two items are returned. First item
// is a list of matching strings and the second item is a list of
// lists with matching positions within each string.
tv_list_append_list(rettv->vval.v_list, tv_list_alloc(kListLenUnknown));
tv_list_append_list(rettv->vval.v_list, tv_list_alloc(kListLenUnknown));
}
match_fuzzy(argvars[0].vval.v_list, (char_u *)tv_get_string(&argvars[1]), key, &cb, retmatchpos,
rettv->vval.v_list);
callback_free(&cb);
}
/// "matchfuzzy()" function
void f_matchfuzzy(typval_T *argvars, typval_T *rettv, FunPtr fptr)
{
do_fuzzymatch(argvars, rettv, false);
}
/// "matchfuzzypos()" function
void f_matchfuzzypos(typval_T *argvars, typval_T *rettv, FunPtr fptr)
{
do_fuzzymatch(argvars, rettv, true);
} }
/// Find identifiers or defines in included files. /// Find identifiers or defines in included files.

View File

@ -1731,32 +1731,6 @@ func Test_nr2char()
call assert_equal("\x80\xfc\b\xfd\x80\xfeX\x80\xfeX\x80\xfeX\x80\xfeX\x80\xfeX", eval('"\<M-' .. nr2char(0x40000000) .. '>"')) call assert_equal("\x80\xfc\b\xfd\x80\xfeX\x80\xfeX\x80\xfeX\x80\xfeX\x80\xfeX", eval('"\<M-' .. nr2char(0x40000000) .. '>"'))
endfunc endfunc
" Test for matchfuzzy()
func Test_matchfuzzy()
call assert_fails('call matchfuzzy(10, "abc")', 'E714:')
" Needs v8.2.1183.
" call assert_fails('call matchfuzzy(["abc"], [])', 'E730:')
call assert_fails('call matchfuzzy(["abc"], [])', 'E475:')
call assert_equal([], matchfuzzy([], 'abc'))
call assert_equal([], matchfuzzy(['abc'], ''))
call assert_equal(['abc'], matchfuzzy(['abc', 10], 'ac'))
call assert_equal([], matchfuzzy([10, 20], 'ac'))
call assert_equal(['abc'], matchfuzzy(['abc'], 'abc'))
call assert_equal(['crayon', 'camera'], matchfuzzy(['camera', 'crayon'], 'cra'))
call assert_equal(['aabbaa', 'aaabbbaaa', 'aaaabbbbaaaa', 'aba'], matchfuzzy(['aba', 'aabbaa', 'aaabbbaaa', 'aaaabbbbaaaa'], 'aa'))
call assert_equal(['one'], matchfuzzy(['one', 'two'], 'one'))
call assert_equal(['oneTwo', 'onetwo'], matchfuzzy(['onetwo', 'oneTwo'], 'oneTwo'))
call assert_equal(['one_two', 'onetwo'], matchfuzzy(['onetwo', 'one_two'], 'oneTwo'))
call assert_equal(['aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'], matchfuzzy(['aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'], 'aa'))
call assert_equal([], matchfuzzy([repeat('a', 300)], repeat('a', 257)))
%bw!
eval ['somebuf', 'anotherone', 'needle', 'yetanotherone']->map({_, v -> bufadd(v) + bufload(v)})
let l = getbufinfo()->map({_, v -> v.name})->matchfuzzy('ndl')
call assert_equal(1, len(l))
call assert_match('needle', l[0])
endfunc
" Test for getcurpos() and setpos() " Test for getcurpos() and setpos()
func Test_getcurpos_setpos() func Test_getcurpos_setpos()
new new

View File

@ -0,0 +1,203 @@
" Tests for fuzzy matching
source shared.vim
source check.vim
" Test for matchfuzzy()
func Test_matchfuzzy()
call assert_fails('call matchfuzzy(10, "abc")', 'E686:')
" Needs v8.2.1183; match the final error that's thrown for now
" call assert_fails('call matchfuzzy(["abc"], [])', 'E730:')
call assert_fails('call matchfuzzy(["abc"], [])', 'E475:')
call assert_fails("let x = matchfuzzy(v:_null_list, 'foo')", 'E686:')
call assert_fails('call matchfuzzy(["abc"], v:_null_string)', 'E475:')
call assert_equal([], matchfuzzy([], 'abc'))
call assert_equal([], matchfuzzy(['abc'], ''))
call assert_equal(['abc'], matchfuzzy(['abc', 10], 'ac'))
call assert_equal([], matchfuzzy([10, 20], 'ac'))
call assert_equal(['abc'], matchfuzzy(['abc'], 'abc'))
call assert_equal(['crayon', 'camera'], matchfuzzy(['camera', 'crayon'], 'cra'))
call assert_equal(['aabbaa', 'aaabbbaaa', 'aaaabbbbaaaa', 'aba'], matchfuzzy(['aba', 'aabbaa', 'aaabbbaaa', 'aaaabbbbaaaa'], 'aa'))
call assert_equal(['one'], matchfuzzy(['one', 'two'], 'one'))
call assert_equal(['oneTwo', 'onetwo'], matchfuzzy(['onetwo', 'oneTwo'], 'oneTwo'))
call assert_equal(['one_two', 'onetwo'], matchfuzzy(['onetwo', 'one_two'], 'oneTwo'))
call assert_equal(['aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'], matchfuzzy(['aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'], 'aa'))
call assert_equal(256, matchfuzzy([repeat('a', 256)], repeat('a', 256))[0]->len())
call assert_equal([], matchfuzzy([repeat('a', 300)], repeat('a', 257)))
" Tests for match preferences
" preference for camel case match
call assert_equal(['oneTwo', 'onetwo'], ['onetwo', 'oneTwo']->matchfuzzy('onetwo'))
" preference for match after a separator (_ or space)
call assert_equal(['one_two', 'one two', 'onetwo'], ['onetwo', 'one_two', 'one two']->matchfuzzy('onetwo'))
" preference for leading letter match
call assert_equal(['onetwo', 'xonetwo'], ['xonetwo', 'onetwo']->matchfuzzy('onetwo'))
" preference for sequential match
call assert_equal(['onetwo', 'oanbectdweo'], ['oanbectdweo', 'onetwo']->matchfuzzy('onetwo'))
" non-matching leading letter(s) penalty
call assert_equal(['xonetwo', 'xxonetwo'], ['xxonetwo', 'xonetwo']->matchfuzzy('onetwo'))
" total non-matching letter(s) penalty
call assert_equal(['one', 'onex', 'onexx'], ['onexx', 'one', 'onex']->matchfuzzy('one'))
%bw!
eval ['somebuf', 'anotherone', 'needle', 'yetanotherone']->map({_, v -> bufadd(v) + bufload(v)})
let l = getbufinfo()->map({_, v -> v.name})->matchfuzzy('ndl')
call assert_equal(1, len(l))
call assert_match('needle', l[0])
let l = [{'id' : 5, 'val' : 'crayon'}, {'id' : 6, 'val' : 'camera'}]
call assert_equal([{'id' : 6, 'val' : 'camera'}], matchfuzzy(l, 'cam', {'text_cb' : {v -> v.val}}))
call assert_equal([{'id' : 6, 'val' : 'camera'}], matchfuzzy(l, 'cam', {'key' : 'val'}))
call assert_equal([], matchfuzzy(l, 'day', {'text_cb' : {v -> v.val}}))
call assert_equal([], matchfuzzy(l, 'day', {'key' : 'val'}))
call assert_fails("let x = matchfuzzy(l, 'cam', 'random')", 'E715:')
call assert_equal([], matchfuzzy(l, 'day', {'text_cb' : {v -> []}}))
call assert_equal([], matchfuzzy(l, 'day', {'text_cb' : {v -> 1}}))
call assert_fails("let x = matchfuzzy(l, 'day', {'text_cb' : {a, b -> 1}})", 'E119:')
call assert_equal([], matchfuzzy(l, 'cam'))
" Nvim's callback implementation is different, so E6000 is expected instead,
" but we need v8.2.1183 to assert it
" call assert_fails("let x = matchfuzzy(l, 'cam', {'text_cb' : []})", 'E921:')
" call assert_fails("let x = matchfuzzy(l, 'cam', {'text_cb' : []})", 'E6000:')
call assert_fails("let x = matchfuzzy(l, 'cam', {'text_cb' : []})", 'E475:')
" call assert_fails("let x = matchfuzzy(l, 'foo', {'key' : []})", 'E730:')
call assert_fails("let x = matchfuzzy(l, 'foo', {'key' : []})", 'E475:')
call assert_fails("let x = matchfuzzy(l, 'cam', v:_null_dict)", 'E715:')
call assert_fails("let x = matchfuzzy(l, 'foo', {'key' : v:_null_string})", 'E475:')
" Nvim doesn't have null functions
" call assert_fails("let x = matchfuzzy(l, 'foo', {'text_cb' : test_null_function()})", 'E475:')
let l = [{'id' : 5, 'name' : 'foo'}, {'id' : 6, 'name' : []}, {'id' : 7}]
call assert_fails("let x = matchfuzzy(l, 'foo', {'key' : 'name'})", 'E730:')
" Test in latin1 encoding
let save_enc = &encoding
" Nvim supports utf-8 encoding only
" set encoding=latin1
call assert_equal(['abc'], matchfuzzy(['abc'], 'abc'))
let &encoding = save_enc
endfunc
" Test for the fuzzymatchpos() function
func Test_matchfuzzypos()
call assert_equal([['curl', 'world'], [[2,3], [2,3]]], matchfuzzypos(['world', 'curl'], 'rl'))
call assert_equal([['curl', 'world'], [[2,3], [2,3]]], matchfuzzypos(['world', 'one', 'curl'], 'rl'))
call assert_equal([['hello', 'hello world hello world'],
\ [[0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]],
\ matchfuzzypos(['hello world hello world', 'hello', 'world'], 'hello'))
call assert_equal([['aaaaaaa'], [[0, 1, 2]]], matchfuzzypos(['aaaaaaa'], 'aaa'))
call assert_equal([[], []], matchfuzzypos(['world', 'curl'], 'ab'))
let x = matchfuzzypos([repeat('a', 256)], repeat('a', 256))
call assert_equal(range(256), x[1][0])
call assert_equal([[], []], matchfuzzypos([repeat('a', 300)], repeat('a', 257)))
call assert_equal([[], []], matchfuzzypos([], 'abc'))
" match in a long string
call assert_equal([[repeat('x', 300) .. 'abc'], [[300, 301, 302]]],
\ matchfuzzypos([repeat('x', 300) .. 'abc'], 'abc'))
" preference for camel case match
call assert_equal([['xabcxxaBc'], [[6, 7, 8]]], matchfuzzypos(['xabcxxaBc'], 'abc'))
" preference for match after a separator (_ or space)
call assert_equal([['xabx_ab'], [[5, 6]]], matchfuzzypos(['xabx_ab'], 'ab'))
" preference for leading letter match
call assert_equal([['abcxabc'], [[0, 1]]], matchfuzzypos(['abcxabc'], 'ab'))
" preference for sequential match
call assert_equal([['aobncedone'], [[7, 8, 9]]], matchfuzzypos(['aobncedone'], 'one'))
" best recursive match
call assert_equal([['xoone'], [[2, 3, 4]]], matchfuzzypos(['xoone'], 'one'))
let l = [{'id' : 5, 'val' : 'crayon'}, {'id' : 6, 'val' : 'camera'}]
call assert_equal([[{'id' : 6, 'val' : 'camera'}], [[0, 1, 2]]],
\ matchfuzzypos(l, 'cam', {'text_cb' : {v -> v.val}}))
call assert_equal([[{'id' : 6, 'val' : 'camera'}], [[0, 1, 2]]],
\ matchfuzzypos(l, 'cam', {'key' : 'val'}))
call assert_equal([[], []], matchfuzzypos(l, 'day', {'text_cb' : {v -> v.val}}))
call assert_equal([[], []], matchfuzzypos(l, 'day', {'key' : 'val'}))
call assert_fails("let x = matchfuzzypos(l, 'cam', 'random')", 'E715:')
call assert_equal([[], []], matchfuzzypos(l, 'day', {'text_cb' : {v -> []}}))
call assert_equal([[], []], matchfuzzypos(l, 'day', {'text_cb' : {v -> 1}}))
call assert_fails("let x = matchfuzzypos(l, 'day', {'text_cb' : {a, b -> 1}})", 'E119:')
call assert_equal([[], []], matchfuzzypos(l, 'cam'))
" Nvim's callback implementation is different, so E6000 is expected instead,
" but we need v8.2.1183 to assert it
" call assert_fails("let x = matchfuzzypos(l, 'cam', {'text_cb' : []})", 'E921:')
" call assert_fails("let x = matchfuzzypos(l, 'cam', {'text_cb' : []})", 'E6000:')
call assert_fails("let x = matchfuzzypos(l, 'cam', {'text_cb' : []})", 'E475:')
" call assert_fails("let x = matchfuzzypos(l, 'foo', {'key' : []})", 'E730:')
call assert_fails("let x = matchfuzzypos(l, 'foo', {'key' : []})", 'E475:')
call assert_fails("let x = matchfuzzypos(l, 'cam', v:_null_dict)", 'E715:')
call assert_fails("let x = matchfuzzypos(l, 'foo', {'key' : v:_null_string})", 'E475:')
" Nvim doesn't have null functions
" call assert_fails("let x = matchfuzzypos(l, 'foo', {'text_cb' : test_null_function()})", 'E475:')
let l = [{'id' : 5, 'name' : 'foo'}, {'id' : 6, 'name' : []}, {'id' : 7}]
call assert_fails("let x = matchfuzzypos(l, 'foo', {'key' : 'name'})", 'E730:')
endfunc
func Test_matchfuzzy_mbyte()
CheckFeature multi_lang
call assert_equal(['ンヹㄇヺヴ'], matchfuzzy(['ンヹㄇヺヴ'], 'ヹヺ'))
" reverse the order of characters
call assert_equal([], matchfuzzy(['ンヹㄇヺヴ'], 'ヺヹ'))
call assert_equal(['αβΩxxx', 'xαxβxΩx'],
\ matchfuzzy(['αβΩxxx', 'xαxβxΩx'], 'αβΩ'))
call assert_equal(['ππbbππ', 'πππbbbπππ', 'ππππbbbbππππ', 'πbπ'],
\ matchfuzzy(['πbπ', 'ππbbππ', 'πππbbbπππ', 'ππππbbbbππππ'], 'ππ'))
" preference for camel case match
call assert_equal(['oneĄwo', 'oneąwo'],
\ ['oneąwo', 'oneĄwo']->matchfuzzy('oneąwo'))
" preference for match after a separator (_ or space)
call assert_equal(['Ⅱa_bㄟㄠ', 'Ⅱa bㄟㄠ', 'Ⅱabㄟㄠ'],
\ ['Ⅱabㄟㄠ', 'Ⅱa_bㄟㄠ', 'Ⅱa bㄟㄠ']->matchfuzzy('Ⅱabㄟㄠ'))
" preference for leading letter match
call assert_equal(['ŗŝţũŵż', 'xŗŝţũŵż'],
\ ['xŗŝţũŵż', 'ŗŝţũŵż']->matchfuzzy('ŗŝţũŵż'))
" preference for sequential match
call assert_equal(['ㄞㄡㄤfffifl', 'ㄞaㄡbㄤcffdfiefl'],
\ ['ㄞaㄡbㄤcffdfiefl', 'ㄞㄡㄤfffifl']->matchfuzzy('ㄞㄡㄤfffifl'))
" non-matching leading letter(s) penalty
call assert_equal(['xㄞㄡㄤfffifl', 'xxㄞㄡㄤfffifl'],
\ ['xxㄞㄡㄤfffifl', 'xㄞㄡㄤfffifl']->matchfuzzy('ㄞㄡㄤfffifl'))
" total non-matching letter(s) penalty
call assert_equal(['ŗŝţ', 'ŗŝţx', 'ŗŝţxx'],
\ ['ŗŝţxx', 'ŗŝţ', 'ŗŝţx']->matchfuzzy('ŗŝţ'))
endfunc
func Test_matchfuzzypos_mbyte()
CheckFeature multi_lang
call assert_equal([['こんにちは世界'], [[0, 1, 2, 3, 4]]],
\ matchfuzzypos(['こんにちは世界'], 'こんにちは'))
call assert_equal([['ンヹㄇヺヴ'], [[1, 3]]], matchfuzzypos(['ンヹㄇヺヴ'], 'ヹヺ'))
" reverse the order of characters
call assert_equal([[], []], matchfuzzypos(['ンヹㄇヺヴ'], 'ヺヹ'))
call assert_equal([['αβΩxxx', 'xαxβxΩx'], [[0, 1, 2], [1, 3, 5]]],
\ matchfuzzypos(['αβΩxxx', 'xαxβxΩx'], 'αβΩ'))
call assert_equal([['ππbbππ', 'πππbbbπππ', 'ππππbbbbππππ', 'πbπ'],
\ [[0, 1], [0, 1], [0, 1], [0, 2]]],
\ matchfuzzypos(['πbπ', 'ππbbππ', 'πππbbbπππ', 'ππππbbbbππππ'], 'ππ'))
call assert_equal([['ααααααα'], [[0, 1, 2]]],
\ matchfuzzypos(['ααααααα'], 'ααα'))
call assert_equal([[], []], matchfuzzypos(['ンヹㄇ', 'ŗŝţ'], 'fffifl'))
let x = matchfuzzypos([repeat('Ψ', 256)], repeat('Ψ', 256))
call assert_equal(range(256), x[1][0])
call assert_equal([[], []], matchfuzzypos([repeat('✓', 300)], repeat('✓', 257)))
" match in a long string
call assert_equal([[repeat('♪', 300) .. '✗✗✗'], [[300, 301, 302]]],
\ matchfuzzypos([repeat('♪', 300) .. '✗✗✗'], '✗✗✗'))
" preference for camel case match
call assert_equal([['xѳѵҁxxѳѴҁ'], [[6, 7, 8]]], matchfuzzypos(['xѳѵҁxxѳѴҁ'], 'ѳѵҁ'))
" preference for match after a separator (_ or space)
call assert_equal([['xちだx_ちだ'], [[5, 6]]], matchfuzzypos(['xちだx_ちだ'], 'ちだ'))
" preference for leading letter match
call assert_equal([[ѵҁxѳѵҁ'], [[0, 1]]], matchfuzzypos([ѵҁxѳѵҁ'], 'ѳѵ'))
" preference for sequential match
call assert_equal([['aンbヹcㄇdンヹㄇ'], [[7, 8, 9]]], matchfuzzypos(['aンbヹcㄇdンヹㄇ'], 'ンヹㄇ'))
" best recursive match
call assert_equal([['xффйд'], [[2, 3, 4]]], matchfuzzypos(['xффйд'], 'фйд'))
endfunc
" vim: shiftwidth=2 sts=2 expandtab