perf(filetype): implement parent pattern pre-matching (#29660)

Problem: calling `vim.filetype.match()` has performance bottleneck in
  that it has to match a lot of Lua patterns against several versions of
  input file name. This might be the problem if users need to call it
  synchronously a lot of times.

Solution: add "parent pattern pre-matching" which can be used to quickly
  reject several potential pattern matches at (usually rare) cost of
  adding time for one extra Lua pattern match.

  "Parent pattern" is a manually added/tracked grouping of filetype
  patterns which should have two properties:
    - Match at least the same set of strings as its filetype patterns.
      But not too much more.
    - Be fast to match.

  For them to be effective, group should consist from at least three
  filetype patterns.

  Example: for a filetpye pattern ".*/etc/a2ps/.*%.cfg", both "/etc/"
  and "%.cfg" are good parent patterns (prefer the one which can group
  more filetype patterns).

  After this commit, `vim.filetype.match()` on most inputs runs ~3.4
  times faster (while some inputs may see less impact if they match
  many parent patterns).
This commit is contained in:
Evgeni Chasnovski 2024-07-18 18:26:27 +03:00 committed by GitHub
parent c69ea53c9d
commit f61efe3fe7
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 641 additions and 510 deletions

View File

@ -302,4 +302,40 @@ used in new documentation:
- `{Only when compiled with ...}`: the vast majority of features have been
made non-optional (see https://github.com/neovim/neovim/wiki/Introduction)
==============================================================================
FILETYPE DETECTION *dev-vimpatch-filetype*
Nvim's filetype detection behavior matches Vim, but is implemented as part of
|vim.filetype| (see $VIMRUNTIME/lua/vim/filetype.lua).
Pattern matching has several differences:
- It is done using explicit Lua patterns (without implicit anchoring) instead
of Vim regexes: >
"*/debian/changelog" -> "/debian/changelog$"
"*/bind/db.*" -> "/bind/db%."
<
- Filetype patterns are grouped by their parent pattern to improve matching
performance. For this to work properly, parent pattern should:
- Match at least the same set of strings as filetype patterns inside it.
But not too much more.
- Be fast to match.
When adding a new filetype with pattern matching, consider the following:
- If there is already a group with appropriate parent pattern, use it.
- If there can be a fast and specific enough pattern to group at least
3 filetype patterns, add it as a separate grouped entry.
Good new parent pattern should be:
- Fast. Good rule of thumb is that it should be a short explicit string
(i.e. no quantifiers or character sets).
- Specific. Good rules of thumb (from best to worst):
- Full directory name (like "/etc/", "/log/").
- Part of a rare enough directory name (like "/conf", "git/").
- String reasonably rarely used in real full paths (like "nginx").
Example:
- Filetype pattern: ".*/etc/a2ps/.*%.cfg"
- Good parent: "/etc/"; "%.cfg$"
- Bad parent: "%." - fast, not specific; "/a2ps/.*%." - slow, specific
vim:tw=78:ts=8:noet:ft=help:norl:

File diff suppressed because it is too large Load Diff