Speed up search index generation by caching word stemming calls.

Saves about 20 seconds when building the Python documentation.

Here are some stats for building the Python docs on my machine with
and without stem caching.

Without stem caching::

    % rm -fr _build
    % \time sphinx-build -q . _build/html
    158.22user 0.87system 2:39.25elapsed 99%CPU (0avgtext+0avgdata 400800maxresident)k
    104inputs+180240outputs (1major+113472minor)pagefaults 0swaps

    % \time sphinx-build -a -q . _build/html
    91.00user 0.67system 1:31.73elapsed 99%CPU (0avgtext+0avgdata 330704maxresident)k
    0inputs+69864outputs (1major+106009minor)pagefaults 0swaps

With stem caching::

    % rm -fr _build
    % \time sphinx-build -q . _build/html
    137.90user 1.10system 2:20.50elapsed 98%CPU (0avgtext+0avgdata 413344maxresident)k
    18896inputs+180232outputs (1major+113779minor)pagefaults 0swaps

    % \time sphinx-build -a -q . _build/html
    70.04user 0.74system 1:10.87elapsed 99%CPU (0avgtext+0avgdata 345632maxresident)k
    16inputs+69864outputs (1major+108010minor)pagefaults 0swaps
This commit is contained in:
Jonathan Waltman 2013-02-18 21:02:51 -06:00
parent 1c5638e40c
commit db9cfa4d1a
2 changed files with 13 additions and 1 deletions

View File

@ -6,6 +6,10 @@ Release 1.2 (in development)
* Fix text builder did not respect wide/fullwidth characters:
title underline width, table layout width and text wrap width.
* Speed up building the search index by caching the results of the word
stemming routines. Saves about 20 seconds when building the Python
documentation.
* #1062: sphinx.ext.autodoc use __init__ method signature for class signature.
* PR#111: Respect add_autodoc_attrgetter() even when inherited-members is set.

View File

@ -170,6 +170,8 @@ class IndexBuilder(object):
self._mapping = {}
# stemmed words in titles -> set(filenames)
self._title_mapping = {}
# word -> stemmed word
self._stem_cache = {}
# objtype -> index
self._objtypes = {}
# objtype index -> (domain, type, objname (localized))
@ -294,7 +296,13 @@ class IndexBuilder(object):
visitor = WordCollector(doctree, self.lang)
doctree.walk(visitor)
stem = self.lang.stem
# memoize self.lang.stem
def stem(word):
try:
return self._stem_cache[word]
except KeyError:
self._stem_cache[word] = self.lang.stem(word)
return self._stem_cache[word]
_filter = self.lang.word_filter
for word in itertools.chain(visitor.found_title_words,