Add development memo about stemming JS code, acceleration tips about stemming, small bug fix

2025-02-25 18:55:22 -06:00 · 2014-01-22 02:12:04 -08:00 · 2014-01-22 02:12:04 -08:00 · 6bda4586bd
commit 6bda4586bd
parent 18722362db
3 changed files with 22 additions and 1 deletions
--- a/doc/config.rst
+++ b/doc/config.rst
@ -747,6 +747,15 @@ that use Sphinx' HTMLWriter class.
   * ``sv`` -- Swedish
   * ``tr`` -- Turkish

+   .. admonition:: Accelerate build speed
+
+      Each language (except Japanese) provides its own stemming algorithm.
+      Sphinx uses Python implementation by default. You can use
+      C implementation to accelerate building the index file.
+
+      * `PorterStemmer <https://pypi.python.org/pypi/PorterStemmer>`_ (`en`)
+      * `PyStemmer <https://pypi.python.org/pypi/PyStemmer>`_ (all languages)
+
   .. versionadded:: 1.1

   .. versionchanged:: 1.3
--- a/doc/devguide.rst
+++ b/doc/devguide.rst
@ -243,3 +243,15 @@ Debugging Tips

 * Set the debugging options in the `Docutils configuration file
  <http://docutils.sourceforge.net/docs/user/config.html>`_.
+
+* JavaScript stemming algorithms in `sphinx/search/*.py` (except `en.py`) are
+  genereted by
+  `modified snowballcode generator <https://github.com/shibukawa/snowball>`_.
+  Generated `JSX <http://jsx.github.io/>`_ files are
+  in `this repository <https://github.com/shibukawa/snowball-stemmer.jsx>`_.
+  You can get resulting JavaScript files by the following command:
+
+  .. code-block:: bash
+
+     $ npm install
+     $ node_modules/.bin/grunt build # -> dest/*.global.js
--- a/sphinx/search/init.py
+++ b/sphinx/search/init.py
@ -89,7 +89,7 @@ var Stemmer = function() {
        Return true if the target word should be registered in the search index.
        This method is called after stemming.
        """
-        return not (((len(word) < 3) and (12353 < ord(word[0]) < 12436)) or
+        return len(word) == 0 or not (((len(word) < 3) and (12353 < ord(word[0]) < 12436)) or
            (ord(word[0]) < 256 and (len(word) < 3 or word in self.stopwords or
                                     word.isdigit())))