Add development memo about stemming JS code, acceleration tips about stemming, small bug fix

This commit is contained in:
shibukawa yoshiki 2014-01-22 02:12:04 -08:00
parent 18722362db
commit 6bda4586bd
3 changed files with 22 additions and 1 deletions

View File

@ -747,6 +747,15 @@ that use Sphinx' HTMLWriter class.
* ``sv`` -- Swedish
* ``tr`` -- Turkish
.. admonition:: Accelerate build speed
Each language (except Japanese) provides its own stemming algorithm.
Sphinx uses Python implementation by default. You can use
C implementation to accelerate building the index file.
* `PorterStemmer <https://pypi.python.org/pypi/PorterStemmer>`_ (`en`)
* `PyStemmer <https://pypi.python.org/pypi/PyStemmer>`_ (all languages)
.. versionadded:: 1.1
.. versionchanged:: 1.3

View File

@ -243,3 +243,15 @@ Debugging Tips
* Set the debugging options in the `Docutils configuration file
<http://docutils.sourceforge.net/docs/user/config.html>`_.
* JavaScript stemming algorithms in `sphinx/search/*.py` (except `en.py`) are
genereted by
`modified snowballcode generator <https://github.com/shibukawa/snowball>`_.
Generated `JSX <http://jsx.github.io/>`_ files are
in `this repository <https://github.com/shibukawa/snowball-stemmer.jsx>`_.
You can get resulting JavaScript files by the following command:
.. code-block:: bash
$ npm install
$ node_modules/.bin/grunt build # -> dest/*.global.js

View File

@ -89,7 +89,7 @@ var Stemmer = function() {
Return true if the target word should be registered in the search index.
This method is called after stemming.
"""
return not (((len(word) < 3) and (12353 < ord(word[0]) < 12436)) or
return len(word) == 0 or not (((len(word) < 3) and (12353 < ord(word[0]) < 12436)) or
(ord(word[0]) < 256 and (len(word) < 3 or word in self.stopwords or
word.isdigit())))