FEATURE: Split up text segmentation for Chinese and Japanese.

* Chinese segmenetation will continue to rely on cppjieba
* Japanese segmentation will use our port of TinySegmenter
* Korean currently does not rely on segmentation which was dropped in c677877e4f
* SiteSetting.search_tokenize_chinese_japanese_korean has been split
into SiteSetting.search_tokenize_chinese and
SiteSetting.search_tokenize_japanese respectively
This commit is contained in:
Alan Guo Xiang Tan
2022-01-26 15:24:11 +08:00
parent 9ddd1f739e
commit 930f51e175
14 changed files with 406 additions and 72 deletions

View File

@@ -0,0 +1,20 @@
# frozen_string_literal: true
class ChangeSegmentCjkSiteSetting < ActiveRecord::Migration[6.1]
def up
execute <<~SQL
UPDATE site_settings
SET name = 'search_tokenize_chinese'
WHERE name = 'search_tokenize_chinese_japanese_korean'
SQL
execute <<~SQL
DELETE FROM site_settings
WHERE name = 'search_tokenize_chinese_japanese_korean'
SQL
end
def down
raise ActiveRecord::IrreversibleMigration
end
end