Full-text search
Setting up Elasticsearch to search for statuses authored, favourited, or mentioned in.
Last modified
Edit this page
Mastodon supports full-text search when Elasticsearch is available. Mastodon’s full-text search allows logged in users to find results from their own statuses, their mentions, their favourites, and their bookmarks. It deliberately does not allow searching for arbitrary strings in the entire database.
Installing Elasticsearch
Elasticsearch requires a Java runtime. If you don’t have Java already installed, do it now. Assuming you are logged in as root
:
apt install openjdk-17-jre-headless
Add the official Elasticsearch repository to apt:
wget -O /usr/share/keyrings/elasticsearch.asc https://artifacts.elastic.co/GPG-KEY-elasticsearch
echo "deb [signed-by=/usr/share/keyrings/elasticsearch.asc] https://artifacts.elastic.co/packages/7.x/apt stable main" > /etc/apt/sources.list.d/elastic-7.x.list
Now you can install Elasticsearch:
apt update
apt install elasticsearch
network.host
within /etc/elasticsearch/elasticsearch.yml
. Consider that anyone who can access Elasticsearch can access and modify any data within it, as there is no authentication layer. So it’s really important that the access is secured. Having a firewall that only exposes the 22, 80 and 443 ports is advisable, as outlined in the
main installation instructions. If you have a multi-host setup, you must know how to secure internal traffic.
2.0
and 2.14.1
are affected by an
exploit in the log4j
library. If affected, please refer to the
temporary mitigation from the Elasticsearch issue tracker.
To start Elasticsearch:
systemctl daemon-reload
systemctl enable --now elasticsearch
Configuring Mastodon
Edit .env.production
to add the following variables:
ES_ENABLED=true
ES_HOST=localhost
ES_PORT=9200
If you have multiple Mastodon servers on the same machine, and you are planning to use the same Elasticsearch installation for all of them, make sure that all of them have unique REDIS_NAMESPACE
in their configurations, to differentiate the indices. If you need to override the prefix of the Elasticsearch indices, you can set ES_PREFIX
directly.
After saving the new configuration, restart Mastodon processes for it to take effect:
systemctl restart mastodon-sidekiq
systemctl reload mastodon-web
Now it’s time to create the Elasticsearch indices and fill them with data:
su - mastodon
cd live
RAILS_ENV=production bin/tootctl search deploy
Search optimization for other languages
Chinese search optimization
The default analyzer of the Elasticsearch is the standard analyzer, which may not be the best especially for Chinese. To improve search experience, you can install a language specific analyzer. Before creating the indices in Elasticsearch, install the following Elasticsearch extensions:
And then modify Mastodon’s index definition as follows:
diff --git a/app/chewy/accounts_index.rb b/app/chewy/accounts_index.rb
--- a/app/chewy/accounts_index.rb
+++ b/app/chewy/accounts_index.rb
@@ -4,7 +4,7 @@ class AccountsIndex < Chewy::Index
settings index: { refresh_interval: '5m' }, analysis: {
analyzer: {
content: {
- tokenizer: 'whitespace',
+ tokenizer: 'ik_max_word',
filter: %w(lowercase asciifolding cjk_width),
},
diff --git a/app/chewy/statuses_index.rb b/app/chewy/statuses_index.rb
--- a/app/chewy/statuses_index.rb
+++ b/app/chewy/statuses_index.rb
@@ -16,9 +16,17 @@ class StatusesIndex < Chewy::Index
language: 'possessive_english',
},
},
+ char_filter: {
+ tsconvert: {
+ type: 'stconvert',
+ keep_both: false,
+ delimiter: '#',
+ convert_type: 't2s',
+ },
+ },
analyzer: {
content: {
- tokenizer: 'uax_url_email',
+ tokenizer: 'ik_max_word',
filter: %w(
english_possessive_stemmer
lowercase
@@ -27,6 +35,7 @@ class StatusesIndex < Chewy::Index
english_stop
english_stemmer
),
+ char_filter: %w(tsconvert),
},
},
}
diff --git a/app/chewy/tags_index.rb b/app/chewy/tags_index.rb
--- a/app/chewy/tags_index.rb
+++ b/app/chewy/tags_index.rb
@@ -2,10 +2,19 @@
class TagsIndex < Chewy::Index
settings index: { refresh_interval: '15m' }, analysis: {
+ char_filter: {
+ tsconvert: {
+ type: 'stconvert',
+ keep_both: false,
+ delimiter: '#',
+ convert_type: 't2s',
+ },
+ },
analyzer: {
content: {
- tokenizer: 'keyword',
+ tokenizer: 'ik_max_word',
filter: %w(lowercase asciifolding cjk_width),
+ char_filter: %w(tsconvert),
},
edge_ngram: {