discourse/spec/services/word_watcher_spec.rb

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

419 lines
14 KiB
Ruby
Raw Normal View History

# frozen_string_literal: true
RSpec.describe WordWatcher do
FEATURE: Add support for case-sensitive Watched Words (#17445) * FEATURE: Add case-sensitivity flag to watched_words Currently, all watched words are matched case-insensitively. This flag allows a watched word to be flagged for case-sensitive matching. To allow allow for backwards compatibility the flag is set to false by default. * FEATURE: Support case-sensitive creation of Watched Words via API Extend admin creation and upload of Watched Words to support case sensitive flag. This lays the ground work for supporting case-insensitive matching of Watched Words. Support for an extra column has also been introduced for the Watched Words upload CSV file. The new column structure is as follows: word,replacement,case_sentive * FEATURE: Enable case-sensitive matching of Watched Words WordWatcher's word_matcher_regexp now returns a list of regular expressions instead of one case-insensitive regular expression. With the ability to flag a Watched Word as case-sensitive, an action can have words of both sensitivities.This makes the use of the global Regexp::IGNORECASE flag added to all words problematic. To get around platform limitations around the use of subexpression level switches/flags, a list of regular expressions is returned instead, one for each case sensitivity. Word matching has also been updated to use this list of regular expressions instead of one. * FEATURE: Use case-sensitive regular expressions for Watched Words Update Watched Words regular expressions matching and processing to handle the extra metadata which comes along with the introduction of case-sensitive Watched Words. This allows case-sensitive Watched Words to matched as such. * DEV: Simplify type casting of case-sensitive flag from uploads Use builtin semantics instead of a custom method for converting string case flags in uploaded Watched Words to boolean. * UX: Add case-sensitivity details to Admin Watched Words UI Update Watched Word form to include a toggle for case-sensitivity. This also adds support for, case-sensitive testing and matching of Watched Word in the admin UI. * DEV: Code improvements from review feedback - Extract watched word regex creation out to a utility function - Make JS array presence check more explicit and readable * DEV: Extract Watched Word regex creation to utility function Clean-up work from review feedback. Reduce code duplication. * DEV: Rename word_matcher_regexp to word_matcher_regexp_list Since a list is returned now instead of a single regular expression, change `word_matcher_regexp` to `word_matcher_regexp_list` to better communicate this change. * DEV: Incorporate WordWatcher updates from upstream Resolve conflicts and ensure apply_to_text does not remove non-word characters in matches that aren't at the beginning of the line.
2022-08-02 16:06:03 +08:00
let(:raw) { <<~RAW.strip }
Do you like liquorice?
I really like them. One could even say that I am *addicted* to liquorice. And if
you can mix it up with some anise, then I'm in heaven ;)
RAW
after { Discourse.redis.flushdb }
FEATURE: Add support for case-sensitive Watched Words (#17445) * FEATURE: Add case-sensitivity flag to watched_words Currently, all watched words are matched case-insensitively. This flag allows a watched word to be flagged for case-sensitive matching. To allow allow for backwards compatibility the flag is set to false by default. * FEATURE: Support case-sensitive creation of Watched Words via API Extend admin creation and upload of Watched Words to support case sensitive flag. This lays the ground work for supporting case-insensitive matching of Watched Words. Support for an extra column has also been introduced for the Watched Words upload CSV file. The new column structure is as follows: word,replacement,case_sentive * FEATURE: Enable case-sensitive matching of Watched Words WordWatcher's word_matcher_regexp now returns a list of regular expressions instead of one case-insensitive regular expression. With the ability to flag a Watched Word as case-sensitive, an action can have words of both sensitivities.This makes the use of the global Regexp::IGNORECASE flag added to all words problematic. To get around platform limitations around the use of subexpression level switches/flags, a list of regular expressions is returned instead, one for each case sensitivity. Word matching has also been updated to use this list of regular expressions instead of one. * FEATURE: Use case-sensitive regular expressions for Watched Words Update Watched Words regular expressions matching and processing to handle the extra metadata which comes along with the introduction of case-sensitive Watched Words. This allows case-sensitive Watched Words to matched as such. * DEV: Simplify type casting of case-sensitive flag from uploads Use builtin semantics instead of a custom method for converting string case flags in uploaded Watched Words to boolean. * UX: Add case-sensitivity details to Admin Watched Words UI Update Watched Word form to include a toggle for case-sensitivity. This also adds support for, case-sensitive testing and matching of Watched Word in the admin UI. * DEV: Code improvements from review feedback - Extract watched word regex creation out to a utility function - Make JS array presence check more explicit and readable * DEV: Extract Watched Word regex creation to utility function Clean-up work from review feedback. Reduce code duplication. * DEV: Rename word_matcher_regexp to word_matcher_regexp_list Since a list is returned now instead of a single regular expression, change `word_matcher_regexp` to `word_matcher_regexp_list` to better communicate this change. * DEV: Incorporate WordWatcher updates from upstream Resolve conflicts and ensure apply_to_text does not remove non-word characters in matches that aren't at the beginning of the line.
2022-08-02 16:06:03 +08:00
describe ".words_for_action" do
it "returns words with metadata including case sensitivity flag" do
Fabricate(:watched_word, action: WatchedWord.actions[:censor])
word1 = Fabricate(:watched_word, action: WatchedWord.actions[:block]).word
word2 =
Fabricate(:watched_word, action: WatchedWord.actions[:block], case_sensitive: true).word
expect(described_class.words_for_action(:block)).to include(
word1 => {
case_sensitive: false,
},
word2 => {
case_sensitive: true,
},
)
end
it "returns word with metadata including replacement if word has replacement" do
word =
Fabricate(
:watched_word,
action: WatchedWord.actions[:link],
replacement: "http://test.localhost/",
).word
expect(described_class.words_for_action(:link)).to include(
word => {
case_sensitive: false,
replacement: "http://test.localhost/",
},
)
end
it "returns an empty hash when no words are present" do
expect(described_class.words_for_action(:tag)).to eq({})
end
end
describe ".word_matcher_regexp_list" do
let!(:word1) { Fabricate(:watched_word, action: WatchedWord.actions[:block]).word }
let!(:word2) { Fabricate(:watched_word, action: WatchedWord.actions[:block]).word }
FEATURE: Add support for case-sensitive Watched Words (#17445) * FEATURE: Add case-sensitivity flag to watched_words Currently, all watched words are matched case-insensitively. This flag allows a watched word to be flagged for case-sensitive matching. To allow allow for backwards compatibility the flag is set to false by default. * FEATURE: Support case-sensitive creation of Watched Words via API Extend admin creation and upload of Watched Words to support case sensitive flag. This lays the ground work for supporting case-insensitive matching of Watched Words. Support for an extra column has also been introduced for the Watched Words upload CSV file. The new column structure is as follows: word,replacement,case_sentive * FEATURE: Enable case-sensitive matching of Watched Words WordWatcher's word_matcher_regexp now returns a list of regular expressions instead of one case-insensitive regular expression. With the ability to flag a Watched Word as case-sensitive, an action can have words of both sensitivities.This makes the use of the global Regexp::IGNORECASE flag added to all words problematic. To get around platform limitations around the use of subexpression level switches/flags, a list of regular expressions is returned instead, one for each case sensitivity. Word matching has also been updated to use this list of regular expressions instead of one. * FEATURE: Use case-sensitive regular expressions for Watched Words Update Watched Words regular expressions matching and processing to handle the extra metadata which comes along with the introduction of case-sensitive Watched Words. This allows case-sensitive Watched Words to matched as such. * DEV: Simplify type casting of case-sensitive flag from uploads Use builtin semantics instead of a custom method for converting string case flags in uploaded Watched Words to boolean. * UX: Add case-sensitivity details to Admin Watched Words UI Update Watched Word form to include a toggle for case-sensitivity. This also adds support for, case-sensitive testing and matching of Watched Word in the admin UI. * DEV: Code improvements from review feedback - Extract watched word regex creation out to a utility function - Make JS array presence check more explicit and readable * DEV: Extract Watched Word regex creation to utility function Clean-up work from review feedback. Reduce code duplication. * DEV: Rename word_matcher_regexp to word_matcher_regexp_list Since a list is returned now instead of a single regular expression, change `word_matcher_regexp` to `word_matcher_regexp_list` to better communicate this change. * DEV: Incorporate WordWatcher updates from upstream Resolve conflicts and ensure apply_to_text does not remove non-word characters in matches that aren't at the beginning of the line.
2022-08-02 16:06:03 +08:00
let!(:word3) do
Fabricate(:watched_word, action: WatchedWord.actions[:block], case_sensitive: true).word
end
FEATURE: Add support for case-sensitive Watched Words (#17445) * FEATURE: Add case-sensitivity flag to watched_words Currently, all watched words are matched case-insensitively. This flag allows a watched word to be flagged for case-sensitive matching. To allow allow for backwards compatibility the flag is set to false by default. * FEATURE: Support case-sensitive creation of Watched Words via API Extend admin creation and upload of Watched Words to support case sensitive flag. This lays the ground work for supporting case-insensitive matching of Watched Words. Support for an extra column has also been introduced for the Watched Words upload CSV file. The new column structure is as follows: word,replacement,case_sentive * FEATURE: Enable case-sensitive matching of Watched Words WordWatcher's word_matcher_regexp now returns a list of regular expressions instead of one case-insensitive regular expression. With the ability to flag a Watched Word as case-sensitive, an action can have words of both sensitivities.This makes the use of the global Regexp::IGNORECASE flag added to all words problematic. To get around platform limitations around the use of subexpression level switches/flags, a list of regular expressions is returned instead, one for each case sensitivity. Word matching has also been updated to use this list of regular expressions instead of one. * FEATURE: Use case-sensitive regular expressions for Watched Words Update Watched Words regular expressions matching and processing to handle the extra metadata which comes along with the introduction of case-sensitive Watched Words. This allows case-sensitive Watched Words to matched as such. * DEV: Simplify type casting of case-sensitive flag from uploads Use builtin semantics instead of a custom method for converting string case flags in uploaded Watched Words to boolean. * UX: Add case-sensitivity details to Admin Watched Words UI Update Watched Word form to include a toggle for case-sensitivity. This also adds support for, case-sensitive testing and matching of Watched Word in the admin UI. * DEV: Code improvements from review feedback - Extract watched word regex creation out to a utility function - Make JS array presence check more explicit and readable * DEV: Extract Watched Word regex creation to utility function Clean-up work from review feedback. Reduce code duplication. * DEV: Rename word_matcher_regexp to word_matcher_regexp_list Since a list is returned now instead of a single regular expression, change `word_matcher_regexp` to `word_matcher_regexp_list` to better communicate this change. * DEV: Incorporate WordWatcher updates from upstream Resolve conflicts and ensure apply_to_text does not remove non-word characters in matches that aren't at the beginning of the line.
2022-08-02 16:06:03 +08:00
let!(:word4) do
Fabricate(:watched_word, action: WatchedWord.actions[:block], case_sensitive: true).word
end
context "when watched_words_regular_expressions = true" do
it "returns the proper regexp" do
SiteSetting.watched_words_regular_expressions = true
FEATURE: Add support for case-sensitive Watched Words (#17445) * FEATURE: Add case-sensitivity flag to watched_words Currently, all watched words are matched case-insensitively. This flag allows a watched word to be flagged for case-sensitive matching. To allow allow for backwards compatibility the flag is set to false by default. * FEATURE: Support case-sensitive creation of Watched Words via API Extend admin creation and upload of Watched Words to support case sensitive flag. This lays the ground work for supporting case-insensitive matching of Watched Words. Support for an extra column has also been introduced for the Watched Words upload CSV file. The new column structure is as follows: word,replacement,case_sentive * FEATURE: Enable case-sensitive matching of Watched Words WordWatcher's word_matcher_regexp now returns a list of regular expressions instead of one case-insensitive regular expression. With the ability to flag a Watched Word as case-sensitive, an action can have words of both sensitivities.This makes the use of the global Regexp::IGNORECASE flag added to all words problematic. To get around platform limitations around the use of subexpression level switches/flags, a list of regular expressions is returned instead, one for each case sensitivity. Word matching has also been updated to use this list of regular expressions instead of one. * FEATURE: Use case-sensitive regular expressions for Watched Words Update Watched Words regular expressions matching and processing to handle the extra metadata which comes along with the introduction of case-sensitive Watched Words. This allows case-sensitive Watched Words to matched as such. * DEV: Simplify type casting of case-sensitive flag from uploads Use builtin semantics instead of a custom method for converting string case flags in uploaded Watched Words to boolean. * UX: Add case-sensitivity details to Admin Watched Words UI Update Watched Word form to include a toggle for case-sensitivity. This also adds support for, case-sensitive testing and matching of Watched Word in the admin UI. * DEV: Code improvements from review feedback - Extract watched word regex creation out to a utility function - Make JS array presence check more explicit and readable * DEV: Extract Watched Word regex creation to utility function Clean-up work from review feedback. Reduce code duplication. * DEV: Rename word_matcher_regexp to word_matcher_regexp_list Since a list is returned now instead of a single regular expression, change `word_matcher_regexp` to `word_matcher_regexp_list` to better communicate this change. * DEV: Incorporate WordWatcher updates from upstream Resolve conflicts and ensure apply_to_text does not remove non-word characters in matches that aren't at the beginning of the line.
2022-08-02 16:06:03 +08:00
regexps = described_class.word_matcher_regexp_list(:block)
expect(regexps).to be_an(Array)
expect(regexps.map(&:inspect)).to contain_exactly(
"/(#{word1})|(#{word2})/i",
"/(#{word3})|(#{word4})/",
)
end
end
context "when watched_words_regular_expressions = false" do
it "returns the proper regexp" do
SiteSetting.watched_words_regular_expressions = false
FEATURE: Add support for case-sensitive Watched Words (#17445) * FEATURE: Add case-sensitivity flag to watched_words Currently, all watched words are matched case-insensitively. This flag allows a watched word to be flagged for case-sensitive matching. To allow allow for backwards compatibility the flag is set to false by default. * FEATURE: Support case-sensitive creation of Watched Words via API Extend admin creation and upload of Watched Words to support case sensitive flag. This lays the ground work for supporting case-insensitive matching of Watched Words. Support for an extra column has also been introduced for the Watched Words upload CSV file. The new column structure is as follows: word,replacement,case_sentive * FEATURE: Enable case-sensitive matching of Watched Words WordWatcher's word_matcher_regexp now returns a list of regular expressions instead of one case-insensitive regular expression. With the ability to flag a Watched Word as case-sensitive, an action can have words of both sensitivities.This makes the use of the global Regexp::IGNORECASE flag added to all words problematic. To get around platform limitations around the use of subexpression level switches/flags, a list of regular expressions is returned instead, one for each case sensitivity. Word matching has also been updated to use this list of regular expressions instead of one. * FEATURE: Use case-sensitive regular expressions for Watched Words Update Watched Words regular expressions matching and processing to handle the extra metadata which comes along with the introduction of case-sensitive Watched Words. This allows case-sensitive Watched Words to matched as such. * DEV: Simplify type casting of case-sensitive flag from uploads Use builtin semantics instead of a custom method for converting string case flags in uploaded Watched Words to boolean. * UX: Add case-sensitivity details to Admin Watched Words UI Update Watched Word form to include a toggle for case-sensitivity. This also adds support for, case-sensitive testing and matching of Watched Word in the admin UI. * DEV: Code improvements from review feedback - Extract watched word regex creation out to a utility function - Make JS array presence check more explicit and readable * DEV: Extract Watched Word regex creation to utility function Clean-up work from review feedback. Reduce code duplication. * DEV: Rename word_matcher_regexp to word_matcher_regexp_list Since a list is returned now instead of a single regular expression, change `word_matcher_regexp` to `word_matcher_regexp_list` to better communicate this change. * DEV: Incorporate WordWatcher updates from upstream Resolve conflicts and ensure apply_to_text does not remove non-word characters in matches that aren't at the beginning of the line.
2022-08-02 16:06:03 +08:00
regexps = described_class.word_matcher_regexp_list(:block)
expect(regexps).to be_an(Array)
expect(regexps.map(&:inspect)).to contain_exactly(
"/(?:\\W|^)(#{word1}|#{word2})(?=\\W|$)/i",
"/(?:\\W|^)(#{word3}|#{word4})(?=\\W|$)/",
)
end
it "is empty for an action without watched words" do
regexps = described_class.word_matcher_regexp_list(:censor)
expect(regexps).to be_an(Array)
expect(regexps).to be_empty
end
end
context "when regular expression is invalid" do
before do
SiteSetting.watched_words_regular_expressions = true
Fabricate(:watched_word, word: "Test[\S*", action: WatchedWord.actions[:block])
end
it "does not raise an exception by default" do
expect { described_class.word_matcher_regexp_list(:block) }.not_to raise_error
end
it "raises an exception with raise_errors set to true" do
expect {
described_class.word_matcher_regexp_list(:block, raise_errors: true)
}.to raise_error(RegexpError)
end
end
end
describe "#word_matches_for_action?" do
it "is falsey when there are no watched words" do
expect(described_class.new(raw).word_matches_for_action?(:require_approval)).to be_falsey
end
context "with watched words" do
fab!(:anise) do
Fabricate(:watched_word, word: "anise", action: WatchedWord.actions[:require_approval])
end
it "is falsey without a match" do
expect(
described_class.new("No liquorice for me, thanks...").word_matches_for_action?(
:require_approval,
),
).to be_falsey
end
it "is returns matched words if there's a match" do
matches = described_class.new(raw).word_matches_for_action?(:require_approval)
expect(matches).to be_truthy
expect(matches[1]).to eq(anise.word)
end
it "finds at start of string" do
matches =
described_class.new("#{anise.word} is garbage").word_matches_for_action?(
:require_approval,
)
expect(matches[1]).to eq(anise.word)
end
it "finds at end of string" do
matches =
described_class.new("who likes #{anise.word}").word_matches_for_action?(:require_approval)
expect(matches[1]).to eq(anise.word)
end
it "finds non-letters in place of letters" do
Fabricate(:watched_word, word: "co(onut", action: WatchedWord.actions[:require_approval])
matches =
described_class.new("This co(onut is delicious.").word_matches_for_action?(
:require_approval,
)
expect(matches[1]).to eq("co(onut")
end
it "handles * for wildcards" do
Fabricate(:watched_word, word: "a**le*", action: WatchedWord.actions[:require_approval])
matches =
described_class.new("I acknowledge you.").word_matches_for_action?(:require_approval)
expect(matches[1]).to eq("acknowledge")
end
it "handles word boundary" do
Fabricate(:watched_word, word: "love", action: WatchedWord.actions[:require_approval])
expect(
described_class.new("I Love, bananas.").word_matches_for_action?(:require_approval)[1],
).to eq("Love")
expect(
described_class.new("I LOVE; apples.").word_matches_for_action?(:require_approval)[1],
).to eq("LOVE")
expect(
described_class.new("love: is a thing.").word_matches_for_action?(:require_approval)[1],
).to eq("love")
expect(
described_class.new("I love. oranges").word_matches_for_action?(:require_approval)[1],
).to eq("love")
expect(
described_class.new("I :love. pineapples").word_matches_for_action?(:require_approval)[1],
).to eq("love")
expect(
described_class.new("peace ,love and understanding.").word_matches_for_action?(
:require_approval,
)[
1
],
).to eq("love")
end
context "when there are multiple matches" do
context "with non regexp words" do
it "lists all matching words" do
%w[bananas hate hates].each do |word|
Fabricate(:watched_word, word: word, action: WatchedWord.actions[:block])
end
matches =
described_class.new("I hate bananas").word_matches_for_action?(
:block,
all_matches: true,
)
expect(matches).to contain_exactly("hate", "bananas")
matches =
described_class.new("She hates bananas too").word_matches_for_action?(
:block,
all_matches: true,
)
expect(matches).to contain_exactly("hates", "bananas")
end
end
context "with regexp words" do
before { SiteSetting.watched_words_regular_expressions = true }
it "lists all matching patterns" do
Fabricate(:watched_word, word: "(pine)?apples", action: WatchedWord.actions[:block])
Fabricate(
:watched_word,
word: "((move|store)(d)?)|((watch|listen)(ed|ing)?)",
action: WatchedWord.actions[:block],
)
matches =
described_class.new("pine pineapples apples").word_matches_for_action?(
:block,
all_matches: true,
)
expect(matches).to contain_exactly("pineapples", "apples")
matches =
described_class.new(
"go watched watch ed ing move d moveed moved moving",
).word_matches_for_action?(:block, all_matches: true)
expect(matches).to contain_exactly(*%w[watched watch move moved])
end
end
end
context "when word is an emoji" do
it "handles emoji" do
Fabricate(:watched_word, word: ":joy:", action: WatchedWord.actions[:require_approval])
matches =
described_class.new("Lots of emojis here :joy:").word_matches_for_action?(
:require_approval,
)
expect(matches[1]).to eq(":joy:")
end
it "handles unicode emoji" do
Fabricate(:watched_word, word: "🎃", action: WatchedWord.actions[:require_approval])
matches =
described_class.new("Halloween party! 🎃").word_matches_for_action?(:require_approval)
expect(matches[1]).to eq("🎃")
end
it "handles emoji skin tone" do
Fabricate(
:watched_word,
word: ":woman:t5:",
action: WatchedWord.actions[:require_approval],
)
matches =
described_class.new("To Infinity and beyond! 🚀 :woman:t5:").word_matches_for_action?(
:require_approval,
)
expect(matches[1]).to eq(":woman:t5:")
end
end
context "when word is a regular expression" do
before { SiteSetting.watched_words_regular_expressions = true }
it "supports regular expressions on word boundaries" do
Fabricate(:watched_word, word: /\btest\b/, action: WatchedWord.actions[:block])
matches = described_class.new("this is not a test.").word_matches_for_action?(:block)
expect(matches[0]).to eq("test")
end
it "supports regular expressions as a site setting" do
Fabricate(
:watched_word,
word: /tro[uo]+t/,
action: WatchedWord.actions[:require_approval],
)
matches =
described_class.new("Evil Trout is cool").word_matches_for_action?(:require_approval)
expect(matches[0]).to eq("Trout")
matches =
described_class.new("Evil Troot is cool").word_matches_for_action?(:require_approval)
expect(matches[0]).to eq("Troot")
matches = described_class.new("trooooooooot").word_matches_for_action?(:require_approval)
expect(matches[0]).to eq("trooooooooot")
end
it "support uppercase" do
Fabricate(:watched_word, word: /a\S+ce/, action: WatchedWord.actions[:require_approval])
matches = described_class.new("Amazing place").word_matches_for_action?(:require_approval)
expect(matches).to be_nil
matches =
described_class.new("Amazing applesauce").word_matches_for_action?(:require_approval)
expect(matches[0]).to eq("applesauce")
matches =
described_class.new("Amazing AppleSauce").word_matches_for_action?(:require_approval)
expect(matches[0]).to eq("AppleSauce")
end
end
FEATURE: Add support for case-sensitive Watched Words (#17445) * FEATURE: Add case-sensitivity flag to watched_words Currently, all watched words are matched case-insensitively. This flag allows a watched word to be flagged for case-sensitive matching. To allow allow for backwards compatibility the flag is set to false by default. * FEATURE: Support case-sensitive creation of Watched Words via API Extend admin creation and upload of Watched Words to support case sensitive flag. This lays the ground work for supporting case-insensitive matching of Watched Words. Support for an extra column has also been introduced for the Watched Words upload CSV file. The new column structure is as follows: word,replacement,case_sentive * FEATURE: Enable case-sensitive matching of Watched Words WordWatcher's word_matcher_regexp now returns a list of regular expressions instead of one case-insensitive regular expression. With the ability to flag a Watched Word as case-sensitive, an action can have words of both sensitivities.This makes the use of the global Regexp::IGNORECASE flag added to all words problematic. To get around platform limitations around the use of subexpression level switches/flags, a list of regular expressions is returned instead, one for each case sensitivity. Word matching has also been updated to use this list of regular expressions instead of one. * FEATURE: Use case-sensitive regular expressions for Watched Words Update Watched Words regular expressions matching and processing to handle the extra metadata which comes along with the introduction of case-sensitive Watched Words. This allows case-sensitive Watched Words to matched as such. * DEV: Simplify type casting of case-sensitive flag from uploads Use builtin semantics instead of a custom method for converting string case flags in uploaded Watched Words to boolean. * UX: Add case-sensitivity details to Admin Watched Words UI Update Watched Word form to include a toggle for case-sensitivity. This also adds support for, case-sensitive testing and matching of Watched Word in the admin UI. * DEV: Code improvements from review feedback - Extract watched word regex creation out to a utility function - Make JS array presence check more explicit and readable * DEV: Extract Watched Word regex creation to utility function Clean-up work from review feedback. Reduce code duplication. * DEV: Rename word_matcher_regexp to word_matcher_regexp_list Since a list is returned now instead of a single regular expression, change `word_matcher_regexp` to `word_matcher_regexp_list` to better communicate this change. * DEV: Incorporate WordWatcher updates from upstream Resolve conflicts and ensure apply_to_text does not remove non-word characters in matches that aren't at the beginning of the line.
2022-08-02 16:06:03 +08:00
context "when case sensitive words are present" do
before do
Fabricate(
:watched_word,
word: "Discourse",
action: WatchedWord.actions[:block],
case_sensitive: true,
)
end
context "when watched_words_regular_expressions = true" do
it "respects case sensitivity flag in matching words" do
SiteSetting.watched_words_regular_expressions = true
Fabricate(:watched_word, word: "p(rivate|ublic)", action: WatchedWord.actions[:block])
matches =
described_class.new(
"PUBLIC: Discourse is great for public discourse",
).word_matches_for_action?(:block, all_matches: true)
expect(matches).to contain_exactly("PUBLIC", "Discourse", "public")
end
end
context "when watched_words_regular_expressions = false" do
it "repects case sensitivity flag in matching" do
SiteSetting.watched_words_regular_expressions = false
Fabricate(:watched_word, word: "private", action: WatchedWord.actions[:block])
matches =
described_class.new(
"PRIVATE: Discourse is also great private discourse",
).word_matches_for_action?(:block, all_matches: true)
expect(matches).to contain_exactly("PRIVATE", "Discourse", "private")
end
end
end
end
end
describe ".apply_to_text" do
fab!(:censored_word) do
Fabricate(:watched_word, word: "censored", action: WatchedWord.actions[:censor])
end
fab!(:replaced_word) do
Fabricate(
:watched_word,
word: "to replace",
replacement: "replaced",
action: WatchedWord.actions[:replace],
)
end
fab!(:link_word) do
Fabricate(
:watched_word,
word: "https://notdiscourse.org",
replacement: "https://discourse.org",
action: WatchedWord.actions[:link],
)
end
it "replaces all types of words" do
text = "hello censored world to replace https://notdiscourse.org"
expected =
"hello #{described_class::REPLACEMENT_LETTER * 8} world replaced https://discourse.org"
expect(described_class.apply_to_text(text)).to eq(expected)
end
FEATURE: Add support for case-sensitive Watched Words (#17445) * FEATURE: Add case-sensitivity flag to watched_words Currently, all watched words are matched case-insensitively. This flag allows a watched word to be flagged for case-sensitive matching. To allow allow for backwards compatibility the flag is set to false by default. * FEATURE: Support case-sensitive creation of Watched Words via API Extend admin creation and upload of Watched Words to support case sensitive flag. This lays the ground work for supporting case-insensitive matching of Watched Words. Support for an extra column has also been introduced for the Watched Words upload CSV file. The new column structure is as follows: word,replacement,case_sentive * FEATURE: Enable case-sensitive matching of Watched Words WordWatcher's word_matcher_regexp now returns a list of regular expressions instead of one case-insensitive regular expression. With the ability to flag a Watched Word as case-sensitive, an action can have words of both sensitivities.This makes the use of the global Regexp::IGNORECASE flag added to all words problematic. To get around platform limitations around the use of subexpression level switches/flags, a list of regular expressions is returned instead, one for each case sensitivity. Word matching has also been updated to use this list of regular expressions instead of one. * FEATURE: Use case-sensitive regular expressions for Watched Words Update Watched Words regular expressions matching and processing to handle the extra metadata which comes along with the introduction of case-sensitive Watched Words. This allows case-sensitive Watched Words to matched as such. * DEV: Simplify type casting of case-sensitive flag from uploads Use builtin semantics instead of a custom method for converting string case flags in uploaded Watched Words to boolean. * UX: Add case-sensitivity details to Admin Watched Words UI Update Watched Word form to include a toggle for case-sensitivity. This also adds support for, case-sensitive testing and matching of Watched Word in the admin UI. * DEV: Code improvements from review feedback - Extract watched word regex creation out to a utility function - Make JS array presence check more explicit and readable * DEV: Extract Watched Word regex creation to utility function Clean-up work from review feedback. Reduce code duplication. * DEV: Rename word_matcher_regexp to word_matcher_regexp_list Since a list is returned now instead of a single regular expression, change `word_matcher_regexp` to `word_matcher_regexp_list` to better communicate this change. * DEV: Incorporate WordWatcher updates from upstream Resolve conflicts and ensure apply_to_text does not remove non-word characters in matches that aren't at the beginning of the line.
2022-08-02 16:06:03 +08:00
context "when watched_words_regular_expressions = true" do
it "replaces captured non-word prefix" do
SiteSetting.watched_words_regular_expressions = true
Fabricate(
:watched_word,
word: "\\Wplaceholder",
replacement: "replacement",
action: WatchedWord.actions[:replace],
)
text = "is \tplaceholder in https://notdiscourse.org"
expected = "is replacement in https://discourse.org"
expect(described_class.apply_to_text(text)).to eq(expected)
end
end
context "when watched_words_regular_expressions = false" do
it "maintains non-word character prefix" do
SiteSetting.watched_words_regular_expressions = false
text = "to replace and\thttps://notdiscourse.org"
expected = "replaced and\thttps://discourse.org"
expect(described_class.apply_to_text(text)).to eq(expected)
end
end
end
end