Commit Graph

37 Commits

Author SHA1 Message Date
Régis Hanol
73ce3589ad
PERF: improves TextSentinel's seems_unpretentious check (#28044)
by scanning the text for the first word that is bigger than `max_word_length` instead of extracting (segmenting) all the words, computing their size, and comparing the maximum with `max_word_length`.

Idea from @mentalstring in https://meta.discourse.org/t/body-seems-unclear-error-when-users-are-typing-in-chinese/88715/14
2024-07-23 17:12:29 +02:00
Régis Hanol
23aa88d203
FIX: Allow all caps within CJK text (#28018)
This improves the `TextSentinel` so that we don't consider CJK text as being uppercase and thus failing the validator.

It also optimizes the entropy computation by using native ruby `.bytes` to get all the bytes from the text.

It also tweaks the `seems_pronounceable?` and `seems_unpretentious?` check to use the `\p{Alnum}` unicode regexp group to account for non-latin languages.

Reference - https://meta.discourse.org/t/body-seems-unclear-error-when-users-are-typing-in-chinese/88715

Inspired by https://github.com/discourse/discourse/pull/27900

Co-authored-by: Paulo Magalhaes <mentalstring@gmail.com>
2024-07-22 17:35:52 +02:00
David Taylor
6417173082
DEV: Apply syntax_tree formatting to lib/* 2023-01-09 12:10:19 +00:00
Josh Soref
59097b207f
DEV: Correct typos and spelling mistakes (#12812)
Over the years we accrued many spelling mistakes in the code base. 

This PR attempts to fix spelling mistakes and typos in all areas of the code that are extremely safe to change 

- comments
- test descriptions
- other low risk areas
2021-05-21 11:43:47 +10:00
Joffrey JAFFEUX
d16a39dc53
FIX: prevents exception when text input is nil (#12922)
nil was converted to "" and the matching regex would return [] and then be converted to nil with max usage.

Example exception:

```
NoMethodError (undefined method `<=' for nil:NilClass)

lib/text_sentinel.rb:71:in `seems_unpretentious?'
lib/validators/quality_title_validator.rb:13:in `validate_each'
lib/topic_creator.rb:25:in `valid?'
```
2021-05-03 09:21:35 +02:00
Bianca Nenciu
a48f7ba61c
FEATURE: Improve errors when title is invalid (#11149)
It used to simply say "title is invalid" without giving any hint what
the problem could be. This commit adds different errors messages for
all caps titles, low entropy titles or titles with very long words.
2020-11-11 15:11:36 +02:00
Sam Saffron
30990006a9 DEV: enable frozen string literal on all files
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.

Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
2019-05-13 09:31:32 +08:00
Erick Guan
9dd325f805 FIX: skip some checks for CJK locale in TextSentinel (#7322) 2019-04-05 15:07:49 +02:00
Arpit Jalan
25ec077eca rename 'min_private_message_{post/title}_length' to 'min_personal_message_{post/title}_length' 2018-02-01 13:25:29 +05:30
Guo Xiang Tan
5012d46cbd Add rubocop to our build. (#5004) 2017-07-28 10:20:09 +09:00
Arpit Jalan
e46204d195 FIX: allow long words if they contain periods 2016-09-13 09:15:05 +05:30
Rafael dos Santos Silva
5915929166 FIX: Unicode aware text sentinel (#4301)
* FIX: Handle unicode text on Text Sentinel

Uses active_support to properly handle unicode text

* Adds test cases to unicode Text Sentinel
2016-07-12 11:08:55 -04:00
Jeff Wong
cf86f27415 FEATURE: New setting to allow all caps posts
Adds a setting to ignore text_sentinel's check on all caps content.
2015-11-18 09:50:50 -08:00
Robin Ward
4ab9ef3497 FIX: Allow long words if they contain periods 2015-05-19 13:10:25 -04:00
Sam
1f59375c82 rename max_word_length to title_max_word_length 2015-04-02 16:46:53 +11:00
Nathan Nontell
d95172cb5d Allow TextSentinel#seems_unpretentious? to accept words joined with dashes or forward slashes. (Issue 1133) 2013-09-16 09:45:57 -04:00
Régis Hanol
1a38f66c7e removed warning about already existing constants 2013-08-30 17:28:18 +02:00
Neil Lalonde
f611a5d898 If min entropy setting is greater than min post/body length setting, then use a sensible min entropy value instead 2013-08-28 11:04:28 -04:00
Robin Ward
e6cd835928 Merge pull request #1040 from vipulnsward/remove_and
Remove unnecessary anding with true
2013-06-18 06:37:33 -07:00
Vipul A M
3167a152b2 Remove unnecessary anding with true 2013-06-18 11:19:10 +05:30
Neil Lalonde
876a570e3a Fix to prevent check for all upper case for non-ascii messages 2013-06-17 12:22:50 -04:00
Sam
f7de9f17d5 refactor validators
add a new setting for min pm body length
use that setting for flags
scale entropy check down for pms
2013-06-13 18:18:43 +10:00
Michael Brown
bb77d2c38b More entropy for foreign titles
* Treat strings with non-ASCII characters as having more entropy
2013-06-07 14:47:07 -04:00
Robin Ward
7d763a6f0c Bad merge. Oddly not caught by autospec. 2013-05-27 10:56:55 -04:00
Robin Ward
e1781240a6 Merge branch 'refactoring' of git://github.com/mattvanhorn/discourse
Conflicts:
	lib/text_sentinel.rb
2013-05-27 10:42:20 -04:00
Chris Hunt
13c4266c74 Allow Chinese characters in Topic titles 2013-05-26 13:56:42 -07:00
Matt Van Horn
c9fcee8490 simplify, clarify TextSentinel
codeclimate pointed this out. I agree it is better
to simplify and reveal intentions.
2013-05-24 13:36:33 -07:00
Matt Van Horn
806255b3c4 refactor Topic validation
introduce a couple of custom validators
fix minor discrepancies in tests
copy I18n error message keys to default location
clean up validation invocation
move some responsibilities out of validator into class
2013-05-22 22:31:52 -07:00
Régis Hanol
c5cf8be864 auto replace rules in titles 2013-04-10 11:00:50 +02:00
Régis Hanol
239cbd2d58 enforce coding convention
replaced every `and` by `&&` and every `or` by `||`
2013-03-05 01:42:44 +01:00
Gosha Arinich
cafc75b238 remove trailing whitespaces ❤️ 2013-02-26 07:31:35 +03:00
Jeremy Banks
eb2a5e4654 Merge branch 'origin/master'
Conflicts:
	lib/text_sentinel.rb
2013-02-18 21:41:20 -05:00
Jeremy Banks
6af69f7e77 Do not strip leading and trailing whitespace from raw posts. 2013-02-15 20:58:33 -05:00
Alexander
6c4ae05454 Removes iconv dependency
Fixes #100
2013-02-15 13:36:19 -08:00
Robin Ward
d740d7b25f Fix for foreign language titles: Only enforce upper case rule on english alphabet. 2013-02-14 16:09:57 -05:00
Robin Ward
12d3c3b66b Enforce entropy on flag text 2013-02-08 17:01:43 -05:00
Robin Ward
40da901e5d Introduction of TextSentinel to enforce title and body quality. 2013-02-06 20:53:34 -05:00