gitea/modules
Bruno Sofiato f64fbd9b74
Updated tokenizer to better matching when search for code snippets (#32261)
This PR improves the accuracy of Gitea's code search. 

Currently, Gitea does not consider statements such as
`onsole.log("hello")` as hits when the user searches for `log`. The
culprit is how both ES and Bleve are tokenizing the file contents (in
both cases, `console.log` is a whole token).

In ES' case, we changed the tokenizer to
[simple_pattern_split](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-simplepatternsplit-tokenizer.html#:~:text=The%20simple_pattern_split%20tokenizer%20uses%20a,the%20tokenization%20is%20generally%20faster.).
In such a case, tokens are words formed by digits and letters. In
Bleve's case, it employs a
[letter](https://blevesearch.com/docs/Tokenizers/) tokenizer.

Resolves #32220

---------

Signed-off-by: Bruno Sofiato <bruno.sofiato@gmail.com>
2024-11-06 20:51:20 +00:00
..
actions Fix wrong status of Set up Job when first step is skipped (#32120) 2024-09-24 18:34:08 +00:00
activitypub Remove SHA1 for support for ssh rsa signing (#31857) 2024-09-07 18:05:18 -04:00
analyze Rename code_langauge.go to code_language.go (#26377) 2023-08-07 15:00:53 -04:00
assetfs Use Set[Type] instead of map[Type]bool/struct{}. (#26804) 2023-08-30 06:55:25 +00:00
auth Add Passkey login support (#31504) 2024-06-29 22:50:03 +00:00
avatar Use crypto/sha256 (#29386) 2024-02-25 13:32:13 +00:00
badge Implement actions badge svgs (#28102) 2024-02-27 18:56:18 +01:00
base fix OIDC introspection authentication (#31632) 2024-07-23 12:43:03 +00:00
cache bump to go 1.23 (#31855) 2024-09-10 02:23:07 +00:00
charset refactor: remove redundant err declarations (#32381) 2024-10-30 19:36:24 +00:00
container Allow disabling authentication related user features (#31535) 2024-07-09 17:36:31 +00:00
csv Render embedded code preview by permlink in markdown (#30234) 2024-04-02 17:48:27 +00:00
dump Refactor "dump" sub-command (#30240) 2024-04-03 02:16:46 +00:00
emoji Update emoji set to Unicode 15 (#25595) 2023-06-29 16:29:48 +00:00
eventsource Final round of db.DefaultContext refactor (#27587) 2023-10-14 08:37:24 +00:00
generate Refactor JWT secret generating & decoding code (#29172) 2024-02-16 15:18:30 +00:00
git Fix git error handling (#32401) 2024-11-02 11:20:22 +00:00
gitgraph Fix milestone deadline and date related problems (#32339) 2024-11-05 07:46:40 +00:00
gitrepo Refactor markup package (#32399) 2024-11-04 10:59:50 +00:00
globallock Use global lock instead of NewExclusivePool to allow distributed lock between multiple Gitea instances (#31813) 2024-09-06 10:12:41 +00:00
graceful Remove unused error in graceful manager (#29871) 2024-03-18 21:14:51 +00:00
hcaptcha Consume hcaptcha and pwn deps (#22610) 2023-01-29 09:49:51 -06:00
highlight Add option to disable ambiguous unicode characters detection (#28454) 2023-12-17 14:38:54 +00:00
hostmatcher Support allowed hosts for migrations to work with proxy (#32025) 2024-09-11 05:47:00 +00:00
html Refactor backend SVG package and add tests (#26335) 2023-08-05 04:34:59 +00:00
httpcache Fix wrong last modify time (#32102) 2024-09-21 21:56:25 +00:00
httplib Fix wrong last modify time (#32102) 2024-09-21 21:56:25 +00:00
indexer Updated tokenizer to better matching when search for code snippets (#32261) 2024-11-06 20:51:20 +00:00
issue/template bump to go 1.23 (#31855) 2024-09-10 02:23:07 +00:00
json Replace interface{} with any (#25686) 2023-07-04 18:36:08 +00:00
label Make label templates have consistent behavior and priority (#23749) 2023-04-10 16:44:02 +08:00
lfs Use 8 as default value for git lfs concurrency (#32421) 2024-11-05 13:10:57 +00:00
lfstransfer Add pure SSH LFS support (#31516) 2024-09-27 10:27:37 -04:00
log Refactor markup package (#32399) 2024-11-04 10:59:50 +00:00
markup Refactor markup package (#32399) 2024-11-04 10:59:50 +00:00
mcaptcha Implement FSFE REUSE for golang files (#21840) 2022-11-27 18:20:29 +00:00
metrics Rename project board -> column to make the UI less confusing (#30170) 2024-05-27 08:59:54 +00:00
migration Support migrating GitHub/GitLab PR draft status (#32242) 2024-10-13 22:58:13 +03:00
nosql Update tool dependencies, lock govulncheck and actionlint (#25655) 2023-07-09 11:58:06 +00:00
optional Resolve lint for unused parameter and unnecessary type arguments (#30750) 2024-04-29 08:47:56 +00:00
options Use a general approach to access custom/static/builtin assets (#24022) 2023-04-12 18:16:45 +08:00
packages Refactor markup package (#32399) 2024-11-04 10:59:50 +00:00
paginator Use more specific test methods (#24265) 2023-04-22 17:56:27 -04:00
pprof Implement FSFE REUSE for golang files (#21840) 2022-11-27 18:20:29 +00:00
private Make git push options accept short name (#32245) 2024-10-12 05:42:10 +00:00
process Update misspell to 0.5.1 and add misspellings.csv (#30573) 2024-04-27 08:03:49 +00:00
proxy Use proxy for pull mirror (#22771) 2023-02-11 08:39:50 +08:00
proxyprotocol Implement FSFE REUSE for golang files (#21840) 2022-11-27 18:20:29 +00:00
public Refactor CORS handler (#28587) 2023-12-25 20:13:18 +08:00
queue bump to go 1.23 (#31855) 2024-09-10 02:23:07 +00:00
recaptcha Implement FSFE REUSE for golang files (#21840) 2022-11-27 18:20:29 +00:00
references Refactor to use UnsafeStringToBytes (#31358) 2024-06-14 01:26:33 +00:00
regexplru Upgrade go dependencies (#25819) 2023-07-14 11:00:31 +08:00
repository Make LFS http_client parallel within a batch. (#32369) 2024-11-04 04:49:08 +00:00
secret Use crypto/sha256 (#29386) 2024-02-25 13:32:13 +00:00
session Improve oauth2 client "preferred username field" logic and the error handling (#30622) 2024-04-25 11:22:32 +00:00
setting Use 8 as default value for git lfs concurrency (#32421) 2024-11-05 13:10:57 +00:00
sitemap Fix sitemap (#22272) 2022-12-30 23:31:00 +08:00
ssh Remove SSH workaround (#27893) 2023-11-03 15:21:05 +00:00
storage Add artifacts test fixture (#30300) 2024-11-01 10:29:54 +08:00
structs Make admins adhere to branch protection rules (#32248) 2024-10-23 12:39:43 +08:00
svg Refactor markdown attention render (#29984) 2024-03-22 12:16:23 +00:00
sync Use global lock instead of NewExclusivePool to allow distributed lock between multiple Gitea instances (#31813) 2024-09-06 10:12:41 +00:00
system Refactor to use UnsafeStringToBytes (#31358) 2024-06-14 01:26:33 +00:00
templates Fix milestone deadline and date related problems (#32339) 2024-11-05 07:46:40 +00:00
test Remove sub-path from container registry realm (#31293) 2024-06-09 16:29:29 +08:00
testlogger Refactor tests to prevent from unnecessary preparations (#32398) 2024-11-01 23:18:29 +08:00
timeutil Refactor DateUtils and merge TimeSince (#32409) 2024-11-04 11:30:00 +00:00
translation Render embedded code preview by permlink in markdown (#30234) 2024-04-02 17:48:27 +00:00
turnstile Add new captcha: cloudflare turnstile (#22369) 2023-02-05 15:29:03 +08:00
typesniffer Detect ogg mime-type as audio or video (#26494) 2023-08-15 10:31:25 +08:00
updatechecker Replace more db.DefaultContext (#27628) 2023-10-15 17:46:06 +02:00
uri Implement FSFE REUSE for golang files (#21840) 2022-11-27 18:20:29 +00:00
user Implement FSFE REUSE for golang files (#21840) 2022-11-27 18:20:29 +00:00
util Refactor markup package (#32399) 2024-11-04 10:59:50 +00:00
validation Check blocklist for emails when adding them to account (#26812) 2023-08-30 10:46:49 -05:00
web Refactor names (#31405) 2024-06-19 06:32:45 +08:00
webhook Fix schedule tasks bugs (#28691) 2024-01-12 21:50:38 +00:00
zstd Support compression for Actions logs (#31761) 2024-08-09 10:10:30 +08:00