Commit Graph

27 Commits

Author SHA1 Message Date
Régis Hanol
0559a4736a FIX: don't double request when downloading a file 2018-02-24 12:35:57 +01:00
Gerhard Schlager
b6277e208b FIX: Cookies header didn't have the right format 2018-02-19 12:46:57 +01:00
Sam
fa5880e04f PERF: ability to crawl for titles without extra HEAD req
Also, introduces a much more aggressive timeout for title crawling
and introduces gzip to body that is crawled
2018-01-29 15:40:12 +11:00
Sam
1dd2b51059 remove redundent stubs 2017-10-18 12:10:30 +11:00
Sam Saffron
8185b8cb06 FEATURE: cache https redirects per hostname
If a hostname does an https redirect we cache that so next
lookup does not incur it.

Also, only rate limit per ip once per final destination

Raise final destination protection to 1000 ip lookups an hour
2017-10-17 16:22:54 +11:00
Sam
70bb2aa426 FEATURE: allow specifying s3 config via globals
This refactors handling of s3 so it can be specified via GlobalSetting

This means that in a multisite environment you can configure s3 uploads
without actual sites knowing credentials in s3

It is a critical setting for situations where assets are mirrored to s3.
2017-10-06 16:20:01 +11:00
Guo Xiang Tan
5324c01209 FIX: Don't raise an error if reading from URL timeout. 2017-09-27 14:53:22 +08:00
Guo Xiang Tan
367fb1c524 FIX: Onebox fails on encoded URL.
https://meta.discourse.org/t/onebox-breaks-if-theres-chinese-text-in-url/67364
2017-09-26 18:34:54 +08:00
Joffrey JAFFEUX
6cd8203686 FIX: allows onebox to force GET hosts returning wrong headers on HEAD 2017-08-08 11:44:27 +02:00
Arpit Jalan
b059a0f789 extract url escaping to a dedicated class method and improved tests 2017-07-29 22:16:51 +05:30
Arpit Jalan
1fe553873c FIX: preserve fragment identifier when escaping url 2017-07-29 17:22:45 +05:30
Guo Xiang Tan
b534778f46 FIX: Escape URL before attempting to resolve it. 2017-07-18 10:04:24 +09:00
Robin Ward
db485ae0da FIX: Support for skipping redirects on certain domains (like steam) 2017-06-26 15:38:43 -04:00
Robin Ward
009f0921dc FEATURE: Whitelist hosts for internal crawling 2017-06-13 12:59:54 -04:00
Robin Ward
a3729b51eb FIX: Always allow the host the forum is hosted on 2017-06-12 13:22:51 -04:00
Robin Ward
53b95f009f FIX: If HEAD is not supported, try GET. Also set cookies 2017-06-06 13:53:49 -04:00
Guo Xiang Tan
56f98de7b2 Use webmock to stub external web requests. 2017-05-26 15:19:09 +08:00
Guo Xiang Tan
f8f1548fd4 Revert "FIX: Use Excon to do its own stubbing"
This reverts commit 80af54460a.
2017-05-26 13:04:25 +08:00
Robin Ward
3b0cbf7013 FIX: Always allow downloads from CDN 2017-05-23 16:32:54 -04:00
Robin Ward
b81e7be9a1 FEATURE: Rate limit how often we'll crawl a destination IP 2017-05-23 15:03:04 -04:00
Robin Ward
36e477750c FIX: Use same code path for downloading images 2017-05-23 14:51:30 -04:00
Robin Ward
e5e7a15a85 SECURITY: Never crawl by IP 2017-05-23 13:07:18 -04:00
Robin Ward
93a5fc62bf FEATURE: A site setting to prevent crawling on private IP blocks 2017-05-23 11:56:06 -04:00
Robin Ward
80af54460a FIX: Use Excon to do its own stubbing 2017-05-22 18:19:20 -04:00
Robin Ward
b51126dd5e FIX: Reset the WebMock after before every test 2017-05-22 17:52:31 -04:00
Robin Ward
4c690f7089 Use FinalDestination to ensure public redirects for onebox 2017-05-22 16:42:49 -04:00
Robin Ward
b23fc2bf84 Helper to find the final destination for a URL 2017-05-22 15:52:41 -04:00