Commit Graph

10 Commits

Author SHA1 Message Date
David Taylor
b0416cb1c1
FEATURE: Upload to s3 in parallel to speed up backup restores (#13391)
Uploading lots of small files can be made significantly faster by parallelizing the `s3.put_object` calls. In testing, an UPLOAD_CONCURRENCY of 10 made a large restore 10x faster. An UPLOAD_CONCURRENCY of 20 made the same restore 18x faster.

This commit is careful to parallelize as little as possible, to reduce the chance of concurrency issues. In the worker threads, no database transactions are performed. All modification of shared objects is controlled with a mutex.

Unfortunately we do not have any existing tests for the `ToS3Migration` class. This change has been tested with a large site backup (120k uploads totalling 45GB)
2021-06-16 10:34:39 +01:00
David Taylor
35e1e009fa
FIX: Allow restoring non-subfolder backup to subfolder site (#12537)
`GlobalSetting.relative_url_root` comes from the destination site. We
can't be sure whether it was the same on the original site. It's safer
to use a wildcard here, so we can backup/restore sites with different
relative_url_root values.
2021-04-12 14:00:52 +10:00
Martin Brennan
31e31ef449
SECURITY: Add content-disposition: attachment for SVG uploads
* strip out the href and xlink:href attributes from use element that
  are _not_ anchors in svgs which can be used for XSS
* adding the content-disposition: attachment ensures that
  uploaded SVGs cannot be opened and executed using the XSS exploit.
  svgs embedded using an img tag do not suffer from the same exploit
2020-07-09 13:31:48 +10:00
Martin Brennan
e92909aa77
FIX: Use ActionDispatch::Http::ContentDisposition for uploads content-disposition (#10108)
See https://meta.discourse.org/t/broken-pipe-error-when-uploading-to-a-s3-clone-a-pdf-with-a-name-containing-e-i-etc/155414

When setting content-disposition for attachment, use the ContentDisposition class to format it. This handles filenames with weird characters and localization (accented characters) correctly.
2020-06-23 17:10:56 +10:00
Gerhard Schlager
c6b411f6c1 FIX: Restore to S3 didn't work without env variables
The `uplaods:migrate_to_s3` rake task should always use the environment variables, because you usually don't want to break your site's uploads during the migration. But restoring a backup should work with site settings as well as environment variables, otherwise you can't restore uploads to S3 from the web interface.
2020-04-19 20:24:40 +02:00
Gerhard Schlager
baae0e7446 FIX: Infinite loop in migrate_to_s3 rake task 2020-04-19 20:24:40 +02:00
Gerhard Schlager
5bffb033df FIX: The migrate_to_s3 rake task couldn't find the AWS SDK 2020-03-26 16:41:10 +01:00
Gerhard Schlager
93b8b04b06 FIX: Migrating uploads to S3 could miss files
The rake task aborted the migration with "Already migrated" when all upload URLs linked to the correct S3 bucket even though the files didn't exist on S3. By removing the first check we force the rake task to check for the existance of uploads on S3.
2020-03-04 12:50:48 +01:00
Gerhard Schlager
0adab26e45 FIX: Don't count ignored, missing uploads in migration to S3 2020-02-12 16:18:52 +01:00
Gerhard Schlager
e474cda321 REFACTOR: Restoring of backups and migration of uploads to S3 2020-01-14 11:41:35 +01:00