discourse

mirror of https://github.com/discourse/discourse.git synced 2024-11-25 02:11:08 -06:00

Author	SHA1	Message	Date
Martin Brennan	e4350bb966	FEATURE: Direct S3 multipart uploads for backups (#14736 ) This PR introduces a new `enable_experimental_backup_uploads` site setting (default false and hidden), which when enabled alongside `enable_direct_s3_uploads` will allow for direct S3 multipart uploads of backup .tar.gz files. To make multipart external uploads work with both the S3BackupStore and the S3Store, I've had to move several methods out of S3Store and into S3Helper, including: * presigned_url * create_multipart * abort_multipart * complete_multipart * presign_multipart_part * list_multipart_parts Then, S3Store and S3BackupStore either delegate directly to S3Helper or have their own special methods to call S3Helper for these methods. FileStore.temporary_upload_path has also removed its dependence on upload_path, and can now be used interchangeably between the stores. A similar change was made in the frontend as well, moving the multipart related JS code out of ComposerUppyUpload and into a mixin of its own, so it can also be used by UppyUploadMixin. Some changes to ExternalUploadManager had to be made here as well. The backup direct uploads do not need an Upload record made for them in the database, so they can be moved to their final S3 resting place when completing the multipart upload. This changeset is not perfect; it introduces some special cases in UploadController to handle backups that was previously in BackupController, because UploadController is where the multipart routes are located. A subsequent pull request will pull these routes into a module or some other sharing pattern, along with hooks, so the backup controller and the upload controller (and any future controllers that may need them) can include these routes in a nicer way.	2021-11-11 08:25:31 +10:00
Martin Brennan	6a68bd4825	DEV: Limit list multipart parts to 1 (#14853 ) We are only using list_multipart_parts right now in the uploads controller for multipart uploads to check if the upload exists; thus we don't need up to 1000 parts. Also adding a note for future explorers that list_multipart_parts only gets 1000 parts max, and adding params for max parts and starting parts.	2021-11-10 08:01:28 +10:00
Vinoth Kannan	c8d5c049eb	DEV: skip S3 CDN urls with different path in prefix. (#14488 ) Previously, while retrieving each upload urls in a post S3 CDN urls with different path in prefix (external urls technically) are considered as uploaded url. It created issue while checking missing uploads.	2021-10-01 12:25:17 +05:30
Martin Brennan	0d809197aa	FIX: Make sure S3 object headers are preserved on copy (#14302 ) When copying an existing upload stub temporary object on S3 to its final destination we were not copying across its additional headers such as content-disposition and cache-control, which led to issues like attachments not downloading with their original filename when clicking the download links in posts. This is because the metadata_directive = REPLACE option was not being passed to object.copy_from(), so only the source object's headers were being used. Added an option for apply_metadata_to_destination to apply this option conditionally, because we may not always want to replace this metadata, but we definitely do when copying a temporary upload.	2021-09-10 12:59:51 +10:00
Martin Brennan	99ec8eb6df	FIX: Capture S3 metadata when calling create_multipart (#14161 ) The generate_presigned_put endpoint for direct external uploads (such as the one for the uppy-image-uploader) records allowed S3 metadata values on the uploaded object. We use this to store the sha1-checksum generated by the UppyChecksum plugin, for later comparison in ExternalUploadManager. However, we were not doing this for the create_multipart endpoint, so the checksum was never captured and compared correctly. Also includes a fix to make sure UppyChecksum is the last preprocessor to run. It is important that the UppyChecksum preprocessor is the last one to be added; the preprocessors are run in order and since other preprocessors may modify the file (e.g. the UppyMediaOptimization one), we need to checksum once we are sure the file data has "settled".	2021-08-27 09:50:23 +10:00
Martin Brennan	e0102a533a	FIX: Restructure temp/ folders for direct S3 uploads (#14137 ) Previously we had temp/ in the middle of the S3 key path like so * /uploads/default/temp/randomstring/test.png (normal site) * /sitename/uploads/default/temp/randomstring/test.png (s3 folder path site) * /standard10/uploads/sitename/temp/randomstring/test.png (multisite site) However this necessitates making a lifecycle rule to clean up incomplete S3 multipart uploads for every site, something which we cannot do. It makes much more sense to have a structure with /temp at the start of the key, which is what this commit does: * /temp/uploads/default/randomstring/test.png (normal site) * /temp/sitename/uploads/default/randomstring/test.png (s3 folder path site) * /temp/standard10/uploads/sitename/randomstring/test.png (multisite site)	2021-08-25 09:22:36 +10:00
Martin Brennan	d295a16dab	FEATURE: Uppy direct S3 multipart uploads in composer (#14051 ) This pull request introduces the endpoints required, and the JavaScript functionality in the `ComposerUppyUpload` mixin, for direct S3 multipart uploads. There are four new endpoints in the uploads controller: * `create-multipart.json` - Creates the multipart upload in S3 along with an `ExternalUploadStub` record, storing information about the file in the same way as `generate-presigned-put.json` does for regular direct S3 uploads * `batch-presign-multipart-parts.json` - Takes a list of part numbers and the unique identifier for an `ExternalUploadStub` record, and generates the presigned URLs for those parts if the multipart upload still exists and if the user has permission to access that upload * `complete-multipart.json` - Completes the multipart upload in S3. Needs the full list of part numbers and their associated ETags which are returned when the part is uploaded to the presigned URL above. Only works if the user has permission to access the associated `ExternalUploadStub` record and the multipart upload still exists. After we confirm the upload is complete in S3, we go through the regular `UploadCreator` flow, the same as `complete-external-upload.json`, and promote the temporary upload S3 into a full `Upload` record, moving it to its final destination. * `abort-multipart.json` - Aborts the multipart upload on S3 and destroys the `ExternalUploadStub` record if the user has permission to access that upload. Also added are a few new columns to `ExternalUploadStub`: * multipart - Whether or not this is a multipart upload * external_upload_identifier - The "upload ID" for an S3 multipart upload * filesize - The size of the file when the `create-multipart.json` or `generate-presigned-put.json` is called. This is used for validation. When the user completes a direct S3 upload, either regular or multipart, we take the `filesize` that was captured when the `ExternalUploadStub` was first created and compare it with the final `Content-Length` size of the file where it is stored in S3. Then, if the two do not match, we throw an error, delete the file on S3, and ban the user from uploading files for N (default 5) minutes. This would only happen if the user uploads a different file than what they first specified, or in the case of multipart uploads uploaded larger chunks than needed. This is done to prevent abuse of S3 storage by bad actors. Also included in this PR is an update to vendor/uppy.js. This has been built locally from the latest uppy source at `d613b849a6`. This must be done so that I can get my multipart upload changes into Discourse. When the Uppy team cuts a proper release, we can bump the package.json versions instead.	2021-08-25 08:46:54 +10:00
Martin Brennan	b500949ef6	FEATURE: Initial implementation of direct S3 uploads with uppy and stubs (#13787 ) This adds a few different things to allow for direct S3 uploads using uppy. These changes are still not the default. There are hidden `enable_experimental_image_uploader` and `enable_direct_s3_uploads` settings that must be turned on for any of this code to be used, and even if they are turned on only the User Card Background for the user profile actually uses uppy-image-uploader. A new `ExternalUploadStub` model and database table is introduced in this pull request. This is used to keep track of uploads that are uploaded to a temporary location in S3 with the direct to S3 code, and they are eventually deleted a) when the direct upload is completed and b) after a certain time period of not being used. ### Starting a direct S3 upload When an S3 direct upload is initiated with uppy, we first request a presigned PUT URL from the new `generate-presigned-put` endpoint in `UploadsController`. This generates an S3 key in the `temp` folder inside the correct bucket path, along with any metadata from the clientside (e.g. the SHA1 checksum described below). This will also create an `ExternalUploadStub` and store the details of the temp object key and the file being uploaded. Once the clientside has this URL, uppy will upload the file direct to S3 using the presigned URL. Once the upload is complete we go to the next stage. ### Completing a direct S3 upload Once the upload to S3 is done we call the new `complete-external-upload` route with the unique identifier of the `ExternalUploadStub` created earlier. Only the user who made the stub can complete the external upload. One of two paths is followed via the `ExternalUploadManager`. 1. If the object in S3 is too large (currently 100mb defined by `ExternalUploadManager::DOWNLOAD_LIMIT`) we do not download and generate the SHA1 for that file. Instead we create the `Upload` record via `UploadCreator` and simply copy it to its final destination on S3 then delete the initial temp file. Several modifications to `UploadCreator` have been made to accommodate this. 2. If the object in S3 is small enough, we download it. When the temporary S3 file is downloaded, we compare the SHA1 checksum generated by the browser with the actual SHA1 checksum of the file generated by ruby. The browser SHA1 checksum is stored on the object in S3 with metadata, and is generated via the `UppyChecksum` plugin. Keep in mind that some browsers will not generate this due to compatibility or other issues. We then follow the normal `UploadCreator` path with one exception. To cut down on having to re-upload the file again, if there are no changes (such as resizing etc) to the file in `UploadCreator` we follow the same copy + delete temp path that we do for files that are too large. 3. Finally we return the serialized upload record back to the client There are several errors that could happen that are handled by `UploadsController` as well. Also in this PR is some refactoring of `displayErrorForUpload` to handle both uppy and jquery file uploader errors.	2021-07-28 08:42:25 +10:00
Gerhard Schlager	157f10db4c	FEATURE: Use path from existing URL of uploads and optimized images (#13177 ) Discourse shouldn't dynamically calculate the path of uploads and optimized images after a file has been stored on disk or S3. Otherwise it might calculate the wrong path if the SHA1 or extension stored in the database doesn't match the actual file path.	2021-05-27 17:42:25 +02:00
Josh Soref	59097b207f	DEV: Correct typos and spelling mistakes (#12812 ) Over the years we accrued many spelling mistakes in the code base. This PR attempts to fix spelling mistakes and typos in all areas of the code that are extremely safe to change - comments - test descriptions - other low risk areas	2021-05-21 11:43:47 +10:00
David Taylor	13e39d8b9f	PERF: Improve cook_url performance for topic thumbnails (#11609 ) - Only initialize the S3Helper when needed - Skip initializing the S3Helper for S3Store#cdn_url - Allow cook_url to be passed a `local` hint to skip unnecessary checks	2020-12-30 18:13:13 +00:00
Martin Brennan	4193eb0419	FIX: Respect force download when downloading secure media via lightbox (#10769 ) The download link on the lightbox for images was not downloading the image if the upload was marked secure, because the code in the upload controller route was not respecting the dl=1 param for force download. This PR fixes this so the download link works for secure images as well as regular ligthboxed images.	2020-09-29 12:12:03 +10:00
Martin Brennan	31e31ef449	SECURITY: Add content-disposition: attachment for SVG uploads * strip out the href and xlink:href attributes from use element that are _not_ anchors in svgs which can be used for XSS * adding the content-disposition: attachment ensures that uploaded SVGs cannot be opened and executed using the XSS exploit. svgs embedded using an img tag do not suffer from the same exploit	2020-07-09 13:31:48 +10:00
Martin Brennan	8ef782bdbd	FIX: Increase time of DOWNLOAD_URL_EXPIRES_AFTER_SECONDS to 5 minutes (#10160 ) * Change S3Helper::DOWNLOAD_URL_EXPIRES_AFTER_SECONDS to 5 minutes, which controls presigned URL expiry and secure-media route cache time. * This is done because of the composer preview refreshing while typing causes a lot of requests sent to our server because of the short URL expiry. If this ends up being not enough we can always increase the time or explore other avenues (e.g. GitHub has a 7 day validity for secure URLs)	2020-07-03 13:42:36 +10:00
Sam Saffron	689568c216	FIX: invalid urls should not break store.has_been_uploaded? Breaking this method has wide ramification including breaking search indexing.	2020-06-25 15:00:15 +10:00
Martin Brennan	e92909aa77	FIX: Use ActionDispatch::Http::ContentDisposition for uploads content-disposition (#10108 ) See https://meta.discourse.org/t/broken-pipe-error-when-uploading-to-a-s3-clone-a-pdf-with-a-name-containing-e-i-etc/155414 When setting content-disposition for attachment, use the ContentDisposition class to format it. This handles filenames with weird characters and localization (accented characters) correctly.	2020-06-23 17:10:56 +10:00
Guo Xiang Tan	828ceab64b	DEV: Make rubocop happy.	2020-06-17 15:47:05 +08:00
Martin Brennan	e5da2d24e5	FIX: Add attachment content-disposition for all non-image files (#10058 ) This will make it so the original filename is used when downloading all non-image files, bringing S3Store into line with the to_s3 migration and local storage. Video and audio files will still stream correctly in HTML players as well. See https://meta.discourse.org/t/cannot-download-non-image-media-files-original-filenames-lost-when-uploaded-to-s3/152797 for a lot of extra context.	2020-06-17 11:16:37 +10:00
Roman Rizzi	b61a291cf3	FIX: returns false if the upload url is an invalid mailto link (#9877 )	2020-05-26 10:32:48 -03:00
Michael Brown	d9a02d1336	Revert "Revert "Merge branch 'master' of https://github.com/discourse/discourse "" This reverts commit `20780a1eee`. * SECURITY: re-adds accidentally reverted commit: `03d26cd6`: ensure embed_url contains valid http(s) uri * when the merge commit `e62a85cf` was reverted, git chose the `2660c2e2` parent to land on instead of the `03d26cd6` parent (which contains security fixes)	2020-05-23 00:56:13 -04:00
Jeff Atwood	20780a1eee	Revert "Merge branch 'master' of https://github.com/discourse/discourse " This reverts commit `e62a85cf6f`, reversing changes made to `2660c2e21d`.	2020-05-22 20:25:56 -07:00
Osama Sayegh	02f44def56	FIX: Don't blow up when trying to parse invalid or non-ASCII URLs (#9838 ) * FIX: Don't blow up when trying to parseinvalid or non-ASCII URLs Follow-up to `72f139191e`	2020-05-20 12:46:27 +03:00
Martin Brennan	72f139191e	FIX: S3 store has_been_uploaded? was not taking into account s3 bucket path (#9810 ) In some cases, between Discourse forums the hostname of a URL could match if they are hosting S3 files on the same bucket but the S3 bucket path might not. So e.g. https://testbucket.somesite.com/testpath/some/file/url.png vs https://testbucket.somesite.com/prodpath/some/file/url.png. So has_been_uploaded? was returning true for the second URL, even though it may have been uploaded on a different Discourse forum. This is a very rare case but must be accounted for, because this impacts UrlHelper.is_local which mistakenly thinks the file has already been downloaded and thus allows the URL to be cooked, where we want to return the full URL to be downloaded using PullHotlinkedImages.	2020-05-20 10:40:38 +10:00
Gerhard Schlager	c6b411f6c1	FIX: Restore to S3 didn't work without env variables The `uplaods:migrate_to_s3` rake task should always use the environment variables, because you usually don't want to break your site's uploads during the migration. But restoring a backup should work with site settings as well as environment variables, otherwise you can't restore uploads to S3 from the web interface.	2020-04-19 20:24:40 +02:00
Martin Brennan	7c32411881	FEATURE: Secure media allowing duplicated uploads with category-level privacy and post-based access rules (#8664 ) ### General Changes and Duplication * We now consider a post `with_secure_media?` if it is in a read-restricted category. * When uploading we now set an upload's secure status straight away. * When uploading if `SiteSetting.secure_media` is enabled, we do not check to see if the upload already exists using the `sha1` digest of the upload. The `sha1` column of the upload is filled with a `SecureRandom.hex(20)` value which is the same length as `Upload::SHA1_LENGTH`. The `original_sha1` column is filled with the _real_ sha1 digest of the file. * Whether an upload `should_be_secure?` is now determined by whether the `access_control_post` is `with_secure_media?` (if there is no access control post then we leave the secure status as is). * When serializing the upload, we now cook the URL if the upload is secure. This is so it shows up correctly in the composer preview, because we set secure status on upload. ### Viewing Secure Media * The secure-media-upload URL will take the post that the upload is attached to into account via `Guardian.can_see?` for access permissions * If there is no `access_control_post` then we just deliver the media. This should be a rare occurrance and shouldn't cause issues as the `access_control_post` is set when `link_post_uploads` is called via `CookedPostProcessor` ### Removed We no longer do any of these because we do not reuse uploads by sha1 if secure media is enabled. * We no longer have a way to prevent cross-posting of a secure upload from a private context to a public context. * We no longer have to set `secure: false` for uploads when uploading for a theme component.	2020-01-16 13:50:27 +10:00
Gerhard Schlager	e474cda321	REFACTOR: Restoring of backups and migration of uploads to S3	2020-01-14 11:41:35 +01:00
Penar Musaraj	102909edb3	FEATURE: Add support for secure media (#7888 ) This PR introduces a new secure media setting. When enabled, it prevent unathorized access to media uploads (files of type image, video and audio). When the `login_required` setting is enabled, then all media uploads will be protected from unauthorized (anonymous) access. When `login_required`is disabled, only media in private messages will be protected from unauthorized access. A few notes: - the `prevent_anons_from_downloading_files` setting no longer applies to audio and video uploads - the `secure_media` setting can only be enabled if S3 uploads are already enabled and configured - upload records have a new column, `secure`, which is a boolean `true/false` of the upload's secure status - when creating a public post with an upload that has already been uploaded and is marked as secure, the post creator will raise an error - when enabling or disabling the setting on a site with existing uploads, the rake task `uploads:ensure_correct_acl` should be used to update all uploads' secure status and their ACL on S3	2019-11-18 11:25:42 +10:00
Penar Musaraj	067696df8f	DEV: Apply Rubocop redundant return style	2019-11-14 15:10:51 -05:00
Daniel Waterworth	55a1394342	DEV: pluck_first Doing .pluck(:column).first is a very common pattern in Discourse and in most cases, a limit cause isn't being added. Instead of adding a limit clause to all these callsites, this commit adds two new methods to ActiveRecord::Relation: pluck_first, equivalent to limit(1).pluck(*columns).first and pluck_first! which, like other finder methods, raises an exception when no record is found	2019-10-21 12:08:20 +01:00
Gerhard Schlager	24877a7b8c	FIX: Correctly encode non-ASCII filenames in HTTP header Backport of fix from Rails 6: `890485cfce`	2019-08-07 19:10:50 +02:00
Rafael dos Santos Silva	606c0ed14d	FIX: S3 uploads were missing a cache-control header (#7902 ) Admins still need to run the rake task to fix the files who where uploaded previously.	2019-08-06 14:55:17 -03:00
Gerhard Schlager	f2dc59d61f	FEATURE: Add hidden setting to include S3 uploads in backups	2019-07-09 14:04:16 +02:00
Penar Musaraj	03805e5a76	FIX: Ensure lightbox image download has correct content disposition in S3 (#7845 )	2019-07-04 11:32:51 -04:00
Penar Musaraj	f00275ded3	FEATURE: Support private attachments when using S3 storage (#7677 ) * Support private uploads in S3 * Use localStore for local avatars * Add job to update private upload ACL on S3 * Test multisite paths * update ACL for private uploads in migrate_to_s3 task	2019-06-06 13:27:24 +10:00
Guo Xiang Tan	a3938f98f8	Revert changes to `FileStore::S3Store#path_for` in `f0620e7118`. There are some places in the code base that assumes the method should return nil.	2019-05-29 18:39:07 +08:00
Guo Xiang Tan	f0620e7118	FEATURE: Support `[description\|attachment](upload://<short-sha>)` in MD take 2. Previous attempt was missing `post_uploads` records.	2019-05-29 09:26:32 +08:00
Penar Musaraj	7c9fb95c15	Temporarily revert "FEATURE: Support `[description\|attachment](upload://<short-sha>)` in MD. (#7603 )" This reverts commit `b1d3c678ca`. We need to make sure post_upload records are correctly stored.	2019-05-28 16:37:01 -04:00
Guo Xiang Tan	b1d3c678ca	FEATURE: Support `[description\|attachment](upload://<short-sha>)` in MD. (#7603 )	2019-05-28 11:18:21 -04:00
Sam Saffron	30990006a9	DEV: enable frozen string literal on all files This reduces chances of errors where consumers of strings mutate inputs and reduces memory usage of the app. Test suite passes now, but there may be some stuff left, so we will run a few sites on a branch prior to merging	2019-05-13 09:31:32 +08:00
Guo Xiang Tan	243fb8d9ad	Fix the build.	2019-03-13 17:39:07 +08:00
Vinoth Kannan	563b953224	DEV: Add 'backfill_etags_' to the method name since it also backfilling the etags	2019-02-19 21:54:35 +05:30
Vinoth Kannan	0472bd4adc	FIX: Remove 'backfill_etags' keyword argument from 'uploads:missing' rake task And etags backfilling code is optimized	2019-02-15 00:34:35 +05:30
Vinoth Kannan	7b5931013a	Update rake task to backfill etags from s3 inventory	2019-02-14 05:18:06 +05:30
Vinoth Kannan	b4f713ca52	FEATURE: Use amazon s3 inventory to manage upload stats (#6867 )	2019-02-01 10:10:48 +05:30
Vinoth Kannan	75dbb98cca	FEATURE: Add S3 etag value to uploads table (#6795 )	2019-01-04 14:16:22 +08:00
Rishabh	cae5ba7356	FIX: Ensure that multisite s3 uploads are tombstoned correctly (#6769 ) * FIX: Ensure that multisite uploads are tombstoned into the correct paths * Move multisite specs to spec/multisite/s3_store_spec.rb	2018-12-19 13:32:32 +08:00
Rishabh	503ae1829f	FIX: All multisite upload paths should start with /uploads/default/.. (#6707 )	2018-12-03 12:04:14 +08:00
Rishabh	05a4f3fb51	FEATURE: Multisite support for S3 image stores (#6689 ) * FEATURE: Multisite support for S3 image stores * Use File.join to concatenate all paths & fix linting on multisite/s3_store_spec.rb	2018-11-29 12:11:48 +08:00
Vinoth Kannan	bcdf5b2f47	DEV: improve missing uploads query and skip checking file size	2018-11-27 02:21:33 +05:30
Vinoth Kannan	4ccf9d28eb	Remove trailing whitespaces	2018-11-27 01:15:29 +05:30

1 2

100 Commits