Commit Graph

264 Commits

Author SHA1 Message Date
James Addison
e45fb5e61b
Reduce the lifetime of `response` in the linkcheck builder (#11432)
Co-authored-by: Adam Turner <9087854+aa-turner@users.noreply.github.com>
2023-07-20 21:14:00 +01:00
James Addison
c9d0933e5d
linkcheck: Use context managers for HTTP requests (#11318)
This closes HTTP responses when no content reads are required, as
when requests are made in streaming mode, ``requests`` doesn't know
whether the caller may intend to later read content from a streamed
HTTP response object and holds the socket open.

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
2023-05-09 17:09:35 +01:00
Adam Turner
97f07ca83c Speed up `test_linkcheck` 2023-03-24 00:29:27 +00:00
danieleades
2f03886d55
Remove deprecated code in `sphinx.builders.linkcheck` (#11089)
Co-authored-by: daniel.eades <daniel.eades@hotmail.com>
2023-03-17 15:24:38 +00:00
Adam Turner
31162a9b63 Handle exceptions for `get_node_source and get_node_line` 2023-01-10 15:51:37 +00:00
danieleades
2759c2c76b
Use `any` to find elements in iterable (#11053) 2023-01-02 04:52:46 +00:00
Adam Turner
4032070e81
Run pyupgrade (#11070) 2023-01-02 00:01:14 +00:00
Adam Turner
14a9289d78 Use PEP 604 types 2023-01-01 20:48:39 +00:00
Adam Turner
26f79b0d2d Use PEP 595 types 2023-01-01 20:48:38 +00:00
Adam Turner
f4c8a0a68e Insert `from __future__ import annotations` 2023-01-01 20:48:37 +00:00
danieleades
3c73efadab
shrink 'Any generics' mypy whitelist for builders module (#10846) 2022-09-29 17:26:53 +01:00
n-peugnet
1553cc3b36 linkcheck: Check the source URL of raw directives
Add raw directives' source URL to the list of links to check with linkcheck.
By the way, refactor HyperlinkCollector by adding `add_uri` function.
Add test for linkcheck raw directives source URL
2022-08-17 14:57:58 +02:00
danieleades
a504ac6100
Improve static typing strictness (#10569) 2022-07-18 22:08:16 +01:00
danieleades
25d379fb53
Lint with flake8-bugbear (#10602)
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
2022-07-12 22:55:57 +01:00
danieleades
12e86ff0e1
Use the flake8-comprehensions lint plugin (#10601) 2022-06-26 14:43:05 +01:00
Adam Turner
55669f6cfc Specify encoding 2022-04-22 04:21:12 +01:00
Takeshi KOMIYA
aa1bc83c2a Merge branch '4.x' 2022-03-19 22:58:15 +09:00
Adam Turner
5775912455 Collapse single line docstrings 2022-02-20 03:13:45 +00:00
Adam Turner
6bb7b891a1 Remove copyright and licence fields 2022-02-20 03:06:23 +00:00
Adam Turner
5694e0ce60 Fix module docstring indentation 2022-02-20 00:35:13 +00:00
Adam Turner
4f5a3269a6 Fix module docstring first line 2022-02-20 00:11:08 +00:00
Adam Turner
6b8bccec59 Remove module titles in docstrings 2022-02-19 23:17:29 +00:00
Takeshi KOMIYA
444dfc50aa Merge branch '4.x' 2022-01-17 00:22:09 +09:00
Takeshi KOMIYA
6a0215198f Merge branch '4.x' 2022-01-16 02:26:11 +09:00
Daniel Eades
6697ed62ed address some unused loop control variables (B007) 2022-01-12 20:19:18 +00:00
Daniel Eades
574e787bf1 use class-style syntax for 'NamedTuple's 2022-01-10 15:35:27 +00:00
Takeshi KOMIYA
b21b90d292 Merge branch '4.x' 2022-01-03 01:01:06 +09:00
Takeshi KOMIYA
05a898ecb4 Migrate to Node.findall() from Node.traverse()
Node.traverse() was marked as deprecated since docutils-0.18.  Instead
of it, Node.findall() has been added as successor of traverse().

This applies a patch to docutils-0.17 or older to be available
Node.findall() and use it.
2022-01-03 00:35:29 +09:00
Takeshi KOMIYA
ce8039db1f Merge branch '4.x' 2022-01-01 20:04:19 +09:00
Takeshi KOMIYA
b84771dcd2 A happy new year! 2022-01-01 18:45:03 +09:00
Takeshi KOMIYA
980ccc4c2a Merge branch '4.x' 2021-12-10 01:45:15 +09:00
Christian Roth
10023da895 linkcheck: Exclude links from matched documents 2021-12-08 10:01:45 +01:00
Takeshi KOMIYA
3f51f1a6cf Merge branch '4.x' 2021-10-09 13:50:50 +09:00
oleg.hoefling
36a6fcaef1
don't print file extension twice in linkcheck warnings
Signed-off-by: oleg.hoefling <oleg.hoefling@gmail.com>
2021-09-27 22:12:57 +02:00
Takeshi KOMIYA
43517219b7 refactor: linkcheck: Remove objects marked as RemovedInSphinx50Warning 2021-09-06 02:14:26 +09:00
Takeshi KOMIYA
a35b009adc Fix #9435: linkcheck: Failed to check anchors in github.com
The approach of `rewrite_github_anchor` makes some anchors valid.  But
it also makes other kind of anchors invalid.  This disables the handler
to make them valid again (while 4.1.x release).
2021-07-18 13:03:57 +09:00
Takeshi KOMIYA
5e5bca98f7
Merge branch '4.x' into 6525_linkcheck_warn_redirects 2021-07-07 02:09:50 +09:00
Takeshi KOMIYA
7e71b759d7
Merge branch '4.x' into 4.0.x 2021-06-13 17:20:01 +09:00
Justin Mathews
ce305190c5
shorter explanatory comments
Co-authored-by: François Freitag <mail@franek.fr>
2021-06-10 11:40:44 -04:00
Justin Mathews
d804981a37
alphabetical ordering
Co-authored-by: François Freitag <mail@franek.fr>
2021-06-10 11:40:16 -04:00
Justin Mathews
193ea9153e
alphabetical ordering
Co-authored-by: François Freitag <mail@franek.fr>
2021-06-10 11:40:01 -04:00
Justin Mathews
935df33d95 comment explaining why try GET when HEAD got a ConnectionError 2021-06-09 23:56:02 -04:00
Justin Mathews
57c866caf1 catch ConnectionError on HEAD 2021-06-07 17:53:36 -04:00
Takeshi KOMIYA
4776cd329c Fix ImportError 2021-06-03 21:54:38 +09:00
Takeshi KOMIYA
bc0e3b4405
linkcheck: Use urlparse to check and reconstruct URI for github.com
Co-authored-by: François Freitag <mail@franek.fr>
2021-06-03 21:52:00 +09:00
Takeshi KOMIYA
92335bd6e6 Close #9016: linkcheck builder failed to check the anchors of github.com 2021-06-01 00:31:15 +09:00
Takeshi KOMIYA
988a79de65 linkcheck: Emit a warning for disallowed redirects
Now linkcheck builder integrates `linkcheck_warn_redirects` into
`linkcheck_allowed_redirects`.  As a result, linkcheck builder will
emit a warning when "disallowed" redirection detected via
`linkcheck_allowed_redirects`.
2021-05-31 21:53:09 +09:00
Takeshi KOMIYA
ce9e2e6c74 Rename linkcheck_ignore_redirects to linkcheck_allowed_redirects 2021-05-20 23:26:16 +09:00
Takeshi KOMIYA
707319aab2 Close #6525: linkcheck: Add linkcheck_ignore_redirects
Add a new confval; linkcheck_ignore_redirects to ignore hyperlinks
that are redirected as expected.
2021-05-16 02:48:03 +09:00
Takeshi KOMIYA
05eb2ca06f Close #6525: linkcheck: Add linkcheck_warn_redirects
Add a new confval; `linkcheck_warn_redirects` to emit a warning when
the hyperlink is redirected.  It's useful to detect unexpected redirects
under the warn-is-error mode.
2021-05-16 02:29:03 +09:00
Takeshi KOMIYA
29038c9d4c refactor: linkcheck: Call write_linkstat() at the top of process_result() 2021-04-29 18:19:38 +09:00
Takeshi KOMIYA
a7d3e9684d refactor: linkcheck: Use attributes of CheckResult in process_result() 2021-04-29 18:01:42 +09:00
Takeshi KOMIYA
aeb9e42d2b refactor: Use PEP-526 based variable annotation (sphinx.builders) 2021-03-13 16:37:50 +09:00
Takeshi KOMIYA
725f74f5eb refactor: linkcheck: Remove next_check from Hyperlink object
To separate hyperlink itself and checking (intermediate) state, this
removes `next_check` attribute from Hyperlink object and add a new
named-tuple `CheckRequest`.  It was rejected idea on implementing
rate-limiting first (see #8467).  But it's better way to separate
large builder module into small components and make whole of linkcheck
builder simple.

After this change, `Hyperlink` object represents a hyperlink on the
document.  It has a URI and location info (docname and lineno).
2021-02-13 00:44:02 +09:00
Takeshi KOMIYA
7252abab1c refactor: linkcheck: Refine the constructor of Checker and CheckWorker
Make the constructors of Checker and CheckWorker classes less-coupled
with linkcheck builder.
2021-02-12 23:20:07 +09:00
Takeshi KOMIYA
5c223d20d6 refactor: linkcheck: Separate thread manager feature from builder class
To reduce the complexity of the linkcheck builder, this separates
the thread manager feature from the builder class as
HyperlinkAvailabilityChecker.
2021-02-12 23:19:01 +09:00
Takeshi KOMIYA
899ccfd40e refactor: linkcheck: Deprecate attributes of linkcheck builders
Move anchors_ignore, auth and to_ignore to
HyperlinkAvailabilityCheckWorker and become deprecated.
2021-02-07 02:39:35 +09:00
Takeshi KOMIYA
ad5b0babd7 refactor: linkcheck: Remove unused attribute HyperlinkAvailabilityCheckWorker.app 2021-02-06 01:34:20 +09:00
Takeshi KOMIYA
84130fff40
Fix typo
Co-authored-by: François Freitag <mail@franek.fr>
2021-02-06 01:24:16 +09:00
Takeshi KOMIYA
f02fb7a8cc refactor: linkcheck: Separate worker feature from builder class
To reduce the complexity of the linkcheck builder, this separates
the worker feature from the builder class.
2021-02-05 22:52:28 +09:00
François Freitag
b12a0f33ef
Formalize linkcheck CheckResult into a NamedTuple 2021-02-04 21:53:49 +01:00
Takeshi KOMIYA
421c6bb473 refactor: linkcheck: Skip queuing ignored URIs
To make checker-thread simpler, this checks the URI is ignored before
queueing.
2021-02-04 01:09:19 +09:00
Takeshi KOMIYA
88c6330900 linkcheck: The docname of hyperlink is not displayed (refs: #8791)
Currently, linkcheck displays the status of hyperlinks.  But it is hard
to search where the hyperlink is written because only line numbers are
shown as the location for the link.

This displays the docname of the link too.
2021-02-01 01:15:53 +09:00
Takeshi KOMIYA
434cc60ae5
Merge pull request #8794 from tk0miya/refactor_linkcheck3
refactor: linkcheck: Update type annotations
2021-01-31 23:45:59 +09:00
Takeshi KOMIYA
6701628e2e refactor: linkcheck: Update type annotations 2021-01-31 16:40:51 +09:00
Takeshi KOMIYA
d39fb5ce3a refactor: linkcheck: Access config object via self.config
Now builder objects have `config` attribute as a reference to the
living config object.  No reason to access it via `self.app.config`
longer!
2021-01-31 16:22:04 +09:00
François Freitag
227955cbe8
linkcheck: Raise on unknown status in process_result()
Helps catching programming errors. The else clause should never be
reached.
2021-01-27 17:36:10 +01:00
Takeshi KOMIYA
6c01d7614b
Merge branch '3.x' into rm-n 2021-01-24 22:43:33 +09:00
François Freitag
1121e5e8bc
Linkcheck: Derive number of links from the post-transform result
The number of links to check is the number of links in self.hyperlinks,
populated by the post-transform.
2021-01-24 12:24:52 +01:00
François Freitag
54df51e86f
Linkcheck: Don’t repeatedly open/close log files
Opening and closing a file requires processing from the operating
system. Repeatedly opening and closing wastes system resources and
hinders buffering, causing a flush (disk I/O) after each write
operation.

Using a context manager ensures the logs are flushed to disk and files
are properly closed whether the program exists successfully or an
exception occurs. Compared to the previous implementation, a brutal
shutdown of the machine (e.g. power cord disconnected) could cause some
log lines not to be written. That should not be an issue in practice.

Now, files are created and truncated when linkcheck submitted the links
to check to the threads and is ready to process the results, instead of
when the builder is constructed. It keeps the file operations closer to
their use.
2021-01-24 12:15:07 +01:00
Takeshi KOMIYA
2308695d24
Merge branch '3.x' into unused-attrs-linkcheck 2021-01-22 02:41:32 +09:00
Takeshi KOMIYA
a5b0d96c70
Merge branch '3.x' into typo 2021-01-22 01:15:53 +09:00
François Freitag
aa5e4e2da0 Deprecate linkcheck builder {broken,good,redirected}
These attributes were used to cache checked links and avoid issuing
another web request to the same URI.

Since 82ef497a8c, links are pre-processed
to ensure uniqueness. This caching the results of checked links is no
longer useful.
2021-01-21 17:06:51 +01:00
François Freitag
52fde7e7b1 Match linkcheck deprecation warning version with deprecated.rst
Deprecated.rst states the node_line_or_0 helper will be removed in
Sphinx 5.0, use a RemovedInSphinx50Warning.
2021-01-21 16:39:40 +01:00
Takeshi KOMIYA
bd103a82c9 refactor: linkcheck: Make linkcheck builder to a subclass of DummyBuilder
After recent refactoring, the linkcheck builder does not do "writing".
So it would be better to inherit the DummyBuilder.
2021-01-20 21:37:23 +09:00
Takeshi KOMIYA
cead0f6ddf linkcheck: Fix race condition that could lead to checking the availability of the same URL twice
So far, linkcheck scans all of references and images from documents, and
checks them parallel.  As a result, some URL would be checked twice (or
more) by race condition.

This collects the URL via post-transforms, and removes duplicated URLs
before checking availability.

refs: #4303
2021-01-20 20:58:27 +09:00
Takeshi KOMIYA
f996859420 A happy new year!
.. note::

   $ find sphinx tests LICENSE doc/conf.py -type f -exec sed -i '' -e 's/2007\-20../2007-2021/' {} \;
   $ git co sphinx/locale/**/*.js sphinx/templates/epub3/mimetype
2021-01-01 13:40:48 +09:00
François Freitag
a1b8b1febb
Ensure linkcheck items are comparable
Linkcheck organizes the URLs to checks in a PriorityQueue. The items are
tuples (priority, url, docname, lineno).

Tuples where the lineno is `None` are not comparable with tuples that
have an integer lineno, and PriorityQueue items must be comparable (see
https://bugs.python.org/issue31145).

Fixes an issue when a document contains two links to the same URL, one
with an int line number and the other without line number metadata (such
as an image :target: attribute).

Using 0 instead of None to represent no line number should not lead to
observable changes, the result logger only logs the line number when it
is truthy.

Close #8565
2020-12-22 21:18:31 +01:00
François Freitag
6b90a63f08 Fix #6629: linkcheck: Handle rate-limiting
Follow the Retry-After header if present, otherwise use an exponential
back-off.
2020-11-25 17:34:55 +01:00
Takeshi KOMIYA
67846b1e93 Merge branch '3.x' into 8131_too_many_redirects 2020-11-23 01:42:10 +09:00
François Freitag
683635f5b4 linkcheck: Remove call to is_ssl_error()
This method always returns False, it is dead code. The exception
checking stopped working because Requests library wraps SSL errors in a
`requests.exceptions.SSLError` and no longer throws an
`urllib3.exceptions.SSLError`. The first argument to that exception is
an `urllib3.exceptions.MaxRetryError`.
2020-11-12 19:58:04 +01:00
François Freitag
640cc40b7e
linkcheck: Set allow_redirects in requests.head() call
Following redirects is the default for other methods.
https://requests.readthedocs.io/en/latest/api/#requests.request

Define the option closer to its use.
2020-11-11 18:32:54 +01:00
François Freitag
0949735210
Sort imports with isort
Keep imports alphabetically sorted and their order homogeneous across
Python source files.

The isort project has more feature and is more active than the
flake8-import-order plugin.

Most issues caught were simply import ordering from the same module.
Where imports were purposefully placed out of order, tag with
isort:skip.
2020-11-11 13:19:05 +01:00
Takeshi KOMIYA
b415b25c09 Merge branch '3.2.x' into 3.x 2020-11-01 20:31:49 +09:00
Takeshi KOMIYA
eb3d9355f5
Merge pull request #8332 from sphinx-doc/8321_linkcheck_tel_links
Fix #8321: linkcheck: ``tel:`` schema hyperlinks are detected as errors
2020-10-25 18:48:50 +09:00
Takeshi KOMIYA
3171a44032 Fix #8321: linkcheck: `tel:` schema hyperlinks are detected as errors 2020-10-24 20:11:23 +09:00
Vasista Vovveti
72985c250b
Fix broken url not reporting error
Some links are printed as broken but do not error out the build.

This issue appeared when include `tel:` links in our build.
2020-10-20 12:43:10 -05:00
François Freitag
55f7919531
Linkcheck: Use Thread daemon argument
Instead of using a separate call.
2020-10-11 11:41:42 +02:00
Takeshi KOMIYA
837a4d1173
Merge pull request #8245 from mgeier/linkcheck-sourcedir
linkcheck: take source directory into account for local files
2020-10-04 17:17:33 +09:00
Matthias Geier
6b3d445879 Pass docname instead of srcdir 2020-10-04 10:02:57 +02:00
François Freitag
5ea8ee133d
Fix #8268: make linkcheck report HTTP errors 2020-10-03 14:33:29 +02:00
Matthias Geier
786972e47f linkcheck: take source directory into account for local files 2020-09-27 20:54:50 +02:00
Sebastien Besson
33732a3147 Extend linkchecker GET fallback logic to handle Too Many Redirects
Some websites will enter infinite redirect loops with HEAD requests. In this
case, the GET fallback is ignored as the exception is of type TooManyRedirects
and the link is reported as broken.
This extends the except clause to retry with a GET request for such scenarios.
2020-08-17 09:27:29 +01:00
Takeshi KOMIYA
875346307f linkcheck: Fix a protocol relative URL is considered as a local file
Since #7985, a protocol relative URL (URL starts with "//") is considered
as a local file incorrectly.  This makes it to a "unchecked" URL.

refs: #7985
2020-07-24 02:13:23 +09:00
Takeshi KOMIYA
f95ba21f4a Close #5208: linkcheck: Support checks for local links 2020-07-19 19:08:14 +09:00
Takeshi KOMIYA
a7725ad8ca Close #7247: linkcheck: Add linkcheck_request_headers 2020-06-01 01:48:46 +09:00
Wes Turner
fd94270f1c ENH: linkcheck: also write all links to output.json
* TST: linkcheck: make tests more flexible
* CLN: linkcheck: flake8, mypy
* REF: linkcheck: docpath->filename, write_jsonline->write_linkstat
* REF: linkcheck: remove redundant call to doc2path
* TST: linkcheck: show JSON obj structure in test
* REF: linkcheck: remove docname from JSON obj because it's redundant (use path2doc(filename) if necessary)
* TST: linkcheck: don't test row[info] output (see comments for examples)
2020-02-12 16:29:26 -05:00
Takeshi KOMIYA
041435024f Fix #7055: linkcheck: redirect is treated as an error 2020-01-30 23:08:00 +09:00
Takeshi KOMIYA
fc523c3ccf A happy new year! 2020-01-01 11:15:42 +09:00
Georg Brandl
3398194135 builders/linkcheck: include "experimental" HTTP 308 as "permanently"
Also remove redundant "default" case.
2019-11-30 17:07:25 +01:00