So far, linkcheck scans all of references and images from documents, and
checks them parallel. As a result, some URL would be checked twice (or
more) by race condition.
This collects the URL via post-transforms, and removes duplicated URLs
before checking availability.
refs: #4303
Linkcheck organizes the URLs to checks in a PriorityQueue. The items are
tuples (priority, url, docname, lineno).
Tuples where the lineno is `None` are not comparable with tuples that
have an integer lineno, and PriorityQueue items must be comparable (see
https://bugs.python.org/issue31145).
Fixes an issue when a document contains two links to the same URL, one
with an int line number and the other without line number metadata (such
as an image :target: attribute).
Using 0 instead of None to represent no line number should not lead to
observable changes, the result logger only logs the line number when it
is truthy.
Close#8565
Test `test_connect_to_selfsigned_nonexistent_cert_file` serves two
purposes:
- verify the behavior when the CA bundle file path is incorrect
- flag a leak of REQUESTS_CA_BUNDLE
Assumes that test runs after
`test_connect_to_selfsigned_with_requests_env_var`. Not great, but
better than no testing.
`linkcheck` logic suggests that SSL errors were originally expected to
be ignored.
Blaming the corresponding lines point to issue #3008: linkcheck to
website with self-signed certificates. That issue was fixed by commit
4c7bec6460, which ignored SSL errors.
Probably because back then, users could not specify a CA bundle.
A broken SSL certificate is a real issue, it should not be ignored.
Users wishing to ignore issues for a specific link can use the
`linkcheck_ignore` option.
The current behavior is to report the site as broken, keep it.
Keep imports alphabetically sorted and their order homogeneous across
Python source files.
The isort project has more feature and is more active than the
flake8-import-order plugin.
Most issues caught were simply import ordering from the same module.
Where imports were purposefully placed out of order, tag with
isort:skip.
Makes the test more realistic by issuing an HTTP request.
Reduces coupling between test and the code under test.
The `http_server` helper was factored out into a new tests.utils module.
* TST: linkcheck: make tests more flexible
* CLN: linkcheck: flake8, mypy
* REF: linkcheck: docpath->filename, write_jsonline->write_linkstat
* REF: linkcheck: remove redundant call to doc2path
* TST: linkcheck: show JSON obj structure in test
* REF: linkcheck: remove docname from JSON obj because it's redundant (use path2doc(filename) if necessary)
* TST: linkcheck: don't test row[info] output (see comments for examples)
In Python 3, the default encoding of source files is utf-8. The encoding
cookie is now unnecessary and redundant so remove it. For more details,
see the docs:
https://docs.python.org/3/howto/unicode.html#the-string-type
> The default encoding for Python source code is UTF-8, so you can
> simply include a Unicode character in a string literal ...
Includes a fix for the flake8 header checks to stop expecting an
encoding cookie.
To avoid needing to turn off anchor checking across the entire
documentation allow skipping based on matching the anchor against a
regex.
Some sites/pages use JavaScript to perform anchor assignment in a
webpage, which would require rendering the page to determine whether
the anchor exists. Allow fine grain control of whether the anchor is
checked based on pattern matching, until such stage as the retrieved
URLs can be passed through an engine for deeper checking on the HTML
doctree.