Commit Graph

6854 Commits

Author SHA1 Message Date
Maxim Dounin
c251961c41 Fixed request termination with AIO and subrequests (ticket #2555).
When a request was terminated due to an error via ngx_http_terminate_request()
while an AIO operation was running in a subrequest, various issues were
observed.  This happened because ngx_http_request_finalizer() was only set
in the subrequest where ngx_http_terminate_request() was called, but not
in the subrequest where the AIO operation was running.  After completion
of the AIO operation normal processing of the subrequest was resumed, leading
to issues.

In particular, in case of the upstream module, termination of the request
called upstream cleanup, which closed the upstream connection.  Attempts to
further work with the upstream connection after AIO operation completion
resulted in segfaults in ngx_ssl_recv(), "readv() failed (9: Bad file
descriptor) while reading upstream" errors, or socket leaks.

In ticket #2555, issues were observed with the following configuration
with cache background update (with thread writing instrumented to
introduce a delay, when a client closes the connection during an update):

    location = /background-and-aio-write {
        proxy_pass ...
        proxy_cache one;
        proxy_cache_valid 200 1s;
        proxy_cache_background_update on;
        proxy_cache_use_stale updating;
        aio threads;
        aio_write on;
        limit_rate 1000;
    }

Similarly, the same issue can be seen with SSI, and can be caused by
errors in subrequests, such as in the following configuration
(where "/proxy" uses AIO, and "/sleep" returns 444 after some delay,
causing request termination):

    location = /ssi-active-boom {
        ssi on;
        ssi_types *;
        return 200 '
                   <!--#include virtual="/proxy" -->
                   <!--#include virtual="/sleep" -->
                   ';
        limit_rate 1000;
    }

Or the same with both AIO operation and the error in non-active subrequests
(which needs slightly different handling, see below):

    location = /ssi-non-active-boom {
        ssi on;
        ssi_types *;
        return 200 '
                   <!--#include virtual="/static" -->
                   <!--#include virtual="/proxy" -->
                   <!--#include virtual="/sleep" -->
                   ';
        limit_rate 1000;
    }

Similarly, issues can be observed with just static files.  However,
with static files potential impact is limited due to timeout safeguards
in ngx_http_writer(), and the fact that c->error is set during request
termination.

In a simple configuration with an AIO operation in the active subrequest,
such as in the following configuration, the connection is closed right
after completion of the AIO operation anyway, since ngx_http_writer()
tries to write to the connection and fails due to c->error set:

    location = /ssi-active-static-boom {
        ssi on;
        ssi_types *;
        return 200 '
                   <!--#include virtual="/static-aio" -->
                   <!--#include virtual="/sleep" -->
                   ';
        limit_rate 1000;
    }

In the following configuration, with an AIO operation in a non-active
subrequest, the connection is closed only after send_timeout expires:

    location = /ssi-non-active-static-boom {
        ssi on;
        ssi_types *;
        return 200 '
                   <!--#include virtual="/static" -->
                   <!--#include virtual="/static-aio" -->
                   <!--#include virtual="/sleep" -->
                   ';
        limit_rate 1000;
    }

Fix is to introduce r->main->terminated flag, which is to be checked
by AIO event handlers when the r->main->blocked counter is decremented.
When the flag is set, handlers are expected to wake up the connection
instead of the subrequest (which might be already cleaned up).

Additionally, now ngx_http_request_finalizer() is always set in the
active subrequest, so waking up the connection properly finalizes the
request even if termination happened in a non-active subrequest.
2024-01-30 03:20:05 +03:00
Maxim Dounin
b794465178 AIO operations now add timers (ticket #2162).
Each AIO (thread IO) operation being run is now accompanied with 1-minute
timer.  This timer prevents unexpected shutdown of the worker process while
an AIO operation is running, and logs an alert if the operation is running
for too long.

This fixes "open socket left" alerts during worker processes shutdown
due to pending AIO (or thread IO) operations while corresponding requests
have no timers.  In particular, such errors were observed while reading
cache headers (ticket #2162), and with worker_shutdown_timeout.
2024-01-29 10:31:37 +03:00
Maxim Dounin
cc4c3ee0a4 Silenced complaints about socket leaks on forced termination.
When graceful shutdown was requested, and then nginx was forced to
do fast shutdown, it used to (incorrectly) complain about open sockets
left in connections which weren't yet closed when fast shutdown
was requested.

Fix is to avoid complaining about open sockets when fast shutdown was
requested after graceful one.  Abnormal termination, if requested with
the WINCH signal, can still happen though.
2024-01-29 10:29:39 +03:00
Sergey Kandaurov
f255815f5d SSL: reasonable version for LibreSSL adjusted.
OPENSSL_VERSION_NUMBER is now redefined to 0x1010000fL for LibreSSL 3.5.0
and above.  Building with older LibreSSL versions, such as 2.8.0, may now
produce warnings (see cab37803ebb3) and may require appropriate compiler
options to suppress them.

Notably, this allows to start using SSL_get0_verified_chain() appeared
in OpenSSL 1.1.0 and LibreSSL 3.5.0, without additional macro tests.

Prodded by Ilya Shipitsin.
2023-12-25 21:15:48 +04:00
Sergey Kandaurov
d7923960a8 SSL: disabled renegotiation checks with LibreSSL.
Similar to 7356:e3ba4026c02d, as long as SSL_OP_NO_CLIENT_RENEGOTIATION
is defined, it is the library responsibility to prevent renegotiation.

Additionally, this allows to raise LibreSSL version used to redefine
OPENSSL_VERSION_NUMBER to 0x1010000fL, such that this won't result in
attempts to dereference SSL objects made opaque in LibreSSL 3.4.0.

Patch by Maxim Dounin.
2023-12-25 21:15:47 +04:00
J Carter
c0134ded9f Win32: extended ngx_random() range to 0x7fffffff.
rand() is used on win32. RAND_MAX is implementation defined. win32's is
0x7fff.

Existing uses of ngx_random() rely upon 0x7fffffff range provided by
POSIX implementations of random().
2023-12-09 08:38:14 +00:00
Sergey Kandaurov
5e74324284 QUIC: fixed format specifier after a6f79f044de5. 2023-12-16 03:40:01 +04:00
Sergey Kandaurov
386329d3cf QUIC: path aware in-flight bytes accounting.
On-packet acknowledgement is made path aware, as per RFC 9000, Section 9.4:
    Packets sent on the old path MUST NOT contribute to congestion control
    or RTT estimation for the new path.

To make this possible in a single congestion control context, the first packet
to be sent after the new path has been validated, which includes resetting the
congestion controller and RTT estimator, is now remembered in the connection.
Packets sent previously, such as on the old path, are not taken into account.

Note that although the packet number is saved per-connection, the added checks
affect application level packets only.  For non-application level packets,
which are only processed prior to the handshake is complete, the remembered
packet number remains set to zero.
2023-12-12 20:21:12 +04:00
Sergey Kandaurov
4ee2a48f3f QUIC: reset RTT estimator for the new path.
RTT is a property of the path, it must be reset on confirming a peer's
ownership of its new address.
2023-12-12 20:20:51 +04:00
Roman Arutyunyan
c1efb3a725 QUIC: path revalidation after expansion failure.
As per RFC 9000, Section 8.2.1:

    When an endpoint is unable to expand the datagram size to 1200 bytes due
    to the anti-amplification limit, the path MTU will not be validated.
    To ensure that the path MTU is large enough, the endpoint MUST perform a
    second path validation by sending a PATH_CHALLENGE frame in a datagram of
    at least 1200 bytes.
2023-11-29 10:58:21 +04:00
Roman Arutyunyan
209e8bc0c0 QUIC: ngx_quic_frame_t time fields cleanup.
The field "first" is removed.  It's unused since 909b989ec088.
The field "last" is renamed to "send_time".  It holds frame send time.
2023-11-30 15:03:06 +04:00
Roman Arutyunyan
ccca701dc6 QUIC: congestion control in ngx_quic_frame_sendto().
Previously ngx_quic_frame_sendto() ignored congestion control and did not
contribute to in_flight counter.

Now congestion control window is checked unless ignore_congestion flag is set.
Also, in_flight counter is incremented and the frame is stored in ctx->sent
queue if it's ack-eliciting.  This behavior is now similar to
ngx_quic_output_packet().
2023-11-29 21:41:29 +04:00
Roman Arutyunyan
0c0f340554 QUIC: ignore duplicate PATH_CHALLENGE frames.
According to RFC 9000, an endpoint SHOULD NOT send multiple PATH_CHALLENGE
frames in a single packet.  The change adds a check to enforce this claim to
optimize server behavior.  Previously each PATH_CHALLENGE always resulted in a
single response datagram being sent to client.  The effect of this was however
limited by QUIC flood protection.

Also, PATH_CHALLENGE is explicitly disabled in Initial and Handshake levels,
see RFC 9000, Table 3.  However, technically it may be sent by client in 0-RTT
over a new path without actual migration, even though the migration itself is
prohibited during handshake.  This allows client to coalesce multiple 0-RTT
packets each carrying a PATH_CHALLENGE and end up with multiple PATH_CHALLENGEs
per datagram.  This again leads to suboptimal behavior, see above.  Since the
purpose of sending PATH_CHALLENGE frames in 0-RTT is unclear, these frames are
now only allowed in 1-RTT.  For 0-RTT they are silently ignored.
2023-11-22 14:48:12 +04:00
Roman Arutyunyan
6c78bb9bb1 QUIC: fixed anti-amplification with explicit send.
Previously, when using ngx_quic_frame_sendto() to explicitly send a packet with
a single frame, anti-amplification limit was not properly enforced.  Even when
there was no quota left for the packet, it was sent anyway, but with no padding.
Now the packet is not sent at all.

This function is called to send PATH_CHALLENGE/PATH_RESPONSE, PMTUD and probe
packets.  For all these cases packet send is retried later in case the send was
not successful.
2023-11-22 14:52:21 +04:00
Roman Arutyunyan
0efe8db1d0 QUIC: avoid partial expansion of PATH_CHALLENGE/PATH_RESPONSE.
By default packets with these frames are expanded to 1200 bytes.  Previously,
if anti-amplification limit did not allow this expansion, it was limited to
whatever size was allowed.  However RFC 9000 clearly states no partial
expansion should happen in both cases.

    Section 8.2.1.  Initiating Path Validation:

    An endpoint MUST expand datagrams that contain a PATH_CHALLENGE frame
    to at least the smallest allowed maximum datagram size of 1200 bytes,
    unless the anti-amplification limit for the path does not permit
    sending a datagram of this size.

    Section 8.2.2. Path Validation Responses:

    An endpoint MUST expand datagrams that contain a PATH_RESPONSE frame
    to at least the smallest allowed maximum datagram size of 1200 bytes.
    ...
    However, an endpoint MUST NOT expand the datagram containing the
    PATH_RESPONSE if the resulting data exceeds the anti-amplification limit.
2023-11-29 18:13:25 +04:00
Vladimir Khomutov
d8fa024ef1 HTTP: uniform checks in ngx_http_alloc_large_header_buffer().
If URI is not fully parsed yet, some pointers are not set.  As a result,
the calculation of "new + (ptr - old)" expression is flawed.

According to C11, 6.5.6 Additive operators, p.9:

: When two pointers are subtracted, both shall point to elements
: of the same array object, or one past the last element of the
: array object

Since "ptr" is not set, subtraction leads to undefined behaviour, because
"ptr" and "old" are not in the same buffer (i.e. array objects).

Prodded by GCC undefined behaviour sanitizer.
2023-11-29 11:13:05 +03:00
Vladimir Khomutov
0db94ba96a HTTP: removed unused r->port_start and r->port_end.
Neither r->port_start nor r->port_end were ever used.

The r->port_end is set by the parser, though it was never used by
the following code (and was never usable, since not copied by the
ngx_http_alloc_large_header_buffer() without r->port_start set).
2023-11-28 12:57:14 +03:00
Sergey Kandaurov
f9a25736fd HTTP/3: added Huffman decoding error logging. 2023-11-14 15:26:02 +04:00
Sergey Kandaurov
6a4eb51f5e Adjusted Huffman coding debug logging, missed in 7977:336084ff943b.
Spotted by XingY Wang.
2023-11-14 14:50:03 +04:00
Vladimir Khomutov
a13ed7f5ed QUIC: improved packet and frames debug tracing.
Currently, packets generated by ngx_quic_frame_sendto() and
ngx_quic_send_early_cc() are not logged, thus making it hard
to read logs due to gaps appearing in packet numbers sequence.

At frames level, it is handy to see immediately packet number
in which they arrived or being sent.
2023-10-26 23:35:09 +03:00
Sergey Kandaurov
1f1bc17ba8 Version bump. 2023-10-27 01:29:28 +04:00
Sergey Kandaurov
b19bc2e0fa HTTP/2: fixed buffer management with HTTP/2 auto-detection.
As part of normal HTTP/2 processing, incomplete frames are saved in the
control state using a fixed size memcpy of NGX_HTTP_V2_STATE_BUFFER_SIZE.
For this matter, two state buffers are reserved in the HTTP/2 recv buffer.

As part of HTTP/2 auto-detection on plain TCP connections, initial data
is first read into a buffer specified by the client_header_buffer_size
directive that doesn't have state reservation.  Previously, this made it
possible to over-read the buffer as part of saving the state.

The fix is to read the available buffer size rather than a fixed size.
Although memcpy of a fixed size can produce a better optimized code,
handling of incomplete frames isn't a common execution path, so it was
sacrificed for the sake of simplicity of the fix.
2023-10-21 18:48:24 +04:00
Sergey Kandaurov
31620d1a89 QUIC: explicitly zero out unused keying material. 2023-10-20 18:05:07 +04:00
Sergey Kandaurov
b94f1fbee3 QUIC: removed key field from ngx_quic_secret_t.
It is made local as it is only needed now when creating crypto context.

BoringSSL lacks EVP interface for ChaCha20, providing instead
a function for one-shot encryption, thus hp is still preserved.

Based on a patch by Roman Arutyunyan.
2023-10-20 18:05:07 +04:00
Sergey Kandaurov
01bd8caceb QUIC: simplified ngx_quic_ciphers() API.
After conversion to reusable crypto ctx, now there's enough caller
context to remove the "level" argument from ngx_quic_ciphers().
2023-10-20 18:05:07 +04:00
Sergey Kandaurov
d15f8f2c85 QUIC: cleaned up now unused ngx_quic_ciphers() calls. 2023-10-20 18:05:07 +04:00
Sergey Kandaurov
4f60ee789e QUIC: reusing crypto contexts for header protection. 2023-10-20 18:05:07 +04:00
Sergey Kandaurov
52d50714eb QUIC: common code for crypto open and seal operations. 2023-10-20 18:05:07 +04:00
Sergey Kandaurov
80a695add8 QUIC: reusing crypto contexts for packet protection. 2023-10-20 18:05:07 +04:00
Sergey Kandaurov
885a02696e QUIC: renamed protection functions.
Now these functions have names ngx_quic_crypto_XXX():

  - ngx_quic_tls_open() -> ngx_quic_crypto_open()
  - ngx_quic_tls_seal() -> ngx_quic_crypto_seal()
  - ngx_quic_tls_hp() -> ngx_quic_crypto_hp()
2023-10-20 18:05:07 +04:00
Sergey Kandaurov
8e1217c46d QUIC: prevented generating ACK frames with discarded keys.
Previously it was possible to generate ACK frames using formally discarded
protection keys, in particular, when acknowledging a client Handshake packet
used to complete the TLS handshake and to discard handshake protection keys.
As it happens late in packet processing, it could be possible to generate ACK
frames after the keys were already discarded.

ACK frames are generated from ngx_quic_ack_packet(), either using a posted
push event, which envolves ngx_quic_generate_ack() as a part of the final
packet assembling, or directly in ngx_quic_ack_packet(), such as when there
is no room to add a new ACK range or when the received packet is out of order.
The added keys availability check is used to avoid generating late ACK frames
in both cases.
2023-10-20 18:05:07 +04:00
Sergey Kandaurov
fffd2823ba QUIC: added safety belt to prevent using discarded keys.
In addition to triggering alert, it ensures that such packets won't be sent.

With the previous change that marks server keys as discarded by zeroing the
key lengh, it is now an error to send packets with discarded keys.  OpenSSL
based stacks tolerate such behaviour because key length isn't used in packet
protection, but BoringSSL will raise the UNSUPPORTED_KEY_SIZE cipher error.
It won't be possible to use discarded keys with reused crypto contexts as it
happens in subsequent changes.
2023-10-20 18:05:07 +04:00
Sergey Kandaurov
cd5f4cd8d3 QUIC: split keys availability checks to read and write sides.
Keys may be released by TLS stack in different times, so it makes sense
to check this independently as well.  This allows to fine-tune what key
direction is used when checking keys availability.

When discarding, server keys are now marked in addition to client keys.
2023-08-31 19:54:10 +04:00
Maxim Dounin
c93cb45ae3 Core: changed ngx_queue_sort() to use merge sort.
This improves nginx startup times significantly when using very large number
of locations due to computational complexity of the sorting algorithm being
used: insertion sort is O(n*n) on average, while merge sort is O(n*log(n)).
In particular, in a test configuration with 20k locations total startup
time is reduced from 8 seconds to 0.9 seconds.

Prodded by Yusuke Nojima,
https://mailman.nginx.org/pipermail/nginx-devel/2023-September/NUL3Y2FPPFSHMPTFTL65KXSXNTX3NQMK.html
2023-10-18 04:30:11 +03:00
Maxim Dounin
284a0c7377 Core: fixed memory leak on configuration reload with PCRE2.
In ngx_regex_cleanup() allocator wasn't configured when calling
pcre2_compile_context_free() and pcre2_match_data_free(), resulting
in no ngx_free() call and leaked memory.  Fix is ensure that allocator
is configured for global allocations, so that ngx_free() is actually
called to free memory.

Additionally, ngx_regex_compile_context was cleared in
ngx_regex_module_init().  It should be either not cleared, so it will
be freed by ngx_regex_cleanup(), or properly freed.  Fix is to
not clear it, so ngx_regex_cleanup() will be able to free it.

Reported by ZhenZhong Wu,
https://mailman.nginx.org/pipermail/nginx-devel/2023-September/3Z5FIKUDRN2WBSL3JWTZJ7SXDA6YIWPB.html
2023-10-17 02:39:38 +03:00
Maxim Dounin
6ceef192e7 HTTP/2: per-iteration stream handling limit.
To ensure that attempts to flood servers with many streams are detected
early, a limit of no more than 2 * max_concurrent_streams new streams per one
event loop iteration was introduced.  This limit is applied even if
max_concurrent_streams is not yet reached - for example, if corresponding
streams are handled synchronously or reset.

Further, refused streams are now limited to maximum of max_concurrent_streams
and 100, similarly to priority_limit initial value, providing some tolerance
to clients trying to open several streams at the connection start, yet
low tolerance to flooding attempts.
2023-10-10 15:13:39 +03:00
Vladimir Khomutov
c37fdcdd1e QUIC: handle callback errors in compat.
The error may be triggered in add_handhshake_data() by incorrect transport
parameter sent by client.  The expected behaviour in this case is to close
connection complaining about incorrect parameter.  Currently the connection
just times out.
2023-09-22 19:23:57 +04:00
Roman Arutyunyan
027b681688 Modules compatibility: added QUIC to signature (ticket #2539).
Enabling QUIC changes ngx_connection_t layout, which is why it should be
added to the signature.
2023-09-13 17:48:15 +04:00
Roman Arutyunyan
196289ac18 QUIC: simplified setting close timer when closing connection.
Previously, the timer was never reset due to an explicit check.  The check was
added in 36b59521a41c as part of connection close simplification.  The reason
was to retain the earliest timeout.  However, the timeouts are all the same
while QUIC handshake is in progress and resetting the timer for the same value
has no performance implications.  After handshake completion there's only
application level.  The change removes the check.
2023-09-14 14:15:20 +04:00
Roman Arutyunyan
26e606a6bc HTTP/3: postponed session creation to init() callback.
Now the session object is assigned to c->data while ngx_http_connection_t
object is referenced by its http_connection field, similar to
ngx_http_v2_connection_t and ngx_http_request_t.

The change allows to eliminate v3_session field from ngx_http_connection_t.
The field was under NGX_HTTP_V3 macro, which was a source of binary
compatibility problems when nginx/module is build with/without HTTP/3 support.

Postponing is essential since c->data should retain the reference to
ngx_http_connection_t object throughout QUIC handshake, because SSL callbacks
ngx_http_ssl_servername() and ngx_http_ssl_alpn_select() rely on this.
2023-09-14 14:13:43 +04:00
Roman Arutyunyan
6ecf576e34 QUIC: do not call shutdown() when handshake is in progress.
Instead, when worker is shutting down and handshake is not yet completed,
connection is terminated immediately.

Previously the callback could be called while QUIC handshake was in progress
and, what's more important, before the init() callback.  Now it's postponed
after init().

This change is a preparation to postponing HTTP/3 session creation to init().
2023-09-21 19:32:38 +04:00
Roman Arutyunyan
ec37134416 HTTP/3: moved variable initialization. 2023-09-13 17:57:13 +04:00
Roman Arutyunyan
33dca88792 QUIC: "handshake_timeout" configuration parameter.
Previously QUIC did not have such parameter and handshake duration was
controlled by HTTP/3.  However that required creating and storing HTTP/3
session on first client datagram.  Apparently there's no convenient way to
store the session object until QUIC handshake is complete.  In the followup
patches session creation will be postponed to init() callback.
2023-09-13 17:59:37 +04:00
Sergey Kandaurov
b489ba83e9 QUIC: removed use of SSL_quic_read_level and SSL_quic_write_level.
As explained in BoringSSL change[1], levels were introduced in the original
QUIC API to draw a line between when keys are released and when are active.
In the new QUIC API they are released in separate calls when it's needed.
BoringSSL has then a consideration to remove levels API, hence the change.

If not available e.g. from a QUIC packet header, levels can be taken based on
keys availability.  The only real use of levels is to prevent using app keys
before they are active in QuicTLS that provides the old BoringSSL QUIC API,
it is replaced with an equivalent check of c->ssl->handshaked.

This change also removes OpenSSL compat shims since they are no longer used.
The only exception left is caching write level from the keylog callback in
the internal field which is a handy equivalent of checking keys availability.

[1] https://boringssl.googlesource.com/boringssl/+/1e859054
2023-09-01 20:31:46 +04:00
Sergey Kandaurov
0d6ea58ebb QUIC: refined sending CONNECTION_CLOSE in various packet types.
As per RFC 9000, section 10.2.3, to ensure that peer successfully removed
packet protection, CONNECTION_CLOSE can be sent in multiple packets using
different packet protection levels.

Now it is sent in all protection levels available.
This roughly corresponds to the following paragraph:

* Prior to confirming the handshake, a peer might be unable to process 1-RTT
  packets, so an endpoint SHOULD send a CONNECTION_CLOSE frame in both Handshake
  and 1-RTT packets.  A server SHOULD also send a CONNECTION_CLOSE frame in an
  Initial packet.

In practice, this change allows to avoid sending an Initial packet when we know
the client has handshake keys, by checking if we have discarded initial keys.
Also, this fixes sending CONNECTION_CLOSE when using QuicTLS with old QUIC API,
where TLS stack releases application read keys before handshake confirmation;
it is fixed by sending CONNECTION_CLOSE additionally in a Handshake packet.
2023-09-01 20:31:46 +04:00
Maxim Dounin
fa46a57199 Upstream: fixed handling of Status headers without reason-phrase.
Status header with an empty reason-phrase, such as "Status: 404 ", is
valid per CGI specification, but looses the trailing space during parsing.
Currently, this results in "HTTP/1.1 404" HTTP status line in the response,
which violates HTTP specification due to missing trailing space.

With this change, only the status code is used from such short Status
header lines, so nginx will generate status line itself, with the space
and appropriate reason phrase if available.

Reported at:
https://mailman.nginx.org/pipermail/nginx/2023-August/EX7G4JUUHJWJE5UOAZMO5UD6OJILCYGX.html
2023-08-31 22:59:17 +03:00
Roman Arutyunyan
ba30ff4c8d QUIC: ignore path validation socket error (ticket #2532).
Previously, a socket error on a path being validated resulted in validation
error and subsequent QUIC connection closure.  Now the error is ignored and
path validation proceeds as usual, with several retries and a timeout.

When validating the old path after an apparent migration, that path may already
be unavailable and sendmsg() may return an error, which should not result in
QUIC connection close.

When validating the new path, it's possible that the new client address is
spoofed (See RFC 9000, 9.3.2. On-Path Address Spoofing).  This address may
as well be unavailable and should not trigger QUIC connection closure.
2023-08-31 10:54:07 +04:00
Roman Arutyunyan
1bc204a3a5 QUIC: use last client dcid to receive initial packets.
Previously, original dcid was used to receive initial client packets in case
server initial response was lost.  However, last dcid should be used instead.
These two are the same unless retry is used.  In case of retry, client resends
initial packet with a new dcid, that is different from the original dcid.  If
server response is lost, the client resends this packet again with the same
dcid.  This is shown in RFC 9000, 7.3. Authenticating Connection IDs, Figure 8.

The issue manifested itself with creating multiple server sessions in response
to each post-retry client initial packet, if server response is lost.
2023-08-30 11:09:21 +04:00
Sergey Kandaurov
24f3cb795e QUIC: posted generating TLS Key Update next keys.
Since at least f9fbeb4ee0de and certainly after 924882f42dea, which
TLS Key Update support predates, queued data output is deferred to a
posted push handler.  To address timing signals after these changes,
generating next keys is now posted to run after the push handler.
2023-08-25 13:51:38 +04:00
Sergey Kandaurov
f42519ff54 Version bump. 2023-08-25 16:39:14 +04:00
Roman Arutyunyan
eeb8a9f56f QUIC: path MTU discovery.
MTU selection starts by doubling the initial MTU until the first failure.
Then binary search is used to find the path MTU.
2023-08-14 09:21:27 +04:00
Roman Arutyunyan
58fc5e2830 QUIC: allowed ngx_quic_frame_sendto() to return NGX_AGAIN.
Previously, NGX_AGAIN returned by ngx_quic_send() was treated by
ngx_quic_frame_sendto() as error, which triggered errors in its callers.
However, a blocked socket is not an error.  Now NGX_AGAIN is passed as is to
the ngx_quic_frame_sendto() callers, which can safely ignore it.
2023-08-08 10:43:17 +04:00
Roman Arutyunyan
8ab3889073 QUIC: removed explicit packet padding for certain frames.
The frames for which the padding is removed are PATH_CHALLENGE and
PATH_RESPONSE, which are sent separately by ngx_quic_frame_sendto().
2023-07-06 11:30:47 +04:00
Roman Arutyunyan
3990aaaa55 QUIC: removed path->limited flag.
Its value is the opposite of path->validated.
2023-07-06 17:49:01 +04:00
Roman Arutyunyan
4f3707c5c7 QUIC: fixed probe-congestion deadlock.
When probe timeout expired while congestion window was exhausted, probe PINGs
could not be sent.  As a result, lost packets could not be declared lost and
congestion window could not be freed for new packets.  This deadlock
continued until connection idle timeout expiration.

Now PINGs are sent separately from the frame queue without congestion control,
as specified by RFC 9002, Section 7:

  An endpoint MUST NOT send a packet if it would cause bytes_in_flight
  (see Appendix B.2) to be larger than the congestion window, unless the
  packet is sent on a PTO timer expiration (see Section 6.2) or when entering
  recovery (see Section 7.3.2).
2023-08-14 08:28:30 +04:00
Roman Arutyunyan
842a930b88 QUIC: fixed PTO expiration condition.
Previously, PTO handler analyzed the first packet in the sent queue for the
timeout expiration.  However, the last sent packet should be analyzed instead.
An example is timeout calculation in ngx_quic_set_lost_timer().
2023-08-01 11:21:59 +04:00
Roman Arutyunyan
57f87d6163 QUIC: avoid accessing freed frame.
Previously the field pnum of a potentially freed frame was accessed.  Now the
value is copied to a local variable.  The old behavior did not cause any
problems since the frame memory is not freed, but is moved to a free queue
instead.
2023-08-01 11:20:04 +04:00
Roman Arutyunyan
968293d5e7 QUIC: fixed congesion control in GSO mode.
In non-GSO mode, a datagram is sent if congestion window is not exceeded by the
time of send.  The window could be exceeded by a small amount after the send.
In GSO mode, congestion window was checked in a similar way, but for all
concatenated datagrams as a whole.  This could result in exceeding congestion
window by a lot.  Now congestion window is checked for every datagram in GSO
mode as well.
2023-07-27 13:35:42 +04:00
Roman Arutyunyan
6f5f17358e QUIC: always add ACK frame to the queue head.
Previously it was added to the tail as all other frames.  However, if the
amount of queued data is large, it could delay the delivery of ACK, which
could trigger frames retransmissions and slow down the connection.
2023-08-10 20:11:29 +04:00
Roman Arutyunyan
6e60e21ac0 QUIC: optimized ACK delay.
Previously ACK was not generated if max_ack_delay was not yet expired and the
number of unacknowledged ack-eliciting packets was less than two, as allowed by
RFC 9000 13.2.1-13.2.2.  However this only makes sense to avoid sending ACK-only
packets, as explained by the RFC:

  On the other hand, reducing the frequency of packets that carry only
  acknowledgments reduces packet transmission and processing cost at both
  endpoints.

Now ACK is delayed only if output frame queue is empty.  Otherwise ACK is sent
immediately, which significantly improves QUIC performance with certain tests.
2023-07-27 16:37:17 +04:00
Maxim Dounin
bdea5b703f SSL: avoid using OpenSSL config in build directory (ticket #2404).
With this change, the NGX_OPENSSL_NO_CONFIG macro is defined when nginx
is asked to build OpenSSL itself.  And with this macro automatic loading
of OpenSSL configuration (from the build directory) is prevented unless
the OPENSSL_CONF environment variable is explicitly set.

Note that not loading configuration is broken in OpenSSL 1.1.1 and 1.1.1a
(fixed in OpenSSL 1.1.1b, see https://github.com/openssl/openssl/issues/7350).
If nginx is used to compile these OpenSSL versions, configuring nginx with
NGX_OPENSSL_NO_CONFIG explicitly set to 0 might be used as a workaround.
2023-06-21 01:29:53 +03:00
Maxim Dounin
2038b46e25 SSL: provided "nginx" appname when loading OpenSSL configs.
Following OpenSSL 0.9.8f, OpenSSL tries to load application-specific
configuration section first, and then falls back to the "openssl_conf"
default section if application-specific section is not found, by using
CONF_modules_load_file(CONF_MFLAGS_DEFAULT_SECTION).  Therefore this
change is not expected to introduce any compatibility issues with existing
configurations.  It does, however, make it easier to configure specific
OpenSSL settings for nginx in system-wide OpenSSL configuration
(ticket #2449).

Instead of checking OPENSSL_VERSION_NUMBER when using the OPENSSL_init_ssl()
interface, the code now tests for OPENSSL_INIT_LOAD_CONFIG to be defined and
true, and also explicitly excludes LibreSSL.  This ensures that this interface
is not used with BoringSSL and LibreSSL, which do not provide additional
library initialization settings, notably the OPENSSL_INIT_set_config_appname()
call.
2023-06-21 01:29:55 +03:00
Maxim Dounin
9e1a000f2b Core: fixed environment variables on exit.
Similarly to 6822:c045b4926b2c, environment variables introduced with
the "env" directive (and "NGINX_BPF_MAPS" added by QUIC) are now allocated
via ngx_alloc(), and explicitly freed by a cleanup handler if no longer used.

In collaboration with Sergey Kandaurov.
2023-07-19 05:09:23 +03:00
Sergey Kandaurov
4d3a9cc11f HTTP/3: fixed $body_bytes_sent. 2023-07-12 15:27:35 +04:00
Roman Arutyunyan
6bdfd58f26 QUIC: use AEAD to encrypt address validation tokens.
Previously used AES256-CBC is now substituted with AES256-GCM.  Although there
seem to be no tangible consequences of token integrity loss.
2023-06-08 14:58:01 +04:00
Sergey Kandaurov
b051754d9d QUIC: removed TLS1_3_CK_* macros wrap up.
They were preserved in 172705615d04 to ease transition from older BoringSSL.
2023-06-16 17:13:29 +04:00
Sergey Kandaurov
dc9017ee48 QUIC: style. 2023-06-20 17:59:02 +04:00
Sergey Kandaurov
68690b0345 QUIC: unified ngx_quic_tls_open() and ngx_quic_tls_seal(). 2023-06-20 17:59:01 +04:00
Roman Arutyunyan
69826dd4f7 QUIC: TLS_AES_128_CCM_SHA256 cipher suite support. 2023-06-20 16:10:49 +04:00
Roman Arutyunyan
3ee1051912 QUIC: common cipher control constants instead of GCM-related.
The constants are used for both GCM and CHACHAPOLY.
2023-06-09 10:23:22 +04:00
Roman Arutyunyan
58c11ee714 QUIC: a new constant for AEAD tag length.
Previously used constant EVP_GCM_TLS_TAG_LEN had misleading name since it was
used not only with GCM, but also with CHACHAPOLY.  Now a new constant
NGX_QUIC_TAG_LEN introduced.  Luckily all AEAD algorithms used by QUIC have
the same tag length of 16.
2023-06-09 10:25:54 +04:00
Roman Arutyunyan
8b28fd7f26 Version bump. 2023-06-20 17:01:00 +04:00
Sergey Kandaurov
e3d2bd0a10 QUIC: fixed rttvar on subsequent RTT samples (ticket #2505).
Previously, computing rttvar used an updated smoothed_rtt value as per
RFC 9002, section 5.3, which appears to be specified in a wrong order.
A technical errata ID 7539 is reported.
2023-06-12 23:38:56 +04:00
Sergey Kandaurov
6915d2fb2e HTTP/2: removed server push (ticket #2432).
Although it has better implementation status than HTTP/3 server push,
it remains of limited use, with adoption numbers seen as negligible.
Per IETF 102 materials, server push was used only in 0.04% of sessions.
It was considered to be "difficult to use effectively" in RFC 9113.
Its use is further limited by badly matching to fetch/cache/connection
models in browsers, see related discussions linked from [1].

Server push was disabled in Chrome 106 [2].

The http2_push, http2_push_preload, and http2_max_concurrent_pushes
directives are made obsolete.  In particular, this essentially reverts
7201:641306096f5b and 7207:3d2b0b02bd3d.

[1] https://jakearchibald.com/2017/h2-push-tougher-than-i-thought/
[2] https://chromestatus.com/feature/6302414934114304
2023-06-08 16:56:46 +04:00
Roman Arutyunyan
d32f66f1e8 SSL: removed the "ssl" directive.
It has been deprecated since 7270:46c0c7ef4913 (1.15.0) in favour of
the "ssl" parameter of the "listen" directive, which has been available
since 2224:109849282793 (0.7.14).
2023-06-08 14:49:27 +04:00
Roman Arutyunyan
aefd862ab1 HTTP/2: "http2" directive.
The directive enables HTTP/2 in the current server.  The previous way to
enable HTTP/2 via "listen ... http2" is now deprecated.  The new approach
allows to share HTTP/2 and HTTP/0.9-1.1 on the same port.

For SSL connections, HTTP/2 is now selected by ALPN callback based on whether
the protocol is enabled in the virtual server chosen by SNI.  This however only
works since OpenSSL 1.0.2h, where ALPN callback is invoked after SNI callback.
For older versions of OpenSSL, HTTP/2 is enabled based on the default virtual
server configuration.

For plain TCP connections, HTTP/2 is now auto-detected by HTTP/2 preface, if
HTTP/2 is enabled in the default virtual server.  If preface is not matched,
HTTP/0.9-1.1 is assumed.
2023-05-16 16:30:08 +04:00
Roman Arutyunyan
cb70d5954c QUIC: fixed compat with ciphers other than AES128 (ticket #2500).
Previously, rec.level field was not uninitialized in SSL_provide_quic_data().
As a result, its value was always ssl_encryption_initial.  Later in
ngx_quic_ciphers() such level resulted in resetting the cipher to
TLS1_3_CK_AES_128_GCM_SHA256 and using AES128 to encrypt the packet.

Now the level is initialized and the right cipher is used.
2023-05-28 11:17:07 +04:00
Roman Arutyunyan
fddcc30e99 Version bump. 2023-05-29 15:03:31 +04:00
Sergey Kandaurov
05990c6bb0 QUIC: fixed OpenSSL compat layer with OpenSSL master branch.
The layer is enabled as a fallback if the QUIC support is configured and the
BoringSSL API wasn't detected, or when using the --with-openssl option, also
compatible with QuicTLS and LibreSSL.  For the latter, the layer is assumed
to be present if QUIC was requested, so it needs to be undefined to prevent
QUIC API redefinition as appropriate.

A previously used approach to test the TLSEXT_TYPE_quic_transport_parameters
macro doesn't work with OpenSSL 3.2 master branch where this macro appeared
with incompatible QUIC API.  To fix the build there, the test is revised to
pass only for QuicTLS and LibreSSL.
2023-05-23 00:45:18 +04:00
Roman Arutyunyan
5ac9da4577 QUIC: fixed post-close use-after-free.
Previously, ngx_quic_close_connection() could be called in a way that QUIC
connection was accessed after the call.  In most cases the connection is not
closed right away, but close timeout is scheduled.  However, it's not always
the case.  Also, if the close process started earlier for a different reason,
calling ngx_quic_close_connection() may actually close the connection.  The
connection object should not be accessed after that.

Now, when possible, return statement is added to eliminate post-close connection
object access.  In other places ngx_quic_close_connection() is substituted with
posting close event.

Also, the new way of closing connection in ngx_quic_stream_cleanup_handler()
fixes another problem in this function.  Previously it passed stream connection
instead of QUIC connection to ngx_quic_close_connection().  This could result
in incomplete connection shutdown.  One consequence of that could be that QUIC
streams were freed without shutting down their application contexts.  This could
result in another use-after-free.

Found by Coverity (CID 1530402).
2023-05-22 15:59:42 +04:00
Maxim Dounin
0400e3d5ce QUIC: better sockaddr initialization.
The qsock->sockaddr field is a ngx_sockaddr_t union, and therefore can hold
any sockaddr (and union members, such qsock->sockaddr.sockaddr, can be used
to access appropriate variant of the sockaddr).  It is better to set it via
qsock->sockaddr itself though, and not qsock->sockaddr.sockaddr, so static
analyzers won't complain about out-of-bounds access.

Prodded by Coverity (CID 1530403).
2023-05-21 04:38:45 +03:00
Roman Arutyunyan
4b02661748 Merged with the quic branch. 2023-05-19 21:46:36 +04:00
Roman Arutyunyan
e4edf78bac HTTP/3: removed server push support. 2023-05-12 10:02:10 +04:00
Roman Arutyunyan
0a3c796145 Common tree insert function for QUIC and UDP connections.
Previously, ngx_udp_rbtree_insert_value() was used for plain UDP and
ngx_quic_rbtree_insert_value() was used for QUIC.  Because of this it was
impossible to initialize connection tree in ngx_create_listening() since
this function is not aware what kind of listening it creates.

Now ngx_udp_rbtree_insert_value() is used for both QUIC and UDP.  To make
is possible, a generic key field is added to ngx_udp_connection_t.  It keeps
client address for UDP and connection ID for QUIC.
2023-05-14 12:30:11 +04:00
Roman Arutyunyan
779bfcff5f Stream: removed QUIC support. 2023-05-14 12:05:35 +04:00
Maxim Dounin
089d1f6530 QUIC: style. 2023-05-11 18:48:01 +03:00
Roman Arutyunyan
2ce3eeeeb7 HTTP/3: removed "http3" parameter of "listen" directive.
The parameter has been deprecated since c851a2ed5ce8.
2023-05-11 13:22:10 +04:00
Roman Arutyunyan
6cc803e713 QUIC: removed "quic_mtu" directive.
The directive used to set the value of the "max_udp_payload_size" transport
parameter.  According to RFC 9000, Section 18.2, the value specifies the size
of buffer for reading incoming datagrams:

    This limit does act as an additional constraint on datagram size in
    the same way as the path MTU, but it is a property of the endpoint
    and not the path; see Section 14. It is expected that this is the
    space an endpoint dedicates to holding incoming packets.

Current QUIC implementation uses the maximum possible buffer size (65527) for
reading datagrams.
2023-05-11 10:37:51 +04:00
Roman Arutyunyan
9ab5d15379 QUIC: resized input datagram buffer from 65535 to 65527.
The value of 65527 is the maximum permitted UDP payload size.
2023-05-11 09:49:34 +04:00
Roman Arutyunyan
885c488191 QUIC: keep stream sockaddr and addr_text constant.
HTTP and Stream variables $remote_addr and $binary_remote_addr rely on
constant client address, particularly because they are cacheable.
However, QUIC client may migrate to a new address.  While there's no perfect
way to handle this, the proposed solution is to copy client address to QUIC
stream at stream creation.

The change also fixes truncated $remote_addr if migration happened while the
stream was active.  The reason is addr_text string was copied to stream by
value.
2023-05-11 19:40:11 +04:00
Sergey Kandaurov
1a8ef991d9 Variables: avoid possible buffer overrun with some "$sent_http_*".
The existing logic to evaluate multi header "$sent_http_*" variables,
such as $sent_http_cache_control, as previously introduced in 1.23.0,
doesn't take into account that one or more elements can be cleared,
yet still present in a linked list, pointed to by the next field.
Such elements don't contribute to the resulting variable length, an
attempt to append a separator for them ends up in out of bounds write.

This is not possible with standard modules, though at least one third
party module is known to override multi header values this way, so it
makes sense to harden the logic.

The fix restores a generic boundary check.
2023-05-01 19:16:05 +04:00
Roman Arutyunyan
a4319bc496 QUIC: set c->socklen for streams.
Previously, the value was not set and remained zero.  While in nginx code the
value of c->sockaddr is accessed without taking c->socklen into account,
invalid c->socklen could lead to unexpected results in third-party modules.
2023-04-27 19:49:05 +04:00
Roman Arutyunyan
906e3b5dca QUIC: fixed addr_text after migration (ticket #2488).
Previously, the post-migration value of addr_text could be truncated, if
it was longer than the previous one.  Also, the new value always included
port, which should not be there.
2023-04-27 19:52:40 +04:00
Sergey Kandaurov
12fa86dd92 QUIC: reschedule path validation on path insertion/removal.
Two issues fixed:
- new path validation could be scheduled late
- a validated path could leave a spurious timer
2023-05-09 19:42:40 +04:00
Sergey Kandaurov
894679804b QUIC: lower bound path validation PTO.
According to RFC 9000, 8.2.4. Failed Path Validation,
the following value is recommended as a validation timeout:

  A value of three times the larger of the current PTO
  or the PTO for the new path (using kInitialRtt, as
  defined in [QUIC-RECOVERY]) is RECOMMENDED.

The change adds PTO of the new path to the equation as the lower bound.
2023-05-09 19:42:40 +04:00
Sergey Kandaurov
f706744165 QUIC: separated path validation retransmit backoff.
Path validation packets containing PATH_CHALLENGE frames are sent separately
from regular frame queue, because of the need to use a decicated path and pad
the packets.  The packets are sent periodically, separately from the regular
probe/lost detection mechanism.  A path validation packet is resent up to 3
times, each time after PTO expiration, with increasing per-path PTO backoff.
2023-05-09 19:42:39 +04:00
Sergey Kandaurov
f0537cf17c QUIC: removed check for in-flight packets in computing PTO.
The check is needed for clients in order to unblock a server due to
anti-amplification limits, and it seems to make no sense for servers.
See RFC 9002, A.6 and A.8 for a further explanation.

This makes max_ack_delay to now always account, notably including
PATH_CHALLENGE timers as noted in the last paragraph of 9000, 9.4,
unlike when it was only used when there are packets in flight.

While here, fixed nearby style.
2023-05-09 19:42:38 +04:00
Roman Arutyunyan
1465a34067 QUIC: disabled datagram fragmentation.
As per RFC 9000, Section 14:

  UDP datagrams MUST NOT be fragmented at the IP layer.
2023-05-06 16:23:27 +04:00
Roman Arutyunyan
13f81f9b88 QUIC: fixed encryption level in ngx_quic_frame_sendto().
Previously, ssl_encryption_application was hardcoded.  Before 9553eea74f2a,
ngx_quic_frame_sendto() was used only for PATH_CHALLENGE/PATH_RESPONSE sent
at the application level only.  Since 9553eea74f2a, ngx_quic_frame_sendto()
is also used for CONNECTION_CLOSE, which can be sent at initial level after
SSL handshake error or rejection.  This resulted in packet encryption error.
Now level is copied from frame, which fixes the error.
2023-05-04 19:29:34 +04:00
Roman Arutyunyan
2187e5e1d9 QUIC: optimized immediate close.
Previously, before sending CONNECTION_CLOSE to client, all pending frames
were sent.  This is redundant and could prevent CONNECTION_CLOSE from being
sent due to congestion control.  Now pending frames are freed and
CONNECTION_CLOSE is sent without congestion control, as advised by RFC 9002:

  Packets containing frames besides ACK or CONNECTION_CLOSE frames
  count toward congestion control limits and are considered to be in flight.
2023-05-02 17:54:53 +04:00
Sergey Kandaurov
af18ce3506 QUIC: fixed split frames error handling.
Do not corrupt frame data chain pointer on ngx_quic_read_buffer() error.
The error leads to closing a QUIC connection where the frame may be used
as part of the QUIC connection tear down, which envolves writing pending
frames, including this one.
2023-05-04 15:52:23 +04:00
Sergey Kandaurov
ea51d2fce8 HTTP/3: fixed ngx_http_v3_init_session() error handling.
A QUIC connection is not usable yet at this early stage of spin up.
2023-05-04 15:52:22 +04:00
Maxim Dounin
25c546ac37 Fixed segfault if regex studies list allocation fails.
The rcf->studies list is unconditionally accessed by ngx_regex_cleanup(),
and this used to cause NULL pointer dereference if allocation
failed.  Fix is to set cleanup handler only when allocation succeeds.
2023-04-18 06:28:46 +03:00
Sergey Kandaurov
c2f2b313a7 Version bump. 2023-04-17 14:06:43 +04:00
Roman Arutyunyan
cc00acfe74 Stream: allow waiting on a blocked QUIC stream (ticket #2479).
Previously, waiting on a shared connection was not allowed, because the only
type of such connection was plain UDP.  However, QUIC stream connections are
also shared since they share socket descriptor with the listen connection.
Meanwhile, it's perfectly normal to wait on such connections.

The issue manifested itself with stream write errors when the amount of data
exceeded stream buffer size or flow control.  Now no error is triggered
and Stream write module is allowed to wait for buffer space to become available.
2023-04-06 15:39:48 +04:00
Sergey Kandaurov
ba15b2af1b HTTP/3: fixed CANCEL_PUSH handling. 2023-04-06 18:18:41 +04:00
Roman Arutyunyan
c136324721 QUIC: optimized sending stream response.
When a stream is created by client, it's often the case that nginx will send
immediate response on that stream.  An example is HTTP/3 request stream, which
in most cases quickly replies with at least HTTP headers.

QUIC stream init handlers are called from a posted event.  Output QUIC
frames are also sent to client from a posted event, called the push event.
If the push event is posted before the stream init event, then output produced
by stream may trigger sending an extra UDP datagram.  To address this, push
event is now re-posted when a new stream init event is posted.

An example is handling 0-RTT packets.  Client typically sends an init packet
coalesced with a 0-RTT packet.  Previously, nginx replied with a padded CRYPTO
datagram, followed by a 1-RTT stream reply datagram.  Now CRYPTO and STREAM
packets are coalesced in one reply datagram, which saves bandwidth.

Other examples include coalescing 1-RTT first stream response, and
MAX_STREAMS/STREAM sent in response to ACK/STREAM.
2023-04-03 16:17:12 +04:00
Sergey Kandaurov
e8fbc96747 Merged with the default branch. 2023-03-29 11:14:25 +04:00
Maxim Dounin
87471918b2 Gzip: compatibility with recent zlib-ng versions.
It now uses custom alloc_aligned() wrapper for all allocations,
therefore all allocations are larger than expected by (64 + sizeof(void*)).
Further, they are seen as allocations of 1 element.  Relevant calculations
were adjusted to reflect this, and state allocation is now protected
with a flag to avoid misinterpreting other allocations as the zlib
deflate_state allocation.

Further, it no longer forces window bits to 13 on compression level 1,
so the comment was adjusted to reflect this.
2023-03-27 21:25:05 +03:00
Maxim Dounin
7b24b93d67 SSL: enabled TLSv1.3 by default. 2023-03-24 02:57:43 +03:00
Maxim Dounin
2ca4355bf0 Mail: fixed handling of blocked client read events in proxy.
When establishing a connection to the backend, nginx blocks reading
from the client with ngx_mail_proxy_block_read().  Previously, such
events were lost, and in some cases this resulted in connection hangs.

Notably, this affected mail_imap_ssl.t on Windows, since the test
closes connections after requesting authentication, but without
waiting for any responses (so the connection close events might be
lost).

Fix is to post an event to read from the client after connecting to
the backend if there were blocked events.
2023-03-24 02:53:21 +03:00
Roman Arutyunyan
25d8ab363b QUIC: style. 2023-03-15 19:57:15 +04:00
Sergey Kandaurov
4d472cd792 HTTP/3: fixed OpenSSL compatibility layer initialization.
SSL context is not present if the default server has neither certificates nor
ssl_reject_handshake enabled.  Previously, this led to null pointer dereference
before it would be caught with configuration checks.

Additionally, non-default servers with distinct SSL contexts need to initialize
compatibility layer in order to complete a QUIC handshake.
2023-03-24 19:49:50 +04:00
Maxim Dounin
11ed95bb53 Syslog: introduced error log handler.
This ensures that errors which happen during logging to syslog are logged
with proper context, such as "while logging to syslog" and the server name.

Prodded by Safar Safarly.
2023-03-10 07:43:50 +03:00
Maxim Dounin
853912986d Syslog: removed usage of ngx_cycle->log and ngx_cycle->hostname.
During initial startup the ngx_cycle->hostname is not available, and
previously this resulted in incorrect logging.  Instead, hostname from the
configuration being parsed is now preserved in the syslog peer structure
and then used during logging.

Similarly, ngx_cycle->log might not match the configuration where the
syslog peer is defined if the configuration is not yet fully applied,
and previously this resulted in unexpected logging of syslog errors
and debug information.  Instead, cf->cycle->new_log is now referenced
in the syslog peer structure and used for logging, similarly to how it
is done in other modules.
2023-03-10 07:43:40 +03:00
Maxim Dounin
ff9e426337 HTTP/2: finalize request as bad if header validation fails.
Similarly to 7192:d5a535774861, this avoids spurious zero statuses
in access.log, and in line with other header-related errors.
2023-03-10 06:47:53 +03:00
Maxim Dounin
3c949f7c40 HTTP/2: socket leak with "return 444" in error_page (ticket #2455).
Similarly to ticket #274 (7354:1812f1d79d84), early request finalization
without calling ngx_http_run_posted_requests() resulted in a connection
hang (a socket leak) if the 400 (Bad Request) error was generated in
ngx_http_v2_state_process_header() due to invalid request headers and
"return 444" was used in error_page 400.
2023-03-10 06:47:48 +03:00
Maxim Dounin
5c480f9173 SSL: logging levels of errors observed with BoringSSL.
As tested with tlsfuzzer with BoringSSL, the following errors are
certainly client-related:

SSL_do_handshake() failed (SSL: error:10000066:SSL routines:OPENSSL_internal:BAD_ALERT)
SSL_do_handshake() failed (SSL: error:10000089:SSL routines:OPENSSL_internal:DECODE_ERROR)
SSL_do_handshake() failed (SSL: error:100000dc:SSL routines:OPENSSL_internal:TOO_MANY_WARNING_ALERTS)
SSL_do_handshake() failed (SSL: error:10000100:SSL routines:OPENSSL_internal:INVALID_COMPRESSION_LIST)
SSL_do_handshake() failed (SSL: error:10000102:SSL routines:OPENSSL_internal:MISSING_KEY_SHARE)
SSL_do_handshake() failed (SSL: error:1000010e:SSL routines:OPENSSL_internal:TOO_MUCH_SKIPPED_EARLY_DATA)
SSL_read() failed (SSL: error:100000b6:SSL routines:OPENSSL_internal:NO_RENEGOTIATION)

Accordingly, the SSL_R_BAD_ALERT, SSL_R_DECODE_ERROR,
SSL_R_TOO_MANY_WARNING_ALERTS, SSL_R_INVALID_COMPRESSION_LIST,
SSL_R_MISSING_KEY_SHARE, SSL_R_TOO_MUCH_SKIPPED_EARLY_DATA,
and SSL_R_NO_RENEGOTIATION errors are now logged at the "info" level.
2023-03-08 22:22:47 +03:00
Maxim Dounin
13987c88c3 SSL: logging levels of errors observed with tlsfuzzer and LibreSSL.
As tested with tlsfuzzer with LibreSSL 3.7.0, the following errors are
certainly client-related:

SSL_do_handshake() failed (SSL: error:14026073:SSL routines:ACCEPT_SR_CLNT_HELLO:bad packet length)
SSL_do_handshake() failed (SSL: error:1402612C:SSL routines:ACCEPT_SR_CLNT_HELLO:ssl3 session id too long)
SSL_do_handshake() failed (SSL: error:140380EA:SSL routines:ACCEPT_SR_KEY_EXCH:tls rsa encrypted value length is wrong)

Accordingly, the SSL_R_BAD_PACKET_LENGTH ("bad packet length"),
SSL_R_SSL3_SESSION_ID_TOO_LONG ("ssl3 session id too long"),
SSL_R_TLS_RSA_ENCRYPTED_VALUE_LENGTH_IS_WRONG ("tls rsa encrypted value
length is wrong") errors are now logged at the "info" level.
2023-03-08 22:22:34 +03:00
Maxim Dounin
a3a94f7534 SSL: logging levels of various errors reported with tlsfuzzer.
To further differentiate client-related errors and adjust logging levels
of various SSL errors, nginx was tested with tlsfuzzer with multiple
OpenSSL versions (3.1.0-beta1, 3.0.8, 1.1.1t, 1.1.0l, 1.0.2u, 1.0.1u,
1.0.0s, 0.9.8zh).

The following errors were observed during tlsfuzzer runs with OpenSSL 3.0.8,
and are clearly client-related:

SSL_do_handshake() failed (SSL: error:0A000092:SSL routines::data length too long)
SSL_do_handshake() failed (SSL: error:0A0000A0:SSL routines::length too short)
SSL_do_handshake() failed (SSL: error:0A000124:SSL routines::bad legacy version)
SSL_do_handshake() failed (SSL: error:0A000178:SSL routines::no shared signature algorithms)

Accordingly, the SSL_R_DATA_LENGTH_TOO_LONG ("data length too long"),
SSL_R_LENGTH_TOO_SHORT ("length too short"), SSL_R_BAD_LEGACY_VERSION
("bad legacy version"), and SSL_R_NO_SHARED_SIGNATURE_ALGORITHMS
("no shared signature algorithms", misspelled as "sigature" in OpenSSL 1.0.2)
errors are now logged at the "info" level.

Additionally, the following errors were observed with OpenSSL 3.0.8 and
with TLSv1.3 enabled:

SSL_do_handshake() failed (SSL: error:0A00006F:SSL routines::bad digest length)
SSL_do_handshake() failed (SSL: error:0A000070:SSL routines::missing sigalgs extension)
SSL_do_handshake() failed (SSL: error:0A000096:SSL routines::encrypted length too long)
SSL_do_handshake() failed (SSL: error:0A00010F:SSL routines::bad length)
SSL_read() failed (SSL: error:0A00007A:SSL routines::bad key update)
SSL_read() failed (SSL: error:0A000125:SSL routines::mixed handshake and non handshake data)

Accordingly, the SSL_R_BAD_DIGEST_LENGTH ("bad digest length"),
SSL_R_MISSING_SIGALGS_EXTENSION ("missing sigalgs extension"),
SSL_R_ENCRYPTED_LENGTH_TOO_LONG ("encrypted length too long"),
SSL_R_BAD_LENGTH ("bad length"), SSL_R_BAD_KEY_UPDATE ("bad key update"),
and SSL_R_MIXED_HANDSHAKE_AND_NON_HANDSHAKE_DATA ("mixed handshake and non
handshake data") errors are now logged at the "info" level.

Additionally, the following errors were observed with OpenSSL 1.1.1t:

SSL_do_handshake() failed (SSL: error:14094091:SSL routines:ssl3_read_bytes:data between ccs and finished)
SSL_do_handshake() failed (SSL: error:14094199:SSL routines:ssl3_read_bytes:too many warn alerts)
SSL_read() failed (SSL: error:1408F0C6:SSL routines:ssl3_get_record:packet length too long)
SSL_read() failed (SSL: error:14094085:SSL routines:ssl3_read_bytes:ccs received early)

Accordingly, the SSL_R_CCS_RECEIVED_EARLY ("ccs received early"),
SSL_R_DATA_BETWEEN_CCS_AND_FINISHED ("data between ccs and finished"),
SSL_R_PACKET_LENGTH_TOO_LONG ("packet length too long"), and
SSL_R_TOO_MANY_WARN_ALERTS ("too many warn alerts") errors are now logged
at the "info" level.

Additionally, the following errors were observed with OpenSSL 1.0.2u:

SSL_do_handshake() failed (SSL: error:1407612A:SSL routines:SSL23_GET_CLIENT_HELLO:record too small)
SSL_do_handshake() failed (SSL: error:1408C09A:SSL routines:ssl3_get_finished:got a fin before a ccs)

Accordingly, the SSL_R_RECORD_TOO_SMALL ("record too small") and
SSL_R_GOT_A_FIN_BEFORE_A_CCS ("got a fin before a ccs") errors are now
logged at the "info" level.

No additional client-related errors were observed while testing with
OpenSSL 3.1.0-beta1, OpenSSL 1.1.0l, OpenSSL 1.0.1u, OpenSSL 1.0.0s,
and OpenSSL 0.9.8zh.
2023-03-08 22:21:59 +03:00
Maxim Dounin
a976e6b9ef SSL: switched to detect log level based on the last error.
In some cases there might be multiple errors in the OpenSSL error queue,
notably when a libcrypto call fails, and then the SSL layer generates
an error itself.  For example, the following errors were observed
with OpenSSL 3.0.8 with TLSv1.3 enabled:

SSL_do_handshake() failed (SSL: error:02800066:Diffie-Hellman routines::invalid public key error:0A000132:SSL routines::bad ecpoint)
SSL_do_handshake() failed (SSL: error:08000066:elliptic curve routines::invalid encoding error:0A000132:SSL routines::bad ecpoint)
SSL_do_handshake() failed (SSL: error:0800006B:elliptic curve routines::point is not on curve error:0A000132:SSL routines::bad ecpoint)

In such cases it seems to be better to determine logging level based on
the last error in the error queue (the one added by the SSL layer,
SSL_R_BAD_ECPOINT in all of the above example example errors).  To do so,
the ngx_ssl_connection_error() function was changed to use
ERR_peek_last_error().
2023-03-08 22:21:53 +03:00
Yugo Horie
2c5fccd469 Core: stricter UTF-8 handling in ngx_utf8_decode().
An UTF-8 octet sequence cannot start with a 11111xxx byte (above 0xf8),
see https://datatracker.ietf.org/doc/html/rfc3629#section-3.  Previously,
such bytes were accepted by ngx_utf8_decode() and misinterpreted as 11110xxx
bytes (as in a 4-byte sequence).  While unlikely, this can potentially cause
issues.

Fix is to explicitly reject such bytes in ngx_utf8_decode().
2023-02-23 08:09:50 +09:00
Maxim Dounin
4ace957c4e Win32: non-ASCII names in ngx_fs_bsize(), ngx_fs_available().
This fixes potentially incorrect cache size calculations and non-working
"min_free" when using cache in directories with non-ASCII names.
2023-02-23 20:50:03 +03:00
Maxim Dounin
6c5fe80bc6 Win32: removed attempt to use a drive letter in ngx_fs_bsize().
Just a drive letter might not correctly represent file system being used,
notably when using symlinks (as created by "mklink /d").  As such, instead
of trying to call GetDiskFreeSpace() with just a drive letter, we now always
use GetDiskFreeSpace() with full path.

Further, it looks like the code to use just a drive letter never worked,
since it tried to test name[2] instead of name[1] to be ':'.
2023-02-23 20:50:00 +03:00
Maxim Dounin
2062ddef39 Win32: non-ASCII names support in ngx_open_tempfile().
This makes it possible to use temporary directories with non-ASCII characters,
either explicitly or via a prefix with non-ASCII characters in it.
2023-02-23 20:49:57 +03:00
Maxim Dounin
4d84bc4929 Win32: non-ASCII names support in ngx_rename_file().
This makes it possible to upload files with non-ASCII characters
when using the dav module (ticket #1433).
2023-02-23 20:49:55 +03:00
Maxim Dounin
1a9e5c8376 Win32: non-ASCII names support in ngx_delete_file().
This makes it possible to delete files with non-ASCII characters
when using the dav module (ticket #1433).
2023-02-23 20:49:54 +03:00
Maxim Dounin
dc4957485e Win32: reworked ngx_win32_rename_file() to use nginx wrappers.
This ensures that ngx_win32_rename_file() will support non-ASCII names
when supported by the wrappers.

Notably, this is used by PUT requests in the dav module when overwriting
existing files with non-ASCII names (ticket #1433).
2023-02-23 20:49:52 +03:00
Maxim Dounin
94d8cea620 Win32: reworked ngx_win32_rename_file() to check errors.
Previously, ngx_win32_rename_file() retried on all errors returned by
MoveFile() to a temporary name.  It only make sense, however, to retry
when the destination file already exists, similarly to the condition
when ngx_win32_rename_file() is called.  Retrying on other errors is
meaningless and might result in an infinite loop.
2023-02-23 20:49:50 +03:00
Maxim Dounin
f8075f1ef5 Win32: non-ASCII directory names support in ngx_delete_dir().
This makes it possible to delete directories with non-ASCII characters
when using the dav module (ticket #1433).
2023-02-23 20:49:47 +03:00
Maxim Dounin
89719dc5c1 Win32: non-ASCII directory names support in ngx_create_dir().
This makes it possible to create directories under prefix with non-ASCII
characters, as well as makes it possible to create directories with non-ASCII
characters when using the dav module (ticket #1433).

To ensure that the dav module operations are restricted similarly to
other file operations (in particular, short names are not allowed), the
ngx_win32_check_filename() function is used.  It improved to support
checking of just dirname, and now can be used to check paths when creating
files or directories.
2023-02-23 20:49:45 +03:00
Maxim Dounin
1edc23cc84 Win32: non-ASCII directory names support in ngx_getcwd().
This makes it possible to start nginx without a prefix explicitly set
in a directory with non-ASCII characters in it.
2023-02-23 20:49:44 +03:00
Maxim Dounin
99d5ad72a4 Win32: non-ASCII names support in "include" with wildcards.
Notably, ngx_open_glob() now supports opening directories with non-ASCII
characters, and pathnames returned by ngx_read_glob() are converted to UTF-8.
2023-02-23 20:49:41 +03:00
Maxim Dounin
1f8a66f199 Win32: non-ASCII names support in autoindex (ticket #458).
Notably, ngx_open_dir() now supports opening directories with non-ASCII
characters, and directory entries returned by ngx_read_dir() are properly
converted to UTF-8.
2023-02-23 20:49:39 +03:00
Maxim Dounin
2485681308 Lingering close for connections with pipelined requests.
This is expected to help with clients using pipelining with some constant
depth, such as apt[1][2].

When downloading many resources, apt uses pipelining with some constant
depth, a number of requests in flight.  This essentially means that after
receiving a response it sends an additional request to the server, and
this can result in requests arriving to the server at any time.  Further,
additional requests are sent one-by-one, and can be easily seen as such
(neither as pipelined, nor followed by pipelined requests).

The only safe approach to close such connections (for example, when
keepalive_requests is reached) is with lingering.  To do so, now nginx
monitors if pipelining was used on the connection, and if it was, closes
the connection with lingering.

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=973861#10
[2] https://mailman.nginx.org/pipermail/nginx-devel/2023-January/ZA2SP5SJU55LHEBCJMFDB2AZVELRLTHI.html
2023-02-02 23:38:48 +03:00
Maxim Dounin
384a8d8dfb Fixed "zero size buf" alerts with subrequests.
Since 4611:2b6cb7528409 responses from the gzip static, flv, and mp4 modules
can be used with subrequests, though empty files were not properly handled.
Empty gzipped, flv, and mp4 files thus resulted in "zero size buf in output"
alerts.  While valid corresponding files are not expected to be empty, such
files shouldn't result in alerts.

Fix is to set b->sync on such empty subrequest responses, similarly to what
ngx_http_send_special() does.

Additionally, the static module, the ngx_http_send_response() function, and
file cache are modified to do the same instead of not sending the response
body at all in such cases, since not sending the response body at all is
believed to be at least questionable, and might break various filters
which do not expect such behaviour.
2023-01-28 05:23:33 +03:00
Maxim Dounin
856a01860e Style. 2023-01-28 05:20:23 +03:00
Maxim Dounin
dad65f3e44 Added warning about redefinition of listen socket protocol options.
The "listen" directive in the http module can be used multiple times
in different server blocks.  Originally, it was supposed to be specified
once with various socket options, and without any parameters in virtual
server blocks.  For example:

    server { listen 80 backlog=1024; server_name foo; ... }
    server { listen 80; server_name bar; ... }
    server { listen 80; server_name bazz; ... }

The address part of the syntax ("address[:port]" / "port" / "unix:path")
uniquely identifies the listening socket, and therefore is enough for
name-based virtual servers (to let nginx know that the virtual server
accepts requests on the listening socket in question).

To ensure that listening options do not conflict between virtual servers,
they were allowed only once.  For example, the following configuration
will be rejected ("duplicate listen options for 0.0.0.0:80 in ..."):

    server { listen 80 backlog=1024; server_name foo; ... }
    server { listen 80 backlog=512; server_name bar; ... }

At some point it was, however, noticed, that it is sometimes convenient
to repeat some options for clarity.  In nginx 0.8.51 the "ssl" parameter
was allowed to be specified multiple times, e.g.:

    server { listen 443 ssl backlog=1024; server_name foo; ... }
    server { listen 443 ssl; server_name bar; ... }
    server { listen 443 ssl; server_name bazz; ... }

This approach makes configuration more readable, since SSL sockets are
immediately visible in the configuration.  If this is not needed, just the
address can still be used.

Later, additional protocol-specific options similar to "ssl" were
introduced, notably "http2" and "proxy_protocol".  With these options,
one can write:

    server { listen 443 ssl backlog=1024; server_name foo; ... }
    server { listen 443 http2; server_name bar; ... }
    server { listen 443 proxy_protocol; server_name bazz; ... }

The resulting socket will use ssl, http2, and proxy_protocol, but this
is not really obvious from the configuration.

To emphasize such misleading configurations are discouraged, nginx now
warns as long as the "listen" directive is used with options different
from the options previously used if this is potentially confusing.

In particular, the following configurations are allowed:

    server { listen 8401 ssl backlog=1024; server_name foo; }
    server { listen 8401 ssl; server_name bar; }
    server { listen 8401 ssl; server_name bazz; }

    server { listen 8402 ssl http2 backlog=1024; server_name foo; }
    server { listen 8402 ssl; server_name bar; }
    server { listen 8402 ssl; server_name bazz; }

    server { listen 8403 ssl; server_name bar; }
    server { listen 8403 ssl; server_name bazz; }
    server { listen 8403 ssl http2; server_name foo; }

    server { listen 8404 ssl http2 backlog=1024; server_name foo; }
    server { listen 8404 http2; server_name bar; }
    server { listen 8404 http2; server_name bazz; }

    server { listen 8405 ssl http2 backlog=1024; server_name foo; }
    server { listen 8405 ssl http2; server_name bar; }
    server { listen 8405 ssl http2; server_name bazz; }

    server { listen 8406 ssl; server_name foo; }
    server { listen 8406; server_name bar; }
    server { listen 8406; server_name bazz; }

And the following configurations will generate warnings:

    server { listen 8501 ssl http2 backlog=1024; server_name foo; }
    server { listen 8501 http2; server_name bar; }
    server { listen 8501 ssl; server_name bazz; }

    server { listen 8502 backlog=1024; server_name foo; }
    server { listen 8502 ssl; server_name bar; }

    server { listen 8503 ssl; server_name foo; }
    server { listen 8503 http2; server_name bar; }

    server { listen 8504 ssl; server_name foo; }
    server { listen 8504 http2; server_name bar; }
    server { listen 8504 proxy_protocol; server_name bazz; }

    server { listen 8505 ssl http2 proxy_protocol; server_name foo; }
    server { listen 8505 ssl http2; server_name bar; }
    server { listen 8505 ssl; server_name bazz; }

    server { listen 8506 ssl http2; server_name foo; }
    server { listen 8506 ssl; server_name bar; }
    server { listen 8506; server_name bazz; }

    server { listen 8507 ssl; server_name bar; }
    server { listen 8507; server_name bazz; }
    server { listen 8507 ssl http2; server_name foo; }

    server { listen 8508 ssl; server_name bar; }
    server { listen 8508; server_name bazz; }
    server { listen 8508 ssl backlog=1024; server_name foo; }

    server { listen 8509; server_name bazz; }
    server { listen 8509 ssl; server_name bar; }
    server { listen 8509 ssl backlog=1024; server_name foo; }

The basic idea is that at most two sets of protocol options are allowed:
the main one (with socket options, if any), and a shorter one, with options
being a subset of the main options, repeated for clarity.  As long as the
shorter set of protocol options is used, all listen directives except the
main one should use it.
2023-01-28 01:29:45 +03:00
Roman Arutyunyan
a5f9b45aee HTTP/3: trigger more compatibility errors for "listen quic".
Now "ssl", "proxy_protocol" and "http2" are not allowed with "quic" in "listen"
directive.  Previously, only "ssl" was not allowed.
2023-01-26 15:25:33 +04:00
Roman Arutyunyan
815ef96124 HTTP/3: "quic" parameter of "listen" directive.
Now "listen" directve has a new "quic" parameter which enables QUIC protocol
for the address.  Further, to enable HTTP/3, a new directive "http3" is
introduced.  The hq-interop protocol is enabled by "http3_hq" as before.
Now application protocol is chosen by ALPN.

Previously used "http3" parameter of "listen" is deprecated.
2023-02-27 14:00:56 +04:00
Roman Arutyunyan
a36ebf7e95 QUIC: OpenSSL compatibility layer.
The change allows to compile QUIC with OpenSSL which lacks BoringSSL QUIC API.

This implementation does not support 0-RTT.
2023-02-22 19:16:53 +04:00
Sergey Kandaurov
76adb91913 QUIC: improved ssl_reject_handshake error logging.
The check follows the ngx_ssl_handshake() change in 59e1c73fe02b.
2023-02-23 16:26:38 +04:00
Sergey Kandaurov
d9610b40a6 QUIC: using ngx_ssl_handshake_log(). 2023-02-23 16:17:29 +04:00
Sergey Kandaurov
5149620d6d QUIC: moved "handshake failed" reason to send_alert.
A QUIC handshake failure breaks down into several cases:
- a handshake error which leads to a send_alert call
- an error triggered by the add_handshake_data callback
- internal errors (allocation etc)

Previously, in the first case, only error code was set in the send_alert
callback.  Now the "handshake failed" reason phrase is set there as well.
In the second case, both code and reason are set by add_handshake_data.
In the last case, setting reason phrase is removed: returning NGX_ERROR
now leads to closing the connection with just INTERNAL_ERROR.

Reported by Jiuzhou Cui.
2023-02-23 16:16:56 +04:00
Sergey Kandaurov
1ccba18f00 QUIC: using NGX_QUIC_ERR_CRYPTO macro in ALPN checks.
Patch by Jiuzhou Cui.
2023-02-23 15:49:59 +04:00
Sergey Kandaurov
40af97fdec QUIC: fixed indentation. 2023-02-13 14:01:50 +04:00
Roman Arutyunyan
b7ccca0eb0 QUIC: fixed broken token in NEW_TOKEN (ticket #2446).
Previously, since 3550b00d9dc8, the token was allocated on stack, to get
rid of pool usage.  Now the token is allocated by ngx_quic_copy_buffer()
in QUIC buffers, also used for STREAM, CRYPTO and ACK frames.
2023-01-31 15:26:33 +04:00
Roman Arutyunyan
341c21c9f6 QUIC: ngx_quic_copy_buffer() function.
The function copies passed data to QUIC buffer chain and returns it.
The chain can be used in ngx_quic_frame_t data field.
2023-01-31 14:12:18 +04:00
Maxim Dounin
f20c6e0eb5 Fixed handling of very long locations (ticket #2435).
Previously, location prefix length in ngx_http_location_tree_node_t was
stored as "u_char", and therefore location prefixes longer than 255 bytes
were handled incorrectly.

Fix is to use "u_short" instead.  With "u_short", prefixes up to 65535 bytes
can be safely used, and this isn't reachable due to NGX_CONF_BUFFER, which
is 4096 bytes.
2023-01-26 03:34:44 +03:00
Maxim Dounin
5c18b5bc3f Gzip static: ranges support (ticket #2349).
In contrast to on-the-fly gzipping with gzip filter, static gzipped
representation as returned by gzip_static is persistent, and therefore
the same binary representation is available for future requests, making
it possible to use range requests.

Further, if a gzipped representation is re-generated with different
compression settings, it is expected to result in different ETag and
different size reported in the Content-Range header, making it possible
to safely use range requests anyway.

As such, ranges are now allowed for files returned by gzip_static.
2023-01-24 03:01:51 +03:00