Reworked connections reuse, so closing connections is attempted in
advance, as long as number of free connections is less than 1/16 of
worker connections configured. This ensures that new connections can
be handled even if closing a reusable connection requires some time,
for example, for a lingering close (ticket #2017).
The 1/16 ratio is selected to be smaller than 1/8 used for disabling
accept when working with accept mutex, so nginx will try to balance
new connections to different workers first, and will start reusing
connections only if this won't help.
Previously, reusing connections happened silently and was only
visible in monitoring systems. This was shown to be not very user-friendly,
and administrators often didn't realize there were too few connections
available to withstand the load, and configured timeouts (keepalive_timeout
and http2_idle_timeout) were effectively reduced to keep things running.
To provide at least some information about this, a warning is now logged
(at most once per second, to avoid flooding the logs).
Sending shutdown when ngx_http_test_reading() detects the connection is
closed can result in "SSL_shutdown() failed (SSL: ... bad write retry)"
critical log messages if there are blocked writes.
Fix is to avoid sending shutdown via the c->ssl->no_send_shutdown flag,
similarly to how it is done in ngx_http_keepalive_handler() for kqueue
when pending EOF is detected.
Reported by Jan Prachař
(http://mailman.nginx.org/pipermail/nginx-devel/2018-December/011702.html).
Without the flag, SSL shutdown is attempted on such connections,
resulting in useless work and/or bogus "SSL_shutdown() failed
(SSL: ... bad write retry)" critical log messages if there are
blocked writes.
Previously, bidirectional shutdown never worked, due to two issues
in the code:
1. The code only tested SSL_ERROR_WANT_READ and SSL_ERROR_WANT_WRITE
when there was an error in the error queue, which cannot happen.
The bug was introduced in an attempt to fix unexpected error logging
as reported with OpenSSL 0.9.8g
(http://mailman.nginx.org/pipermail/nginx/2008-January/003084.html).
2. The code never called SSL_shutdown() for the second time to wait for
the peer's close_notify alert.
This change fixes both issues.
Note that after this change bidirectional shutdown is expected to work for
the first time, so c->ssl->no_wait_shutdown now makes a difference. This
is not a problem for HTTP code which always uses c->ssl->no_wait_shutdown,
but might be a problem for stream and mail code, as well as 3rd party
modules.
To minimize the effect of the change, the timeout, which was used to be 30
seconds and not configurable, though never actually used, is now set to
3 seconds. It is also expanded to apply to both SSL_ERROR_WANT_READ and
SSL_ERROR_WANT_WRITE, so timeout is properly set if writing to the socket
buffer is not possible.
If some additional data from a pipelined request happens to be
read into the body buffer, we copy it to r->header_in or allocate
an additional large client header buffer for it.
This ensures that copying won't write more than the buffer size
even if the buffer comes from hc->free and it is smaller than the large
client header buffer size in the virtual host configuration. This might
happen if size of large client header buffers is different in name-based
virtual hosts, similarly to the problem with number of buffers fixed
in 6926:e662cbf1b932.
After 05e42236e95b (1.19.1) responses with extra data might result in
zero size buffers being generated and "zero size buf" alerts in writer
(if f->rest happened to be 0 when processing additional stdout data).
Previously, the document generated by the xslt filter was always fully sent
to client even if a range was requested and response status was 206 with
appropriate Content-Range.
The xslt module is unable to serve a range because of suspending the header
filter chain. By the moment full response xml is buffered by the xslt filter,
range header filter is not called yet, but the range body filter has already
been called and did nothing.
The fix is to disable ranges by resetting the r->allow_ranges flag much like
the image filter that employs a similar technique.
The slice filter allows ranges for the response by setting the r->allow_ranges
flag, which enables the range filter. If the range was not requested, the
range filter adds an Accept-Ranges header to the response to signal the
support for ranges.
Previously, if an Accept-Ranges header was already present in the first slice
response, client received two copies of this header. Now, the slice filter
removes the Accept-Ranges header from the response prior to setting the
r->allow_ranges flag.
As long as the "Content-Length" header is given, we now make sure
it exactly matches the size of the response. If it doesn't,
the response is considered malformed and must not be forwarded
(https://tools.ietf.org/html/rfc7540#section-8.1.2.6). While it
is not really possible to "not forward" the response which is already
being forwarded, we generate an error instead, which is the closest
equivalent.
Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Also this
directly contradicts HTTP/2 specification requirements.
Note that the new behaviour for the gRPC proxy is more strict than that
applied in other variants of proxying. This is intentional, as HTTP/2
specification requires us to do so, while in other types of proxying
malformed responses from backends are well known and historically
tolerated.
Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Fix is to
drop extra data instead, as it naturally happens in most clients.
Additionally, we now also issue a warning if the response is too
short, and make sure the fact it is truncated is propagated to the
client. The u->error flag is introduced to make it possible to
propagate the error to the client in case of unbuffered proxying.
For responses to HEAD requests there is an exception: we do allow
both responses without body and responses with body matching the
Content-Length header.
Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Fix is to
drop extra data instead, as it naturally happens in most clients.
This change covers generic buffered and unbuffered filters as used
in the scgi and uwsgi modules. Appropriate input filter init
handlers are provided by the scgi and uwsgi modules to set corresponding
lengths.
Note that for responses to HEAD requests there is an exception:
we do allow any response length. This is because responses to HEAD
requests might be actual full responses, and it is up to nginx
to remove the response body. If caching is enabled, only full
responses matching the Content-Length header will be cached
(see b779728b180c).
Previously, additional data after final chunk was either ignored
(in the same buffer, or during unbuffered proxying) or sent to the
client (in the next buffer already if it was already read from the
socket). Now additional data are properly detected and ignored
in all cases. Additionally, a warning is now logged and keepalive
is disabled in the connection.
Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Fix is to
drop extra data instead, as it naturally happens in most clients.
If a memcached response was followed by a correct trailer, and then
the NUL character followed by some extra data - this was accepted by
the trailer checking code. This in turn resulted in ctx->rest underflow
and caused negative size buffer on the next reading from the upstream,
followed by the "negative size buf in writer" alert.
Fix is to always check for too long responses, so a correct trailer cannot
be followed by extra data.
After sending the GOAWAY frame, a connection is now closed using
the lingering close mechanism.
This allows for the reliable delivery of the GOAWAY frames, while
also fixing connection resets observed when http2_max_requests is
reached (ticket #1250), or with graceful shutdown (ticket #1544),
when some additional data from the client is received on a fully
closed connection.
For HTTP/2, the settings lingering_close, lingering_timeout, and
lingering_time are taken from the "server" level.
Using SSL_CTX_set_verify(SSL_VERIFY_PEER) implies that OpenSSL will
send a certificate request during an SSL handshake, leading to unexpected
certificate requests from browsers as long as there are any client
certificates installed. Given that ngx_ssl_trusted_certificate()
is called unconditionally by the ngx_http_ssl_module, this affected
all HTTPS servers. Broken by 699f6e55bbb4 (not released yet).
Fix is to set verify callback in the ngx_ssl_trusted_certificate() function
without changing the verify mode.
Clearing cache based on free space left on a file system is
expected to allow better disk utilization in some cases, notably
when disk space might be also used for something other than nginx
cache (including nginx own temporary files) and while loading
cache (when cache size might be inaccurate for a while, effectively
disabling max_size cache clearing).
Based on a patch by Adam Bambuch.
With XFS, using "allocsize=64m" mount option results in large preallocation
being reported in the st_blocks as returned by fstat() till the file is
closed. This in turn results in incorrect cache size calculations and
wrong clearing based on max_size.
To avoid too aggressive cache clearing on such volumes, st_blocks values
which result in sizes larger than st_size and eight blocks (an arbitrary
limit) are no longer trusted, and we use st_size instead.
The ngx_de_fs_size() counterpart is intentionally not modified, as
it is used on closed files and hence not affected by this problem.
NFS on Linux is known to report wsize as a block size (in both f_bsize
and f_frsize, both in statfs() and statvfs()). On the other hand,
typical file system block sizes on Linux (ext2/ext3/ext4, XFS) are limited
to pagesize. (With FAT, block sizes can be at least up to 512k in
extreme cases, but this doesn't really matter, see below.)
To avoid too aggressive cache clearing on NFS volumes on Linux, block
sizes larger than pagesize are now ignored.
Note that it is safe to ignore large block sizes. Since 3899:e7cd13b7f759
(1.0.1) cache size is calculated based on fstat() st_blocks, and rounding
to file system block size is preserved mostly for Windows.
Note well that on other OSes valid block sizes seen are at least up
to 65536. In particular, UFS on FreeBSD is known to work well with block
and fragment sizes set to 65536.
When validating second and further certificates, ssl callback could be called
twice to report the error. After the first call client connection is
terminated and its memory is released. Prior to the second call and in it
released connection memory is accessed.
Errors triggering this behavior:
- failure to create the request
- failure to start resolving OCSP responder name
- failure to start connecting to the OCSP responder
The fix is to rearrange the code to eliminate the second call.
The flush flag was not set when forwarding the request body to the uwsgi
server. When using uwsgi_pass suwsgi://..., this causes the uwsgi server
to wait indefinitely for the request body and eventually time out due to
SSL buffering.
This is essentially the same change as 4009:3183165283cc, which was made
to ngx_http_proxy_module.c.
This will fix the uwsgi bug https://github.com/unbit/uwsgi/issues/1490.
This ensures that certificate verification is properly logged to debug
log during upstream server certificate verification. This should help
with debugging various certificate issues.
Listening UNIX sockets were not removed on graceful shutdown, preventing
the next runs. The fix is to replace the custom socket closing code in
ngx_master_process_cycle() by the ngx_close_listening_sockets() call.
When changing binary, sending a SIGTERM to the new binary's master process
should not remove inherited UNIX sockets unless the old binary's master
process has exited.
Previously, invalid connection preface errors were only logged at debug
level, providing no visible feedback, in particular, when a plain text
HTTP/2 listening socket is erroneously used for HTTP/1.x connections.
Now these are explicitly logged at the info level, much like other
client-related errors.
When enabled, certificate status is stored in cache and is used to validate
the certificate in future requests.
New directive ssl_ocsp_cache is added to configure the cache.
OCSP validation for client certificates is enabled by the "ssl_ocsp" directive.
OCSP responder can be optionally specified by "ssl_ocsp_responder".
When session is reused, peer chain is not available for validation.
If the verified chain contains certificates from the peer chain not available
at the server, validation will fail.
Previously only the first responder address was used per each stapling update.
Now, in case of a network or parsing error, next address is used.
This also fixes the issue with unsupported responder address families
(ticket #1330).