Changed the name of the unit tests CI job to accuratly reflect
that both AMD64 and ARM64 tests are executed.
Modified the contribution guide to specify that contributors
should work inside their own fork of the project.
Because veilid-server listens on 127.0.0.1 TCP port 5959, it is
potentially open to attacks from websites if a user runs an ordinary
web browser (e.g., Chrome or Firefox) on the same computer.
Specifically, any https website can include JavaScript code that
begins with
let message = 'WASTE_YOUR_VEILID_SERVER_DISK_SPACE_'.repeat(1000);
fetch('http://127.0.0.1:5959/' + message)
and the web browser will then send many KB of data to veilid-server,
where it may typically be logged to disk by this code:
2ab51ae3e9/veilid-core/src/veilid_api/serialize_helpers/serialize_json.rs (L6-L12)
(Because Veilid hasn't even reached 1.0, it's very common for users to
enable a large amount of logging.)
The threat model is that someone creates a website that's apparently
of interest to any Veilid user, but the actual purpose of the website
is to leverage the user's web browser to silently tunnel an attack
payload into another application that is local to the user. An attack
that sends more than 1 MB of data (for each fetch API call) is
feasible, and the patch in this MR tries to address that.
Note that the most common web browsers always allow JavaScript on
arbitrary https websites to send data to 127.0.0.1 port 5959, there is
no configuration menu to shut this off, and the user is not alerted
that this is occurring. Brave 1.54 (June 2023) was the first popular
web browser to block this:
https://brave.com/privacy-updates/27-localhost-permission/
This does not mean that an adversary can just as easily setup a
website to send:
{"op":"Control","args":["Shutdown"],"id":1}
to 127.0.0.1 TCP port 5959 and thereby terminate a veilid-server
process. A web browser using http will always send requests that begin
with specific strings (such as GET or OPTIONS) on the first line, and
the code at:
2ab51ae3e9/veilid-server/src/client_api.rs (L367)2ab51ae3e9/veilid-server/src/client_api.rs (L244)2ab51ae3e9/veilid-server/src/client_api.rs (L202)
seems to work together to ensure that no JSON object results in
command execution unless the first line of the input is a JSON object.
(Not sure if this was a design goal, or simply how it turned out.)
A web browser can do other things besides cleartext http (e.g., try to
start a TLS session to 127.0.0.1 TCP port 5959), but it's perhaps
unlikely that the initial bytes of the network traffic, in the context
of the above threat model, would ever be a JSON object.
Note that, although veilid-server is not speaking the HTTP protocol on
127.0.0.1 TCP port 5959, it is still able to read the data sent by any
web browser to http://127.0.0.1:5959, send that data to a JSON parser,
and write the data to the server logs. In limited testing, the HTTP
client typically saw zero bytes of application layer response;
however, if the HTTP client sent a huge amount of data (e.g., 16 MB),
the HTTP client would sometimes receive a large response with JSON
data about veilid-server's internal state. That might be a separate
bug. In the context of the threat model, this may not matter because
that JSON data isn't accessible by the operator of the website (that
hosts the JavaScript code).
There may be many ways to resolve this. First, the Veilid
documentation could recommend never running a web browser on any
machine that has veilid-server with 127.0.0.1 TCP port 5959 open.
Second, the existence of a realistically probe-able service on
127.0.0.1 TCP port 5959 might be considered much too large an attack
surface for an application of Veilid's sensitivity, and interprocess
communication could be replaced with something other than
unauthenticated TCP.
This MR is intended to improve Veilid for an ordinary user who wants
to help the project by installing veilid-server on their primary
personal machine, and wants veilid-cli to remain usable, but needs to
continue routine web browsing on that machine. It provides safer
behavior for such a person. The MR is not intended to benefit experts
who already understand localhost TCP risks, and either avoid all web
browsing on the same machine or have their own countermeasures. These
experts will not see any attacker-controlled traffic on port 5959, and
thus the reduction in logging should be of no concern to them.
Without the patch (and with logging on), data sent by a web browser is
always logged by veilid-server in the form:
Connection processing failure: Parse error: 'expected value at line 1 column 1' with value 'deserialize_json:
---
GET /<attacker_controlled_data> HTTP/1.1
---
to type veilid_core::veilid_api::json_api::Request'
regardless of how long the attacker controlled data is. Some browsers
such as Chrome start by sending OPTIONS instead of GET.
With the patch, long malformed input is discarded and the log instead
contains:
Connection processing failure: Parse error: 'expected value at line 1 column 1' with value 'deserialize_json:
---
:skipped long input that's not a JSON object
---
to type veilid_core::veilid_api::json_api::Request'
The patch allows logging of anything where the first non-whitespace
character is a '{' - this is considered safe (at the moment) because
no web browser (realistically used by a local user) can send '{' at
the beginning of the first line. Also, the patch allows logging of
requests smaller than 50 bytes to support two use cases. First, if a
node operator is sending one of the simple JSON API requests by hand
and is accidentally omitting the initial '{' from the JSON object,
they'll be able to see the failure in their logs. Second, non-expert
node operators may want some limited visibility into the details of
adversarial activity on http://127.0.0.1:5959. Of course, this default
logging policy could be made more flexible later if Veilid decides to
stay with unauthenticated TCP. The patch only aims to defeat a simple
DoS attack against the out-of-the-box code.
set_subkey corrupts record_data_size in a Record struct by calling
set_record_data_size with a value that depends only on the length of
the new subkey value. This leads to various undesirable outcomes, such
as: applications can write more than MAX_RECORD_DATA_SIZE without
encountering the intended "veilid.error.VeilidAPIErrorGeneric:
label='Generic' message='dht record too large'" error message, and
"panicked at 'attempt to subtract with overflow'" (i.e., an attempt to
set a negative value of a size) if a subkey's new length is less than
a subkey's old length. Typically, record_data_size in a Record struct
will be incorrect if a value was set for more than one subkey. Some
users might want to start over with a table_store that doesn't have
any incorrect record_data_size values.
The issue begins here:
6f71c6a00a/veilid-core/src/storage_manager/record_store.rs (L583-L586)
and is triggered here:
6f71c6a00a/veilid-core/src/storage_manager/record_store.rs (L613-L615)
It should be clear that new_record_data_size is only related to the
subkey that is currently being set. The amount of data in the record,
before set_subkey is called, is ignored. It appears that
new_total_size, not new_record_data_size, was intended to be used for
set_record_data_size, and this change succeeds for me in limited
testing but I don't have a comprehensive test suite.
One way to reproduce is by running the code from
https://gitlab.com/vatueil/veilid-file on a greater than 1 MB file
while watching variable values within
veilid-core/src/storage_manager/record_store.rs. For example: "poetry
run file put /usr/bin/tcpdump" (1.3 MB on Ubuntu 23.04). With the
original Veilid code, each of the dozens of subkey writes is checking
whether a roughly 32K value is greater than 1048576, it never is, and
thus there is never a "dht record too large" error. With the patch in
this MR, each of the dozens of subkey writes is checking whether an
ever-increasing value is greater than 1048576, it eventually is, and
the "dht record too large" error is printed. With the patch, one can
work with smaller files, e.g., do "poetry run file put /usr/bin/ssh"
(0.8 MB) followed by "poetry run file get VLD0:<_insert_key_here_>
ssh-copy" and the retrieved file ssh-copy is identical to
/usr/bin/ssh.
The more detailed behavior is that the modified code has
record.total_size of 350 on the first iteration, then 33596, 66842,
100088, etc. The original code also has record.total_size of 350 on
the first iteration, but then stays at 33246 forever (33246 is the
user-supplied subkey size of 32768, plus 128, plus the minimum record
size of 350),