When using short-lived certs and regular issuance, the expired certs can build up in the PKI database and cause issues with replication, performance and overall database size.
PKI has provided a new feature in 11.3.0, pruning, which is a job that can be executed on a schedule or manually to remove expired certificates and requests.
Random Serial Numbers v3 (RSNv3) is mandatory to enable pruning.
Both pruning and RSNv3 require PKI 11.3.0 or higher.
## Use Cases
ACME certificates in particular are generally short-lived and expired certificates can build up quickly in a dynamic environment. An example is a CI system that requests one or more certificates per run. These will build up infinitely without a way to remove the expired certificates.
Another case is simply a very long-lived installation. Over time as hosts come and go certificates build up.
## How to Use
https://github.com/dogtagpki/pki/wiki/Configuring-CA-Database-Pruning provides a thorough description of the capabilities of the pruning job.
The default configuration is to remove expired certificates and incomplete requests after 30 days.
Pruning is disabled by default.
Configuration is a four-step process:
1. Configure the expiration thresholds
2. Enable the job
3. Schedule the job
4. Restart the CA
The job will be scheduled to use the PKI built-in cron-like timer. It is configured nearly identically to `crontab(5)`. On execution it will remove certificates and requests that fall outside the configured thresholds. LDAP search/time limits can be used to control how many are removed at once.
In addition to the automated schedule it is possible to manually run the pruning job.
The tool will not restart the CA. It will be left as an exercise for the user, who will be notified as needed.
### Where to use
The pruning configuration is not replicated. It should not be necessary to enable this task on all IPA servers, or more than one.
Running the task simultaneously on multiple servers has a few downsides:
* Additional stress on the LDAP server searching for expired certificates and requests
* Unnecessary replication load deleting the same entries on multiple servers
While enabling this on a single server represents a single-point-of-failure there should be no catastrophic consequences other than expired certificates and requests potentially building up. This can be cleared by enabling pruning on a different server. Depending on the size of the backlog this could take a couple of executions to catch up.
## Design
There are several operations, most of which act locally and one of which uses the PKI REST API.
1. Updating the job configuration (enable, thresholds, etc). This will be done by running the `pki-server ca-config-set` command which modifies CS.cfg directly per the PKI wiki. A restart is required.
2. Retrieving the current configuration for display. The `pki-server ca-config-find` command returns the entire configuration so the results will need to be filtered.
3. Managing the job. This can be done using the REST API, https://github.com/dogtagpki/pki/wiki/PKI-REST-API . Operations include enabling the job and triggering it to run now.
Theoretically for operations 1 and 2 we could use existing code to manually update `CS.cfg` and retrieve values. For future-proofing purposes calling `pki-server` is probably the better long-term option given the limited number of times this will be used. Configuration is likely to be one and done.
There are four values each that can be managed for pruning certificates and requests:
The first two configure when an expired certificate or incomplete request will be deleted. The unit can be one of: minute, hour, day, year. By default it is 30 days.
The LDAP limits control how many entries are returned and how long the search can take. By default it is 1000 entries and unlimited time (0 == unlimited, unit is seconds).
The configuration values will be set by running `pki-server ca-config-set` This will ensure best forward compatibility. The options are case-sensitive and not validated by the CA until restart. The values are not applied until the CA is restarted.
### Configuring job execution time
The CA provides a cron-like interface for scheduling jobs. To configure the job to run at midnight on the first of every month the PKI equivalent command-line is:
This will be the default when pruning is enabled. A separate configuration option will be available for fine-tuning execution time.
The format is defined https://access.redhat.com/documentation/en-us/red_hat_certificate_system/9/html/administration_guide/setting_up_specific_jobs#Frequency_Settings_for_Automated_Jobs
### REST Authentication and Authorization
The REST API for pruning is documented at https://github.com/dogtagpki/pki/wiki/PKI-Start-Job-REST-API
A PKI job can define an owner that can manage the job over the REST API. We will automatically define the owner as `ipara` when pruning is enabled.
Manually running the job will be done using the PKI REST API. Authentication to this API for our purposes is done at the `/ca/rest/account/login` endpoint. A cookie is returned which will be used in any subsequent calls. The IPA RA agent certificate will be used for authentication and authorization.
### Commands
This will be implemented in the ipa-acme-manage command. While strictly not completely ACME-related this is the primary driver for pruning.
A new verb will be added, pruning, to be used for enabling and configuring pruning.
These options set the client-side limits. The server imposes its own search size and look through limits. This can be tuned for the uid=pkidbuser,ou=people,o=ipaca user via https://access.redhat.com/documentation/en-us/red_hat_directory_server/11/html/administration_guide/ldapsearch-ex-complex-range
### Showing the Configuration
To display the current configuration run `pki-server ca-config-find` and filter the results to only those that contain `jobsScheduler.job.pruning`.
Default values are not included so will need to be set by `ipa-acme-manage` before displaying.
For online REST operations (login, run job) we will use the `ipaserver/plugins/dogtag.py::RestClient` class to manage the requests. This will take care of the authentication cookie, etc.
The class uses dogtag.https_request() will can take PEM cert and key files as arguments. These will be used for authentication.
For the non-REST operations (configuration, cron settings) the tool will fork out to pki-server ca-config-set.
### UI
This will only be configurable on the command-line.
Configuration changes will be made to /etc/pki/pki-tomcat/ca/CS.cfg
## Upgrade
No expected impact on upgrades.
## Test plan
Testing will consist of:
* Use the default configuration
* enabling the pruning job
* issue one or more certificates
* move time forward +1 days after expiration
* manually running the job
* validating that the certificates are removed
For size/time limit testing, create a large number of certificates/requests and set the search limit to a low value, then ensure that the number of deleted certs is equal to the search limit. Testing timelimit in this way may be less predictable as it may require a massive number of entries to find to timeout on a non-busy server.