For troubleshooting issues, especially ones that come in through the SSDT Support Desk, it can be difficult to know if the concentration should be on the server, the instance, or a particular report/process. Depending on where the problem originates, different information and troubleshooting techniques are needed. I've also included some common troubleshooting steps (links to other Wiki documents) the ITC's can take. This way they may be able to resolve the situation, or at least provide as much information as possible. For at least the first two situations, it is more productive if someone from the technical staff at the ITC takes the lead on the ticket.
In this case, all the instances on the server seem to be running slowly. Note there could still be an issue with a specific instance causing the entire server to be slow, but there are certain metric to check to diagnose the slowness.
Linux and troubleshooting tips are listed here. Some specific things to check:
- Server Memory (use free command)
- Nagios proxy timeout configuration
- Soft lockup-bug
- Use the Top command in conjunction with /ssdt/scripts/pid2name.sh to see if a specific instance is using significant memory (if so, see next step).
If the ITC cannot determine the cause, usually this means one of the SSDT technical personnel will take ownership of the ticket.
One Particular Instance
Troubleshooting steps here apply when particular instance (specific district and USAS or USPS) is reporting slowness, OR when research from the section above indicates that one specific instance/process is consuming too much memory, causing server-wide issues. If it is determined that it's a specific task or report (or a group of them) and not the entire instance, see the next step.
Note that if the URL cannot be reached (for example, users are getting a 404 error), this usually is an issue with the container not starting/restarting, and not an issue with NGINX. The first thing the ITC should do is look at the logs. The docker container might be up, but that does not mean the instance itself is up and accessible.
If the application appears completely hung, there are tools here to help with diagnosing the issue.
Use the system monitor to help determine if the instance settings are correct or need tuning. Parameters to set via the docker-compose.override.yml file are listed here. Specific settings that can affect performance:
- Memory allocated to the instance (JAVA_OPTS)
- DB Pool Max (DB_POOL_MAX)
In addition to looking at other metrics via the application, the ITC may need to look at the application logs, in particular at the time of the reported slowness. Here is a listing of common commands and parameters. To look at logs for a specific application, for example USAS, the command is:
If the instance is not starting, there is also a possibility that there is a lock on the database. The log file may contain entries similar to: liquibase.exception.LockException: Could not acquire change log lock. This can be due to the application stopping in the middle of attempting to apply database updates. During database updates, the application must have exclusive access to the database. If something happens to interrupt this update process, the lock may not be released. The SSDT has supplied a script for releasing these locks, but it should only be used at the request of, and with guidance from the SSDT.
If the ITC cannot determine the cause by looking at the logs, one of the SSDT technical personnel will take ownership of the ticket
The ITC should provide logs and possibly a database backup. This describes how to send files to the SSDT. System logs can be sent from within the application or via the command line. It is important to note that the procedures do not send the SSDT notification. The customer should note on the ticket the full name of the file they sent as reported by the procedure. We get files from multiple ITC's, and the procedures provide a unique naming schema. If the debugging level needs to be changed (either at the request of the SSDT or when an ITC is trying to diagnose the problem), see here for instructions.
It is helpful to include the time the slowness was happening and, if possible, what the district was doing at the time. This helps when looking at the log.
Specific Report or Task
Sometimes a specific report or task is causing slowness. This can occur within one instance, or it can be the same report/task having problems within multiple instances.
The district/ITC should check report parameters, etc. The issues can be due to data or report parameters. They should also look at the system monitor and logs as indicated in troubleshooting one particular instance as described in the previous step.
If the ITC cannot determine or resolve the issue, someone from the programming staff may initially work on the ticket. Follow instructions as described in the previous step to send information to the SSDT.