If there is an issue with a USS gateway, please establish the version of the gateway and get debug logs. This is in order to help identifying services degrading.
TABLE OF CONTENTS
- Check if Squid is running
- Overriding files
- Commands for performance analysis
- Top Breakdown (simplified)
- HTOP breakdown
- Excessive Proxy Connections
- Running out of storage
- Change interface properly
- View routing table
- Navigating Logs
- Updating the Gateway using packages
- USS Gateway Authentication issues
- Failure Joining Domain altogether
Check if Squid is running
When troubleshooting any Gateway issues it’s important to verify squid is running, which version is running and how to restart it:
Command to check if squid is running:
ps aux | grep squid
if not running it looks like this:
If is running it looks like this:
The squid version is highlighted in yellow:
Overriding files
Squid override files are located in this directory: /usr/local/uss-squid5/etc
NOTE: making changes to the override files and needs to be done in the uss-squid4 directory on proxy version 1.x and uss-squid5 directory on versions 2.x
Commands for performance analysis
Check available disk space:
df -h
If you encounter file systems that show 100% usage in the "Use%" column of this command, those file systems are completely full and could be causing performance issues.
du --max-depth=1 -h /
This command is used to estimate disk space usage for directories and subdirectories, limiting the depth of the analysis to just the immediate subdirectories of the specified path ('/' in this case). Used in conjunction with mount and df-h commands.
df -i
The df -i command is used to display information about the inode usage on filesystems. Inodes are data structures in a Unix-like file system that store metadata about files and directories, including information like file ownership, permissions,timestamps, and pointers to the actual data blocks on the storage device. Eachfile or directory on a file system is associated with an inode.The output of the df -i command shows the inode-related information for each mounted file system. Ensure they have not maxed inodes
Monitor processes:
top
or you can use:
htop
Note: htop is better as it is easier to read. Please pay attention to the Cpu%, Mem% & Uptime presented here.
Check RAM usage
free -h
Confirm it is not full, or approaching.
free -s 3
This checks the RAM usage every 3 seconds.
TOP Breakdown (Simplified)
the command top is used to view the live processes on the USS gateway along with their associated load
-Load Average- this dispalys the CPU load on the gateway. Cenerally any one of the load averages shouldn't exceed the number of CPU's on the applicance. This is a rule of thumb, that if a gateway has 4 CPU's the load shouldn't exceed 4.0 otherwise there will be latency
- Table below- the Table below the output shows a live list of processes (COMMAND Colum) and their associated memory and CPU usage (very similar to win resource monitor)
Top should be the first point of reference for troubleshooting a slow gateway
Below is an example of two prxoes 80 and 81.
We can see normal and expected load averages on gateway 80 (below 2.5) and on gateway 81 we can see abnormal load averages of 9+
we can then use the live process table to try to identify an issue such as a process which is using up too much CPU:
HTOP breakdown
HTOP is used to example processes, sometimes referred to as Tasks.
Highlighted RED below are CPU cores:
The numbers on the top left from 1 to 8 represents the number of cpu's/cores in my system with the progress bar next to them representing the load of cpu/core. As you would have noticed the progress bars can be comprised of different colours. The following list will explain what each colour means.
- Blue: low priority processes (nice > 0)
- Green: normal (user) processes
- Red: kernel processes
- Yellow: IRQ time
- Magenta: Soft IRQ time
- Grey: IO Wait time
Highlighted RED below are Memory and Swap progress bars:
Like the cpu progress bars the memory and swap progress bars can be comprised of different colours. Here is a list of what the colours means within relation to the memory and swap progress bars.
- Green: Used memory pages
- Blue: Buffer pages
- Yellow: Cache pages
Highlighted RED below are Load averages:
The system load is a measure of the amount of computational work that a computer system performs. The load average represents the average system load over a period of time. 1.0 on a single core cpu represents 100% utilization. Note that loads can exceed 1.0 this just means that processes have to wait longer for the cpu. 4.0 on a quad core represents 100% utilization. Anything under a 4.0 load average for a quad-core is ok as the load is distributed over the 4 cores.
The first number is a 1 minute load average, second is 5 minutes load average and the third is 15 minutes load average.
HTOP Task information breakdown
Htop will lists all the running processes(tasks) on a system with information about how much cpu and memory each process is using as well as the command used to start the process.
Here is a list that explains what each column means.
- PID: A process’s process ID number.
- USER: The process’s owner.
- PR: The process’s priority. The lower the number, the higher the priority.
- NI: The nice value of the process, which affects its priority.
- VIRT: How much virtual memory the process is using.
- RES: How much physical RAM the process is using, measured in kilobytes.
- SHR: How much shared memory the process is using.
- S: The current status of the process (zombied, sleeping, running, uninterruptedly sleeping, or traced).
- %CPU: The percentage of the processor time used by the process.
- %MEM: The percentage of physical RAM used by the process.
- TIME+: How much processor time the process has used.
- COMMAND: The name of the command that started the process.
The difference between VIRT, RES & SHR:
VIRT stands for the virtual size of a process, which is the sum of memory it is actually using, memory it has mapped into itself (for instance the video card's RAM for the X server), files on disk that have been mapped into it (most notably shared libraries), and memory shared with other processes.
VIR represents how much memory the program is able to access at the present moment.
RES stands for the resident size, which is an accurate representation of how much actual physical memory a process is consuming. (This also corresponds directly to the %MEM column)
SHR indicates how much of the VIRT size is actually sharable memory or libraries. In the case of libraries, it does not necessarily mean that the entire library is resident. For example, if a program only uses a few functions in a library, the whole library is mapped and will be counted in VIRT and SHR, but only the parts of the library file containing the functions being used will actually be loaded in and be counted under RES.
HTOP PROTIPS:
- Press f5 to enable tree view. this helps massively
- Navigate the processes using arrow keys
- Kill a process using f9 key
- List open files used by a process by pressing the 'L' key ('q' to return from L view)
- Display only processes of single user by pressing 'u'
- Sort any HTOP column by pressing f6
- SUPER ADVANCED: You can customise how htop functions and displays by pressing the f2 key. (escape key to exit)
- HTOP IS CLICKABLE, you can LEFT CLICK entries! (handy for sorting)
- Lastly, a really good breakdown please refer to this document: https://peteris.rocks/blog/htop/
Excessive Proxy Connections
A proxy may slow down if excessive connections are active and open, in that scenario the course of action would be to investigate the number of connections per device and address a device which may be spamming connections or to upgrade to a load balances setup or add another standalone proxy to the mix
The following commands should be run as the root user from the CensorNet server command line.
Display a list of connections from unique IP addresses to the proxy (first column connection count, second column IP address):
netstat -n | grep -E '8080|8081|8082' | awk '{print $5;}' | cut -d : -f 1 | sort | uniq -c
Display a count of unique IP addresses connected to the proxy (also includes closed connections):
netstat -n | grep -E '8080|8081|8082' | awk '{print $5;}' | cut -d : -f 1 | sort | uniq | grep -c .
Display a count of unique IP addresses which have open and active connections to the proxy:
netstat -n | grep -E '8080|8081|8082' | grep ESTAB | awk '{print $5;}' | cut -d : -f 1 | sort | uniq -c
Running out of Storage
Firstly, check the syslog and postgresql log to verify that the gateway is running out of space, the errors should be very verbose and easy to spot. A clear sign that the gateway is completely out os space is squid refusing to stay up.
These commands will let you check the system storage:
df -h [this will display general storage, the dev sda1 is the main partition to worry about] df -i [this will display inode storage which are tiny files which can fill up the gw]
If the gateway is indeed full you will need to figure out what's causing this, if the sotrage was just low to begin with, the user will need to increase the size of the partition
However, if the it seems unusal please check what's filling up storage, it could be an unrotated log or something spamming or looping.
The following command will display all of the largest files on the server in order of size:
sudo find / -type f -exec du -h {} + 2>/dev/null | sort -rh | head -n 20
it should produce an output like this:
Resizing the exisiting sda partition with unused space
In general the main partition will have unused space Change interface properly
We can see that using the logical volume manager, the sda3 partition is 79GB but there is a logical volume sub parititon which is 39GB and not being used.
We can free up some space in a pinch by re-partitioning:
This command will move the space from the unused partition to the main partition.
sudo lvresize -L +20G --resizefs /dev/ubuntu-vg/ubuntu-lv
Change Interface Properly
This is not normally necessary, but important to know if gateway reverts back to old IP address or internet connection isn't working.
Firstly, check the interfaces and change them in the USS gateway UI:
On the proxy its self, you can check them by:
Firstly using ifconfig:
Then in the database. To access the database use commands
su - postgres
psql cloudgw
\dt
The command \dt isn’t necessary as it just lists the DB tables, useful if you want to see what else is there)
select * from interfaces;
To edit interfaces using the DB (which you might need to do if the gateway UI is not reachable) use a simple SQL query:
update interfaces set address='192.168.1.x' where if_name='interface_name';
For e.g.
Finally, the location where interfaces and IP settings tend to get stuck is netplan:
To access the netplan file use the following command:
sudo nano /etc/netplan/00-installer-config.yaml
the file will look like this:
Editing the existing entries in the file are self explanatory. However adding new entries (which you likely won’t have to do) is a bit more complex as yaml format is indent sensitive.
- Use Spaces, Not Tabs: YAML requires spaces for indentation. Tabs are not allowed.
- Consistent Number of Spaces: Use a consistent number of spaces for each level of indentation. The most common choices are 2 spaces, 4 spaces, or even 8 spaces.
- Maintain Alignment: Keep items at the same level aligned properly. This helps with readability.
- Nested Structures: When you have nested structures (like lists within lists or dictionaries within dictionaries), indent them further.
View routing table
To view the gateway routing table run the command:
route -n
Normally the gateway should route automatically between NICs, but in rare cases – for example when customers use public IP ranges instead of private ones (not 10.0.0.0, 192.168.0.0 etc) a static route may be necessary.
- SSH into the gateway using putty or a similar tool
- run “sudo su” (without quotes) to gain root privileges
- edit (or create if it doesn't exist) /etc/ussgw_custom_firewall
- if newly created add "#!/bin/bash" on top (without quotes)
- Add your route command i.e.
route add -net 40.0.0.0 netmask 255.0.0.0 gw 10.0.5.1 dev eth0
- Save the file. It should now look like this:
#!/bin/bash route add -net 40.0.0.0 netmask 255.0.0.0 gw 10.0.5.1 dev eth0
- if the file is newly created, make it executable by running:
chmod +x /etc/ussgw_custom_firewall
- Finally, restart the sysmond process by running:
service ussgw_sysmond restart
- Or by rebooting the gateway
Navigating Logs
The easiest way to go through logs in detail is to export them in the debug zip file and investigate in a good editor. However, if you’re troubleshooting and need to go through the logs and info on the gateway here is the list of locations and ways of using the ubuntu viewers:
Viewers:
- less- views the entire file use arrows or pageUp/Down to navigate
- cat- views whole file but scrolls all the way down for you
- tail- views only the last page of logs, good for quick analysis
- nano- not really for logs but useful text editor to view and/or make changes to files
Logs: (preface the location with any of the above less/cat/tail)
- /var/log/syslog -system log
- /var/log/postgresql/postgresql-12-main.log -database log(might have different numbers depending on the gateway version, I ain’t listing them, just view the directory with ls -la)
- cd /usr/local/uss-squid5/var/logs/access.log -squid access log
- cd /usr/local/uss-squid5/var/logs/cache.log -squid cache log
- /var/www/api/application/logs/ -directory containing API logs for checking icap connection
- dmesg | less -It prints the message buffer of the kernel, which contains information about the system's hardware and drivers. This log provides a real-time view of the system's kernel activity.
Updating the Gateway using packages
Most of the time the gateway can be updated using the standard “apt-get update && apt-get upgrade” command, however if a gateway release is not yet available on the stable repo you may need to download the Debian packages and install them manually:
Firstly, repositories (including the agent ones for quick access):
- USS Gateway: https://downloads.clouduss.com/gateway/packages/
- USS Agent WIN: https://downloads.clouduss.com/windows/
- USS Agent MAC: https://downloads.clouduss.com/macosx/
In Putty, do the following:
sudo su cd /tmp
wget https://downloads.clouduss.com/gateway/packages/ussgateway_2.0.59_amd64.deb
(note, you can use a different link to a deb file, the above is just example)
Lastly:
sudo apt-get -f install ./ussgateway_2.0.59_amd64.deb
new version should then install. You can check the version using this command:
dpkg -l | grep ussgateway
USS Gateway Authentication issues
If users get the authentication pop-up or any kind of squid cache-access-denied message there is likely to be an issue with AD authentication (not this have nothing to do with the AD syncing in the USS dashboard)
Customer should be able to go through it themselves with the public guide available here: https://censornet.freshdesk.com/a/solutions/articles/80000809322
But it’s important that you know and understand these steps. Also, remember to enable debug mode on and test join and keys on the USS gateway ui. Remember to enable debug mode when joining too in order to get full output on the errors
Failure Joining Domain altogether
WARNING: This is a last resort.
Join the domain via the command line, reference ticket: 67869
Notes: The domain I tested with is JOTUNHEIM.LOCAL - please replace this with the domain you're using
The commands on the command line MUST match the case - if I use upper case, please do so also. It won't work without this.
Pre-requisite – please install the Kerberos utilities using:
apt-get -y install krb5-user
- Create the Active Directory in the UI as normal (replacing the fields with details for your own domain)
2. Join the domain using this command (note, use your own user and replace JOTUNHEIM.LOCAL with your own domain in caps here in the filename):
net ads join -U mick@jotunheim.local -s /etc/samba/JOTUNHEIM.LOCAL_smb.conf
It will prompt for a password, which you won’t see echoed to the screen while typing. If the join is successful, you will see an error about adding the DNS record, but this can be ignored as this will need to be done manually.
3. Update the proxy db with this info (Note, replace JOTUNHEIM.LOCAL here with your own domain):
sudo su -c "psql -d cloudgw -c \"update ad_auth_domains set is_joined='1' where realm='JOTUNHEIM.LOCAL'\";" postgres
It should look like this if successful:
4. Create the keys (replacing the user and JOTUNHEIM.LOCAL in the filename with your own info):
net ads keytab add_update_ads HTTP -U mick@JOTUNHEIM.LOCAL -s /etc/samba/JOTUNHEIM.LOCAL_smb.conf
Verify that the keys have been created:
klist -ke /etc/krb5.keytab
5. Update the proxy db with this info:
sudo su -c "psql -d cloudgw -c \"update ad_auth_domains set has_keys='1' where realm='JOTUNHEIM.LOCAL'\";" postgres
It should look like this if successful:
6. Enable AD Auth:
sudo su -c "psql -d cloudgw -c \"update global_config set val='1' where key='ad_authentication'\";" postgres
7. Restart the proxy here:
Finally, configure the proxy in the Active Directory DNS. Once done, log out and back into a test machine, change the proxy settings to suit, and try it out.