Tokyo PC Users Group
	  Home Page
Members Only
Become a Member
Meeting Info & Map
Corporate Members
Workshops & Training
Other Clubs
Job Hunting?

Ruling the Webcache

by Kurt Keller

The Task

A reliable HTTP/FTP cache for a company. In a first stage about 300 users, in a second stage about 1,000, in the end the number will possibly grow to over 5,000 users.

Only selected users are to be able to use the webcache. One group of users is allowed access from 07:00 to 19:00 weekdays, another group 7 x 24 hours and a third group 7 x 24 hours including FTP.

It must be possible to exactly verify who accessed what resources.

A method to block certain sites must be given.

Time to implement is only two weeks.

The Solution: Squid.

Sorry? You want to know more about it. Well then...

Squid is a free webcache available from

Originally the cache should have been implemented with Lotus Notes, making use of an already existing user database. This user database should have been split up into three to handle the requested access rights for different groups. Logging seemed fine with Notes as well. It was planned to access the web through Lotus Notes, but using an external browser for actually viewing pages.

Two weeks before the launching date, a Notes guru found that such an implementation was technically impossible. BUMMER! This meant either back to the drawing board and associated considerable delays, or find a completely different solution. Vaguely recalling having read something about user authentication while setting up Squid on another system, the manuals for Squid were skimmed again. It seemed possible to mostly meet the given criteria. Two weeks till it had to work, including all the administrative stuff associated with it, left no time to verify the assumption or even test it.

Squid has a very versatile ACL (Access Control List) configuration. It is top down, meaning the first matching entry decides whether access is granted or denied. You can set rules depending on destination IP and domain, source IP and domain, time, requested URL (also with regular expressions), port, protocol, method (GET / POST) or user. However, the user ACL rules were not usable in this case, since they require an ident server to be running on the machine accessing the webcache.

But if you compile Squid with the USE_PROXY_AUTH switch, it is possible use the same method of authentication known from web pages; the browser will pop up a small window, asking for a user name and password. Even though this can not be mixed with the aforementioned ACL's, it is still a way to authenticate users and keep anyone without access rights out. And also in the logs, the name of users authenticated this way will show up, making it possible to satisfy the given logging requirements.

Getting the other access restrictions in place needed a little trickery. It would have been nice to use the user ID form the proxy-authentication, but they can't be used in ACL's, a different way was necessary. Good luck the whole company network uses fixed IP addresses. This means certain machines can be restricted, or rather set the default to the minimum access for authenticated users and privilege, rather than restrict, certain machines.

Access Rules

So the chosen setup provides free access without authentication to the intranet and the company's domain on the internet. This is done with a "domain ignore list" on the proxy-authentication.

Then any request is compared to a list of sites which are not allowed to be accessed by anyone.

For anything else, a user must go through authentication. For password changes, an Apache webserver has been setup on the proxy machine. Any user can access a cgi script on this webserver and change his or her password this way.

If successfully authenticated, access to external HTTP resources is granted weekdays from 07:00 to 19:00.

If an FTP request is being made, the IP address of the requesting computer is compared to a list of authorized IP addresses and access granted or denied.

Identically, the IP address is checked and compared to a list when someone tries to access the cache outside of the default time range.

Criteria met?

If you think carefully, you'll find that this does not exactly meet the given requirements, as privileges are not actually assigned to users, but to individual computers instead. A user with default privileges could login with his own user ID on a privileged machine to work around certain restrictions. This hole is known, but it is the best we can currently come up with. And as long as users are not told about this possibility, they won't notice so easily. (I trust you won't tell them, otherwise I'll get you off the AJ's mailing list!!!) business  or private

© Algorithmica Japonica Copyright Notice: Copyright of material rests with the individual author. Articles may be reprinted by other user groups if the author and original publication are credited. Any other reproduction or use of material herein is prohibited without prior written permission from TPC. The mention of names of products without indication of Trademark or Registered Trademark status in no way implies that these products are not so protected by law.

Algorithmica Japonica

Month, 1998

The Newsletter of the Tokyo PC Users Group

Submissions : Editor

Tokyo PC Users Group, Post Office Box 103, Shibuya-Ku, Tokyo 150-8691, JAPAN