If you’re looking at a web site in your web browser (Firefox, Internet Explorer etc), it is being served by a web server.
According to Netcraft over 50% of web servers now are using Apache. For logging, most people use the extended log file format. Here is how a standard log looks when someone goes to http://www.privasectech.com:
127.0.0.1 - - [11/Jan/2010:17:21:09 -0800] "GET / HTTP/1.1" 200 3245 "-" "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:184.108.40.206pre) Gecko/20100106 Ubuntu/9.10 (karmic) Shiretoko/3.5.8pre"
What does all of this mean? The first number is the IP address. It’s the internet’s version of your home address.
Then, we have the time, and date of the request. Next, we have the request sent from the browser, the status of the request (was it successful?) and size of file transfer. After that we often get the web browser and operating system version.
This user was using Ubuntu Linux 9.10 and Firefox 3.5.8pre.
This information goes to every website you visit.
Your information was just logged:
echo $_SERVER["REMOTE_ADDR"]; ?> using echo $_SERVER["HTTP_USER_AGENT"]; ?>
There is a lot of information that can be gathered from these logs, both useful and malicious. One could write software that would only be delivered to visitors with a certain operating system, browser, or even people coming from a certain address.
Think about how much information Google must have on you. Every Google search you have ever done. Now think about Google Adwords. For every website (millions?) Google Adwords advertising suite is on, Google now get all of that information as well.
Then there are web servers that run Google analytics. You might not even know you’re going to a website that is providing your surfing habits to Google at no charge.
Just to put this into perspective, Google made 12 Billion dollars in 2008 with this information, it is valuable.
Not to pick on Google, think about the same concerns with Facebook. They log every transaction you’ve ever done, and will never delete it. What might they do with that information in the future? What would you do if you had the entire web surfing history of millions of people and/or their online social interactions?
We do keep web server logs here at PrivaSecTech. After one year we archive them, after two years we delete them.
If you use Firefox, you might be interested in User-Agent Switcher. It allows you to fake your browser and operating system. This is handy if you prefer to keep this information private, or if you go to a website that says “This website only works for this type of operating system or browser”, you can configure the User-Agent switcher to provide that information.
If you try it out, hit F5 to reload this page and try it out. Did the information logged about you above change at all?
The other information you may wish to hide is your IP address, but that is a little more difficult. For that you will want a proxy, a machine in the middle of you and the target website. Your ISP may offer a web proxy, ask them. You can also use an anonymizer web proxy like Tor. Just note that the more proxies you go through, the slower your web surfing experience.