These developments pose a problem when it comes to obtaining feedback about the usage of web applications. The data left by many interactive applications in the server's log file is minimal and not sufficient for extracting detailed information about the actual usage of the application. For instance, it is not possible to say in which order the fields of a form were filled in.
In this work, we investigate means for obtaining more information about the usage of websites and web applications. This includes a detailed tracking of all interaction with the displayed browser page, such as moving the mouse pointer around or scrolling the page. Additionally, the interaction should be tracked at the widget level, i.e. the mouse coordinates are mapped to elements like buttons, links etc. Combined with knowledge about the layout of the HTML pages, complete tracking of all user activity becomes possible.
The information which can be obtained using activity tracking is interesting for a number of scenarios. So far, the main application has been usability tests of websites, but with a tracking approach that is flexible enough, it is also possible to use the tracking during web application development, beta-testing or constant/repeated evaluation of live websites. On a more abstract level, it can be employed for profiling users and for implicit interaction with websites.
The term "implicit interaction" is explained in detail in section 3.2. In contrast to explicit interaction with a website which the user is aware of (such as filling in a form field), implicit interaction usually happens unconsciously. For example, the user may hesitate before filling in the field because he is not certain about the correct answer - a fact which we can observe and make use of.
As part of our research, we present an advanced, non-intrusive tracking solution which does not require special setup at the client or server side. Working as an HTTP proxy, the software performs detailed user activity tracking while modifying HTML pages "on the fly" before passing them on to the client browser.
This paper is organised as follows: In section 2, our requirements for user tracking and the chosen approach are described. Next, section 3 discusses the types of data that can be obtained using activity tracking as well as its use for implicit interaction and usability testing. In section 4, the implementation of our HTTP proxy for activity tracking is explained, followed by a case study in section 5 which illustrates its usefulness. Section 6 discusses previous work, it is followed by the conclusion in section 7.
In this section, we will outline our requirements for a flexible, non-invasive tracking technology for web page usage, and briefly describe the approach taken by our solution. Our overall goal was to design a tracking solution which records detailed data for the analysis of user actions, without many of the restrictions of existing tools. Section 4 discusses the implementation in detail.
General requirements The general requirements for the user tracking approach are as follows. The technical requirements resulting from these are listed in section 4.1.
Approach: A proxy for user tracking Some of the requirements above may seem contradictory; the tracking must take place either at the client or server end, which implies significant changes in the setup. However, our proposed solution addresses this issue simply by allowing the tracking to take place either client-side or server-side - this way, the type of setup can be chosen on a case-to-case basis depending on the task for which the tracking is employed.
In order to collect data about users' actions, we use the approach of an HTTP proxy which is inserted between the client and server (see figure 1). It intercepts all traffic and outputs log data with details about any requests sent to servers as well as the replies that a server sends back to the client.
However, if only the requests were logged, the data collected by the proxy would be comparable to standard web server logs. Only things like the URLs of requested pages and input data typed into forms by the user would be available. This is not sufficient for our purpose of logging the user actions in detail, e.g. including mouse movement.
Thus, the next step towards a complete solution is the addition of client-side code which records the user's behaviour in detail. The most straightforward way to achieve this would be via special software running on the client machine. However, installation of software is not an option because we also want to support setups where the user does not need to change his configuration at all.
In a human computer interaction class at University of Munich in spring 2005, we had students manually implement detailed tracking for pages that contained larger forms. This assignment showed that it is feasible to implement this kind of tracking without altering the user's experience.
When information about user actions is recorded, it is always very concrete (e.g. "the left mouse button was clicked on link x"). This means that more abstract information must be inferred from the concrete information. In general, the following types of data can be collected with our user tracking solution:
Our approach of using a HTTP proxy is flexible enough to be used in a variety of different scenarios. The following is only a selection of possible areas of application:
Many other applications can be envisioned. For example, an advanced form of customer support for websites is possible: If a customer has a problem using a web application, they can contact a support hotline, either by telephone or over the Internet, using an instant messaging functionality built into the application's website. Customer support is able to perform live monitoring of all actions of the user to understand the problem - even a live replay of the customer's mouse movements would be feasible.
In the following sections, we will look at two areas of application in greater detail: Usability evaluation of websites and implicit interaction with websites.
So far, usability evaluation for websites has been the main purpose of systems which are similar to our approach. Section 6 gives an overview of related work, and compares the different available tools. In this section, we show that our solution is well suited for this task. In fact, it is flexible enough to support many different usability testing scenarios, whereas other tools usually concentrate on only one scenario, such as testing a specific, prepared website in a usability lab on an appropriately configured computer.
An expert who conducts a usability test is typically confronted with the following dilemma: On one hand, as much data as possible needs to be logged in order to produce reliable statistics. This implies inviting a large number of test users and using advanced technology (e.g. video recordings or eye tracking) during the test. On the other hand, the amount of money, time and manpower available for tests is limited. This is especially true for commercial suppliers of usability testing expertise.
Due to this dilemma, the goal when designing our usability proxy was to reduce the cost of activities which are not directly related to actually performing the test. With this in mind, we had a look at the typical tasks that normally arise when performing a 'classical' usability test:
In order to reduce the cost of website usability evaluation, we decided that our user action tracking approach should have the following properties:
No test lab necessary In the discussion above, it quickly becomes clear that much of the cost is caused by the fact that the test user, usability expert and the equipment need to come to the same place for the test. This is not always necessary, because the Internet can be used for remote usability tests. In those cases where remote tests are not sufficient, it is often possible only to invite a certain portion of users to the lab.
No special hardware or software requirements While additional video footage or eye tracking data helps to identify some problems more easily, its usefulness must always be weighed against its costs. We believe it is acceptable to do without special hardware support in many cases - see section 6.2 for a more detailed discussion. Furthermore, user tests can be parallelised much better (i.e. performed by several test users at the same time) if it is not necessary to use special equipment.
If no special software needs to be installed, the amount of technical problems can be reduced: The expert may not be present in person to install special software, whereas the test user cannot be expected to have sufficient computer skills.
Our HTTP proxy can be used in different ways to perform usability tests:
"Classical" usability evaluation Some of the existing tools are only designed for this type of user test: A test user is told to perform certain tasks on a set of web pages. Often, the content of the pages is static, and the usability expert has full control over the server from which they originate.
Evaluation of interactive sites Due to the proxy approach, it is possible to evaluate websites which are not under the control of the usability expert, and to have test users interact with other, unknown users on these sites. For example, this includes the monitoring of online auctions or real purchases in online shops. (However, it should be noted that our implementation does not support encrypted HTTP connections at the moment, which will prevent its being used with some of these sites.)
Parallel user tests The performance of our solution is sufficient to allow its use by many test users in parallel. This allows the collection of much more data than with setups where only one user can take part in the test at a time.
Remote user tests Related to the previous point, the test users can take part in the experiment from their usual desktop machine at home or work. For websites with an international audience, the test users can be located all over the world.
Inviting test users over the Internet Instead of giving the website URL to a number of users and telling them to perform tasks, it may sometimes be desirable to have a website's real users participate in a test. For this scenario, our proxy needs to be run on the website server. With appropriate server configuration (e.g. using the Apache server's mod_proxy module), the proxy can be used to track a user's actions on the website once this user has agreed to the user test, e.g. by clicking on a button. This way, the user does not even need to reconfigure his browser. Only some users (rather than all site visitors) can be tracked, and no change is necessary to the HTML code on the server.
How people interact with an application provides additional information. Typing speed or pointing precision may be good indicators on the proficiency level of a user. The time a user spends on a specific question in a questionnaire may indicate that this question makes her think a lot. Such information is not provided on purpose by the user. However, operating an application, clicking on a button or filling in a form will inevitably provide such parameters. Up to now, they are largely ignored in classical computer systems with graphical user interfaces.
In the area of context-awareness   and physical user interfaces, the notion of implicit human computer interaction is well established . It is defined as the behaviour and interaction of a user with the environment and artifacts to reach a goal. In this work, the focus is on interaction beyond the computer with the real world. The notion of implicit interaction can be extended back to the traditional computer system. People often use applications as tools to achieve a goal, without focussing on the interaction with the computer. For example, if someone orders cinema tickets on a web page (and provided the page is well engineered), she will not think consciously where to click or which form field to fill in first. These minimal interactions will happen unconsciously and automatically.
In the context of web applications, we extend the notion of implicit interaction to the following: Implicit interaction is the observable interaction behaviour of the user with an application that is not done consciously while focusing on reaching a goal.
It is obvious that no clear distinction between implicit and explicit interaction is possible. Figuring out what is done consciously is not straightforward. However, our experience shows that users do much of the interaction of a website or in a form without thinking much about the small steps in the interaction process, and hence having the notion of implicit interaction helps to assess usage. This information can be used in the small (e.g. finding additional information needs with regard to a form field) as well as in the large (estimating the type of user).
In the following example, the concept of implicit interaction is illustrated: A user wants to buy a flight ticket at an online travel agency. Typically, the user fills in a form with the origin and destination and provides the dates of travel. That is essentially all the information that is sent back to the server. For booking a flight, this is sufficient. However, additional information on how the user filled in the form is lost. This information is not essential to book the flight, but could be used to improve the service or to customise it. If implicit interaction tracking is present, it can detect that the user changed dates more than once before submitting the form. This may indicate that he or she is not yet sure about the date, so it can be an option for the website not only to provide flight information for the selected date, but also for earlier and later days.
Implicit input from the user does not provide clear information, it acts more as an additional source of information. The reason why a user types in a certain field much slower than the others may be due to the fact that he or she needs to think longer on this question, but it could also mean that the user needed to answer the phone while doing the question. The developer has to be aware that the additional information needs to be used carefully. Similarly, implicit interaction can be used to monitor usability over a longer period of time. This allows us to continuously look at where people stop using an application (e.g. which form field makes them go away from a website) or to identify part of an application where users are slowed down and where help may be needed.
Collecting and using implicit input raises privacy concerns, ethical questions, and to some extent legal issues. When analysing the information about how precise a user clicks, how often she needs to correct an input field or how long it takes her to write her name, do we have to treat this data as personal information? We think that this is the case and therefore we designed our system in such a way that users usually have to explicitly opt in to use the proxy server by reconfiguring their browser. In a setup where the proxy runs at the server's end, we strongly recommend that a user be asked for his approval before tracking his actions. The user should also be informed in detail about the type of data that will be collected. In our opinion, not doing do so would not only be unethical, but even illegitimate in some jurisdictions.
In this section, we describe the UsaProxy application that provides website usage tracking functionality using a HTTP proxy approach. First, we will look at the general requirements from section 2 at a more technical level. Next, we show how the implementation of UsaProxy meets these requirements. This is followed by a discussion of methods for the visualisation of the aggregated tracking data.
Our primary objective was to enable automated tracking of user activity on web pages that is not perceptible by the user nor intrusive. This requires an application which operates transparently in a way which does not alter the user's browsing habits and experience. It must be possible for test users to take part in website evaluation remotely from their home/office environments, from any location and using their own equipment and network access. For this purpose, the core proxy application as well as the tracking client-side part must be platform-independent with regard to the server technology, and compatible with major browsers and operating systems at the client side. Furthermore, the system must not require any installation on the client machine or changes to the technology used to create the pages. All these requirements are met by the HTTP proxy approach as described in section 2 and below.
For the purposes of a usability engineer, the monitoring and logging of user actions must proceed accurately, in real time and in a reproducible way. It is important to provide a range of features and possibilities that is comparable to those of a standard usability test, resulting in an accurate listing of what the user actually did while performing the predefined tasks. In order to be able to reconstruct website usage in a qualitatively adequate way, we decided to track the following user actions:
Additionally, the Content-Length header's value is increased by the number of bytes that were added. The addition of the above line is the only change made to the HTML.
Some servers do not send plain text/html content, but compress their response and use a Content-Encoding: gzip header. Due to the fact that adding the monitoring HTML in compressed streams would be very difficult, data compression is simply suppressed by overriding the client's Accept-Encoding header and using a value of "identity" instead. If a web server receives a request marked with that value, data will always be sent uncompressed.
To address this problem, two advanced event registration models were added to browsers: The W3C model works with Netscape 6 and Konqueror/Safari, the Microsoft model with Internet Explorer 5+. Both models work in Opera 7. With the new models, multiple event handling functions can be added to elements without problems. With the W3C model, handlers are attached using the addEventListener() method whereas Microsoft's model uses the attachEvent() method. By using these models, it is possible to invoke the UsaProxy-specific monitoring functions without influencing the page's event handlers.
Events are objects with properties. Examples of properties which are of interest for user activity tracking are the target element of a click, hover events or the current mouse position. Since the UsaProxy script is not aware of what elements are used in the document and what names or IDs have been assigned, the event handlers are defined for the root element of the page which is the document element. The Document Object Model (DOM) of every browser gives access to all lower-level elements of HTML documents and their properties such as name, ID, href, src and so on. Events triggered on a certain DOM level are usually forwarded to the root element. Any click on a button or link will automatically "bubble" to the document element. This way, every lower-level event can be recognised and the respective event properties are available for capturing at the root level without having to touch the existing code. This results in the following data that may be monitored by the UsaProxy script together with a timestamp and the user's IP:
However, not only the client-side information is tracked. In order to correlate usage data with the web server's actual reaction to client requests, the server responses (both HTTP headers and text/html content) are also captured and stored on the proxy. Additionally, as shown in figure 2 a log entry is composed for identifying the file the captured response was stored in. This way, the exact HTML code output by dynamic websites is available for later inspection. Furthermore, server instructions such as redirects to another location can be identified and merged with the user inputs that evoked the server reaction.
Possible visualisations of the aggregated UsaProxy usage data range from ordinary listings of web metrics and website statistics to complex screenshot annotations. Some of these are also mentioned in section 6. The design of the proxy allows for the following scenarios:
To analyse the usefulness of the HTTP proxy approach, we conducted a small user test with our implementation. For this, 12 test participants used a computer whose web browser had been reconfigured to use the usability proxy instead of connecting to websites directly. The users were told to perform tasks on two different websites. Due to the test subjects' background (10 students of media informatics, 2 members of the media informatics staff), they can be regarded as experienced web users. The proxy itself was running on a nearby computer to modify the web pages and to log user actions.
The test users were presented with two tasks which are difficult to track with some of the other available user action tracking approaches (see section 6):
Example proxy log Figure 5 shows a short excerpt from the log that the proxy produced during the test. Only a small selection of lines from the log are shown to illustrate the different log events. The events in the figure include mousemove (pointer position changed), mouseover (the pointer was moved over a div HTML element or similar), focus (the cursor moved into an input field) and others. The serverdata line is followed by an ID which can be used to retrieve the HTTP headers and content sent by the server, which is stored by the proxy for later inspection.
188.8.131.52 2005-10-25,11:5:57 http://www.kiko.com/ serverdata 12
184.108.40.206 2005-10-25,11:5:58 http://www.kiko.com/ load width=1280;height=867
220.127.116.11 2005-10-25,11:6:2 http://www.kiko.com/ mousemove x=672;y=7
18.104.22.168 2005-10-25,11:6:2 http://www.kiko.com/ mouseover x=731;y=457 target=link:http://www.kiko.com/contact.htm+linktext:Contact
22.214.171.124 2005-10-25,11:6:6 http://www.kiko.com/ click x=815;y=231 target=id:SPAN16
126.96.36.199 2005-10-25,11:6:37 http://www.kiko.com/app.htm?use auth=678397351 mousemove x=849;y=352
188.8.131.52 2005-10-25,11:6:37 http://www.kiko.com/app.htm?use auth=678397351 mouseover x=472;y=296 target=id:DIV144
184.108.40.206 2005-10-25,11:6:37 http://www.kiko.com/app.htm?use auth=678397351 mouseover x=161;y=229 target=id:left bar
220.127.116.11 2005-10-25,11:6:38 http://www.kiko.com/app.htm?use auth=678397351 click x=147;y=183 target=unknown:scrollbar
18.104.22.168 2005-10-25,11:6:40 http://www.kiko.com/app.htm?use auth=678397351 mousemove x=148;y=138
22.214.171.124 2005-10-25,11:6:50 http://www.kiko.com/app.htm?use auth=678397351 click x=26;y=507 target=id:IMG14
126.96.36.199 2005-10-25,11:6:50 http://www.kiko.com/app.htm?use auth=678397351 focus
188.8.131.52 2005-10-25,11:6:56 http://www.kiko.com/app.htm?use auth=678397351 keypress key=T
184.108.40.206 2005-10-25,11:6:56 http://www.kiko.com/app.htm?use auth=678397351 keypress key=e
220.127.116.11 2005-10-25,11:6:56 http://www.kiko.com/app.htm?use auth=678397351 keypress key=s
18.104.22.168 2005-10-25,11:47:45 http://de.wikipedia.org/wiki/Hauptseite scrolledTo y=399
Test results Due to the detailed logging, it is easy to extract data about the test from the log. For example, a visit to specific pages marks the start and end of each task. It is also possible to determine if the user visited a certain area of a page, either by moving the mouse pointer over it or by scrolling to a certain position.
This way, it was no problem to determine the average time taken to complete the Wikipedia task (1:47 minutes; 1 out of the 12 participants failed to complete the task). An analysis of the different navigation paths shows that only 4 users took the optimal path. For the 6 users who used Wikipedia's search facility, the exact search strings are recorded in the logs. The proxy also continued to track one user who temporarily left http://de.wikipedia.orgwikipedia.org and used Google search to find the desired page. Additionally, the proxy proved very useful in determining how exactly users completed the task. For example, the logs show that 3 users pressed Ctrl+F and typed in a search query using their browser's built-in search facility. This appears in the logs as a keypress of "F", followed by just a single scroll event, whereas users who scrolled manually through the page created several scroll events.
A number of previous publications discuss solutions to problems which are relevant to our work. This includes the task of tracking user actions on web pages using client-side scripting, and the development of solutions which map tracking data to the GUI elements on a web page.
Using an HTTP proxy for tracking In , a proxy concept is used to log the pages visited by users on the web. In contrast to our solution, no client-side tracking takes place, so only information about visited URLs is available to the proxy. The focus of the work is on visualisation techniques for the recorded log data.
Mouse tracking using client-side scripting In , Mueller and Lockerd introduce their idea to use embedded scripting to track mouse movement (position and associated timestamps) and to send the logged data to a server for later analysis. Due to its being an extended abstract, the paper is low on details, but it appears that the approach is restricted to the logging of mouse coordinates (i.e. no scrolling, keypresses etc) and that individual objects on the web pages (such as buttons) are not identified by the client-side code. HTML pages appear to have been prepared manually for the user study by inserting scripting commands.
Our own logging approach avoids the problems with browsers' security models by modifying pages on their way through the HTTP proxy.
Client-side tracking software The work by Goecks and Shavlik  allows tracking of user actions for Microsoft's Internet Explorer (MSIE) by means of a program which needs to be installed on the user's computer. The level of detail of the logs is quite coarse due to limitations of the implementation: For each page that is visited, only the number of clicks and the amount of scrolling and mouse activity are recorded. However, this is sufficient for the paper's goal of measuring the user's level of interest in the page.
Client-side tracking software in addition to eye tracking In , two different tools for user tracking on web pages are introduced. The WebLogger application tracks users' actions on web pages, such as navigating between pages and scrolling. Additionally, it records data from an eye tracking device. It is not clear whether tracking of mouse movement and keypresses is supported, but this seems likely. The tracking is achieved with a program which is installed on the machine. As the browser to use, only Internet Explorer is supported. WebEyeMapper is later used to analyse the log data, which among other things involves mapping window coordinates to links, buttons etc. on web pages. It also allows an exact playback of the test user's browsing session.
The work from  is similar to our own approach. The two tools allow for detailed logging including eye tracking data. However, they are more restricted with regard to the platform on which tests can be conducted (Windows with MSIE), and with regard to the scenarios of their use - for example, it is not possible to invite arbitrary users from the Internet to take part in a user study. Furthermore, an additional post-processing step is necessary to obtain useful data, such as which button was clicked rather than that there was a click at certain coordinates. This makes it difficult to use the tools in situations where real-time access to the data is necessary.
Our approach to user activity tracking does not include eye tracking. This could be regarded as a drawback because eye tracking can provide detailed information about how users scan and read web pages. On the other hand, eye tracking technology is expensive and requires the test user to leave their usual web browsing environment in order to use a computer with eye tracking capabilities. In the light of these arguments, the possibilities of deducing the user's gaze direction in the absence of eye tracking have been the topic of previous work.
Correlation between eye and mouse movements How closely related are the position of the mouse pointer and the position of the user's gaze on the screen? According to , predictions about the probable direction of the user's gaze can be made in a number of circumstances. For instance, in the experiment that was conducted, if a region on a web page was visited by the mouse pointer, there was a 84% chance that it was also visited by the user's gaze. Similarly, in the case of sudden mouse movements within or between regions of the page (saccade), the user's gaze was inside the involved region(s) in over 70% of all cases. The figure of 70% only applies if the destination region of the mouse movement contains content, i.e. is not a blank or ornamental part of the page.
Related to this, the authors of  observed that users would sometimes move the mouse pointer over an empty part of the web page. The cited reason for this is that a user did not want to click on a link accidentally while reading the page. During the user study, it also became clear that the mouse pointer is often used as a reading aid when scanning through menus on the web page.
The paper only discusses the implementation of this "poor man's eye tracking" approach in an e-learning system. Unfortunately, it does not include a study to examine how much the changes to the displayed page impair the user's browsing experience, and how much this could influence his behaviour. For example, the missing text will prevent the user from making a quick scan of a page upon first seeing it. Furthermore, a delay of 0.7 seconds is suggested - only after this time, the text below the mouse pointer becomes legible. This might make the user impatient, or might even annoy him.
In this paper, detailed tracking of user interaction on web pages was discussed from several perspectives. Going beyond the usual application of tracking technologies for user tests, we have looked at a large number of possible fields of use, ranging from inviting Internet users for usability tests, self-adapting websites, enhanced customer support through real-time tracking and user profile creation to advanced visualisation techniques for usage data.
This work raises privacy concerns: Using our technology, the actions of users on web pages can be observed with an accuracy which is unprecedented for tracking solutions without client-side installation of software. If our approach is abused, this can happen without the users' knowledge. It is the responsibility of anyone performing user tracking to inform the subjects of the tracking about its use. At a minimum, we recommend that a user be asked for his consent before any logging takes place. Furthermore, if arbitrary visitors of a website agree to their actions being logged, their consent should only be considered valid for a few hours, or until the end of their browser session.
Acknowledgement This work was funded by the BMBF (intermedia project) and by the DFG (Embedded Interaction Research Group).