The studies of browsing behavior have gained increasing attention in web analysis for providing better service. Most of the conventional approaches focus on simple indices such as average dwell time and conversion rate. These indices make similar evaluations to websites even if their features are significantly different. Moreover, such statistical indices are not sensitive to the dynamics of users’ interests. In this paper, we propose a new framework for measuring a website’s attractiveness that takes into account both the distribution and dynamics of users’ interests. Within the framework, we define a new index for the website, called Attractiveness Factor, which evaluates the degree of users’ attention. It consists of three procedures: First, we capture the transition of users’ interests during browsing by solving a nonnegative matrix factorization and constrained network flow problems. To accommodate multiple types of interests of a user, we applied a soft clustering as opposed to a hard clustering to model attributes of users and websites. Second, for each website, the feature of each cluster is obtained by fitting the dwell time distribution with Weibull distribution. Finally, we calculate Attractiveness Factor of a website by applying the results of clustering and fitting. Attractiveness Factor depends on the distribution of the dwell time of users interested in the website, which reflects the change of interest of users. Numerical experiments with real web access data of Yahoo Japan News are conducted by solving extremely large-scale optimization problems. They show that Attractiveness Factor captures more exceptional information about browsing behavior more effectively than well-used indices. Attractive factors give low ratings to category pages; however, it can assign high ratings to websites that attract many people, such as hot topic news about the 2018 FIFA World Cup, Japan’s new imperial era’ REIWA,’ and North Korea—the United States Hanoi Summit. Moreover, we demonstrate that Attractiveness Factor can detect the tendency of users’ attention to each website at a given time interval of the day.
All Science Journal Classification (ASJC) codes
- コンピュータ サイエンスの応用