Saturday, September 26, 2009

Intelligent Social Networks. Part 2

See the beginning of the article here.

D. Among the numerous discussions in various groups there may be a lot of information that is interesting to me, and I would prefer that the network helped me find it. But without an analysis of my profile, where I described my interests and preferences (i.e. what I'm reading and what I’m liking, what movies I watch and so on), again, it is difficult to achieve anything. The problems with my profile are that: a) it is incomplete, as in it is far from completely representing myself, b) to describe myself, I may not choose the standard semantics, or even a generally accepted terminology and ontology (see problems that led to the paradigm of the Semantic Web), and c) in my description, I am subjective or even try to pass my wishful being as my real self; for example, in my profile I can describe myself as an expert in some area, though I only read a few books on the subject and persuaded my friends, who do not have any experience with this topic, that I am in fact an expert and that they should give me recommendations "confirming" my "expertise". Does this mean that this system should "trust" my profile and recommend me to someone seeking advice or help in this area? One approach to this problem – determining the degree of credibility in my profile (and consequently of my posts), based on the trust analysis inside and outside (e.g. rating of scientific publications) the social network. Another - the auto-generation of my synthesized profile, computed on the basis of my contacts, posts and discussions – much like Google's PageRank, which pulls "semantics" from the use of websites. Such a profile, different from the one I'd write, would be more normalized, "semantically" clear to the network, and comparable with others, and would allow the site to find similarities and differences, classify it, and deduce common interests, yielding more accurate recommendations. Furthermore, such a profile would make online advertising more targeted and effective.

E. The network should be able to analyze (closer to human-style analysis) my correspondence to make my synthesized profile more accurate. To do this, it may use my discussions in various groups, feedback of other users, as well as forward to me, as a person competent in some areas, inquiries or requests for assistance from other users. My responses may then be used to deduce my competency based on feedback, and I may be further tested by having the system forward me requests intended for someone else. "Knowing" so much about me, the network should at this point be able to help me find the right contacts, opportunities and resources, for example, for completing a particular project. Or it may find experts who could answer my questions, give me appropriate links, or even solve my problem, keeping my budget in mind, as well as give me the opportunity to find projects that match my interests and allow me to participate, much like InnoCentive.

But where do we get the computing resources necessary? Today social networks are implemented as centralized systems, using the modern concept of virtualization - Cloud Computing. I think that we should move to a hybrid architecture, employing centralized services, and intelligent software agents which would use the computing power of individual users. This way the agents and resources necessary to shape and polish the synthesized profiles of their owners, as well as to personalize services and make them more intelligent, are proportional with the number of active users. By the way, the social network's developer can generate profit by offering agent templates of varying degrees of intelligence. And the most advanced agents can cooperate with each other to solve various problems, the complexity of which exceeds the capabilities of any individual agent, for example, the search described earlier of a person based on a rough description of him.

But if such an agent is already up and running on one social network, we can agree on a standard like OpenSocial and give it the ability to work cross-boundary, on all social networks (an interoperability problem). Or, at worst, we can create an agent proxy for each network as "personalities", corresponding to each social network environment.

Finally a question arises: is it possible to implement this model in the context of Semantic Social Networks? I am convinced that it is not, and this is why it is necessary to supplement capabilities of the Semantic Web (Web 3.0) with natural language processing, which along with the concept of intelligent software agents is a better suited paradigm for Intelligent Web (Web 4.0); but if we require from these agents greater autonomy, adaptability to the surrounding social environment, as well as cooperation with other agents, then we step into the paradigm of the Adaptive Web (Web 5.0).

P.S. What other ways are there to make money in social networks? I see another way: one can conduct intelligent marketing and analyze how the network formulates an opinion around certain brands, how these opinions can be affected, how they can be predicted, and how one can choose the best marketing strategy and determine to what brands network it is favorable and vice versa (read an article named "Identifying influential spreaders in complex networks").

Intelligent Social Networks. Part 1

This article is written in collaboration with Maxim Gorelkin.

Collecting users in one's social network is tricky business. Features that allow people to "manually" solve their problems such as, for example, finding contacts and information, are tedious and cumbersome. Many such networks measure their success through the number of profiles, even if they are inactive, fake, duplicate, or created and used once to simply "check out" the site, never to be used again. Such systems should be consistently analyzed in order to accurately measure the level of its complexity and growth rate, but the key measure should be how dynamic it is, since it is the activity on the network and the intensity of this activity that determines its current popularity. And one major way to make one's network popular is to constantly evolve its intelligence. It should offer its users the ability to solve their social and professional problems through a combination of their intelligence with that of the artificial kind within the realm of Collective Intelligence, as demonstrated by digg.com (lots of people liked this story so you might too), last.fm (people who like Madonna also like this artist), and others (see my first article, "Adaptive Web Sites").

Here I will describe some properties of Intelligent Social Networks for the use of which I would be willing to pay.

A. A search for a person by name does not always work: the name may have changed, I may have forgotten it, or remembered it incorrectly. However I can describe certain facts about this individual, such as when, where, as whom and with whom he worked; each of which is insufficient to identify the exact person I'm seeking. On the other hand, even if the combination of facts is not unique, it may narrow the number of similar profiles to allow a quick browse. Or perhaps there is a set of individuals that can help add details about this person and relegate down the line until the sought contact information is found and returned, or someone is able to pass my information directly to the person I'm seeking.

B. The networks often use names as identifiers, and as a result feature dozens of duplicate entities that denote the same physical instance, complicating the search process. In one network, for example, I claimed four (!) universities in my profile, all referring to only one by different names. Standard classification does not usually work for larger networks, but there is a simple decentralized solution to this problem - if a sufficient number of people who use different names in their profiles indicate that they denote the same entity, they should be joined by a common identifier and depicted as different values of its "names" attribute. If there is any uncertainty left, this assumption can be formulated as a hypothesis and tested on a sample of users with these names.

C. Most of the emails I receive every day from my groups, don't have any relation to the interests I described in my profile. This stream seems closer to that of "noise" than that of information, in which I rarely come across something interesting; thus more often than even skimming, I simply delete all messages. I would prefer that the social network took on the task of filtering and re-categorizing my email, possibly with an importance indicator. Of course, for this we would need to employ natural language processing, but not necessarily in real time. Moreover, if I found something interesting in these lists, I would like for the network to suggest other relevant discussions, similar in content, as well as other groups in which such discussions occur. By the way, the search for groups is another difficult problem that cannot be solved by name and keyword search alone. For example, one group may match my interests perfectly, however without having any activity in the last six months, while another group, with a name that means nothing to me, may be extremely active with people discussing subjects that I would find fascinating. Hence the problem of groups is a semantic problem. And of course, I would prefer to get not only the information relevant to my interests, but also that which only MAY interest me, but that I may not be aware of.

News: On September 21st, an international team "BellKor's Pragmatic Chaos" (Bob Bell, Martin Chabbert, Michael Jahrer, Yehuda Koren, Martin Piotte, Andreas Töscher and Chris Volinsky) received the $1M Grand Prize by winning of the Netflix Prize contest for advancing its recommendation system - algorithm Cinematch.

See the final part here.

Thursday, September 3, 2009

Website Engineering

This article is written in collaboration with Maxim Gorelkin.

There are two basic ways to build bridges: from the bottom up and from the top down. The former, the one more familiar to us and, unfortunately, still employed: to build a bridge and then see what happens; if it withstands all weights (at least some time), then all the better. The latter, a much more difficult approach not even from the same discipline, engineering: we define the requirements for the bridge and the metric for its evaluation - the load-carrying capacity - and using strength of materials we design, build, and test a model and... Only after we are satisfied with the result, we decorate the built bridge with, for example, statues of lions lying on their front paws. My preference to the second way yields my preference for the term “website engineering” over the traditional “website development”.

We shall start with the definition of the key metric of the website’s effectiveness. Typically, this is defined as the conversion rate - the percentage of visitors who take a desired action (e.g. who buy something from your online store or meet a specified dollar amount). Thus the essence of the engineering approach is when building websites, we must guarantee specified levels of their effectiveness. Adaptive websites, described in my previous article, define the model that should solve this problem.

The issue with today's web development is the lack of an engineering approach and similar models. Yes, you can construct several alternative landing pages for a/b split or multivariate testing and collect statistics for several months in order to find the best solution. However, as demonstrated by Tim Ash, your result may depend on the chosen method of testing and data analysis techniques!? Or there may be no statistically significant differences between alternatives and, consequently, you may be unable to choose the best page. Suppose you get lucky, and after months of testing, you optimize your website, only to discover that its web traffic has changed and you must start the process from scratch. The same applies to web analytics: yes, you have found that, for example, some number of users visited certain pages of your site and made certain clicks, but how do you interpret this? What motivations led them to do it? And what actions does such “knowledge” suggest you take to improve your site? What about if you find a complete chaotic behavior of users on your site, what do you do then?

Web testing (preferably adaptive, for example adaptive multivariate testing), web analytics and web usage mining (discovering patterns of user behavior on your site) should become part of your website or, put another way, your website must be self-testing, self-analyzing, and “intelligent” enough to extract the practical knowledge of user behavior from these tests and analyses in order to use it for its adaptivity. By the way, in order for the mentioned patterns to become knowledge about user behavior on your website, they must be formulated as statistical hypotheses and constantly verified for accuracy.

Next, let's assume you defined the metrics for the effectiveness of your website and measure them regularly to determine the effectiveness (or ineffectiveness) of your site. However, the problem is even more difficult: learning how to manage these metrics to achieve sustainable improvement. How can it be done? In one of the areas of quality control - statistical process control, they developed a technique for process stabilization prior to taking the process under control and improving the quality of production. It seems to me that there is a direct analogy here with web traffic and control over it for improving the website’s effectiveness.

Summing up, we say that website engineering is about computability of the website’s effectiveness on the basis of characteristics of its web traffic and of its web traffic on the basis of the website elements: its content, navigation, etc.

P.S. Another example is one from the field of algorithmic trading: this type of trading has become a money making machine - a lot of money - without direct human intervention. And this ability stems from the fact that these machines are becoming more intelligent and adaptive. Today, their development uses such disciplines as complexity theory, chaos theory, mathematical game theory, cybernetics, models of quantum dynamics, and so forth. For example, in order for e-commerce to set such ambitious goals, it must attain a similar level of sophistication, and the application of artificial intelligence algorithms is a modest start at best. But we’re treading on the territory of advanced engineering, based on modern science.