Semantic Web and Social Networks Notes for (Unit -1,2,3,4) Attachments (3). SEMANTIC WEB UNIT - 1 & taufeedenzanid.tk KB View Download. VRIJE UNIVERSITEIT. Social Networks and the Semantic Web. ACADEMISCH PROEFSCHRIFT ter verkrijging van de graad Doctor aan. PDF | Purpose – Aims to investigate the way that the semantic web is being used to represent and process social network information.
|Language:||English, French, Arabic|
|ePub File Size:||23.43 MB|
|PDF File Size:||19.44 MB|
|Distribution:||Free* [*Registration Required]|
Introduction to the Semantic Web and Social Networks. Front Matter Web data and semantics in social network applications. Front Matter. Pages PDF. The Semantic Web promised to enable a new generation of intelligent applications Social network analysis has been used, for example, to. Part I Introduction to the Semantic Web and Social Networks. 1. The Semantic Web. RDF is also embedded inside files in a PDF format. 8. We introduce the .
Firantas, M. Jusevicius: Towards a This may result in a growing dissatisfaction of the user community and a reduced usability of the websites. This article shows through the implementation of a prototype that Semantic Web technologies can be used to build a next generation of social networks that overcome limitations of current social network applications and enable new features currently not exploited by them. The article emphasizes the need for semantics and mechanisms to better structure this information and make it interoperable.
Research combining social networks and the Semantic Web is an interdisciplinary field, attracting researchers from both social and computer sciences. More research combining social networks and the Semantic Web is required to address the above- mentioned limitations. An important line of research combining social networks and the Semantic Web focuses on the extraction of semantic data from existing social applications, its representation and its analysis.
Existing work in this area explores the possibilities of extracting ontologies from user contributed folksonomies through collaborative tagging systems and of integrating ontologies with folksonomies [Specia, '07; Xu, '06], while other approaches propose the development and evolution of lightweight ontologies in a collaborative way [Angeletou, '07; Mika, '07].
Researchers seem to agree that folksonomies and lightweight ontologies have more properties in common than differences and will be further integrated, and thus, in the future, a community-based bottom-up approach might prevail over top-down controlled engineering efforts.
A related initiative is the Semantically-Interlinked Online Communities SIOC project, 2 which provides an ontology for describing items and relationships from Internet discussion methods such as blogs, forums, and mailing lists to facilitate interconnection of these methods by publishing metadata [J. Breslin, '07; J. Breslin, '05]. Theoretical work combines the Semantic Web SW and social networks, especially for the analysis of social networks and the extraction of knowledge from existing data [Ding, '04].
However, neither the creation of new end-user semantic social applications nor their design and implementation is well explored. Existing social network applications do not employ SW technologies, although most of the standards infrastructure is already in place. This article discusses some of the limitations of current social network applications and shows how Semantic Web technologies can be used to build a new generation of social networks which overcome these limitations and enable new features currently not exploited.
The article is structured into six sections. The section below discusses a number of common features and technological limitations of current social networking applications. Section 3 identifies and describes a number of issues that need to be handled in the design and implementation of the next generation of semantic social networks.
Section 4 presents a concrete scenario usage for a semantic social application. A presentation of a semantic social network prototype featuring semantic mashups is proposed in section 5.
The last section summarizes the role of semantics for the development of a new generation of social networks in order to better harness and integrate the collective knowledge made available by the various social applications.
These sites have become central points on the Web for sharing personal information and online socialization. They allow users to create a personal profile and link to profiles of their friends.
Social Networks and the Semantic Web
The resulting network can be browsed to find common friends, friends that have been lost or forge potential new friendships based on shared interests [Mika, 07]. The core feature in all social networks is the user community, which is tightly integrated with the application domain. We have analysed applications which we personally use and which we think reflect the current state of the art in social networking: Last. As an outcome of a review of social networks, we have identified a number of generic features that are common to the majority of social networks as well as a number of technological limitations.
Lists can be compiled on a personal or group basis, or for the whole website. They are usually set up by people with similar interests or a circle of friends. As a recent Web 2. Most sites publish RSS or custom XML data feeds, while others provide programmatic methods to control the site to the same extent as using the graphical interface.
Our observations fit well with statements by initiatives such as Open Social Web, 3 Social Network Portability, 4 DataPortability, 5 OpenID, 6 OpenSocial, 7 which have emerged as a result of a growing dissatisfaction in user communities. Semantic social networks will still focus on the community dimension while drawing on Semantic Web technologies to aggregate content.
Some examples of semantic social networks are further discussed in section 2.
It acquires structured data spanning different domains such as music, people and locations from various sources such as Wikipedia and MusicBrainz. In addition, users in the community can add, edit, and even upload data. Topics in Freebase are organized by types which are grouped into domains.
An important feature is that users can not only fill already predefined types with instance data or edit it, but can also create their own types and define their properties, i. Furthermore, it provides an open but proprietary API for its data and encourages its use in applications and mashups. DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web via a semantic representation.
DBPedia is a prime example of Linked Data publishing and can be browsed using semantic browsers. It is interlinked with other semantic datasets such as Geonames, 9 MusicBrainz etc. These four aspects are further elaborated below.
Folksonomies are the primary sources of metadata on Web 2. However, they have issues with consistency, ambiguity and lack of synonymy. A next step beyond Web 2. Semantic Web or Web 3. On the Semantic Web, creation of metadata, a data with machine-processable meaning, is as important as the data itself. Metadata derived from tag clouds and folksonomies may help aggregating various fragmented pieces of data and information into collective knowledge. It has been observed how folksonomy tags evolve into property:value triple-tags, which serve the same purpose as subject property object triple statements in RDF and thus folksonomies move towards becoming lightweight ontologies.
Social networks will provide more sophisticated means to directly create RDF metadata, and collaborative tagging may evolve into lightweight ontology development and may be integrated into collaborative modelling of the social network domain. However, this kind of presentation is probably too advanced for mainstream Web users see Figure 1.
Figure 1: Tabulator view It can be assumed that a Semantic Web application interface visualizes its domain ontology in such a way that each class and instance has its own page, linked to others through class-instance and instance-instance relationships. This generic approach is used in many semantic websites, and is probably best illustrated by Freebase. Another approach, which we call specific, is used by conventional Web applications, as well as social networks.
Every type of information such as a car, a user, or an event has its own specific user interface.
For each new type a new interface has to be created; the same interface cannot be used for different types, and interfaces have to be fixed when the schema changes.
This approach is obviously not feasible on the Semantic Web, where ontologies are meant be extended, reused, and integrated from different sources. If social networks are to become extensible semantic applications, it is likely that they will have to adopt the generic approach. However, they share a common property: the domains are fixed and non-extensible. Users are encouraged to contribute and improve application data, but this is restricted to instance data for predefined types.
Semantic applications such as Freebase take a different approach and allow users to edit the domain model itself: not only to fill in instance data, but to extend and edit types, add new types, and define properties in the underlying ontology.
Following this approach, social network applications would empower users to express their identities by creating or reusing concepts and relationships relevant to them, and share them with others. The domain model could be left to the community to control and further develop it in a direction which is currently of most interest to it, keeping it relevant over time. People would connect L. In the future, this may be achieved by integrating lightweight ontology development into the means of user collaboration and content contribution.
This data model is based on formal semantics and therefore interpreted unambiguously by different agents. Furthermore, they need to reuse FOAF and SIOC ontologies, which are currently the state of the art representations of social networks on the Semantic Web, as well as other relevant ontologies. Most current SW applications are also static and fixed in the sense that ontologies are known and mapped manually at design time [Razmerita, 03].
Although semantic technologies are designed with extensibility and openness in mind, current programming languages and tools are not able to fully exploit it. It is expected that future semantic applications will be using multiple ontologies, discover them and integrate them on request.
However, many social networks do not offer interfaces and APIs to access application data. Others make the contents of the website such as lists of users, songs, or pictures available via a simple read-only REST interface in a software-processable data format, usually a custom schema of XML, Atom, or RSS.
A variety of publishing formats especially non-standard make reuse difficult. Our hope is then to inspire further creative experimentation toward a better understanding of both online social interaction and the nature of human knowledge. Such understanding will be indispensable in a world where the border between these once far-flung disciplines is expected to shrink rapidly through more and more socially immersive online environments such as the virtual worlds of Second Life.
Only when equipped with the proper understanding will we succeed in designing systems that show true intelligence in both reasoning and social capabilities and are thus able to guide us through an ever more complex online universe. But why would the Web need any extension or fixing? We will argue that the reason we do not often raise this question is that we got used to the limitations of accessing the vast information on the Web.
We learned not to expect complete or correct answers and not to ask certain questions at all. In the following, we will demonstrate this effect on the example of some specific queries Section 1. What is common to these questions is that in all cases there is a knowledge gap between the user and the computer: we are asking questions that require a deeper understanding of the content of the Web on the part of our computers or assume the existence of some background knowledge. As our machines are lacking both our knowledge and our skills in interpreting content of all kinds text, images, video , the computer falls short of our expectations when it comes to answering our queries.
Knowledge technologies from the field of Artificial Intelligence provide the necessary means to fill the knowledge gap. Information that is missing or hard to access for our machines can be made accessible using ontologies.
As we will see in Section 4. On the one hand, ontologies are formal, which allows a computer to emulate human ways of reasoning with knowledge. On the other hand, ontologies carry a social commitment toward using a set of concepts and relationships in an agreed way. The Semantic Web adds another layer on the Web architecture that requires agreements to ensure interoperability and thus social adoption of this new technology is also critical for an impact on the global scale of the Web.
As the Semantic Web community is also the subject of this thesis we will describe the development of the Semantic Web from its recent beginnings in Section 1. We discuss the recent parallel and complementary development of Web technologies known as Web 2. We will enter into the details of ontology-based representation, the core of Semantic Web technology in Chapter 4. In Chapter 5 we will show how to use Semantic 4 1 The Semantic Web Web technology for the management of data sources in the social domain, which we later apply in our case study of the Semantic Web community in Chapter 8.
But could it be better?
The reason that we do not often raise this question any more has to do with our unusual ability to adapt to the limitations of our information systems. In the case of the Web this means adaptation to our primary interface to the vast information that constitutes the Web: the search engine. In the following we list four questions that search engines cannot answer at the moment with satisfaction or not at all.
The questions below are specific for the sake of example, but they represent very general categories of search tasks. Who is Frank van Harmelen? To answer such a question using the Web one would go to the search engine and enter the most logical keyword: harmelen.
The results returned by Google are shown in Figure 1. Note that the results are slightly different depending on whether one enters Google through the main site or a localized version.
If this question and answer would be parts of a conversation, the dialogue would sound like this: Q: Who is Frank van Harmelen? Further, you can download Harmelen at site. Free Delivery on Orders Over Not only the advertizement makes little sense, but from the top ten results only six are related to the Frank van Harmelen we are interested in.
Upon closer inspection the problem becomes clear: the word Harmelen means a number of things. Six of the hits from the top ten are related to the first person, one to the latter. Harmelen is also a small town in the Netherlands one hit and the place for a tragic train accident one hit. The problem is thus that the keyword harmelen but even the term Frank van Harmelen is polysemous.
The reason of the variety of the returned results is that designers of search engines know that users are not likely to look at more than the top ten results.
Search engines are thus programmed in such a way that the first page shows a diversity of the most relevant links related to the keyword. Search results for the keyword harmelen using Google.
This allows the user to quickly realize the ambiguity of the query and to make it more specific. Studying the results and improving the query, however, is up to the user. This is a task we take for granted; in fact, most of us who are using search engines on a daily basis would expect this confusion to happen and would immediately start with a more specific query such as Frank van Harmelen.
While this excludes pages related to the municipality of Harmelen, it is important to note that this would not solve our problem completely. If we browse further in the results we notice that the overwhelming majority of the results are related to prof. Frank van Harmelen of the Vrije Universiteit, but not all of them: there are other people named Frank van Harmelen. In fact, finding them would be a lot more difficult: all of the high ranking pages are related to prof. Harmelen, who has a much larger representation on the Web due to his work related to Semantic Web technology.
Again, what we experience is an ambiguity of our query that we could solve by adding additional terms such as Vrije Universiteit or research. This leads to another problem: our request becomes overspecified. First, it is not guaranteed that every mentioning of Frank van Harmelen is accompanied by any or all of these words. Worse yet, pages about Frank van Harmelen may not even mention him by name. None of our queries would return pages about him where he is only mentioned by his first name for example or as van Harmelen, F.
Not even if for the human reader it would be blatantly obvious that the Frank in question could only be Frank van Harmelen. Most advanced search engines, however, have specific facilities for image search where we can drop the term photo from the query.
Some of the results returned by Google Image Search are shown in Figure 1.
Figure 1. Search results for the keyword paris using Google Image Search. Again, what we immediately notice is that the search engine fails to discriminate two categories of images: those related to the city of Paris and those showing Paris Hilton, the heiress to the Hilton fortune whose popularity on the Web could hardly be disputed. While the search engine does a good job with retrieving documents, the results of image searches in general are disappointing.
For the keyword Paris most of us would expect photos of places in Paris or maps of the city. In reality only about half of the photos on the first page, a quarter of the photos on the second page and a fifth on the third page are directly related to our concept of Paris. The rest are about clouds, people, signs, diagrams etc. The problem is that associating photos with keywords is a much more difficult task than simply looking for keywords in the texts of documents.
Search engines attempt to understand the meaning of the image solely from its context, e. Inevitably, this leads to rather poor results. First, from the perspective of automation, music retrieval is just as problematic as image search.
Table of contents
As in the previous case, a search engine could avoid the problem of understanding the content of music and look at the filename and the text of the web page for clues about the performer or the genre. We suspect that such search engines do not exist for different reasons: most music on the internet is shared illegally through peer-to-peer systems that are completely out of reach for search engines.
Music is also a fast moving good; search engines typically index the Web once a month and therefore too slow for the fast moving world of music releases. Google News, the news search engine of Google addresses this problem by indexing well-known news sources at a higher frequency than the rest of the Web. But the reason we would not attempt to pose this query mostly has to do with formulating the music we like.
Most likely we would search for the names of our favorite bands or music styles as a proxy, e. This formulation is awkward on the one hand because it forces us to query by example.
It will not make it possible to find music that is similar to the music that we like but from different artists. In other words it will not lead us to discover new music. On the other hand, our musical taste might change in which case this query would need to change its form. A description of our musical taste is something that we might list on our homepage but it is not something that we would like to keep typing in again for accessing different music-related services on the internet.
Ideally, we would like the search engine to take this information from our homepage or to grab it —with our permission— from some other service that is aware of our musical taste such as our online music store, internet radio stations we listen to or the music player of our own mp3 device. Tell me about music players with a capacity of at least 4GB. This is a typical e-commerce query: we are looking for a product with certain characteristics.
One of the immediate concerns is that translating this query from natural language to the boolean language of search engines is almost impossible. Such a query would return only pages where these terms occur as they are. Problem is that general purpose search engines do not know anything about music players or their properties and how to compare such properties. They are 8 1 The Semantic Web good at searching for specific information e.
An even bigger problem is the one our machines face when trying to collect and aggregate product information from the Web. Again, a possibility would be to extract this information from the content of web pages. The information extraction methods used for this purpose have a very difficult task and it is easy to see why if we consider how a typical product description page looks like to the eyes of the computer.
Even if an algorithm can determine that the page describes a music player, information about the product is very difficult to spot. However, elements that appear close to each other on the page may not be close in the HTML source where text and styling instructions are mixed together. If we make the rules specific to a certain page our algorithm will not be able to locate the price on other pages or worse, extract the price of something else. Price as a number is still among the easiest information to locate.
In order to compare music players from different shops we need to determine that these two properties are actually the same and we can directly compare their values. In practice, information extraction is so unreliable that it is hardly used for product search.
It appears in settings such as searching for publications on the Web. Google Scholar and CiteSeer are two of the most well-known examples. They suffer from the typical weaknesses of information extraction, e.
The cost of such errors is very low, however: most of us just ignore the incorrect results. In the case of e-commerce search engines the cost of such mistakes is prohibitive. In the first case, the search is limited to the stores known by the system. On the other hand, the second method is limited by the human effort required for maintaining product categories as well as locating websites and implementing methods of information extraction.
As a result, these comparison sites feature only a selected number of vendors, product types and attributes. Namely, in all five cases we deal with a knowledge gap: what the computer understands and able to work with is much more limited than the knowledge of the user.
The handicap of the computer is mostly due to technological difficulties in getting our computers to 1. Even if the information is there, and is blatantly obvious to a human reader, the computer may not be able to see anything else of it other than a string of characters.
In that case it can still compare to the keywords provided by the user but without any understanding of what those keywords would mean. This problem affects all of the above queries to some extent. A human can quickly skim the returned snippets showing the context in which the keyword occurs and realize that the different references to the word Harmelen do not all refer to persons and even the persons named Harmelen cannot all be the same.
In the second query, it is also blatantly obvious for the human observer that not all pictures are of cities.
However, even telling cities and celebrities apart is a difficult task when it comes to image recognition. In most cases, however, the knowledge gap is due to the lack of some kind of background knowledge that only the human possesses. The background knowledge is often completely missing from the context of the Web page and thus our computers do not even stand a fair chance by working on the basis of the web page alone.
Answering the third query requires the kind of extensive background knowledge about musical styles, genres etc. This kind of knowledge is well beyond the information that is in the database of a typical music store. The third case is also interesting because there is also lacking background knowledge about the user.
There has to be a way of providing this knowledge to the search engine in a way that it understands it. The fourth query is noteworthy because it highlights the problem of aggregating information. The factual knowledge about particular products can be more or less extracted from the content of web pages, but if not, shop owners could be asked to provide it. It is unrealistic to expect, however, that all shops on the Web would agree to one unified product catalog a listing of product types, properties, models etc and provide information according to that schema.
But if each shop provides information using its own classification we need additional knowledge in order to merge data from different catalogs.
This means providing knowledge in forms that computers can readily process and reason with. This knowledge can either be information that is already described in the content of the Web pages but difficult to extract or additional background knowledge that can help to answer 10 1 The Semantic Web queries in some way. In the following we describe the improvement one could expect in case of our four queries based on examples of existing tools and applications that have been implemented for specific domains or organizational settings.
In the case of the first query the situation can be greatly improved by providing personal information in a semantic format. Although we will only cover the technological details in Chapter 4 and 5, an existing solution is to attach a semantic profile to personal web pages that describe the same information that appears in the text of the web page but in a machine processable format.
FOAF profiles listing attributes such as the name, address, interests of the user can be linked to the web page or even encoded in the text of the page. As we will see several profiles may also exist on the Web describing the same person.
Computational Collective Intelligence. Semantic Web, Social Networks and Multiagent Systems
As all profiles are readable and comparable by machines, all knowledge about a person can be combined automatically. For example, Frank van Harmelen has such a profile attached to his homepage on the Web. This allows a search engine to determine that the page in question is about a person with specific attributes.On the one hand, ontologies are formal, which allows a computer to emulate human ways of reasoning with knowledge.
This allows a search engine to determine that the page in question is about a person with specific attributes. Most sites publish RSS or custom XML data feeds, while others provide programmatic methods to control the site to the same extent as using the graphical interface.
She likes to go out several days a week and hang out in bars or clubs. Another approach, which we call specific, is used by conventional Web applications, as well as social networks. This provides unprecedented opportunities of building socially-aware information systems. Since the exchange of knowledge in standard languages is crucial for the interoperability of tools and services on the Semantic Web, these languages have been standardized by the W3C as a layered set of languages see Chapter 4.
Keywords artificial intelligence collective intelligence consensus theory evolutionary computation fuzzy systems multiagent systems neural systems ontologies semantic web social networks Editors and affiliations.