Alle Beiträge von Helmut Hirner

W3C Completes Bridge Between HTML/Microformats and Semantic Web

GRDDL Gives Web Content Hooks to Powerful Reuse and Data Integration

https://www.w3.org/ –(BUSINESS WIRE)– Today, the World Wide Web Consortium completed an important link between Semantic Web and microformats communities. With "Gleaning Resource Descriptions from Dialects of Languages", or GRDDL (pronounced "griddle"), software can automatically extract information from structured Web pages to make it part of the Semantic Web. Those accustomed to expressing structured data with microformats in XHTML can thus increase the value of their existing data by porting it to the Semantic Web, at very low cost.
"Sometimes one line of code can make a world of difference," said Tim Berners-Lee, W3C Director. "Just as stylesheets make Web pages more readable to people, GRDDL makes Web pages, microformat tags, XML documents, and data more readable to Semantic Web applications, opening more data to new possibilities and creative reuse."
Getting Data into and out of the Web; how is it happening today?
One aspect of recent developments some people call "Web 2.0" involves applications based on combining — in "mashups" — various types of data that are spread all around on the Web. A number of active communities innovating on the Web share the goal of sharing data such as calendar information, contact information, and geopositioning information. These communities have developed diverse social practices and technologies that satisfy their particular needs. For instance, search engines have had great success using statistical methods while people who share photos have found it useful to tag their photos manually with short text labels. Much of this work can be captured via "microformats". Microformats refer to sets of simple, open data formats built upon existing and widely adopted standards, including HTML, CSS and XML.
This wave of activity has direct connections to the essence of the Semantic Web. The Semantic Web-based communities have pursued ways to improve the quality and availability of data on the Web, making it possible for more intensive data-integration and more diverse applications that can scale to the size of the Web and allow even more powerful mash-ups. The Web-based set of standards that supports this work is known as the Semantic Web stack. The foundations of the Semantic Web stack meet the requirements for formality of some applications such as managing bank statements, or combining volumes of medical data.
Each approach to "getting your data out there" has its place. But why limit yourself to just one approach if you can benefit, at low cost, from more than one? As microformats users consider more uses that require data modelling, or validation, how can they take advantage of their existing data in more formal applications?
A Bridge from Flexible Web Applications to the Semantic Web
GRDDL is the bridge for turning data expressed in an XML format (such as XHTML) into Semantic Web data. With GRDDL, authors transform the data they wish to share into a format that can be used and transformed again for more rigorous applications.
GRDDL Use Cases provides insight into why this is useful through a number of real-world scenarios, including scheduling a meeting, comparing information from various retailers before making a purchase, and extracting information from wikis to facilitate e-learning. Once data is part of the Semantic Web, it can be merged with other data (for example, from a relational database, similarly exposed to the Semantic Web) for queries, inferences, and conversion to other formats.
The Working Group has reported on implementation experience, and its members have come forward with statements of support and commitments to implement GRDDL.
GRDDL Test Cases is also published today, which describes and includes test cases for software agents to support GRDDL. The Working Group has produced a GRDDL service that allows users to input a GRDDL'd file and extract the important data.
About the World Wide Web Consortium [W3C]
The World Wide Web Consortium (W3C) is an international consortium where Member organizations, a full-time staff, and the public work together to develop Web standards. W3C primarily pursues its mission through the creation of Web standards and guidelines designed to ensure long-term growth for the Web. Over 400 organizations are Members of the Consortium. W3C is jointly run by the MIT Computer Science and Artificial Intelligence Laboratory (MIT CSAIL) in the USA, the European Research Consortium for Informatics and Mathematics (ERCIM) headquartered in France and Keio University in Japan, and has additional Offices worldwide. For more information see https://www.w3.org/

(394)

SiloMatic – Latent Semantic Indexing

The days of keyword stuffing, single phrase optimization and concentrating only on incoming links to gain traffic are slowly being phased out as a more holistic approach to judging website content comes online. This new concept has many webmasters hopping, and it should. Latent semantic indexing is quickly becoming the wave of now.
Latent semantic indexing, is a Google driven creation that’s meant to better gauge the content of a web page in relation to the entire site to discover the overall theme. It is a more sophisticated measure of what sites and their pages are all about. While it doesn’t mean webmasters need to completely retool all of their keyword optimization efforts, it does mean depth needs to be a greater consideration.
The history behind latent semantic indexing is rather interesting. Google’s current ranking system, which relies on incoming links (or votes) and keywords to scan pages for relevancy when surfers do searches has been known for penalizing perfectly good sites. The system was set up to scan for relevance and quality. In the process, it has a habit of knocking new sites and those which add too much content too quickly. Although some of these sites, naturally, are those that result from link farming and quick keyword stuffed content generators, not all are unplanned fabrications.
Google wanted a better way, and found one. Latent semantic indexing is meant to scan the overall theme of a site, so as not to penalize those sites that have fresh, relevant and good content even if they do happen to pop up over night.
Wer sagt das und was gehört in ein Silo? SiloMatic!
Latent Semantic Analysis ist aber nevertheless interessant

(280)

Scientists Use the „Dark Web“ to Snag Extremists and Terrorists Online

The Dark Web project team catalogues and studies places online where terrorists operate.
Terrorists and extremists have set up shop on the Internet, using it to recruit new members, spread propaganda and plan attacks across the world. The size and scope of these dark corners of the Web are vast and disturbing. But in a non-descript building in Tucson, a team of computational scientists are using the cutting-edge technology and novel new approaches to track their moves online, providing an invaluable tool in the global war on terror.
Funded by the National Science Foundation and other federal agencies, Hsinchun Chen and his Artificial Intelligence Lab at the University of Arizona have created the Dark Web project, which aims to systematically collect and analyze all terrorist-generated content on the Web.
This is no small undertaking. The speed, ubiquity, and potential anonymity of Internet media–email, web sites, and Internet forums–make them ideal communication channels for militant groups and terrorist organizations. As a result, terrorists groups and their followers have created a vast presence on the Internet. A recent report estimates that there are more than 5,000 Web sites created and maintained by known international terrorist groups, including Al-Qaeda, the Iraqi insurgencies, and many home-grown terrorist cells in Europe. Many of these sites are produced in multiple languages and can be hidden within innocuous-looking Web sites.
Because of its vital role in coordinating terror activities, analyzing Web content has become increasingly important to the intelligence agencies and research communities that monitor these groups, yet the sheer amount of material to be analyzed is so great that it can quickly overwhelm traditional methods of monitoring and surveillance.
This is where the Dark Web project comes in. Using advanced techniques such as Web spidering, link analysis, content analysis, authorship analysis, sentiment analysis and multimedia analysis, Chen and his team can find, catalogue and analyze extremist activities online. According to Chen, scenarios involving vast amounts of information and data points are ideal challenges for computational scientists, who use the power of advanced computers and applications to find patterns and connections where humans can not…

weiterlesen auf Scientists Use the „Dark Web“ to Snag Extremists and Terrorists Online

Jedenfalls ein interessantes Projekt in mehfacher Hinsicht.

(309)

Dept of Homeland Security: inexcusable IT waste on ADVISE project

Following its $30 billion virtual fence debacle, the Department of Homeland Security (DHS) has disclosed another failed IT-related project, this one costing $42 million. DHS has suspended, and will likely cancel, a massive data-mining initiative on grounds that it violated privacy standards. Significantly, the program has also suffered from dramatic, severe, and systematic project management failures.
The ADVISE (Analy­sis, Dissemina­tion, Visu­ali­zation, Insight and Semantic Enhance­ment) program, which is still in the prototype and testing stage, is part of a large-scale, anti-terrorism data analysis operation run by DHS. As reported by Mark Clayton in the Christian Science Monitor, ADVISE is intended to “display data patterns visually as ’semantic graphs’ – a sort of illuminated information constellation – in which an analyst’s eye could spot links between people, places, events, travel, calls, and organizations worldwide.” For additional background, see another Christian Science Monitor article written by Mark Clayton.
Auf zdnet weiterlesen.

(114)