Taalee
 
Content Discovery and Mapping
An innovative Repository building solution

| PROJECT REFERAL | ENVIRONMENT, PROTOCOL, TOOLS


PROJECT REFERAL

Taalee Inc., Athens is a leading player operating in the Digital Entertainment market. AmSoft Systems designed and implemented an advanced Content Discovery and Mapping system for them. The aim of the project was to build a knowledge base and index of rich media content (audio and video streams) that was available on various web sites by extracting metadata by Crawling the web, extracting the relevant pages and then mapping them onto Taalee's WorldView (Schema) of content.

Because HTML is the most common mark-up language, even documents that conceptually follow a common schema are marked up for web rendering purposes in different ways due to diverse authoring and editorial and target audience goals. This makes it difficult to automate the process of converting HTML documents in terms of data retrieval, indexing and integration. AmSoft created a novel approach to the integration of topic specific HTML documents into a repository of XML documents. Our approach effectively provided crawling, extraction, mapping, wrapping and schema discovery to transform HTML documents into XML documents.

Solution Objective:

  • Building a repository of rich media content organized around genres in a taxonomy.
  • Crawling the web for known sources of rich media content and discovering new sources of similar content.
  • Extracting meta data and domain-independent properties from the documents and mapping them in a derived schema to build an index.


Back to top

ENVIRONMENT, PROTOCOL, TOOLS

  • Xerces XML Parser
  • Verity's G2 Developer
  • Oracle
  • Apache
  • Cocoon framework for XML and XSL

Back to top

 

 

 

 


AmSoft releases New Identity services!

inameswas officially launched on 20th Juneat Harvard.
To celebrate the launch, AmSoft released 3 key Identity servicesto the Public Domain, in partnership with XDI.org, Cordance and Neustar.
 
Design Codesign © Copyright 1990 - 2009 Amsoft Systems Pvt Ltd. All rights reserved.