 |
|
Back to Text Mining Projects

Administrative POC: Michael Welge, welge@ncsa.uiuc.edu Technical POC: Duane Searsmith, dsears@ncsa.uiuc.edu Collaborators: Alan Craig, acraig@ncsa.uiuc.edu; Dan Kaulwell, kauwell@uiuc.edu Funding Source: Strategic Industrial Partner


Problem Definition

Seventy percent of an analyst's time is allocated to monitor competitive changes in the external environment. A significant portion of this time is spent retrieving specific competitive information from a number of sources from the World Wide Web. A number of conventional search engines are used to retrieve this information. Utilizing a large number of search engines means that duplicate searches are typically run and compared. In running specific queries, 90% of the analyst's time is spent collecting and gathering data. To date many of the free resources being utilized are now being made available only through a subscription.
REVEAL is a sophisticated system for interactive analysis of large information sources such as the web or other large distributed information stores. REVEAL is a collaborative fusion of three existing technologies under development at the University of Illinois, Urbana-Champaign. Together, these tools form a powerful, distributed and interactive analysis environment that will learn over time to provide ever more effective and efficient data retrieval and analysis.



|
 |
Features of REVEAL

 Distributed Information Collection, Extraction, Retrieval, and Storage
 Automated Information Clustering and Classification
 Visualization of Search and Data Organization
 Leverage the Power of Large User Communities
 Means to Share Information and Alert Others with Similar Interests



Three Core Technologies

NCSA VIAS (Visualization and Virtual Reality Information Archival/Retrieval System)
VIAS automatically builds databases of information in specific areas of interest by crawling the world wide web, monitoring relevant electronic mailing lists and USENET news groups, and retrieving other sources of electronic data. The resulting databases are then processed to automatically identify and extract metadata information such as company names, people names, bibliographic references, etc. as well as create document summaries and categorize document types.
http://vias.ncsa.uiuc.edu
NCSA T2K (Text to Knowledge)
T2K is a tool for knowledge discovery -- the process of uncovering relationships in data previously unknown and extracting this knowledge from the data. The T2K tool provides text mining and analysis capabilities that have been specially designed to operate in and capitalize upon the complexity rich natural language domains of very large stores of text and multimedia documents. Features include: T2K is a library of D2K (Data to Knowledge) modules that implements sophisticated algorithms for text analysis. Some of the types of functionality available include:
 Automated Real-time Document Clustering and Classification
 Active Learning of Document Classifications
 Building (Distributed) Models for Very Large Document Stores
http://alg.ncsa.uiuc.edu/do/tools/t2k
UIUC VisIT
 VisIT is an interface to search engines and databases that renders search results in a graphical format.
 Search results can be saved, edited and shared with others.
 VisIT records user actions and sends this information back to it's server (user ID information is encrypted to ensure privacy).
 Client/server architecture makes VisIT in effect, a "distributed" information system that can reduce server side computational resources.
 The information gathered by VisIT can help us build the next generation information retrieval environment for the Internet or private intranets.
http://www.visit.uiuc.edu/



|
|
 |