News | March 17, 2010

dtSearch Expands File Parsers And Converters; Content Extraction Only Licenses Available

dtSearch Corp., a leading supplier of enterprise and developer text retrieval software, announces a new release of its enterprise and developer product line, including the dtSearch Engine. The dtSearch Engine for Win & .NET (32-bit & 64-bit) and the dtSearch Engine for Linux (32-bit & 64-bit) let developers add instant text searching and built-in file format and other data support to a wide range of Internet, Intranet and other commercial applications.

File format expansions. Responding to increased interest from developers in file parser and converter licenses, even for applications that do not involve search, dtSearch Corp. has been extending its proprietary parsers and converters. The new release includes a new XML-based conversions format to provide better access for developers to document structures, such as document properties, nested attachments, and internal structural elements (like spreadsheets inside of documents).

The new release also broadens the list of supported file types. The file parsers and converters now cover Adobe Framemaker MIF, XFA form templates, and Visio XML, in addition to existing supported file types like HTML, PDF, XSL/XML, ZIP, OpenOffice and MS Office files (through current released versions). The parsers also support popular email formats, along with the full text of attachments. For a complete list of supported file types, see http://support.dtsearch.com/faq/dts0103.htm.

The dtSearch Engine embeds the file parsers for hit-highlighted WYSIWYG display of web-ready files and HTML conversion (with hit-highlighted display) of other file types. Content extraction only licenses are also available.

Fielded data enhancements. The new release also provides broader API access to "stored fields." The dtSearch Engine can generate stored fields from databases like SQL (including BLOB data). It can also generate stored fields from XML (including extensive supported for nested field hierarchies), from other supported document types, or from data added "on the fly" during indexing. The dtSearch Engine uses stored fields in its data classification and filtering objects.

Spider. The Spider (included with most dtSearch products and as a .NET API in the dtSearch Engine) adds local or remote web content to a searchable data collection. Supported content can be static or dynamic (ASP.NET, PHP, SharePoint, etc.). The Spider indexes public sites, intranets, HTTPS sites, password-accessible sites, and forms-based authentication sites. The Spider indexes to any level of vertical or horizontal depth, with integrated hit-highlighted display of local and remote data.

Terabyte indexer. dtSearch products can index over a terabyte of text in a single index, as well as create and simultaneously search an unlimited number of indexes. Concurrent indexed search time is typically less than a second, even across terabytes of data.

International language support. Built-in Unicode support covers hundreds of international languages (including right-to-left languages and Chinese/Japanese/Korean character processing options). dtSearch's UK distributor offers a Language Extension Pack, including customized noise word lists and stemming rules for over 25 European languages.

Other search features. Full-text and fielded data search options include: distributed or federated search options with integrated hit-highlighted display, fuzziness adjustable from 0 to 10 (to sift through typographical and spelling errors), synonym/concept/thesaurus (through a built-in thesaurus and/or user-defined synonym rings), boolean (and/or/not), phrase, phonic, wildcard, bilateral proximity, directed proximity, stemming, natural language/vector-space relevancy ranking, variable term weighting, positional scoring, field-based relevancy ranking, data classification and filtering objects, field value enumeration, numeric range searching, advanced date recognition, regular expression, unindexed search (in addition to indexed search), and special forensics search options (text filtering of forensically-recovered data, credit card search, email search, etc.).

The core of the dtSearch product line, the dtSearch Engine for Win & .NET supports C++, Java and .NET, including a .NET Spider API. The dtSearch Engine for Linux supports C++ and Java. Both platforms include native 32-bit and 64-bit builds.

dtSearch Web with Spider quickly publishes a large volume of instantly searchable data to an IIS Internet or Intranet site. dtSearch Web works as a "point and click" solution, with no programming required. The Spider provides integrated support for local and remote web site data.

dtSearch Publish enables users to easily publish instantly searchable document collections or web site content to portable media (CDs, DVDs, external hard drives, etc.). dtSearch Publish uses a simple wizard-based setup to create the instantly searchable media. For end-users, the media can run with zero footprint, requiring no installation on the end-user's hard drive.

dtSearch Desktop with Spider instantly searches files on a PC. dtSearch Network with Spider searches across a network. Both instantly search and display, with highlighted hits, a wide variety of file types, including email messages along with the full text of email attachments. Through the Spider, both applications can also add web content to a local or network search.

About dtSearch
The Smart Choice for Text Retrieval since 1991, dtSearch offers 19 years of experience in text search. The dtSearch product line includes enterprise and developer text search products, meeting some of the largest-capacity text retrieval needs in the world. dtSearch products have received hundreds of excellent press reviews and case studies. The company has distributors worldwide, including coverage on six continents. For more information, visit: www.dtsearch.com.

SOURCE: dtSearch Corp.