News | March 15, 2010

Basis Technology Unveils Next-Generation Text Analysis Platform

Basis Technology Corporation, the leading provider of multilingual text analytics for search-based applications, recently unveiled the latest generation of the company's widely-used linguistics platform, Rosette 7. With expanded language coverage, improved entity extraction accuracy, new name matching and translation modules, and native integration with Apache Lucene/Solr, Rosette 7 enables global enterprises, financial institutions, content providers, and intelligence agencies to quickly and accurately identify, analyze, search, and extract meaning from unstructured text in over twenty major languages.

Rosette supports a wide range of applications:

  • Global enterprises are deploying document management systems and XML databases capable of smart retrieval and navigation in multiple languages;
  • Legal teams are quickly locating relevant documents buried in multilingual repositories for e- discovery;
  • Financial institutions are increasing accuracy and reducing false positives for anti-money laundering and counter-terrorism financing regulatory compliance;
  • Intelligence officials are improving watch list implementation by screening documents in their language of origin rather than in translated form; and
  • Businesses of all sizes are exploiting unstructured data to discover trends and anticipate future problems.

"Text analytics is no longer an academic specialty. It has become a necessary component in most search and discovery software - from selling products, tracking terrorists, delivering news, or playing music- to improving communication among people worldwide," said Sue Feldman, research vice president at IDC. "IDC believes that we are in the midst of a profound change in the way text analytic and search software is created and deployed. Basis Technology's new Rosette7 platform ups the ante with its improvements in accuracy, enabling its customers to power a new breed of intelligent workspace applications."

Highlights of Rosette 7

  • Breakthrough Entity Extraction
— Rosette Entity Extractor rapidly locates named entities in large volumes of unstructured text by employing three complementary detection algorithms: rule-based, list-based, and statistical. Rosette 7's improved extractor delivers breakthrough gains in speed and accuracy by dramatically shortening the length of time needed to train its statistical algorithms on new languages or entity types. Search-based applications are exploiting entity extraction to automatically generate metadata to filter search results, enable faceted navigation, deliver alerts, and feed downstream processes.
  • Integration with Lucene-based Applications – Agile businesses deploying the popular Apache Lucene/Solr open source search toolkits can now benefit from the same advanced linguistic processing used by high-end web and enterprise search engines. Rosette easily integrates with Lucene to index and search text in English, French, Italian, German, and Spanish as well as such complex languages as Arabic, Chinese, Farsi, Japanese, Korean, and Russian.
  • Name Matching and Indexing — Rosette Name Indexer matches names of people, places, or organizations, regardless of the language in which they are written, against entries in multilingual databases, while processing many types of intentional and unintentional name variants: script (Arabic vs. Hanzi vs. Latin); phonetic; orthographic; missing or disordered name components; formal and informal titles; initials; nicknames and aliases.
  • High-Accuracy Name Translation — Rosette Name Translator analyzes the fundamental linguistic structure of foreign personal names in Arabic, Chinese, Dari, Farsi, Korean, Pushto, or Urdu to produce highly accurate translations into English in compliance with applicable institutional or government standards.
  • Expanded Language Coverage — Rosette 7 offers Pushto and Dari to support peacekeeping and reconstruction efforts in Afghanistan. Improved language disambiguation between Cyrillic languages (such as Russian and Bulgarian) and more accurate name indexing for Arabic, Chinese, Farsi, and Urdu have also been added.
  • "As an industry leading provider of document exploitation technologies, CACI relies on Basis Technology for advanced linguistic technology to provide accurate document triage," said Carl Muller, Senior Vice President and Division Group Manager, CACI. "Integrating the Rosette Linguistics Platform allows us to deliver the most comprehensive language coverage required by our forward-deployed customers in the defense and intelligence community."

    Carl Hoffman, CEO of Basis Technology, said, "We've developed a reputation for expertise in computational linguistics, commitment delivering effective high-quality technology, and dedication to serving our customers' needs with unparalleled support. Rosette 7 represents the most advanced and innovative linguistics analysis platform available, and it allows our customers to analyze their unstructured data – crucial for today's global businesses."

    Availability and Pricing
    Rosette 7 is available now for evaluation. Contact Basis Technology for license and price information at 617-386-2090 or

    About Basis Technology
    Basis Technology provides software solutions for text analytics, information retrieval, and name resolution in over twenty languages. Our Rosette Linguistics Platform is a widely-used suite of interoperable components that power search, business intelligence, e-discovery, and other enterprise applications. Our company is at the forefront of applied natural language processing using a combination of statistical modeling, expert rules, and corpus-derived data.

    Leading software vendors, content providers, financial institutions, and government agencies rely on Basis Technology's solutions for Unicode compliance, language identification, multilingual search, entity extraction, name matching, and name translation. Our products and services are used by over 250 major firms, including Cisco, EMC, Endeca, Hewlett-Packard, Microsoft, Oracle, and Symantec. Our text analysis products are widely used in the U.S. defense and intelligence industry by such firms as CACI, Lockheed Martin, MITRE, Northrop Grumman, SAIC, and SRI. We are the top provider of multilingual technology to web search engines, such as,, Bing, Google, and Yahoo.

    Company headquarters are in Cambridge, Massachusetts, with branch offices in San Francisco, California; Herndon, Virginia; and Tokyo, Japan. For more information, visit

    SOURCE: Basis Technology Corporation