Dr. Alexander Goerke, ABBYY Vice President of Semantic Technology Products and a pioneer in the field of document classification and document management systems, discusses the current state of ABBYY’s revolutionary technology for semantic search and natural language processing.
Learn more about ABBYY, its products and solutions on: https://www.abbyy.com
Thanks to nealry 20 years of linguistic research ABBYY has developed the innovative approach to text analysis which allows to get to the meaning of the text and bring machine analysis a step closer to human text processing. Unlike technologies based on statistic algorithms, which do not actually “know” anything about the language and can therefore only learn from the concept frequencies and co-occurrences in the text, ABBYY Compreno Technology possesses knowledge about meanings and their relationships in the natural language.
Many automatic classification systems out there today use a pure bag of words approach for finding relevant features that determine the meaning of a document. Few are using correlation and collocation – to account for the fact that words have a different meaning based on their context. None of them is using full semantic analysis of the meaning of words. But this is very much needed to be able to accurately classify a document.
The main reason is that (especially English) language is so ambiguous. English nouns have on average 5-8 close synonyms. There are words – example “strike” – that have more than 30 common meanings (strike a baseball, strike price buying stock, going on strike as an employee etc.). Now if you use a simple bag of words as features the software will never be able to make a clear distinction between an important fact (strike = work stoppage) and irrelevant information (baseball). Hence the classification result is also ambiguous and not very precise.