Establishing a standard toolbox for text mining
Project duration: 01.09.2022 to 31.12.2024
Abstract
The rapid increase in digitally available text data and advances in natural language processing (NLP) have created enormous potential for text mining. As part of this project, the “Temi-Box” was developed as a user-friendly modular system for text mining that can be used without in-depth programming knowledge. The Temi-Box enables the use of proven methods for text classification and text clustering and offers the possibility of comparing results using various evaluation metrics. Originally designed for the automated topic assignment and indexing of publications on the IAB Info Platform, the Temi-Box code developed in the project, including comprehensive documentation, was made available as an open source project. An accompanying research report explains the methodological background and provides illustrative application examples.