Machine Translation Tools: Respondents' Usage and Impact on Data Quality in Self-Administered Web Surveys
Project duration: 01.01.2024 to 31.12.2026
Abstract
The availability and use of machine translation tools, such as Google Translate and DeepL, have greatly expanded in recent years. Many web browsers, including Chrome, Safari, and Firefox, now incorporate plugins that automatically translate entire websites into a user's preferred language. These advancements have significantly improved accessibility for individuals whose native language differs from the primary language used in the country. However, the widespread use of these tools also presents new challenges for administering self-completed web surveys. The TRAPD (Translation, Review, Adjudication, Pretesting, and Documentation) approach is considered the gold standard for translating surveys. It emphasizes the importance of maintaining standardized wording and following strict linguistic protocols. However, the growing use of online translation tools introduces complications to these established practices, particularly in self-administered web surveys. In self-administered web surveys, respondents' use of such tools is often unobservable, raising concerns about their impact on data quality. In order to understand how these technologies impact data quality in web surveys, it is essential to ascertain whether respondents utilize translation tools during the survey and to implement software that can detect browser plugin activation. In 2023, the Institute for Employment Research in Germany launched a new online panel survey of the German workforce (IAB-OPAL) using a push-to-web approach. This survey oversamples explicitly refugees and migrants from non-European countries who have recently arrived in Germany. Given the respondents' linguistic diversity, we designed a dedicated questionnaire module and an experiment to investigate the impact of machine translation tools on web surveys. Participants were asked about their general use of these tools, how frequently they used them while completing the survey, which specific tools they utilized, and what target languages they selected. Most web survey software typically allows browser-based translation plugins by default. We developed a technical solution that blocks these plugins from translating the questionnaire's content, including the question texts, explanatory notes, and response scales. We implemented an experimental design in which five survey items were selectively blocked from translation for an experimental group while the control group was allowed full access to translation functionality. By comparing these two groups and analyzing respondents' self-reported data on their use of translation tools, we can assess the impact of online translation tools on key data quality indicators. Our experimental framework represents an initial effort to systematically evaluate the impact of machine translation tools on data quality in web surveys. Our findings enhance the understanding of how these tools interact with established survey methodologies, particularly in monolingual web surveys conducted with linguistically diverse populations. Additionally, our research highlights the necessity for improved methodological innovations to account for the hidden use of translation tools, ensuring the reliability and validity of data collected in self-administered web surveys.