Current research in readability and text simplification

Accepted papers and publication information


This special issue on readability and text simplification has been scheduled to appear as the Fall issue 2014 of the ITL Journal. We received as much as 13 proposals for contribution and, since we were able to accept only 5 papers, the competition was high. The selected papers are the following:


  • Assessing Document and Sentence Readability in Less Resourced Languages and across Textual Genres (Felice Dell’Orletta, Simonetta Montemagni and Giulia Venturi) 
  • Readability for foreign language learning: The importance of cognates (Lisa Beinborn, Torsten Zesch and Iryna Gurevych)
  • Readability Assessment for Text Simplification: From Analyzing Documents to Identifying Sentential Simplifications (Sowmya Vajjala and Detmar Meurers)
  • Associative Lexical Cohesion as a factor in Text Complexity (Michael Flor and Beata Beigman Klebanov)
  • Making Numerical Information more Accessible: Implementation of a Numerical Expression Simplification System for Spanish (Susana Bautista and Horacio Saggion)

Furthermore, the special issue should also include up to 2 survey papers on readability and text simplification that will aim to be both a good entry point for readers recently interested in those domains as well as a excellent update of recent work.   


Deadline extended to July 15 2013


With the rapid growth of the number of online documents, Internet has become a formidable source of information. However, most of the reliable websites are intended for rather experienced readers and can present comprehension difficulties for certain types of readers, such as children, learners of a foreign language, or readers with language deficits. To offer those kind of readers a better access to information, two strategies based on natural language processing (NLP) have been investigated in recent years: readability and text simplification.

Readability includes a range of methods targeted at the automatic evaluation of text difficulty in relation to a given population of readers. Throughout the 20th century, various readability formulas have been developped (Dale and Chall, 1948; Flesch, 1948 ; Fry, 1968). They have been recently enhanced using various contributions from NLP, such as language models (Collins-Thompson, 2005, Schwarm and Ostendorf, 2005 ; Heilman et al., 2007, ...), various aspects of discourse (Pitler and Nenkova, 2008; Feng et al., 2010) or multiword expressions (Watrin and François, 2011). Moreover, criteria identified by researchers in psycholinguistics or language acquisition are increasingly taken into account.

 These formulas have also been used within web plateformes, such as Read-X (Miltsakaki, 2009) or REAP (Collins-Thompson and Callan, 2004), to help readers find on the web documents matching their level. However, it is sometimes the case that, for a given topic, there are no target documents suitable for a specific user. In this context, automatic text simplification has appeared as a possible way to make existing texts accessible to poorer readers that would normally have trouble processing them.

Automatic text simplification indeed aims at generating a simplified version of an input text. Two main types of automatic simplification methods can be distinguished : (i) lexical simplification, which aims at replacing complex words or expressions with simpler synonyms or paraphrases and (ii) syntactic simplification, which modifies the structure of sentences, by deleting, moving or replacing difficult syntactic structures. In recent years, methods for automatic simplification have been largely influenced by other domains of NLP. Lexical simplification, as it consists in finding a suitable synonym or paraphrase for a complex expression, is closely related to the task of lexical substitution, which, in turn, is related to Word Sense Disambiguation (Specia et al., 2012). Statistical Machine Translation methods have also influenced text simplification, as large comparable corpora such as Wikipedia and the Simple English Wikipedia have become available (Zhu et al., 2010 ; Wubben et al., 2012 ; ...).

This special issue will therefore address these two areas of increasing importance, with a focus on systems informed by research on language acquisition or used in the context of computer-assisted language learning (CALL). To be more specific, all following contributions are suitable for publication (the list is not exclusive):

  • readability formulas for L1 or for L2 including some NLP components
  • readability formulas taking into account some characteristics of a specific population of learners (such as readers with a given L1, etc.)
  • NLP approaches of text understandability.
  • lexical, syntactic, or semantic approaches of text simplification in the context of CALL.
  • contributions of readability to text simplification
  • tools or techniques providing a more detailed difficulty diagnosis for users (e.g. difficult words or syntactic structures).

All contributions should explicitly discuss how they relate to current research in language acquisition, educational linguistics or computer-assisted language learning.



Thomas François (Université catholique de Louvain, Belgium)

Delphine Bernhard  (Université de Strasbourg, France)

Online user: 1