Programme

Antony John Kunnan

Duolingo

Welcome Speech

Jun Liu

City University of Macau

Keynote Speech: The Nexus of Teaching, Learning and Assessment of English Language in the Era of AI

Abstract: In his book “Asian Students' Classroom Communication Patterns in U.S. Universities” (2001), Liu identified five factors that contribute to Asian students' silence in American university classrooms: Linguistic, socio-cultural, cognitive, affective, and pedagogical. Among them, the linguistic factor is the most fundamental and crucial. Two decades later, assessing one's linguistic competence is further complicated by the permeation of World Englishes, or varieties of Englishes, due to the global language contact as there are more than 70% users of English worldwide speaking English as a lingua franca. In this talk, Liu will explain how teaching, learning, and assessment are intersected and interrelated, and how the future of language learning, teaching, and assessment will be driven by World Englishes and influenced by the increased recognition and importance of diverse English language variations worldwide. Liu will conclude his talk by pointing out that in the AI era, the teaching, learning, and assessment of World Englishes must leverage technologies to provide inclusive learning experiences that expose learners to diverse English language variations and promote cultural understanding and communication proficiency beyond standard English.

Jianda Liu

Guangdong University of Foreign Studies

Plenary 1: The Application of ChatGPT in Classroom Assessment

Abstract: ChatGPT is an advanced language model developed by OpenAI. Since its release, it has found applications in various domains. In the realm of language testing and assessment, ChatGPT can be utilized for automated language proficiency tests, language test practice and preparation, oral proficiency interviews, test scoring and feedback, adaptive language testing, among other applications. This talk focuses on the application of ChatGPT in language classroom assessment. Integrating ChatGPT into language classroom assessment offers innovative ways to evaluate students' language skills and enhance their learning experience. The experience of employing ChatGPT to generate classroom activities tailored to students' needs and using it to assist in formative assessments is introduced, along with specific examples.

Ming Ming Chiu

The Education University of Hong Kong

Featured speaker 1: Applying Artificial Intelligence & Statistics to Big Data: Automatic Analysis of Conversations

Abstract: As people solve problems together that they cannot do alone, automatic analysis of conversations can inform and enhance their design to aid learning and teaching. Such analyses must traverse the obstacle course of voice transcription, complex categorization, and statistical analysis. Automated transcription feeds automatic categorization via computational linguistics to create a database (Big Data). Automated statistical analysis integrates statistical discourse analysis (SDA) and artificial intelligence. SDA models (a) pivotal actions that radically change subsequent processes and (b) explanatory variables at multiple levels (sequences of turns/messages, time periods, individuals, groups, organizations, etc.) on multiple target actions. The artificial intelligence expert system translates my theory into a statistical model, tests it on the data, interprets the results, (if needed, rewrites itself to execute revised analyses), and prints a table of results. I showcase automated SDA on 321,867 words in 1,330 messages by 17 student-teachers in 13 weekly online discussions of lesson designs.

Xiaoming Xi

Hong Kong Examinations and Assessment Authority

Plenary 2: Have construct and validity lost their place? – A return to the fundamentals of tests of English for university admissions

Abstract: Tests of English for university admissions have attracted the most attention in language testing given the high-stakes and wide use of them. Traditionally, admissions test users placed a great emphasis on issues of authenticity, validity, fairness and access. However, the unexpected pandemic has caused major disruptions to the field, leading to undifferentiated acceptance of English language tests by universities around the world. The priority for test users has shifted to increasing their international student enrolment, and fundamental issues of test validity, fairness and impact have largely been pushed aside in selecting tests to accept.

In the meantime, with the unprecedented growth of AI technologies and machine learning in the last few years, remote proctoring technology, AI scoring, and generative AI have seen growing dominance in the discourse and literature of language testing. It is imperative for language testers to distinguish between the reality and hype of AI and ground our work in fundamental considerations of construct, validity, fairness and impact.

In this talk, I will discuss current trends in admissions language testing in defining and operationalizing test constructs, test delivery and administration methods, scoring methods and technologies, and score reporting. While many of the developments have been empowered by advances in technology, I will contend that some of the limitations and pitfalls of technology would compromise score interpretation and use. I will then review future developments that would bring changes to the design and operations of admissions language tests, including the impact of English as a Lingua Franca and technology-mediated real-world communication on test constructs, the use of AI in the entire assessment process, and the potential shift of a test’s function from providing a snapshot of a candidate’s proficiency to predicting their future capacity to develop skills in an immersion environment. Arguing that robust admissions tests are expected to help facilitate examinees’ eventual success in university settings and promote positive impact on teaching and learning, I will call for a return to the essentials of language test design and validation.

Cecilia Guanfang Zhao

University of Macau

Featured speaker 2: Assessing Writing: Past, Present, and Future

Abstract: The last few decades have witnessed several major developments in our understanding of the construct of writing, particularly that in an academic context. Different models and theories have been proposed to describe and capture the nature of writing and writing competence from different perspectives. While approaches to writing assessment have certainly also undergone some significant changes over time, writing assessment practices overall still lagged significantly behind our knowledge of what writing is and does, linguistically, socially, culturally and interculturally. This talk thus seeks to first provide an overview of the recent developments in our theoretical conceptions of writing, and then highlight the major themes and gaps in current writing assessment research and practice. It ends with a discussion of the challenges and new directions for future assessment research and the development of the next generation of writing assessment.

Shangchao Min

Zhejiang University

Featured speaker 3: Development of a CSE-based Intelligent English Learning and Teaching System

Abstract: This presentation reports on an attempt to develop a CSE (China’s Standards of English Language Ability)-based Intelligent English Learning and Teaching System, which is aimed to provide customized language education solutions for all students at a university in China. This project starts with the design and implementation of a CSE-based diagnostic assessment, administered to incoming undergraduates, graduates, and Ph.D. students. This assessment yields individualized feedback encompassing various aspects of language proficiency, including students’ ability in four domains (i.e., listening, reading, writing, speaking) and corresponding CSE levels, subskill mastery statuses, and suggestions for remedial actions. Following this, CSE-based subskill-oriented online courses will be offered to those who need to improve one or more of their language subskills, together with CSE-graded materials for daily practice. The learning outcomes will be periodically assessed through diagnostic assessments administered at three-month intervals. Based on performance in these assessments, the assigned courses and learning resources for each student will be automatically adjusted. In this presentation, I will first introduce the contextualized considerations in the development of the system. Then, I will present a longitudinal study on investigating the effect of using such a system on learner achievement. Finally, I will conclude by discussing broader implications of utilizing technology to enhance tailored language teaching and learning.

Geoff LaFlair

Duolingo

Featured speaker 4: Leveraging Large Language Models for Complex Listening and Reading Tasks in Language Assessment: The Duolingo English Test Experience

Abstract: The advent of large language models (LLMs) has significantly transformed the landscape of content and item creation for language tests, enabling the development of complex listening and reading tasks. The presentation will discuss the role of LLMs in the automated generation of content for these tasks, as well as the human-in-the-loop approach to ensure quality and fairness. Additionally, the talk will explore the psychometric evaluation of the tasks, including the large-scale pilot studies conducted to assess their feasibility and validity. This talk will focus on the Duolingo English Test (DET) and its Interactive Listening and Interactive Reading tasks, which capitalize on the capabilities of LLMs. These tasks are grounded within the DET's theoretical language assessment design framework and its assessment ecosystem (Burstein et al., 2022), which emphasize evidence-centered design for task development and psychometric evaluation. By examining the DET's Interactive Listening and Interactive Reading tasks, this talk will provide insights into the application of LLMs for language assessment development.

David Qian

Hong Kong Polytechnic University

Featured speaker 5: Responding to the Emergence of Generative AI on the University Campus

Abstract: The recent development of generative AI tools and their widening emergence on university campuses pose an enormous challenge to university communities. While there are obstacles to overcome, such a challenge may also bring with it a huge opportunity for educators to update their conceptual system and redesign their educational landscapes for teaching, learning and assessment. In this context, stakeholder groups in various universities are reportedly reacting in different ways. In this talk, I will (a) first review some institutional policies in response to the rapid emergence of Generative AI, (b) examine attitudes and actions of frontline teachers’ in dealing with AI-related issues in their teaching and assessment activities, and (c) identify issues that need to be addressed in reshaping the educational system in the context of integrating Generative AI in learning, teaching and assessment.

Jirada Wudthayagorn

Chulalongkorn University

Featured speaker 6: Exploring the landscape of English language assessments in the Thai higher educational context

Abstract: Thailand, as a rapidly globalizing nation, has witnessed transnational business, economic competition, labor mobility, geopolitical negotiation, international study programs, and many other activities that require English language as a tool for communication. This signifies that the English language has played a pivotal role in professional, academic, and social advancement. Given the importance of English, Thai universities have adopted various language assessments as gateways for higher education admission, benchmarks for academic progress, and indicators for exit decisions. This presentation will begin with an overview of the English language policy in Thai educational context, followed by a description of the English language proficiency assessments used for university admission, post-admission evaluation, and exit examination. The alignment between the assessments used for university admission and the subsequent post-admission and exit examination will be discussed. Based on insights from a preliminary survey with 20 freshmen, the presentation will highlight potential sources of stress and challenges experienced during university entrance tests while also analyzing the relationship between pre- and post- admission scores. Despite the exit examination being in its early stage, I will present how language assessments may contribute to the development of students’ language proficiency throughout their academic journey and future careers. In sum, by showcasing the landscape of language assessments across various educational stages, the attendees will walk away with advantages and disadvantages of the current English language policy and language assessment framework in Thailand. I will also propose recommendations aimed at improving the alignment of this framework with students' language development and academic success. It is hoped that this presentation will contribute to a better understanding of the role of English language assessments in shaping students' educational experiences and language proficiency within the Thai context.

Qin Xie

The Education University of Hong Kong

Featured speaker 7: Construct Representation vs. Predictive Validity: A Study on the Duolingo English Test

Abstract: This study examined whether two additional integrated reading-to-write tasks would broaden the construct representation of the writing component of the Duolingo English Test (DET). It also verified whether the two tasks could enhance DET’s prediction of test-taker writing performance in academia. The two tasks were (1) writing a summary based on two source texts and (2) writing a reading-to-write essay based on five texts. Both were given to a sample (N=204) of undergraduates from universities in Hong Kong. Further, each participant provided a representative essay submitted for assessment in a university course. Three professional raters double-marked all writing samples. Raw scores were first processed using Multi-Faceted Rasch Measurement to estimate inter- and intra-rater consistency and to generate adjusted (fair) measures. Based on these measures, descriptive, sequential multiple regression and Structural Equation Modeling analyses were conducted (in that order). The analyses first verified the writing tasks’ underlying component constructs and assessed their relative importance to the overall performance. Both tasks were found to contribute to DET’s construct representation and add moderate predictive power to the domain performance. These findings, along with their practical implications, are discussed in terms of the differences between construct representation and predictive power.

Patrick Wong and Xin Kang

The Chinese University of Hong Kong & Chongqing University

Plenary 3: What leads to the success of foreign language learning? Genetic and behavioral evidence from French, German, and Spanish learners in Hong Kong

Abstract: A high degree of individual variability has been observed in the learning success of a foreign language. In our research, we evaluate genetic and behavioral factors that are hypothesized to influence the success of foreign language learning by capitalizing on a large cohort of French, German, and Spanish learners in Hong Kong. All participants were native Cantonese speakers of Han Chinese descent who spoke English as the second language (L2) and learned one of the target languages as the third language (L3). With comprehensive data on their demographic information (e.g., age), parental socioeconomic status, music background, motivation, L1, L2, and L3 proficiency, and genetic variation, we tested two sets of hypotheses. The first hypothesis argues that the core language function is universal across languages and is independent of when and under what conditions learning occurs. The second hypothesis postulates that different languages and languages learned at different times and under different conditions have different genetic and behavioral underpinnings. For our genetic studies, a candidate gene approach was used. Our findings support the second hypothesis that additional language learning relies on the shared learning conditions between prior language learning experiences, and there was a decreasing influence of genetic contributions from L1 to L3.

Andrew Moody

University of Macau

Featured speaker 8: World Englishes and Applied Linguistics: Common Misconceptions within the Exploration of Language Norms

Abstract: This presentation will briefly introduce the “three circles” model of world Englishes (WE) that has defined research in the field for nearly 40 years. Although Kachru (1986) argued that new English varieties (a.k.a. new Englishes, world Englishes, etc.) should be understood within their acquisitional, sociocultural, motivational and functional contexts, the full breadth of contexts have been frequently overlooked within many disciplines in applied linguistics. This paper will briefly introduce Kachru’s world Englishes model and illustrate three common misconceptions about the model in language teaching contexts: (1) the language proficiency fallacy, (2) the developmental cline fallacy and (3) the variability fallacy. In response to these fallacies, the presentation will also explore the centrality of norms within the WE model and illustrate how norms function differently in media Englishes across the three circles according to concerns related to language authority and authenticity.

Liying Cheng

The City University of Macau

Plenary 4: Catalysing an Ecosystem for Language Assessment, Learning and Teaching

Abstract: We all now live in a very different world. The impact of the worldwide COVID-19 pandemic on education has been impossible to imagine, predict, or measure. As a result, traditional modes of assessment (testing included), learning, and teaching are being challenged, and the quality of our education systems questioned – on a global scale. We must, therefore, reconceptualise assessment, learning, and teaching individually, more importantly, we must reconceptualize the symbiotic relationships among the three. All these changes are happening at the same time as the exponential growth and development of artificial intelligence, natural language processing, and machine learning. This talk discusses the vast opportunities and challenges by redefining language constructs, assessment, and technology, to build an ecosystem. Within such a system, symbiosis could be achieved, through a community or group of organic entities, within which all educational stakeholders – humans and machines – interact with each other in a mutually supportive environment.

Punchalee Wasanasomsithi, Raveewan Viengsang, & Chanisara Tangkijmongkol

Chulalongkorn University

Featured speaker 9: Summative and Formative Language Assessments in Thailand: Current Trends and Future Directions

Abstract: At present, perspectives on the purposes of language assessment and the relationship between summative assessment primarily used to evaluate learning and formative assessment mainly utilized to monitor the learning progress continue to vary. Language instructors, who are tasked with promoting learners’ learning as well as certifying their language performances, have to deal with daunting dilemmas and challenges of how and how much they should make use of both summative and formative assessments in their instruction. In Thailand, summative assessment continues to play a major role in language instruction in school as well as in university admissions during which learners’ language proficiency needs to be validly and reliably determined. At the same time, attempts have been made by language instructors to incorporate more formative assessments in class to enhance learning. The objective of this study was to systematically review published research studies on formative and summative language assessments conducted with Thai EFL learners in the past decade so as to inform language instructors of existing practices, current trends, and future directions of summative and formative language assessment methods that can be adopted and adapted in their classrooms. The selection criteria included research studies undertaken with Thai EFL learners in secondary and university levels, with the summative assessment methods being focused on traditional paper-and-pencil and computer-based proficiency tests, while formative assessments encompassing alternative assessment methods including self-assessment, peer-assessment, portfolio assessment, dynamic assessment, and performance-based assessment. In this presentation, the findings will be presented, and the implications for classroom language teachers who wish to assist their learners to achieve their learning goals by means of both summative and formative assessments will be discussed.

Ricky Lam

Hong Kong Baptist University

Featured speaker 10: Using AI in e-Portfolio assessment: Implications for stakeholders

Abstract: Artificial intelligence (AI) has recently become the talk of the town and its use in education has become even more controversial. Generative AI tools, namely ChatGPT, refer to any pre-trained enormous corpuses, which simulate human thinking, reasoning, and speech by creating verbal responses, essays, poems, lyrics, songs among many others. In close connection with AI technology, e-Portfolios allude to digital dossiers, which assist learners to create, curate, reflect, and disseminate their multimedia artefacts to fulfil diverse purposes. Since students can deploy digital resources to create portfolio artefacts, they may utilize AI-generated materials as makeshift e-Portfolio tasks improperly if not unethically. Hence, this paper aims to argue the importance of developing AI literacy among key stakeholders, namely teachers, students, and researchers when they engage in e-Portfolio assessment. The paper first introduces AI applications in language education and describes their affordances as well as challenges. It then delineates e-Portfolio assessment in terms of its definition, process, and integration in L2/EFL contexts, followed by how AI-powered software can be pedagogically linked to e-Portfolio tools. Drawing upon Davis’s (2008) componential language assessment literacy model, the paper goes on to propose major elements of AI literacy among teachers (architects of AI-based e-Portfolio programmes), students (consumers of AI tools in their digital dossiers), and researchers (trendsetters of AI use in e-Portfolios), and how these elements (i.e., knowledge, skills, principles, self-awareness, and beliefs) enable these three groups of stakeholders to implement, adopt, and innovate AI in e-Portfolio assessment respectively. The paper closes with recommendations and implications for future AI development in a wider alternative assessment context. Keywords: artificial intelligence; e-Portfolio assessment; artificial intelligence literacy; stakeholders in language assessment; English education

Manqian (Mancy) Liao

Duolingo

Featured speaker 11: Quality Assurance in Digital-First High-Stakes Language Assessments

Abstract: Digital-first language assessments are a new generation of language assessments that can be taken anytime and anywhere in the world. The flexibility, complexity, and high-stakes nature of these assessments pose quality assurance challenges and require continuous data monitoring and the ability to promptly identify, interpret, and correct anomalous results. In this presentation, we illustrate the development of a quality assurance system, Analytics for Quality Assurance in Assessment (AQuAA), for anomaly detection for a high-stakes language assessment. The system is essential to ensure the validity of the test scores. Multiple control charts and models are employed to identify and flag any irregular changes in assessment statistics, which are subsequently reviewed by experts. The process of pinpointing the causes of a score anomaly is illustrated using a real-world example. Several statistical categories, such as scores, test taker profiles, and repeat test takers, are monitored to provide context and evidence for evaluating the score anomaly, as well as ensuring the quality of the assessment and the validity of the test scores.

Ping Li

The Hong Kong Polytechnic University

Plenary 5: Neurocognitive Studies of Language Learning in a Digital Era

Abstract: In an era of rapid developments in digital technology and AI, second language (L2) scholars need to consider how to leverage technological tools for both research and education. In this talk, I outline an approach that combines emerging technologies with current neurocognitive theories, with a particular reference to embodied language learning. I highlight the differences in learning between children and adults, and suggest ‘social learning of L2’ (SL2) as a new framework for thinking about L1-L2 differences and the corresponding neural correlates. The neural evidence from our work shows that SL2-based learning, as compared with traditional classroom-based learning, can lead to brain network patterns in L2 that are more similar to those underlying L1. Theoretically, this approach can help us to gain a deeper understanding of embodied learning and its neural mechanisms. Practically, this approach emphasizes context-based communicative abilities rather than classroom-based practice, and through the collection and analyses of real-time multimodal data, this approach sheds light into language assessment and personalized language education, thereby informing pedagogical designs and instructional innovations.

Matthew P. Wallace & Zhisheng (Edward) Wen

University of Macau & Hong Kong Shue Yan University

Featured Speaker 12: Testing Individual Differences in L2 Listening: The Roles of Working Memory and Language Aptitude

Abstract: In this talk, we will focus on the possible contributions of two key cognitive individual differences—working memory and language aptitude—to discuss and evaluate their relation to and potential impacts on the L2 listening process and comprehension ability. Towards this end, we will first provide critical reviews of current theoretical models and assessment paradigms of the two constructs in second language acquisition (e.g., Wen, Skehan & Sparks, 2023; Wen, Sparks, Biedron & Teng, 2023). Then, we summarize major results and findings of selective empirical studies investigating their respective and combined effects with other factors on L2 listening comprehension and performance (e.g., Andringa et al., 2012; cf. Wallace, 2020 & 2022). Based on empirical evidence and emerging patterns, the third part of the talk will turn to discuss the theoretical and methodological implications for model building and assessment practice.

Gary Zhenguang Cai

The Chinese University of Hong Kong

Featured speaker 13: Does ChatGPT resemble humans in language use?

Abstract: Large language models (LLMs), such as ChatGPT and Vicuna, have demonstrated remarkable language understanding and generation capabilities. However, their internal mechanisms still lack transparency regarding cognitive processes, raising questions about their ability to acquire human-like linguistic knowledge and language usage. Cognitive scientists have conducted numerous experiments to investigate and explain how humans represent and process language, making significant progress in this field. In this study, we conducted a series of linguistic and psycholinguistic experiments with ChatGPT and Vicuna, ensuring transparency and rigor through preregistration. In most of these experiments, ChatGPT and Vicuna replicated language usage patterns observed in humans. They associated unfamiliar words with different meanings based on their forms, retained recently encountered meanings of ambiguous words, reused sentence structures from recent context, utilized context to resolve syntactic ambiguities, reinterpreted implausible sentences influenced by noise, overlooked errors, drew reasonable inferences, associated causality with different discourse entities based on verb semantics, accessed different meanings and retrieved different words depending on the interlocutor's identity, and demonstrated human-like syntactic knowledge. However, unlike humans, ChatGPT and Vicuna did not exhibit a preference for using shorter words to convey less informative content, and they did not effectively utilize contextual information for pragmatic processing. We explore the transformer architecture to understand how these similarities and differences may arise. Overall, these experiments demonstrate that ChatGPT and Vicuna can closely mimic human language processing and linguistic knowledge to a significant extent, offering insights into language learning and utilization. We discuss the potential of LLMs in facilitating language learning and teaching, particularly in the context of language assessment.

Moderators

Zhisheng (Edward) Wen

Hong Kong Shue Yan University

Di Zou

The Education University of Hong Kong

Round Table Discussion on The Impact of ChatGPT

Panelists:
Gavin Bui (President of HAAL; Hong Kong Hang Seng University)
Melinda Whong (Associate Dean, Director of CLE; Hong Kong University of Science and Technology)
Sherman Lee (Senior Lecturer, CAES of University of Hong Kong)
Mable Chan (Senior Lecturer, LC of Hong Kong Baptist University)
Wei Wei (Program Leader, Faculty of Applied Science, Macao Polytechnic University)
Dejin Xu (President, Guangdong Province Association of Languages and Education Assessment; Sun Yat Sen University)
Wesley Curtis (Head, Language Center at The City University of Hong Kong) Discussant:
Andy Curtis (50th President, TESOL International Association, Professor, City University of Macau)