Dr. Ameeta Agrawal, professor at Portland State University. Courtesy of Ameeta Agrawal

The ethics of computational language processing

Ethical research is more important than ever

Ethical considerations in language processing—and in machine learning—are becoming more important, as technology increases in scope and skill. 


The Association of Computational Linguistics (ACL) is an organization which handles and reviews ongoing research and development that utilizes language processing models, language-related machine learning and any kind of research that combines the aspects of computer science and machine learning. 


Most academic fields require an ethics statement to be posted alongside an article before it can be published. However, many fields in academia are siloed through their respective associations, so these ethics statements are often regulated and handled independently of each other.  


While fields like computational linguistics and linguistics proper might share some overlap between the similar disciplines, each association or organization gives strict requirements on what should be included in academic journal submissions—and there may be different guidelines and requirements for ethics in each peer-reviewed journal.


The ACL does have a code of ethics, the most recent of which was adopted June 22, 2018.  Computational linguistics is a fairly new field, as the ACL only consolidated during the ‘70s and ‘80s. However, we have a growing dependence on systems built by computational linguists, and on machine learning as a whole. 


Computational linguistics is the understanding and implementation of written and spoken language from a computational perspective. The ultimate goal of this field is to combine knowledge of language with that of computers, so we can better understand how language works—with the hope of creating mutual understanding between speakers of different languages.


In the modern era, computational linguistics often becomes a synonym to Natural Language Processing, or NLP.  Natural Language Processing involves the use of machine learning (or AI) to understand text and speech in the same way human beings do. Natural Language Generation—another process that involves computational linguistics—is the use of similar models to have computers develop and produce language like human beings.


Siri and Alexa are examples of the consumer commodities that these fields have contributed to. Google Translate and other AI-based translators are prime examples of the field’s applications.


However, language processing models can stretch beyond high-tech products—or use as the secret savior in a foreign language class—to powerful applications in forensic linguistics.


For example, on Feb. 19, 2022, forensic linguists in France used computational linguistics to determine the authorship of Q, an anonymous poster responsible for the QAnon conspiracy theory movement. According to the detectives, tech journalist Paul Furber bears the same linguistic fingerprint as the Q messages—along with none other than internet conspiracy theory forum poster Ron Watkins.


The advancements in this field are increasing as fast as artificial intelligence itself is expanding, forcing the ACL to adapt rapidly to accommodate the ethical pitfalls that can occur if computational linguistics advances for the people’s benefit instead of their detriment. 


Dr. Ameeta Agrawal, member of the ACL and professor of NLP at Portland State University, discusses the role—and the importance—of how ethics should be handled within her field and the association.


“Some of the most important questions we should be asking are, ‘what should I be aware of if I have a new dataset?’” Agrawal said. “What is the extended use of this?”


As Agrawal points out, the ACL gives a detailed guide on its Ethics FAQ page about what constitutes ethical concerns, how papers are reviewed and flagged for these concerns and what the researcher should be thinking about, more broadly, with respect to their research. 


As of Dec. 2020, the ACL has allotted an extra page to its usual seven-page guidelines, strictly for an ethics statement. However, unlike its peers at NeurIPS (Neural Information Processing Systems) which handle a broader usage of machine-learning applications, contributors to papers published through the ACL are not required to provide an ethics statement, but only highly recommended to include them. The difference is key—because papers can still be submitted without formally requiring one, and others have been published without them. 


While there is a consensus about the benefit of having these ethical guidelines, Agrawal also acknowledged that these sections are not easy to write, and can be particularly challenging for individuals with a heavier computer science background.  


“You must think in a critical lens [when writing about these sections],” Agrawal said. “For example, [you should be] acknowledging limitations in computing power and its effects on the environment.”


Indeed, the ACL cites not only the importance of environmental concerns, but also restricts researchers in how they approach identity characteristics—gender, race, nationality—so that any research done or applications developed can minimize the negative impacts on a broader scale.


“I really like the fact we have this [ethical commitment],” Agrawal said. “It really does make your research more real.”