Like it or not, generative artificial intelligence (AI) is everywhere. With just a few keystrokes, anyone can generate text, images or videos of any niche scenario they can imagine. While the sheer computing power of generative AI enables previously impossible research, such developments have also fostered fear in academia.
Many U.S. states have enacted legislation concerning AI; however, no laws contain specific guidelines about generative AI and intellectual property. Instead, such cases are reviewed by the U.S. Copyright Office, which has only stated that humans cannot claim copyright over AI-generated content.
This precedent was set by its rulings on cases concerning “Zarya of the Dawn,” a graphic novel, and Creativity Machine, an AI model. Both works incorporated AI-generated images and were, therefore, denied full copyright requests. However, the Copyright Office has not safeguarded against the non-consensual use of copyrighted materials in the training of AI models. These training datasets are utilized to create models capable of producing content similar to the data on which they were trained, sparking strong backlash from writers, artists and publishers.
The way that generative AI is trained creates fundamental issues for its use in academia. The data that is fed to AI models are of such scale and variety that AI-generated information cannot be traced back to its sources. This poses a problem to researchers, as they are not credited for their contributions when used by AI. Furthermore, researchers may become less incentivized to continue their intellectual pursuits as society receives increasingly more of its information from AI.
Beyond obscuring recognition, intellectual property and source precision are also tied to accountability. Citations are a crucial part of research and scholarly writing. Therefore, since generative AI often prevents readers from pinpointing sources of information, works that use generative AI are incapable of abiding by current academic standards.
Under the APA 7th edition citation format, generative AI can be cited by including the name of the AI model and the prompt given to it. In one of my classes where the use of AI is permitted for gathering information, students are instructed to cite in this way. This is perhaps the most straightforward method for AI citation, but it also defeats the purpose of citing altogether. Not only are generative AI models unable to list the origins of their information, identical models can generate different responses to the same prompt. Readers cannot use the citation to recreate the original AI output or identify where the AI-generated information came from. The problem becomes even more concerning when issues like AI bias — when generative AI creates prejudiced outputs due to the nature of the training dataset — and misinformation are taken into account. Generative AI’s outputs are reflective of the inputs they were given during training, so inaccurate or biased data misguide models into outputting AI hallucinations.
This fundamental incompatibility between academia’s need for truth and review and generative AI’s lack of accurate sourcing makes its academic integration more difficult than any technology that came before it. Unlike previous technological advances, including the internet, generative AI’s lack of retraceable sourcing steps presents unique obstacles to maintaining academic integrity and transparency.
However, the undeniable convenience of generative AI is difficult to refute. The conversational nature of large language models enables users to quickly gather and organize new information. This increased efficiency is likely to improve productivity in research as well. Generative AI is not something that can be locked back into Pandora’s box. According to a study done by James Zou, a Stanford University professor, up to 17.5% of sentences in computer science research papers were modified by generative AI. This number has also sharply increased across other disciplines since the commercial launch of large language models and chatbots such as ChatGPT and Llama.
We currently find ourselves in a difficult transitional period where students, educators and researchers have not yet developed a consensus about how generative AI should be used and cited. How we identify, verify and cite AI-generated information — as well as how intellectual property regulation is applied to AI training — are urgent questions that need to be answered. Only then will students and scholars be able to expand the boundaries of their learning and research with generative AI while preserving the fundamental values of truth, accountability and creative dialogue in academia.