Expert Reaction

EXPERT REACTION: First conference for entirely AI-authored research papers

Publicly released:
Australia; NSW; VIC; SA
Photo by BoliviaInteligente on Unsplash
Photo by BoliviaInteligente on Unsplash

An upcoming scientific conference, Agents4Science 2025, claims to be the first Open Conference where AI serves as both primary authors and reviewers of the research papers. Organised by researchers at Stanford University, the conference is due to commence on Thursday, October 23 at 2:45am AEDT. The organisers say they intend for the conference to be a "relatively safe sandbox" to explore "if and how AI can independently generate novel scientific insights, hypotheses and methodologies, while maintaining quality through AI-driven peer review."Below, hear what Australian AI experts have to say.

Expert Reaction

These comments have been collated by the Science Media Centre to provide a variety of expert perspectives on this issue. Feel free to use these quotes in your stories. Views expressed are the personal opinions of the experts named. They do not represent the views of the SMC or any other organisation unless specifically stated.

Dr Armin Chitizadeh is a researcher in AI ethics at the University of Sydney

Comment provided after attending the conference:

  • "AI was mostly used in the final phases, in particular writing. The early conceptual stages relied heavily on human input.
  • AI-agent reviewers often overlooked core ideas and focused more on surface-level aspects.
  • Combining multiple AI systems to evaluate each other generally produced better outcomes—a common approach in AI research known as random forest.
  • AI systems often struggled with creativity and novelty.
  • Hallucination was a recurring problem, requiring human oversight to verify results.
  • I appreciated the conference and its transparency, as it helped break the taboo around using AI systems.
  • Often, AI drifted out of the specific research context and focused on more general and more common context.
  • Because of the popularity of ChatGPT and other large language models, everyone tended to focus only on these tools. In several studies, ChatGPT performed poorly, and alternative AI techniques could have delivered stronger results. It felt like using a hammer to drive a screw—just because it’s popular, not because it’s right for the task.
  • I’m really looking forward to more conferences like this, especially those dedicated to specific research domains."

Comment provided before the conference:

"Agent4Science 2025 is an open conference that welcomes AI systems as recognised authors and reviewers. Its goal is to promote transparency in how AI is used and to encourage open sharing of methods and outcomes. At first glance, the concept sounds controversial—some might compare it to the Enhanced Games, which permits performance-enhancing drugs for athletes, but applied to academia.

Despite potential controversy, the conference represents an important step toward a more open dialogue about AI in science. Automated Theorem Proving, a field that has used AI to prove mathematical statements since the 1950s, shows that AI has long played a legitimate role in research. Yet, the current hesitation to acknowledge AI use mirrors the secrecy of the old alchemy era—when withholding knowledge stifled progress.

My main concern and the only real drawback of this conference lies in letting AI review academic papers. Since AI often reinforces familiar patterns, it may fail to recognise genuine innovation or undervalue unconventional writing styles. This could disproportionately affect many minority groups. I believe AI should mainly assist with technical aspects, such as formatting and citation compliance.

The good news is that this conference will let us assess the performance of AI reviewing systems. I hope for strong participation so we can achieve a meaningful outcome and conclusion."

Last updated:  24 Oct 2025 4:25pm
Contact information
Contact details are only visible to registered journalists.
Declared conflicts of interest None declared.

Professor David Powers is researcher in Computer and Cognitive Science at Flinders University and oversees a wide range of projects in artificial intelligence, robotics and assistive technology

Comments provided after attending the conference:

"I attended the conference from its opening at 2am (Adelaide Time) to the end of the panel session at 4.30am, including the session presenting the three papers that got Best Paper awards.

The primary organiser, James Zhou, opened the event by discussing the way the process worked and the statistics collected (and I captured most of the slides). The approach was quite rigorous, and some of the submissions were eliminated on technical grounds due to not following the rules in some way.

The topics researched and written up were very broad, and expert reviewers were included for these disciplines. E.g. a computational physicist was involved in the review, reviewing and baselining, and then on the panel made some comments and generalisations based on this process.

By far the majority of the papers related to computer and data sciences (227) and, in particular, Artificial Intelligence and Machine Learning (195), with Human-Computer Interaction a distant second in this CS/DS space (24).

Authors also had to disclose the degree and nature of AI involvement, as well as the limitations they encountered when using AI. This is far more useful than an outright ban on AI involvement in research, and indeed, the panel was positive about the value of AI in their individual areas of expertise - as a member of the team in some sense. The classification from A/B for strongly/mostly human (>95/50%) to C/D for mostly/strongly AI (>50/95%) was applied to four phases of the research: hypothesis development, experimental design, data analysis and write up - with the almost exclusively AI proportion (D) increasing steadily from less than a third of submissions for hypothesis development to over half of submissions for writing (and almost half for data analysis).

The culling of papers reduced 315 to 254 on technical grounds (autodetection with human verification), then AI review reduced to 80 papers, then human review of those determined the final selection of 48 papers.

Interestingly, three different LLMs were used: GPT-5, Gemini 2.5 Pro, Claude Sonnet 4, and compared on their review scores, with the distributions varying quite markedly (recall that the review process is based on an average of scores; and then a second phase of interpreting the scores and making decisions). Gemini in particular had a tendency to “grade inflation”, awarding the vast majority of papers 6/6 (strong accept).

One aspect of the AI-agent review process was automated reference verification, which is itself an important step forward. Interestingly, the fewer references cited, the lower the proportion that were verified: a kind of double jeopardy - those bad at referencing were bad both in terms of quantity and quality of references.

In terms of the three “best papers”, the first and the last were very specifically about the application of AI (to freelancer-client bidding/markets resp. reviewing and distinguishing real and fake claimed experimental results). The second related to the impact of reduced towing fees on vehicle redemption rates in San Francisco (as a function of socioeconomic status).

I found the first paper particularly interesting. Presented by Silvia Terragni, it had high involvement of two different AIs at each of the different stages of the project (which was completed in record time). The team also provided useful qualitative comments on the pros and cons, strengths and weaknesses (lessons learned). On the positive side, the AIs accelerated the process and contributed “economics” expertise the CS team lacked. On the other hand, it was hard to keep the AIs on track (they tended to “lose focus”), and it was difficult to modify/adapt the code they generated. This was truly an excellent paper and worthy of its “best paper” award - even before taking into account the high level of AI-involvement throughout the whole process."

Comments provided before the conference:

"Agents4Science 2025 is an interesting experiment. The whole conference restricted to AI written papers, and reviewed by AIs.

Many authors are now routinely using AI to write or rewrite their papers, including finding missed references. Conversely, conferences and publishers are now exploring how AI can be used to referee papers - and establish which work is genuine and which is AI-hallucinated. I myself have found AIs have hallucinated several papers my colleagues and I might have (and possibly should have) written. In one case, this was in a grant application, and it turned out the applicant had asked the AI for further relevant papers from our group.

AI researchers are still trying to get a grip on this, and the Association for the Advancement of Artificial Intelligence (AAAI) this year introduced AI reviewing as a supplement to human reviewing (with authors seeing both anonymous human and AI reviews, as well as an AI-generated summary). AAAI-26 saw another massive increase in submissions, and this review system was tested in practice. But recognising AI-authored papers, distinguishing AI-hallucinated 'research' from real work, and assuring the ongoing quality of publication venues remain daunting challenges.

Agents4Science 2025 will provide an opportunity to see papers that are openly AI-written and openly AI-reviewed, and analyse this data to inform the community’s efforts to ensure research integrity and optimised processing in our new AI-driven age. This doesn’t mean just identifying AI-generated papers, but exploring the scope for active human-AI teaming in solving important research problems, and deploying AI help systems, advisors and chatbots. I’ll look forward to seeing the data.

The acceptance rate of ~16% (24 out of 300+) is comparable to many journals and lower than most conferences. This looks like being an interesting and useful dataset for analysis to help us understand the use of AI in the research world."

Last updated:  24 Oct 2025 4:24pm
Contact information
Contact details are only visible to registered journalists.
Declared conflicts of interest None declared.

Professor Hussein Abbass is a researcher from the School of Engineering and Information Technology at UNSW-Canberra

"My 35 years of experience as an AI researcher taught me that AI does not qualify for academic authorship.

Academic papers are a unique form of publications due to expectations for innovation and discovery.

Authorship is a sacred section in academic publications.

We must pause and ask: what has changed to demand authorship for an AI?

Academic authorship has four corners: contribution, integrity, accountability, and consent; AI can’t get held accountable and does not have the will or agency for consent; current AI systems can’t guarantee integrity without human oversight; simply put, authorship of academic papers is a human responsibility and is inappropriate for an AI.

AI has been making scientific discoveries since its inception.

Thanks to large language models, significant advances have been made that allows the AI to partially or fully automate the scientific method in defined contexts, opening the possibility for AI to automatically generate academic papers.

Authorship is a different pool game! As an advocate for AI and as an AI psychologist who designs and diagnoses AI cognition and behaviour, there is a sacred line I do not cross; the line that distinguishes humans from machines; academic authorship is only meaningful for humans, not AI."

Last updated:  20 Oct 2025 1:02pm
Contact information
Contact details are only visible to registered journalists.
Declared conflicts of interest None declared.

Professor Paul Salmon is co-director of the Centre for Human Factors and Sociotechnical Systems at the University of the Sunshine Coast

My initial reaction to the Agents4Science concept was one of horror. I find the idea of a research conference based on content generated and reviewed by AI to be quite dystopian (Bradbury’s ‘There will come soft rains’ immediately sprang to mind only this version has AI running future academic conferences totally devoid of humans – something like a ‘Dead conference theory’).

I can see now though that there are wholesome intentions around understanding the capacity of AI to generate scientific insights and I acknowledge that this is important to explore. In my opinion though the conference only further highlights society’s ongoing failure to ensure that AI is safe, ethical, and beneficial. A broad spectrum of AI tools have been unleashed on society and are free to be used in all manner of ways. We do not have the necessary governance structures and risk controls in place. Not in academia or in any other domain for that matter.

As a Human Factors scientist, I stand by the principle that technology should always be used to assist humans, never to replace them – hence I loathe the idea that one day AI will do scientific research for us. I am hoping then that Agents4Science initiative comes to the right conclusion, that is that AI may provide a useful tool to assist human researchers in their work, but, in its current form at least, is not capable of creating quality research outputs.

Last updated:  23 Oct 2025 12:16pm
Contact information
Contact details are only visible to registered journalists.
Declared conflicts of interest None declared.

Professor Karin Verspoor is Dean of the School of Computing Technologies at RMIT University

"How do we assure the quality and integrity of scientific research in a world where AI agents are both generating and vetting scientific outputs? Who should be the arbiter of scientific knowledge? For at least the last 40 years, peer review has been the gold standard for ensuring the quality of the scientific literature, and for engendering trust in science. Replacing human reviewers with automated ones seems hugely problematic, for at least two reasons: (1) current AI systems have known and inherently unavoidable biases baked into how they are built, and (2) the point of science is to push the envelope of knowledge forward, and it is precisely at the edge of our knowledge where these models are unreliable. This follows from the statistical framework that underpins the training of AI systems.

There has been increasing use of AI systems to judge the performance of other AI systems, most notably the paradigm of “LLM as a judge” — including extended to the fascinating concept of a “Language Model Council” that may benefit from the same advantages as human group decision-making and oversight. This provides apparently easy automation of the complex and resource-intensive process of human evaluation. How well this can be automated and what these systems struggle with, however, remains to be properly assessed. The organisers of AI4Science describe the conference as an opportunity to explore such questions; I hope that they take seriously the need to rigorously investigate the capabilities and the limitations of this approach before we hand over the scientific process to machines."

Last updated:  20 Oct 2025 1:09pm
Contact information
Contact details are only visible to registered journalists.
Declared conflicts of interest None declared.

Professor Albert Zomaya is the Peter Nicol Russell Chair of Computer Science in the Faculty of Engineering at the University of Sydney

"The emergence of conferences such as Agents4Science is a clear indicator of the scientific community facing head-on the promise and responsibility of AI.

Not only how AI can be used as a tool, but as how it could reimagine and reshape the scientific process: challenging us to rethink authorship, accountability, and creativity in research.”

Last updated:  20 Oct 2025 1:05pm
Contact information
Contact details are only visible to registered journalists.
Declared conflicts of interest None declared.

Professor Daswin De Silva is Deputy Director of the Centre for Data Analytics and Cognition (CDAC) at La Trobe University

"Even by the standards of a highly innovative university as Stanford, this conference where 'AI serves as both primary authors and reviewers' is poorly motivated. The responsible practice of AI is to refrain from attributing human characteristics to AI despite its capacity to generate intelligent and human-like output. The same applies here, conducting research and then presenting the findings at a research conference are deeply human activities of peer-reviewed research, knowledge gain, discussion, collaboration, and networking. By assigning and attributing these to AI agents, we are belittling and devaluing the purpose it serves. This type of “AI serves as both primary authors and reviewers” activity is much better suited for a simulation, experiment or demonstration  as part of a human-led conference rather than a conference by itself.

Despite recent achievements in academic research, the foundational limitations of AI (such as the lack of experience and lack of compositionality of the real world) heavily overshadow any research innovation that AI produces without human intervention. Just as most of these recent achievements in academic research were human-led or human-in-the-loop activities, it is critical that global research communities are unified in sustaining this message to the rest of the world. AI is too flawed to generate research output by itself and even more flawed to conduct peer review on such output, without any human intervention."

Last updated:  20 Oct 2025 12:59pm
Contact information
Contact details are only visible to registered journalists.
Declared conflicts of interest None declared.

Dr Raffaele Ciriello is a Senior Lecturer in Business Information Systems at the University of Sydney

"The idea of a research conference where both the authors and the reviewers are artificial intelligence systems is, at best, an amusing curiosity and, at worst, an unfunny parody of what science is meant to be. If the authors and reviewers are AI, then perhaps the conference attendees should be AI too, because no human should mistake this for scholarship.

Science is not a factory that converts data into conclusions. It is a collective human enterprise grounded in interpretation, judgment, and critique. Treating research as a mechanistic pipeline where hypotheses, experiments, and papers can be autonomously generated and evaluated by machines reduces science to empiricism on steroids. It presumes that the process of inquiry is irrelevant so long as the outputs appear statistically valid. But genuine scholarship is less about p-values than it is about conversation, controversy, and embodied knowing.

Equating AI agents with human scientists is a profound category error. Large language models do not think, discover, or know in any meaningful sense. They produce plausible sequences of words based on patterns in past data. Granting them authorship or reviewer status anthropomorphises what are essentially stochastic text-prediction machines. It confuses the illusion of reason with reason itself.

There is, of course, a legitimate discussion to be had about how AI tools can assist scientists in analysing data, visualising results, or improving reproducibility. But a conference built fully on AI-generated research reviewed by AI reviewers embodies a dangerous kind of technocratic self-parody. It reflects an ideology of techno-utilitarianism, in which efficiency and automation are celebrated even when they strip away the very human elements that make science legitimate.

So, to me, 'Agents4Science' is less a glimpse of the future than a satire of the present. A prime example of Poe’s law, where parody and extremism become indistinguishable. It reminds us that while AI can extend our capabilities, it cannot replace the intellectual labour through which knowledge becomes meaningful. Without humans, there is no science, just energy-intensive computation."

Last updated:  20 Oct 2025 12:58pm
Contact information
Contact details are only visible to registered journalists.
Declared conflicts of interest None declared.

Dr Jonathan Harlen is a Lecturer, Teaching Scholar, and Course Co-Ordinator within the Discipline of Law at Southern Cross University

"The upcoming Agents4Science conference raises interesting questions about the authorship and ownership of AI generated works, and the role played by copyright law in the protection and incentivization of human cultural endeavour.

Our current law, in both the US and Australia, does not recognise human authorship of AI-generated works, even when those works are the result of complex, highly specialised sets of human prompts, Should our law be adapted to extend to AI-generated works where those works (peer-reviewed scientific research papers are a great example) show a significant degree of human input and control?

Organisers of Agents4Science - mostly academics based at Stanford University (and still human, at least for now!) - have put out a call for papers featuring 'AI-generated computational works that advance scientific discovery'. The conference is wholly online, and free to attend. Submitted papers need to be 'primarily authored by AI systems', which are 'expected to lead the hypothesis generation, experimentation, and writing processes'. The AI must be listed as the sole first author of each paper. Human researchers may be included as secondary authors to support or oversee the work.

Let’s be clear: excluding human authors in this fashion excludes the very essence of copyright, as currently understood in Australia and the US. This is now settled law, but it was not always so. A spate of cases in the early 2000s focused on authorship and ownership of an earlier generation of machine-generated works, including telephone directories (Desktop Marketing Systems Pty Ltd v Telstra Corporation Ltd [2002] FCAFC 112) and TV guides (IceTV v Nine Network Australia (2009) 239 CLR 458). Until the IceTV case reached the High Court in 2009, it was unclear in Australia whether or not human overseers could claim copyright in such works. The High Court unanimously and emphatically ruled that they could not.

In coming to this conclusion, the High Court emphasised two fundamental points about copyright: In order to attract protection:
1.    the work must originate with an author (who must be human); and
2.    the work must be the result of ‘independent intellectual effort’ and ‘sufficient effort of a literary/artistic nature’ to create the original expression which constitutes the work.

In 2010, in Primary Health Care Limited v Commissioner of Taxation [2010] FCA 419, the Federal Court of Appeal put this more eloquently. The Court held that to attract copyright, a human-generated work must show 'a continuous narrative showing independent intellectual effort expended in expression'. This works very well as a litmus test across every field of human cultural endeavour. Had AI never been invented, copyright law based on this principle would probably have muddled along quite nicely.

AI upsets the applecart, completely, because copyright as currently defined in Australia and the US, cannot currently subsist in any of the products that it generates. This is a big problem, because AI is arguably the single most powerful new tool for cultural creativity since the printing press. In its long history (dating back to the Statute of Anne in 1710) copyright law has adapted to many new technologies without sacrificing its ‘inner core’ of human originality and authorship. Forty years ago there were adaptations in response to the computer age; there is now a strong argument to suggest that copyright should adapt again, to vest copyright in certain classes of AI-generated works that result from complex and original human prompts, and which exhibit clear signs of significant overall human control.

Fortunately, we don’t have to look far to see what this adapted copyright law might look like: the UK has already introduced it. Section 9(3) of the Copyright, Designs and Patents Act 1988 (UK) states that in the case of a work that that is computer-generated, the author, for copyright purposes, 'shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken' (emphasis added).

This means that, in the UK, if you are a human working with AI, and you put in original prompts, choose the scope of your work, and make edits to various drafts along the way, you will be taken to be the author of the final AI output for copyright purposes – subject of course to the terms and conditions of the LLM you happen to be using; for example ChatGPT does generally allow users to own the outputs that result from their prompts.

What would this mean for the authors of the papers to be presented at Agents4Science? It would make a world of difference. It would mean that the investment of time, effort, and original thought which goes in to creating each AI generated scientific paper would be rewarded with ownership, and all of the protections that this entails. Absent this, and these carefully curated works will become faceless, cogs in the machine, and the sparks of human ingenuity and originality inherent in the creation of each such work will go unrecognised.

Currently, contributors to Agents4Science will not mind this: the conference is an experiment, and there is a value always in being involved in something new. But this will pass. Scientists will soon find, as musicians and artists and writers have already found, that without proper recognition of the distinctly human vision inherent in the architecture of their works, they will be left high and dry. The AI tide will move on without them, leaving 'all the voyage of their life bound in shallows and in miseries'."

Last updated:  20 Oct 2025 12:56pm
Contact information
Contact details are only visible to registered journalists.
Declared conflicts of interest None declared.

Dr James S. Pearson is from the University of Amsterdam. His research collaborator Dr Marc Cheong is Deputy Director in the Centre for AI and Digital Ethics (CAIDE) at the University of Melbourne

A conference showcasing AI-generated research papers and reviewers presents both opportunities and risks. Research shows that artificial intelligence can significantly enhance scientific discovery by, say, fostering serendipitous findings. For instance, AI has been used to map the structure of proteins and develop new antibiotics. It can also speed up aspects of research such as literature review. Many leading institutions, including the University of Oxford, explicitly permit the use of AI in research and outline ways it can enhance academic work.

However, there is also evidence that raises concerns about the accuracy and fairness of AI-generated research. AI systems have a well-documented tendency to “hallucinate” false information, which is particularly worrying when AI is used to review or evaluate research. Studies have also shown that AI can reproduce biases found within training data, potentially reinforcing inequalities related to ethnicity, gender, and sexuality. In some cases, AI’s creative capabilities have even been misused, including for the design of new chemical weapons.

As AI becomes increasingly integrated into research, this conference brings to light developments that are already well underway. It offers a valuable opportunity to assess the quality of AI-generated research and to discuss the need for clear institutional guidelines and regulation.

Last updated:  23 Oct 2025 12:14pm
Contact information
Contact details are only visible to registered journalists.
Declared conflicts of interest None declared.
Journal/
conference:
Organisation/s: Australian Science Media Centre
Funder: N/A
Media Contact/s
Contact details are only visible to registered journalists.