ChatGPT's healthcare responses indistinguishable from humans, new study finds

NYU study reveals ChatGPT's healthcare responses resemble that of humans, a potential ally for providers, with caution and research being needed and required for clinical roles.

Artificial Intelligence — ChatGPT's answer to healthcare-related queries is at par with humans, according to a study by NYU. DADO RUVIC/Reuters

A groundbreaking study conducted by the New York University (NYU) Tandon School of Engineering and Grossman School of Medicine reveals fascinating findings about the capabilities of artificial intelligence in healthcare.

According to the research, ChatGPT's responses to healthcare-related queries are nearly indistinguishable from those provided by human healthcare providers.

The NYU research team, comprised of experts from the Tandon School of Engineering and the Grossman School of Medicine, embarked on a comprehensive study to evaluate the performance of ChatGPT in handling healthcare-related questions.

To achieve this, they presented a series of ten patient questions and responses to a diverse group of 392 participants aged 18 and above. Half of the responses were generated by human healthcare providers, while the other half were produced by ChatGPT, an AI language model developed by OpenAI.

Participants were given the task of identifying the source of each response and were also asked to rate their level of trust in the answers provided by ChatGPT using a 5-point scale, ranging from completely untrustworthy to completely trustworthy.

The results of the study were both intriguing and insightful. Participants demonstrated a rather limited ability to distinguish between chatbot-generated responses and those provided by human healthcare providers.

On average, participants correctly identified chatbot responses 65.5 per cent of the time and human responses 65.1 per cent of the time, with varying levels of accuracy for different questions. The ranges of correct identifications spanned from 49.0 per cent to 85.7 per cent, indicating that some questions posed a more significant challenge to the participants in distinguishing between the two sources of responses.

The research team made a noteworthy observation: the ability to discern responses was consistent across all demographic categories of the respondents. This means that ChatGPT's performance in healthcare-related queries is not influenced by factors such as age, gender, or background of the individuals seeking information.

The study delved into the crucial aspect of trust that patients place in chatbot-generated responses. Overall, participants exhibited a mild level of trust in the responses provided by ChatGPT, with an average score of 3.4 on the five-point trust scale. However, the level of trust varied significantly depending on the complexity of the healthcare tasks addressed in the questions.

Logistical questions, such as scheduling appointments and insurance inquiries, received the highest trust rating with an average score of 3.94. It appears that patients feel comfortable relying on chatbots for handling administrative tasks that do not involve critical medical decisions.

Preventative care topics, including vaccines and cancer screenings, obtained a slightly lower trust rating of 3.52. While patients showed reasonable confidence in chatbot responses regarding preventive measures, it was evident that they were more cautious compared to logistical questions.

In contrast, diagnostic and treatment advice garnered the lowest trust ratings, with scores of 2.90 and 2.89, respectively. This finding implies that patients are more inclined to trust human healthcare providers when it comes to critical medical decisions and treatments.

The study's findings have significant implications for patient-provider communication and healthcare delivery. The research team highlighted the potential of chatbots, like ChatGPT, in assisting healthcare providers with patient communication, particularly in administrative tasks and common chronic disease management.

Moreover, the study suggests that chatbots could play a valuable role in managing and monitoring chronic diseases. By providing patients with relevant information and reminders regarding medications, lifestyle changes and regular check-ups, chatbots can empower patients to take better control of their health, leading to improved health outcomes.

The researchers acknowledged that chatbots have inherent limitations and potential biases due to the underlying AI models. These limitations could result in inaccurate or misleading responses, especially in critical medical situations.

Thus, the study calls for further research and development to ensure the safe and effective implementation of chatbots in healthcare. Subsequent studies should investigate the performance of chatbots in handling more complex medical queries, their ability to adapt to individual patient needs and ways to mitigate biases and inaccuracies in their responses, the researchers noted.

As chatbots demonstrate their potential in assisting with administrative tasks and chronic disease management, the study prompts healthcare providers to consider incorporating these AI-driven solutions into their communication strategies.

However, the researchers stated that cautious adoption and continued research are essential to ensure that chatbots can truly enhance patient care and complement the expertise of human healthcare providers.

Artificial Intelligence