Currently, the replacement of healthcare professionals by artificial intelligence systems remains a remote possibility. However, this study has demonstrated that AI tools such as ChatGPT are worthy of investigation in the medical field.
Main findings
According to the study results, the percentage of ChatGPT’s correct treatment options (55.6%) was higher than that of the medical doctors (54.3%). Meanwhile, the percentage of ChatGPT’s incorrect treatment options (5.2%) was also lower than that of the medical doctors (11%). As for the approximate cases, ChatGPT (24%) had a higher percentage than the medical doctors (17.1%). It is important to note more cases of the medical doctors’ analysis (17.6%) were excluded compared to ChatGPT’s (15.2%), which also contributed to the results (see Tables 2 and 3).
According to the analysis, when ChatGPT and medical doctor treatment options did not coincide, the medical doctors were correct 9.9% of the time, while ChatGPT was correct 9.8% of the time. When the answers co-occurred, they were incorrect 3.8% of the time and correct 1.3% of the time. These results suggest that ChatGPT’s accuracy percentage was almost the same as that of the physicians. Notably, a large number of cases (75.2%) was not used in this analysis. Besides these excluded cases, the approximate cases were also not considered in this specific analysis, which explains the previously mentioned percentages (see Table 4). In this context, the smaller sample size may introduce bias for this specific analysis.
It is also relevant to mention that ChatGPT and medical doctor therapies may not coincide, but they can both be correct according to the guidelines. This explains the discrepancy of the results between Tables 2, 3 and 4.
These data demonstrate a significant advancement in artificial intelligence, representing a crucial step forward in healthcare. These technological advancements could alter the paradigm of healthcare management, especially in primary healthcare, serving as an effective aid to medical practice at various levels.
Limitations
Although significant statistical data analysis was conducted, incomplete records can introduce bias in the analysis of diagnosis and treatment options. In other words, poor record-keeping can result in either underestimating or overestimating best practices.
One of the limitations of this study is the sample size. With a more diverse and broader sample, it would be possible to reduce analytical bias and draw conclusions with greater confidence in more specific analyses, such as the analysis presented in Table 4. Nevertheless, this study entailed meticulous work, particularly in the case-by-case verification of the therapeutic options suggested by ChatGPT and Dynamed. To minimize bias, this process was independently conducted by two authors, thereby contributing to the study’s adherence to best research practices.
Interpretation
This study aimed to evaluate the accuracy of an easy-to-use and accessible artificial intelligence platform, ChatGPT, to assess its usefulness in a primary healthcare context. Although the treatment options suggested by ChatGPT v3.5 are superior to those of medical doctors, these data should not be seen as a threat to medical professionals. Rather, they demonstrate ChatGPT’s capacity as an important auxiliary tool that can be integrated into medical practice. Medical practice is increasingly challenging due to the constant and rapid changes in medicine and the world in general. It is also crucial for healthcare professionals to understand and adapt to what will inevitably be the future of medicine, particularly in the field of primary healthcare.
Although artificial intelligence is increasingly being studied at various levels of medicine, there are still few studies in the current literature regarding the application of AI in primary healthcare, particularly in acute care consultations.
As previously described, this tool has the potential to enhance the quality of medical practice by increasing efficiency and improving therapeutic success rates. Furthermore, it can be used by patients in a complementary manner, helping them adhere to and comply with the therapeutic plan while also ensuring that the appropriate therapeutic approach is being followed.
Several studies across many different medical areas in recent years have shown the importance and enormous potential of AI. For instance, oncology has been one of the most analysed subjects in this context; one hepatocellular carcinoma (HCC) study demonstrated that AI served ‘to improve the accuracy of HCC risk prediction, detection and prediction of treatment response’ [16]. As another example, one study applying AI-based models to manage glioma demonstrated that they could ‘enhance clinical management of glioma through improving tumour segmentation, diagnosis, differentiation, grading, treatment, prediction of clinical outcomes’ [17].
Primary care settings are also being subjected to study in this context. One study implemented an AI system in order to help primary care physicians screen primary diabetes and diabetic retinopathy and provide individualized diabetes management recommendations. The results showed that the average primary care physician’s accuracy was 81.0% unassisted and 92.3% when assisted by the AI system. The patients with diabetes newly diagnosed with assistance from this AI system showed better self-management behaviours throughout follow-up and were more likely to adhere to referrals [18].
While there are extraordinary benefits, there are also disadvantages associated with this AI tool. One concern is the potential for patients to engage in self-treatment, essentially acting as their own doctors. This could lead to self-misdiagnosis, resulting in delayed diagnosis of serious or even life-threatening diseases and, consequently, higher morbidity rates. Conversely, patients may also incorrectly diagnose themselves with severe conditions, leading to heightened levels of anxiety.
As primary healthcare providers serve as the first point of medical contact, it is imperative for them to be adequately prepared to address these challenges in the near future. Furthermore, ‘responsible implementation, guided by transparent guidelines and patient-centric values, ensures that AI remains a tool that empowers clinicians while maintaining the fundamental human connection that defines family medicine’ [19]. There are several additional limitations that are important to address as well, such as bias and lack of personalization, in order to ensure responsible and effective implementation of AI in healthcare [20].
Moreover, comprehensive cybersecurity strategies and robust security measures must be adopted in order to protect patient data and critical healthcare operations. Guidelines for AI algorithms and their use in medical practice must also be established [20]. Trust-building and patient education are crucial for the successful integration of AI in healthcare practice. Overcoming challenges like data quality, privacy, bias, and the need for human expertise is essential for responsible and effective AI integration [20].
Artificial intelligence platforms such as ChatGPT have significant potential to be integrated into medical practice not only for their therapeutic efficiency but also to address and streamline the bureaucratic tasks that general family doctors take on daily. This integration would enhance the doctor–patient relationship and revolutionize primary healthcare.
This study does advocate for the use of ChatGPT at the expense of the Dynamed platform. Rather, its primary aim is to critically examine an artificial intelligence platform that is widely utilized by both lay and professional users and to elucidate its potential for development and accuracy in the context of presenting medical treatments. Studies of this nature underscore the imperative and significance of rigorously investigating AI platforms and their impact on healthcare, especially in light of the considerable and ongoing integration of such technologies within medical practice.
Future research can build on this study by using a more robust sample to explore some particular aspects of ChatGPT’s performance, including assessing its efficiency in specific medical fields or pathologies across different age groups. It could also be of great interest to examine the development of an artificial intelligence software specifically designed for application within the context of primary healthcare. Furthermore, it would be highly valuable to study the platforms and sources accessed by ChatGPT to generate its therapeutic responses, as well as the respective accuracy of that information. Thus, this work paves the way for deeper understanding of the ever-evolving world of artificial intelligence.
Source link