Developing a Canadian artificial intelligence medical curriculum using a Delphi study
Overall, 77% (n = 82), of the AI curricular elements proposed were deemed to be important for medical students to know how to proficiently use AI, and 77% (n = 63) of these included elements that reached consensus in the first round. Thematically, non-technical elements quickly achieved consensus for agreement in the first round. This included unanimous agreement for all elements in ethics (11/11), communication (7/7), collaboration (7/7), and quality improvement (6/6). This highlights the necessity for future physicians to understand AI in a capacity that allows them to engage with it in a safe manner to improve care for patients and ensure transparency in the care they provide. Furthermore, these broader themes already exist and are taught in Canadian UGME, representing an avenue for integration rather than curricular replacement.
The technical themes of theory and application were less decisively included, with only 21/36 and 3/26 elements selected for each one, respectively. The elements included focused on the validation of AI and its strengths and limitations, likely guiding future physicians toward the proper and judicious use of AI. One expert emphasized the importance of medical students understanding the limitations of quantitative data, warning that “the high volumetric quantitative data should not be used to devalue the qualitative data, such as doctor–patient communication and relationship.” In a similar view, training should include how to critically appraise AI models for appropriate use in clinical scenarios, akin to evaluating randomized controlled trials.
Regarding the application theme, which had the lowest number of included elements in the first round, we postulate that this is due to the increasing complexity and technicality of the knowledge that physicians use daily. One expert emphasized this point, highlighting that the role of the medical student is the delivery of medical knowledge, not programming. Another expert concurred, adding clinicians should not “be responsible for data collection, cleaning, pre-processing, and the AI model training. These responsibilities deviate from the clinicians’ responsibility of caring for patients.” Programming and deep learning skills suit engineers, while physicians should validate AI and interpret its output. There will likely be a need for certain physicians to take on a larger role with respect to AI innovation and integration, but the vast majority will be using AI in their everyday practice8,18. This explains the exclusion of specific data science techniques and undecided legal elements related to intellectual property.
With respect to our analysis looking at the difference in ratings between expert groups (MDs versus PhD), it was evident that there was no overall difference in rating based on academic background, with all but three comparisons being not statistically significant. This similarity may be attributed to a broad consensus on core elements that are important, underling the complementary expertise of both groups. Additionally, by selecting Ph.D. researchers who have exposure to the medical field, we ensured overlapping yet distinct perspectives. This result also points to the importance of opinions from both practicing M.D.s and Ph.D. researchers, suggesting that their combined insights can lead to a more comprehensive and balanced curriculum.
The leave-one-institution-out showed a difference between included, excluded, and undecided elements when UBC was included versus excluded, highlighting the effect of an increased number of experts from one institution. However, due to the limited sample size, all experts were included, which may have impacted the overall generalizability of our curricular elements. This is further discussed in the paragraph on limitations.
Although there are no formal existing AI curricula for UGME, there have been efforts to supplement AI education for medical students and residents outside of the curriculum. Lindqwister et al. presented an AI curriculum for radiology residents with didactic sessions and journal clubs, aligning with our elements on AI strengths and limitations (T9) and regulatory issues (E1) to ensure a balanced technical and ethical education19. Hu et al. implemented an AI training curriculum for Canadian medical undergraduates focusing on workshops and project feedback, aligning with our inclusion of elements like applying AI models to clinical decision-making (A2) and developing strategies to mitigate biases (E8)9. Krive et al. Created a modular 4-week AI elective for fourth-year medical students, primarily delivered online, aligning with our elements of critical appraisal of AI research/technology (A7, A13), clinical interpretation of results from AI tools (A11), developing strategies to mitigate bias (E8), and communicating results to patients (COM3)20. As such, our study builds a framework for medical educators and future research. The UGME curriculum prepares students for generalist practice, covering physiology, anatomy, pathology, diagnostics, therapeutics, clinical decision-making, consultations, and counseling. There is little room for a drastic overhaul of UGME. The University of Toronto’s UGME introduces fundamental AI concepts, discussing machine learning, AI’s role in healthcare, potential applications, and ethical challenges; showing how AI education can be integrated into UGME, emphasizing core AI literacy and relevance in medicine. Our study findings help identify that nuanced view on which elements should and should not be taught as agreed upon by experts.
The Delphi method, which relies on expert opinion, provided a robust and iterative framework that allowed us to tailor the curriculum to these specific needs, ensuring it is both comprehensive and practical for medical students17. We also based our approach on similar studies that have successfully used expert opinions to create or update curricula in medicine for different subject areas, leveraging their structures to ensure our process was thorough21,22,23,24,25.
In examining the UGME structure, there are several ways to include AI education without significantly impacting the existing curriculum. One approach is to incorporate AI literature into the current biostatistics curriculum, ensuring that students learn to critically appraise and validate AI literature and tools. This integration would also expose students to AI topics and new technologies. Additionally, incorporating AI into facilitator-led case-based learning (CBL) and problem-based learning (PBL) sessions would allow students to explore various AI tools and their impacts26,27,28,29,30. These sessions could also provide opportunities to discuss AI ethics topics, such as AI scribes, AI in clinical decision-making, AI policy, and novel AI research. For example, the framework for responsible healthcare machine learning could be discussed in these small groups to explore a simulated process from problem formulation to envisioned deployment14. Furthermore, providing hands-on sessions with AI tools currently used in the medical field during clinical rotations, such as point-of-care ultrasound guidance using AI31 or digital scribes for documentation32, can help students improve their technical skills and understand the benefits and risks of these tools. Inviting guest lecturers involved in AI and medicine to discuss the salient principles of AI that medical students need to know and current research in AI and medicine would further enrich their learning experience. Introducing annual modules on AI ethics or baseline knowledge, similar to those required for other rotations, would ensure that students remain up-to-date with the evolving field of AI. Encouraging students to engage in at least one AI-related research project during their 4 years of medical school would deepen their understanding of the subject matter. Additionally, it is important to acknowledge that the integration of an AI curriculum should be flexible and may need to be adapted to fit the specific educational frameworks and resources available at different institutions.
Each included element has been mapped to the AFMC’s EPA and CanMEDS roles to underscore the importance of medical AI education. The elements span nearly all EPAs and CanMEDS roles, demonstrating that AI knowledge meets several exit competencies and could be reasonably justified for integration. Alternatively, as existing competencies are updated, specific competencies with regard to select inclusion elements could be included. Endorsement of AI education by a national governing body, supported by a standardized AI curriculum, would encourage medical schools across Canada to integrate AI education into UGME curricula, enhancing future healthcare practitioners’ knowledge. As an initial set of suggestions, we mapped each learning objective to potential implementation strategies in Supplementary Table 2.
Our study faced several limitations. Selection bias was a concern using non-probability purposive sampling; despite our efforts to include a diverse and representative group of 106 individuals from across Canada, non-participation could still lead to bias. We recognize that this method can lead to systemic bias and possible over-representation of certain geographic regions or institutes with more established AI programs, thus skewing the curricular elements towards their perspectives. This was seen in our study, with the largest number of experts being from UBC (likely due to the study being conducted by UBC) and a lower response rate from the Atlantic region, Quebec, and the Prairies. The number of experts from UBC resulted in a significant influence on the number of included, excluded, and undecided elements throughout the rounds. These limitations could have been addressed by using random sampling techniques to ensure a more representative sample. Additionally, expanding the pool of experts to a broader range of geographic locations, nationally and internationally, and institutions that may not have medical schools, but other health-related programs associated with them. A larger sample size would also allow for a better investigation into the impact of heterogeneity of responses across centers, as seen in Supplementary Table 1. The small sample size of 18 respondents means that perhaps not all desired perspectives were included. The reasons for dropout at each stay were not specifically elicited and may include time constraints, lack of engagement, competing priorities, or insufficient interest. Our expert inclusion criteria were restricted to M.D.s and Ph.D.s, which may have further restricted the perspectives considered. Broadening the inclusion criteria to include industry and non-university-affiliated experts with relevant AI and medical education expertise could help mitigate this.
link