Bridging the mentorship divide: how large language models could reshape medical workforce equity
Hafferty, F. W. Beyond curriculum reform: confronting medicineʼs hidden curriculum. Acad. Med. 73, 403–407 (1998).
Google Scholar
Jackson, V. A. et al. Having the right chemistry: a qualitative study of mentoring in academic medicine. Acad. Med. 78, 328–334 (2003).
Google Scholar
Sambunjak, D., Straus, S. E. & Marušić, A. Mentoring in academic medicine: a systematic review. JAMA 296, 1103 (2006).
Google Scholar
Lawrence, C. et al. The hidden curricula of medical education: a scoping review. Acad. Med. 93, 648–656 (2018).
Google Scholar
Ellis, M., Wilson, G., Nulan, E., Day, M. & McElroy, J. Mentoring, coaching and peer-support programs promoting well-being for physicians: A systematic review. MRAJ 12, (2024).
Schaye, V. et al. Artificial intelligence based assessment of clinical reasoning documentation: an observational study of the impact of the clinical learning environment on resident documentation quality. BMC Med. Educ. 25, 591 (2025).
Google Scholar
Wolfram, T. Large language models predict cognition and education close to or better than genomics or expert assessment. Commun. Psychol. 3, 95 (2025).
Google Scholar
Shahzad, T. et al. A comprehensive review of large language models: issues and solutions in learning environments. Discov. Sustain 6, 27 (2025).
Google Scholar
Mkony, C. A., Kaaya, E. E., Goodell, A. J. & Macfarlane, S. B. Where teachers are few: documenting available faculty in five Tanzanian medical schools. Glob. Health Action 9, 32717 (2016).
Google Scholar
De Villiers, M. et al. Decentralised training for medical students: a scoping review. BMC Med Educ. 17, 196 (2017).
Google Scholar
Feigerlova, E., Hani, H. & Hothersall-Davies, E. A systematic review of the impact of artificial intelligence on educational outcomes in health professions education. BMC Med. Educ. 25, 129 (2025).
Google Scholar
Schaye, V. et al. Development of a Clinical Reasoning Documentation Assessment Tool for Resident and Fellow Admission Notes: a Shared Mental Model for Feedback. J. Gen. Intern. Med. 37, 507–512 (2022).
Google Scholar
Bennett, S., Paina, L., Ssengooba, F., Waswa, D. & M′Imunya, J. Mentorship in African health research training programs: an exploratory study of fogarty international center programs in Kenya and Uganda. Educ. Health 26, 183 (2013).
Google Scholar
Lescano, A. G. et al. Strengthening Mentoring in Low- and Middle-Income Countries to Advance Global Health Research: An Overview. Am. J. Tropical Med. Hyg. 100, 3–8 (2019).
Google Scholar
Nakanjako, D. et al. Doctoral training in Uganda: evaluation of mentoring best practices at Makerere university college of health sciences. BMC Med. Educ. 14, 9 (2014).
Google Scholar
Schwerdtle, P., Morphet, J. & Hall, H. A scoping review of mentorship of health personnel to improve the quality of health care in low and middle-income countries. Glob. Health 13, 77 (2017).
Google Scholar
Reid, H., Gormley, G. J., Dornan, T. & Johnston, J. L. Harnessing insights from an activity system – OSCEs past and present expanding future assessments. Med. Teach. 43, 44–49 (2021).
Google Scholar
Malau-Aduli, B. S., Jones, K., Saad, S. & Richmond, C. Has the OSCE Met Its Final Demise? Rebalancing Clinical Assessment Approaches in the Peri-Pandemic World. Front. Med. 9, 825502 (2022).
Google Scholar
Newble, D. Techniques for measuring clinical competence: objective structured clinical examinations. Med. Educ. 38, 199–203 (2004).
Google Scholar
Neshaei, S. P. et al. Towards Modeling Learner Performance with Large Language Models. Preprint at (2024).
Boehm, J. K., Qureshi, F. & Kubzansky, L. D. In the words of early adolescents: a novel assessment of positive psychological well-being predicts young adult depressive symptoms. J. Adolesc. Health 74, 713–719 (2024).
Google Scholar
Radford, K. et al. Can adult mental health be predicted by childhood future-self narratives? Insights from the CLPsych 2018 Shared Task. In Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic 126–135 (Association for Computational Linguistics, New Orleans, LA, 2018). https://doi.org/10.18653/v1/W18-0614.
Laurin, K., Engstrom, H. R. & Huang, M. What will my life be like when I am 25? How do children’s social class contexts predict their imagined and actual futures? J. Soc. Issues 80, 1433–1459 (2024).
Google Scholar
Mamede, S. & Schmidt, H. G. Deliberate reflection and clinical reasoning: Founding ideas and empirical findings. Med. Educ. 57, 76–85 (2023).
Google Scholar
Lim, J. Y. et al. A systematic scoping review of reflective writing in medical education. BMC Med Educ. 23, 12 (2023).
Google Scholar
Zhou, H. et al. LLM-EPSP: Large language model empowered early prediction of student performance. Inf. Process. Manag. 63, 104351 (2026).
Google Scholar
Kalita, E. et al. Predicting student academic performance using Bi-LSTM: a deep learning framework with SHAP-based interpretability and statistical validation. Front. Educ. 10, 1581247 (2025).
Google Scholar
Turkmenbayev, A., Abdykerimova, E., Nurgozhayev, S., Karabassova, G. & Baigozhanova, D. The application of machine learning in predicting student performance in university engineering programs: a rapid review. Front. Educ. 10, 1562586 (2025).
Google Scholar
Ahmed, W. et al. Machine learning-based academic performance prediction with explainability for enhanced decision-making in educational institutions. Sci. Rep. 15, 26879 (2025).
Google Scholar
Wang, D., Chen, G. & Lu, Y. Fine-Tuning Large Language Models for Knowledge Tracing Harnessing Insights from Explainable AI. In Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium, Blue Sky, and WideAIED (eds Cristea, A. I., Walker, E., Lu, Y., Santos, O. C. & Isotani, S.) vol. 2590, 297–302 (Springer Nature Switzerland, Cham, 2025).
Wang, Z. et al. LLM-KT: Aligning Large Language Models with Knowledge Tracing using a Plug-and-Play Instruction. Preprint at (2025).
Li, G. et al. Single-agent vs. Multi-agent LLM Strategies for Automated Student Reflection Assessment. in Advances in Knowledge Discovery and Data Mining (eds Wu, X. et al.) vol. 15874 300–311 (Springer Nature Singapore, Singapore, 2025).
Lin, C.-C., Cheng, E. S. J., Huang, A. Y. Q. & Yang, S. J. H. DNA of learning behaviors: A novel approach of learning performance prediction by NLP. Computers Educ.: Artif. Intell. 6, 100227 (2024).
Jiang, A. Q. et al. Mixtral of Experts. Preprint at (2024).
Gemma Team et al. Gemma 2: Improving Open Language Models at a Practical Size. Preprint at (2024).
Grattafiori, A. et al. The Llama 3 Herd of Models. Preprint at (2024).
Yang, A. et al. Qwen2 Technical Report. Preprint at (2024).
Zhang, L. et al. Predicting Learning Performance with Large Language Models: A Study in Adult Literacy. Preprint at (2024).
García-Méndez, S., Arriba-Pérez, F. D. & Somoza-López, M. D. C. A review on the use of large language models as virtual tutors. Sci. Educ. 34, 877–892 (2025).
Google Scholar
Ahsan, Z. Integrating artificial intelligence into medical education: a narrative systematic review of current applications, challenges, and future directions. BMC Med. Educ. 25, 1187 (2025).
Google Scholar
Wibowo, M. F. et al. Insights into the current and future state of AI adoption within health systems in southeast asia: cross-sectional qualitative study. J. Med. Int. Res. 27, e71591 (2025).
Li, X., Elnagar, D., Song, G. & Ghannam, R. Advancing medical education using virtual and augmented reality in low- and middle-income countries: a systematic and critical review. Virtual Worlds 3, 384–403 (2024).
Google Scholar
Duan, S., Liu, C., Rong, T., Zhao, Y. & Liu, B. Integrating AI in medical education: a comprehensive study of medical students’ attitudes, concerns, and behavioral intentions. BMC Med. Educ. 25, 599 (2025).
Google Scholar
Weidener, L. & Fischer, M. Artificial intelligence in medicine: cross-sectional study among medical students on application, education, and ethical aspects. JMIR Med. Educ. 10, e51247 (2024).
Google Scholar
Ferreira, J. M. G. et al. Effectiveness of Low-cost, Technology-enhanced Simulation Training for Healthcare Training in Low—and Middle-income Countries (LMICs): A Systematic Literature Review. J. Gen. Intern. Med. https://doi.org/10.1007/s11606-025-09794-y.
Nag, A., Mukherjee, A., Ganguly, N. & Chakrabarti, S. Cost-Performance Optimization for Processing Low-Resource Language Tasks Using Commercial LLMs. in Findings of the Association for Computational Linguistics: EMNLP 2024 15681–15701 (Association for Computational Linguistics, Miami, Florida, USA, 2024). https://doi.org/10.18653/v1/2024.findings-emnlp.920.
Robinson, S. J. et al. A guide to outcome evaluation of simulation-based education programmes in low and middle-income countries. ANZ J. Surg. 94, 1011–1020 (2024).
Google Scholar
Chandran, V. P. et al. Mobile applications in medical education: A systematic review and meta-analysis. PLoS ONE 17, e0265927 (2022).
Google Scholar
Rincón, E. H. H. et al. Mapping the use of artificial intelligence in medical education: a scoping review. BMC Med Educ. 25, 526 (2025).
Google Scholar
Robinson, S. J. A. et al. Simulation-Based Education of Health Workers in Low- and Middle-Income Countries: A Systematic Review. Glob. Health Sci. Pr. 12, e2400187 (2024).
Google Scholar
Shen, M. et al. Development and implementation of a multiple stage emergency care training program in Kono, Sierra Leone: a clinician-educator curriculum. BMC Med. Educ. 25, 1411 (2025).
Google Scholar
Ethics and Governance of Artificial Intelligence for Health: Large Multi-Modal Models. WHO Guidance. (World Health Organization, Geneva, 2024).
Stinson, C. Algorithms are not neutral: Bias in collaborative filtering. AI Ethics 2, 763–770 (2022).
Google Scholar
Phillips-Brown, M. Algorithmic neutrality. Preprint at (2023).
Templin, T. et al. Framework for bias evaluation in large language models in healthcare settings. npj Digit. Med. 8, 414 (2025).
Google Scholar
Gordon, M. et al. A scoping review of artificial intelligence in medical education: BEME Guide No. 84. Med. Teach. 46, 446–470 (2024).
Google Scholar
Yang, J. et al. Mitigating machine learning bias between high income and low–middle income countries for enhanced model fairness and generalizability. Sci. Rep. 14, 13318 (2024).
Google Scholar
Joshi, A. et al. Natural language processing for dialects of a language: a survey. ACM Comput. Surv. 57, 1–37 (2025).
Google Scholar
Fleisig, E. et al. Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing 13541–13564 (Association for Computational Linguistics, Miami, Florida, USA, 2024). https://doi.org/10.18653/v1/2024.emnlp-main.750.
Liu, W. et al. Fairness identification of large language models in recommendation. Sci. Rep. 15, 5516 (2025).
Google Scholar
Plecko, D. & Bareinboim, E. Fairness-accuracy trade-offs: a causal perspective. AAAI 39, 26344–26353 (2025).
Google Scholar
Buijsman, S. Navigating fairness measures and trade-offs. AI Ethics 4, 1323–1334 (2024).
Google Scholar
Jobin, A., Ienca, M. & Vayena, E. The global landscape of AI ethics guidelines. Nat. Mach. Intell. 1, 389–399 (2019).
Google Scholar
Blanco, M. A. et al. Integrating artificial intelligence into medical education: a roadmap informed by a survey of faculty and students. Med. Educ. Online 30, 2531177 (2025).
Google Scholar
Salih, S. M. Perceptions of faculty and students about use of artificial intelligence in medical education: a qualitative study. Cureus (2024).
Sami, A. et al. Medical students’ attitudes toward AI in education: perception, effectiveness, and its credibility. BMC Med Educ. 25, 82 (2025).
Google Scholar
Jackson, P. et al. Artificial intelligence in medical education – perception among medical students. BMC Med Educ. 24, 804 (2024).
Google Scholar
Zheng, L. & Xiao, Y. Refining AI perspectives: assessing the impact of ai curricular on medical students’ attitudes towards artificial intelligence. BMC Med. Educ. 25, 1115 (2025).
Google Scholar
Abouammoh, N. et al. Perceptions and earliest experiences of medical students and faculty with ChatGPT in medical education: qualitative study. JMIR Med. Educ. 11, e63400 (2025).
Google Scholar
Straus, S. E., Johnson, M. O., Marquez, C. & Feldman, M. D. Characteristics of successful and failed mentoring relationships: a qualitative study across two academic health centers. Academic Med. 88, 82–89 (2013).
Google Scholar
Wu, J. & Olagunju, A. T. Mentorship in medical education: reflections on the importance of both unofficial and official mentorship programs. BMC Med Educ. 24, 1233 (2024).
Google Scholar
Ren, M. et al. Optimizing a mentorship program from the perspective of academic medicine leadership – a qualitative study. BMC Med Educ. 24, 530 (2024).
Google Scholar
Hamid, M. & Rasheed, M. A. A new path to mentorship for emerging global health leaders in low-income and middle-income countries. Lancet Glob. Health 10, e946–e948 (2022).
Google Scholar
Kpokiri, E. E. et al. Health research mentorship in low-income and middle-income countries: a global qualitative evidence synthesis of data from a crowdsourcing open call and scoping review. BMJ Glob. Health 9, e011166 (2024).
Google Scholar
Druetz, T. Integrated primary health care in low- and middle-income countries: a double challenge. BMC Med Ethics 19, 48 (2018).
Google Scholar
Alegre, J. C., Sharma, S., Cleghorn, F. & Avila, C. Strengthening primary health care in low- and middle-income countries: furthering structural changes in the post-pandemic era. Front. Public Health 11, 1270510 (2024).
Google Scholar
Macfadyen, L. P. & Dawson, S. Mining LMS data to develop an “early warning system” for educators: A proof of concept. Computers Educ. 54, 588–599 (2010).
Google Scholar
Da Silva Souza, R. C., Bersaneti, M. D. R., Dos Santos Yamaguti, W. P. & Baia, W. R. M. Mentoring in research: development of competencies for health professionals. BMC Nurs. 22, 244 (2023).
Google Scholar
Armijo, I. Balanced profiles: the role of cognitive and non-cognitive competencies in Chilean higher education academic achievement. Discov. Educ. 4, 302 (2025).
Google Scholar
Lai, J. W., Zhang, L., Sze, C. C. & Lim, F. S. Learning analytics for bridging the skills gap: a data-driven study of undergraduate aspirations and skills awareness for career preparedness. Educ. Sci. 15, 40 (2025).
Google Scholar
Ogurek, B. & Harendza, S. Medical students‘ leadership competence in health care: development of a self-assessment scale. BMC Med Educ. 24, 1275 (2024).
Google Scholar
Lee, I. R., Jung, H., Lee, Y., Shin, J. I. & An, S. An analysis of student essays on medical leadership and its educational implications in South Korea. Sci. Rep. 12, 5788 (2022).
Google Scholar
Sebok-Syer, S. S. et al. Sharing is caring: helping institutions and health organizations leverage data for educational improvement. Perspect. Med. Educ. 13, 486–495 (2024).
Google Scholar
Chan, T. et al. Learning analytics in medical education assessment: the past, the present, and the future. AEM Educ. Train. 2, 178–187 (2018).
Google Scholar
link
