Challenges in student assessment in medical and health sciences education in Northern Iraq | BMC Medical Education
This study highlights significant challenges in assessment practices in Iraqi medical education, particularly in addressing underperformance and grading inconsistencies. It revealed high rates of grading leniency (67.3%) and ‘failure to fail’ (38.3%) among medical educators. Faculty members working in clinical settings and those with less teaching experience were more likely to pass students who should have failed or to award extra marks. The most frequently reported influences on assessment decisions were personal (emotional) considerations and learner-related factors.
Assessment challenges in iraqi medical education
Approximately 32% of educators reported failure rates below 5%, while only 11.3% reported rates exceeding this threshold— substantially lower than the 10–35% failure rates documented in university courses from the US, UK, and India [14, 15, 16]. This suggests notable differences, but more direct comparisons with regional studies are needed to fully understand these variations. However, this represents a challenge as obtaining standardized course failure rates is difficult and most Middle Eastern studies use attrition or dropout rates as proxies in their analysis, where attrition ranges from 3.8% in Saudi Arabia to 31.7% in the UAE have been reported [17]. Furthermore, 29% of the educators in our study were unaware of their students’ failure rates. These low rates and uncertainties may reflect the absence of standardized protocols and inconsistent record-keeping rather than actual student performance levels. Institutional culture further reinforces this, as 39% of educators saw failing a student as a last resort, while only 11% considered it a necessary part of education. Additionally, only 20% of educators reported feeling confident after failing a student, while 80% either avoided responding or selected “other feelings”, indicating a sense of uneasiness or stigma with their decision. When failure is seen as a negative outcome rather than a learning tool, faculties may hesitate to fail students, contributing to lower failure rates [18].
The prevalence of FtF (38.3%) and GL (67.3%) in Iraq is notably higher than in the US, Europe, and Australia (12.5-17.7% FtF, 18-43% GL) [19, 20, 21, 22], likely due to less strict assessment policies and other contributing factors. However, comparable FtF rates have been reported in other regions, including Pakistan (58.75%,) and Australia (37%) [23, 24], suggesting that these issues extend beyond Iraq. Nonetheless, research on FtF and GL remains scarce in the Middle East, making direct regional comparisons challenging. Karadağ (2021) found rising grade inflation in Turkish medical education, while Almakadma et al. (2023) reported that Saudi students link lenient grading to better faculty evaluations [25, 26]. Although these studies highlight assessment challenges, they do not directly examine FtF and GL, reinforcing the need for more region-specific research. Our study contributes by providing empirical data from Iraq, offering a foundation for addressing this research gap and facilitating cross-country comparisons.
Despite these challenges, faculties reported high confidence (4.1/5) in their assessment abilities, even though only 28.3% had received formal training. While experience may foster confidence, training ensures evidence-based evaluation. Without it, educators may struggle to provide effective feedback or intervene appropriately when students underperform [27]. Beyond training gaps, unclear institutional guidelines (rated 3.17/5) further undermine consistency as it can leave faculties dependent on informal norms and personal judgment in their assessment decision [1, 28]. These challenges are further exacerbated by limited institutional support. Only 18.9% of the educators sought assistance when faced with challenges in their assessment practices. This indicates that either the support mechanisms are not readily accessible or a cultural expectation that grading is an independent responsibility, discouraging help-seeking and reinforcing GL practices [1].
Faculty characteristics as predictors of failure to fail and grading leniency
A key finding was that members with primary clinical responsibilities were significantly more likely to pass failing students and award extra marks than those teaching in non-clinical settings. This may be due to the subjectivity of clinical assessments, which rely on direct observation, professional judgment, and interaction-based evaluations rather than standardized examinations with strict pass/fail thresholds. Additionally, clinical training often occurs in small-group settings, where assessors work closely with students over extended periods. In such environments, faculty may feel personally invested in a student’s progress, making failure a difficult and emotionally charged decision. Furthermore, failing a student in a clinical setting can carry greater professional consequences, affecting their placement in postgraduate training and future career prospects [29, 30, 31]. Contrarily, faculties with more years of teaching experience were less likely to engage in GL. Although there are conflicting reports, it is possible that experienced faculties with greater familiarity with academic standards and assessment principles may be less susceptible to emotional pressures or external influences, reinforcing fairer and more consistent assessment practices [32, 33, 34, 35, 36].
Formal assessment training and faculty perceptions of institutional guidelines did not predict FtF and GL behaviors. This suggests that training programs may be inadequate or inconsistently applied and that guidelines alone maybe be ineffective without active enforcement and clear communication. It is also possible that training and guidelines alone cannot counteract deeply embedded cultural and experiential factors influencing FtF and GL behaviors. Additionally, the small sample size may have limited the ability to detect significant associations, warranting further research with a broader participant pool.
Factors contributing to grading leniency
This study highlights multiple interconnected factors contributing to GL in Iraqi medical education. Faculty decisions were influenced by emotional factors, including feeling guilt, concerns about appearing uncaring, fear of conflict, and giving benefit of doubt, which are consistent with prior studies on faculty reluctance to fail students [1, 12, 13]. Surprisingly, only 30% of faculty reported experiencing direct pressure from students’ parents or relatives. Given Iraq’s socioeconomic conditions, external influences were expected to play a more significant role. However, this result suggests that faculty members do not perceive widespread external coercion. Although faculty may hesitate to acknowledge such pressures due to concerns about professional autonomy.
Assessor’s professional concerns also shape faculty decision-making [1], including 41% of educators fearing that failing students could create an uncomfortable learning environment. In contrast, avoiding potential exam appeals was not a major concern (22%) even though it has been reported to be in other studies [1, 23, 37]. This could indicate that formal appeal processes in Iraq are not as frequently utilized and rigorous as in other countries or there is a lack of awareness or apprehension about appeals rather than an absence of such pressures.
Learner-related considerations also influence GL [1]. Apart from worrying about student’s self-esteem (46%), career development (48%), not committing critical errors (47%), about 43% of educators graded leniently because they believed the student was aware of their weaknesses and making genuine efforts to improve. While student motivation and self-awareness are valuable, assessment decisions should be based on competency rather than effort or nature of error.
Unsatisfactory assessor development can also lead to GL [1]. Our study highlights a disconnect between educators’ perceived assessment capabilities (4.1/5) and actual grading practices. There could be some overconfidence among faculties as despite limited training opportunities, only 16% acknowledged lacking the expertise to assess competencies accurately, and 19% attributed student failure to their own teaching practices. Both overconfidence and lack of training can contribute to GL [1, 27, 28].
Institutional culture further shapes grading practices, with 32% of faculty agreeing that institutional norms discourage failing students. This reluctance may stem from a focus on institutional reputation or a tendency to avoid confrontation, reinforcing the idea that failure reflects institutional shortcomings rather than student underperformance. By discouraging failure, institutions effectively shield students from the harsh but necessary consequences. Furthermore, 30% of faculties expressed indifference due to failure decisions being overturned, raising concerns about institutional power dynamics. When faculty judgments are routinely disregarded, their authority is undermined, which can reinforce GL practices [1, 38].
Local cultural and socioeconomic factors further influence assessment integrity. About 35% of educators reported passing students due to the challenging socio-economic conditions and uncertain future in Iraq, reflecting a moral obligation to support students facing adversity. While well-intentioned, this approach risks prioritizing social considerations and alleviating socioeconomical distress over competency of the future workforce. Additionally, 45% of faculty cited external disruptions such as the COVID-19 pandemic and frequent strikes as reasons for GL. In such situations, leniency could be perceived as a compensatory measure to make up for the lost times [39, 40]. While this may be a compassionate response, it highlights a failure to adapt to unforeseen challenges and a failure to implement alternative assessment and support strategies to maintain academic standards during disruptions.
Contrary to findings from other studies [1, 37], dissatisfaction with remediation options (17%), timing (12%), academic support (18%), and non-academic support (19%) and time and effort required for remediation (16%) were not major drivers of GL in this study. While this may seem positive, it raises concerns about faculties’ awareness of the importance of structured remediation or their underestimation of the effort needed to support struggling students.
International perspectives on assessment integrity in medical education
The findings of this study align with global concerns regarding FtF and GL in medical education. Competency-Based Medical Education (CBME) has been widely adopted to establish clear performance benchmarks and ensure that student assessment is based on demonstrated competencies rather than subjective grading [41]. However, even within CBME frameworks, faculties often struggle with grading consistency, particularly in clinical settings where assessments rely on professional judgment [42]. To minimize subjectivity in student assessment, regulatory bodies such as the General Medical Council (GMC) in the United Kingdom and the Accreditation Council for Graduate Medical Education (ACGME) in the United States have instituted standardized grading rubrics, structured remediation pathways, and Entrustable Professional Activities [43, 44]. However, the Middle East and South Asian countries largely lack such components in their educational systems [24, 45]. By placing our findings within the larger global context, it becomes evident that addressing FtF and GL practices in Iraqi medical education will require establishing a dedicated regulatory body to modernize the system to standards comparable to those found in developed countries and oversee the process of implementing abovementioned components in the system.
Recommendations
Addressing GL and FtF in Iraqi medical education requires a comprehensive approach that integrates institutional reforms, faculty development, and cultural shifts to enhance assessment integrity and establish competency-based evaluations [46, 47].
First, a nationwide framework for student assessment encompassing standardized rubrics and competency-based grading criteria must be created to minimize subjective decision-making and harmonize grading procedures across institutions. Additionally, transparency and accountability could be improved by introducing digital grading systems and disclosing course pass/fail rates as part of accreditation requirements.
Second, priority should be given to faculty training in student assessment, with an emphasis on competency-based assessment and effective remediation strategies. Since only 28.3% of faculty have received formal training in assessment, this lack of preparation may contribute to grading inconsistencies and leniency. Addressing this gap through targeted training programs could improve their assessment practices. Furthermore, more faculty members might participate if some of these programs were made available online or in a hybrid format and linked to their academic or professional advancement.
Third, more robust institutional policies are needed to ensure consistency and fairness in student assessments. Employing peer-reviewed grading schemes, where multiple educators evaluate high-stakes exams, might improve reliability and lessen individual biases. Additionally, institutions must establish clear remediation pathways which should involve faculty-supervised interventions and tailored learning plans to support struggling students before they reach the point of failure.
Fourth, the cultural perception of failure needs to change as well. Institutions should recognize failure as an essential part of education and professional growth rather than seeing it as a reflection of institutional shortcomings. Public awareness campaigns and workshops could aid in changing this mindset by stressing that fair assessment practices will ultimately promote both student growth and patient safety.
Finally, greater external oversight and accreditation mechanisms should be introduced. Independent review boards could oversee grading integrity to ensure compliance with standardized assessment policies. Iraq’s medical education system would gain even more credibility if national accreditation criteria were in line with those of international organizations such as the World Federation for Medical Education (WFME) and the Educational Commission for Foreign Medical Graduates (ECFMG).
While the above recommendations provide a comprehensive framework for reform, their immediate implementation may be hampered by institutional resistance, cultural obstacles, and resource limitations. A stepwise approach has to be taken to ensure sustainable progress. Short-term efforts could focus on implementing transparent grading criteria and faculty training workshops. The creation of national assessment guidelines and structured remediation programs could be the main medium-term priorities. For long-term reforms to be effective, priority should be given to accreditation standards and embedding training on assessment into educators’ academic advancement criteria. Additionally, prior to wider adoption, a trial phase could evaluate the efficacy of these reforms in selected universities. Furthermore, efforts to mitigate grade inflation by reputable institutions such as Princeton University, as well as guidance from other nations and their regulating agencies (e.g., GMC) offer valuable models that can be modified and adopted in the Iraqi medical education system [44, 48, 49, 50].
Limitations
This study has several limitations. First, while the sample size is statistically sufficient, it remains small relative to Iraq’s broader medical educator population, affecting generalizability. The absence of a centralized database makes it difficult to determine the total number of eligible educators.
Second, the study used purposive and snowball sampling, which may introduce selection bias. Respondents might have stronger opinions on GL and FtF than non-respondents, potentially skewing prevalence estimates and limiting generalizability. However, given the absence of a centralized faculty database in Iraq, this approach was necessary to maximize participant reach. Randomized or stratified sampling in future research would improve representativeness across institutions and disciplines.
Third, participants were drawn from multiple programs rather than a single course or specialty. While this broadens perspectives, it may limit the depth of representation within specific disciplines. Future studies should adopt a course-specific approach to better examine how FtF and GL manifest in different medical subfields.
Fourth, the study lacks student outcome data, such as academic performance and progression. Without this, correlating educators’ grading decisions with student competency is challenging. Future research should include student performance metrics and post-graduation assessments to provide a more comprehensive evaluation of assessment integrity.
Fifth, quantitative approaches are useful in identifying trends but not the complete details of educators’ experiences, reasons, or challenges within their institutions. Qualitative methods, such as interviews and focus groups, or a mixed-methods approach is recommended for future studies.
Finally, self-reported survey data may suffer from social desirability bias as many respondents could provide answers that would be deemed socially acceptable. This could lead to an underestimation or overestimation of the true prevalence of FtF and GL. Future studies should complement survey results with official assessment records or direct observations for greater reliability.
Nevertheless, our study provides preliminary yet valuable insights into an underexplored topic in Iraqi medical education. Our findings demonstrate the urgent need for reforms and may form a foundation for future studies.
link
