Assessing the Impact of Educator Professional Development Programs


Introduction

The Illinois Mathematics and Science Academy (IMSA) is a state-funded learning enterprise dedicated to the transformation of mathematics and science teaching and learning. Such transformation is conducted through two distinct, albeit very interrelated, sets of programs. IMSA's residential high school (grades 10-12) serves gifted Illinois students talented in mathematics and science. The externally focused professional development and student enrichment programs, also targeted toward mathematics and science, serve educational systems, schools, teachers and students in Illinois and beyond.

As a provider of professional development experiences, we want to know the impact of our efforts, but are challenged by the need to identify what we want to measure and the subsequent need to select one or more appropriate assessments from the broad array of possible choices. Such decisions additionally must consider that some measures of impact are more authentic than others, but often at the expense of requiring greater time and other resources. We find ourselves asking "Do we simply count participants; survey their satisfaction levels; test them on what they should have learned; or examine their students to see what effect they've had?"

Kirkpatrick (1994) developed an evaluation framework of four levels (measuring reaction, learning, behavior, and results) to help guide assessment decisions. Although it was designed primarily for a business environment, it often has been adapted for educator professional development. Subsequent to the development of the model in this paper, Guskey (2000) published one that focused on education in his book, Evaluating Professional Development. His model's somewhat different approach to categorizing assessments provides a more theory-based complement to the one in this paper, which is designed primarily as a practitioner's toolkit.

The model below is based upon three areas of assessment content focus:
  • participant counts,
  • satisfaction/engagement indicators, and
  • measures of changes in knowledge, skills, and attitudes.
These components are integrated with the following additional assessment considerations:
  • objectivity/professional perspective - is the measurement conducted through self-assessment by program participants, or through assessments conducted by other professionals, using their expertise and/or professionally developed instruments;
  • place and time frame - is the focus on the measurement of learning within the short-term initial professional development experience or on longer-term examinations of the applications of that learning within the teacher's school; and
  • unit of analysis - are we measuring educators, institutions, and/or students?
The resulting matrix identifies a hierarchy of seven levels of assessments, with an array of specific possible measures for each level. The degree of authenticity matches the level number, with Level 1 generally being the least authentic and Level 7 the most. The accompanying text further details the matrix and its applications.




Assessment Authenticity Levels for Educator Professional Development Programs

Assessment
Focus:
Professional
Development
Experience



Assessment
Focus:
Application
Within
the
Educator's
Home School
Level Assessment Content Focus Unit of Analysis Measures Minimum Program Length
Institution Educator Student
1 Initial Participation Levels X X X* enrollment counts none
2 Engagement/Satisfaction X X* attendance throughout program, satisfaction surveys, program-related website hits, subsequent program enrollment counts none
3 Self-Assessment of Skills / Knowledge / Attitude and Their Application X X* surveys, interviews, journals up to full day
4 Skills / Knowledge / Attitude and Their Application X X* tests, other rating instruments, observations, interviews, journals, products up to full day
5 Self-Assessment of Skills / Knowledge / Attitude and Their Application X X X** surveys, interviews, journals week / multi-day over semester
6 Skills / Knowledge / Attitude and Their Application X X surveys, tests, other rating instruments, observations, interviews, jounals, products week / multi-day over semester
7 Skills / Knowledge / Attitude X** surveys, tests, other rating instruments, observations, interviews, jounals, products multi-day over year
* - students participating in the professional development experience
** - students at the educator's home school
© Copyright 1999, 2001, 2001 Illinois Mathematics and Science Academy. All rights reserved.



Description of the Rows in the Model

The chart is shaded to combine the first four rows into one cluster and the last three into another. Such grouping denotes whether the focus of the assessment is on the professional development program experience (i.e., assessments administered during the experience) or on the subsequent application of what was learned in that experience within educator's home school (i.e., the school in which they are employed). The latter cluster of rows has the inherent potential to be more authentic, inasmuch as change at the educator's home school is the ultimate goal of any initiative.

Level 1 Counting the number of program enrollees and/or participants is the most basic measure of impact. Such counts may be an indicator of initial interest in a program and or the program provider, as well as the quality of the program's promotional efforts. This level category also can include an analysis of target population subgroup counts (for example, "Did we draw a sufficient percentage of rural educators?").

Counts also should be considered in terms of potential impact in the home school. One perspective is the level of participation for any given institution or program, since institutional support is a critical element in changing teaching and learning. For example, there will be greater potential for long-term impact if the participants include two of the three teachers in an academic department rather than one of a department of ten. The second perspective is to consider the numbers of students that potentially will be reached in the educator's home school (extrapolated from educator counts).

If students are provided an educational experience as part of the educator's professional development experience (for example, the offering of a summer science camp experience as a means of training teachers), they too can be counted. However, their relatively small numbers and the short-term nature of their experience provide minimal potential for impact in comparison to the teacher's potential for years of long-term relationships with home school students.

Level 2 Program participant satisfaction assessments frequently are utilized by professional development efforts (the "Did you like the donuts?" assessment). The usual means of measurement is the distribution of a satisfaction survey with items like "The program leader demonstrated enthusiasm for the subject matter" or "The content was relevant to my work."

Counts also can be used as a satisfaction measure, thus elevating them from Level 1 to Level 2 status, by using them at two or more points in an experience. For example, contrasting the count of program participants at the beginning of a day-long conference to one at the end may be a useful indicator of satisfaction, since early departures may indicate a lack of engagement. When appropriate, a potentially more powerful measure of satisfaction and/or engagement is whether participants partake in related programs offered at a later date, or subsequently utilize related program resources such as a website.

Level 2 assessments focused upon student program participants can serve both as short-term customer satisfaction indicators and as pilot tests of home school student reactions.

Level 3 Self-assessments of knowledge, skill, and attitude levels are distinctly different from Levels 1 and 2 in that they specifically target a participant's learning during the professional development experience. They also can solicit predictions of whether such learning was sufficient to enable the participant to apply it in their home. For example, at the end of a workshop, participants can be asked to rate themselves on how much more confident they are in their ability to incorporate a TI 83 calculator into their teaching. Such assessment can be conducted both at the beginning and end of the experience to gauge growth. Additionally, self-assessment items easily can be incorporated into a survey with Level 2 satisfaction items.

Level 3 assessments focused upon student program participants can serve both as short-term measures of learning during the student enrichment experience and as pilot tests of potential home school student learning.

Level 4 In contrast to Level 3, where the participant assesses her/his own growth, Level 4 assessments are conducted through the generally more time-consuming use of instruments to test such levels or professionals trained to observe, interview, or evaluate products developed as part of the experience. Using the Level 3 example on calculators, teachers could be tested on their ability to use its functions, or graded on their ability to write up a lesson plan incorporating it. Additionally, a Level 3 item can be transformed into Level 4 with relatively little effort by adding a request for information supporting the self-assessed rating, such as providing a sufficiently detailed example of knowledge/skill acquisition.

Level 4 assessments focused upon student program participants again can serve both as short-term measures of learning during the student enrichment experience and as pilot tests of potential home school student learning.

Level 5 This level is similar to Level 3 in its use of self-assessments, but is considered more authentic because such assessments are focused on actual experiences within the home school subsequent to the professional development experience. Participants often leave professional development experiences with high levels of enthusiasm and motivation. However, the passage of time and the real-world complications of incorporating that which was learned within the home school environment may alter the participant's perspectives significantly.

For changes targeted at the program, department, or school level, Level 5 assessments also can be used with educators not involved with the original development experience. Inasmuch as institutional support is vital for sustaining changes in pedagogy and curriculum, such broader-based assessments can be more meaningful than those focused on the work of a single teacher or two. Also, while such home school follow-up probably is not worth the effort for student program participants, students at the educator's home school can conduct self-assessments of their learning focused on classroom experiences or other larger-scale changes.

Level 6/7 These levels are the educator's home school versions of Level 4, just as Level 5 was a home school version of Level 3. Instruments can be used to see the level to which knowledge, skills, and attitudes have been retained over time. More importantly, observations of classroom teaching and reviews of curricular documents for implemented programs can be conducted. Also, as with Level 5, Level 6 also can include broader institutional assessments (e.g., the review of a school schedule revised to accommodate interdisciplinary classes).

When Level 4-type measures are administered during a longer-term professional development experience, one which eventually focuses on a period of application within a home school setting, they would be classified as Level 6. As with the relationship between Levels 3 and 4, adding appropriate supporting examples to Level 5 assessments can elevate them to Level 6 status.

Level 7 has been separated from, and elevated above, Level 6 because of its focus on changes in student learning within the home school environment, which generally is the ultimate goal of educator professional development programs. Changes in teaching and institutional structure are of limited meaning if they do not produce a corresponding change in student learning. Hence the Level 7 focus on measures of student learning should be considered potentially more authentic than the Level 6 focus on measures of educators and their institutions.


Description of the Columns in the Model

The model details the levels and differentiates between them via several assessment characteristics:
  • The column "Assessment Content Focus" describes the type of measure to be used. The lowest two levels focus upon counts and evidence of engagement/satisfaction, while the higher levels focus upon skills, knowledge, attitudes and their application. Also, within those top five levels, self-perceptions of such learning generally will constitute a lower level of authenticity than measurement by more objective trained professionals using appropriate tools.

  • The column "Unit of Analysis" denotes whether the institution, educator, or student is the focus of the assessment. Professional development initiatives generally target the educator initially. However, as discussed earlier, initiatives that focus upon meaningful change at the institutional level (department, school, or district), are likely to have a greater impact because of the greater breadth and potentially greater permanence of system-wide intervention. Nevertheless, student assessment usually holds the highest potential for authenticity, inasmuch as any initiative ultimately seeks to impact student learning.

    As mentioned above, some professional development programs involve student participants in order to create a more authentic teaching environment. Assessment of such students in Levels 1 through 4 is very relevant to meeting program goals, for student experiences within the program learning environment are meaningful short-term indicators of educator performance. However, the longer-term experiences of students at the educator's home school (Levels 5 and 7) ultimately are more authentic measures of educator professional development.

  • The column "Measures" lists examples of level-appropriate measures. The examples listed are not intended to be exhaustive and are given with the perspective that multiple and creatively-appropriate measures should be utilized whenever feasible. However, if a provider of multiple professional development experiences seeks to compare among such programs, some level of measurement commonality is necessary. Please note that the measure "products" can include a wide variety of materials, such as an individual teacher's LED, a brochure promoting an innovative learning program, a school reorganization chart, or a student portfolio.

  • The column "Minimum Program Length" lists rough guidelines for matching the duration of the program to the maximum appropriate level. The idea behind this column is that a more extensive, and/or intensive relationship would require more program resources. Such a greater commitment of resources would warrant a proportionately greater investment in more meaningful (and generally more resource-consuming) assessments. This column's guidelines with respect to program duration should be tempered by consideration of the size of the program. More time-consuming higher-level assessments may not be feasible when participant numbers are large.

Additional Considerations

The level of applicability, the degree to which the learning from the professional development experience has been applied, is inherent in determining the level of authenticity. For example, participation is necessary for learning to occur, and such learning in turn may lead to application within the educator's work setting. Nevertheless, Level 1 participant counts can be classified as having minimal potential use for measuring application. As another example, at Level 4, a test item or two can determine possession of knowledge, but developing an LED incorporating such knowledge demonstrates a higher potential for real-world application. In turn, evidence that the same LED was fully utilized as part of the educator's home school classroom curriculum (Level 6) clearly demonstrates application of the professional development experience. Finally, measurement of the effects of that LED on the home school students (Level 7) would provide the most authentic measure of such application.

As with any model, there can be exceptions and gray areas. The level at which an assessment is located is not always an assurance of its level of authenticity. For example, a mediocre or off-target test can be a less authentic measure than an otherwise lower level but well-written self-assessment. On the other hand, the assessment of a well-developed but not yet applied LED (Level 4) may be as or more valuable than a Level 5 self-assessment focused upon general application of the knowledge gained from the professional development experience.

Additionally, the correlation between levels and resource demands is less than perfect. While the "Minimum Program Length" column can be used as a rough guideline to the appropriateness of implementing more resource-intensive higher-level assessments, higher level assessments may be both more appropriate and less demanding than some at lower levels. For example, a program on improving student performance on standardized statewide competence tests can be assessed most appropriately simply by reviewing the already mandated Level 7 test score data for the following year. Other considerations, such as whether the assessment can be imbedded into program content rather than intrusively tacked on at the end, can result in measures which are less taxing on program resources (particularly time) as well as more authentic.


Closing

The above model was tailored to educator professional development. However, with consideration of the building blocks of the assessment model listed at the beginning of the paper (areas of content focus, objectivity/professional perspective, place and time, and unit of analysis), we can create a framework with broader applications. Such adjustments to the model can broaden its application to academic enrichment programs for students, within-school academic programs, and professional development programs for professionals outside of education.

The above model is intended to clarify and order assessment options for practitioners of educator professional development so that they can design experiences incorporating the most meaningful and authentic assessments possible. Furthermore, these practitioners and their organizations can better validate their efforts to their stakeholders by demonstrating that their assessments of their efforts are conducted at the highest feasible levels of authenticity.


References

Guskey, T. (2000). Evaluating Professional Development. Thousand Oaks, CA: Corwin Press.

Kirkpatrick, D. (1994). Evaluating Training Programs: The Four Levels. San Francisco, CA: Berrett-Koehler Publishers.



© Copyright 1999, 2001 Illinois Mathematics and Science Academy. All rights reserved.

Last Updated: June 13, 2001
Created by:Adam Van Den Boom ('98)
Content Design: Dr. Steve Cordogan