Show Summary Details

Page of

PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).

Subscriber: null; date: 26 August 2019

An Overview of Music Tests and Their Uses

Abstract and Keywords

This chapter details the history and evolution of the use of music tests in both in and out of school situations. The focus is on their importance in supporting teaching and learning as well as providing data for science and culture. Music tests are also found in psychology, medicine, and sociology. Tests of talent and aptitude, music achievement, appreciation and preference, emotion, performance, creativity are covered. The chapter discusses tests used in teacher evaluation, certification, and tests used for other purposes. The chapter concludes with a discussion of ongoing challenges of music testing and the use of music test results.

Keywords: test, aptitude, preference, performance, achievement, supportive, scientific

(p. 536) (p. 537) In some circles, the word “test” is pejorative, and in others, tests provide hope. Most negative descriptions of tests seem to originate with issues in education. The education historian Diane Ravitch (2016) states (slightly abridged but her exact words):

Our national leaders have turned our most important domestic duty into a quest of higher scores on standardized tests. The heavy reliance on tests began with George W. Bush’s No Child Left Behind. The punishments for not achieving higher test scores every year were increasingly onerous. After Bush left office and was replaced by Barack Obama, the obsession with testing grew even more intense. States, to qualify for money, had to evaluate teachers in relation to the rise or fall of the test scores of their students. (p. 34)

These tests, thus, are high-stakes tests. The purpose in testing has not abated with the Every Student Succeeds Act (S. 1177, 114 Cong., 2015)—only the responsibility. As of- 2015, as many as 27 states required administration of at least one end-of-course test (Domaleski, 2011); 13 states required students to pass a test in order to graduate; 44 states and the District of Columbia (DC) require classroom observations to be incorporated in teacher evaluations (Ruzek, Hafen, Hamre, & Pianta, 2014); DC and 18 states have evaluation designs on how to measure student achievement in nontested grades. In addition, 35 states and DC required student achievement as a significant or the most significant factor in teacher evaluations, a policy that can be changed under Every Student Succeeds Act (Doherty & Jacobs, 2013; Ferrara & Way, 2016; Penuel & Shepard, 2016). The influential criteria for teacher and student tests are: higher-order cognitive skills, high-fidelity assessment of critical abilities, real-world assessments that are internationally benchmarked, use of items that are instructionally sensitive and educationally valuable (not reflecting students’ out-of-school experiences), and assessments that are valid, reliable, and fair (Darling-Hammond et al., 2014.) These criteria seem unreasonable for music education; we can only keep them in mind. The purposes (p. 538) (and assessment) of music education may change with new or different standards. In visual arts, Elliot Eisner (2001) raised such an issue when visual culture replaced discipline-based arts education, and David Burton (2016) has written about the disconnect between the 2014 standards and opportunity to learn (reported in the 2008 National Assessment of Educational Progress). Tests, whether used in formative or summative assessment, provide data more valuable to teachers than to school administrators and politicians; music tests are no exception. Industry, medicine, and most research areas rely on tests as indicators of progress and failure. Positive results indicate that “something” is working satisfactorily. Failure of a test is important when safety issues are involved and may indicate that it is time to think anew—maybe even about learning. This chapter focuses on tests related to education and music education and, secondarily, tests used with music as an indicator for a range of outcomes in other professions. What are important about all tests are the purpose and the context in which they are used and the knowledge gained from interpreting the results.

Music has been used in ceremonies, theater, and in the reciting of epic poetry since the time of the Greeks and before. Plato was supposedly concerned about the impact of “new” music on youth, while other Greeks were investigating overtones as well as consonance and dissonance. Observation is the most common “test,” and data from observations the most troublesome to interpret. The music critic comments on the performance of the musicians, of the conductor, of the music and of the programming. The critic often makes comparisons of the musician’s growth over time and, occasionally, with other soloists and ensembles. The reviews in national media are important and can be considered “high stakes” for the performers. Observing a music teacher may be comparable, as it has become “high stakes,” and interpreting the results of observation can range from dismissal to rewards. The accuracy of the critical test of observation has a large margin of error; thus, its use has been questioned. The Bill and Melinda Gates Foundation has invested 45 million US dollars to improve observation as a test (discussed later in this chapter) as part of a $335 million investment in improving teaching and learning (Bill and Melinda Gates Foundation, 2013, January; Kane, Kerr, & Pianta (2014).

The history and use of tests in music teaching and learning can inform us of the expected curriculum goals, and of priorities for teaching and for student learning of knowledge and skills. Commercially published tests have been selected for this chapter. Published materials should be instructionally sensitive and provide evidence of the importance of musical outcomes tested. Categories of tests include the following: aptitude, achievement, appreciation, psychological, creativity, teacher evaluation, and “interesting uses.” Music teachers frequently create tests—these usually provide only a rough and incomplete estimate of student learning and should not be used for high-stakes purposes. Are music grades high-stakes? Certainly, for some teachers and students. We know that many tests designed for formative evaluation are used to justify music grades. If one’s philosophy is that music learning is primarily a process without a product, most, if not all, summative assessments are inappropriate.

In the years after the American Civil War, music was used primarily for dancing and entertainment. With community-sponsored contests (tests) among instrumental (p. 539) soloists, and with the founding of symphony orchestras, auditions became a type of test. Teacher certification tests at the beginning of the 20th century required candidates to know the fundamentals of music notation and have reasonable competence in singing. The emphasis on notation and singing was a continuation of the rationale for music begun by Lowell Mason and his colleagues. Students, of course, were expected to learn to sing and were tested.

At the beginning of the 20th century, music followed education’s interest in the role of talent (aptitude) in the schools. William Wundt had established an experimental laboratory to conduct research on human abilities in 1879. The philosopher Carl Stumpf became interested in psychology as he observed the experimental work in aesthetics by Gustav Fechner; with Stumpf publishing a music test in 1883 (Stumpf, 1883/1890), which used data on consonance and dissonance as well as tone color. In addition to testing for degrees of pleasantness, Stumpf asked subjects to sing a note that he played on the piano. He also asked subjects to judge the higher of two notes. By the end of the century, psychologists were developing sensory tests such as those used in the Binet-Simon scale to determine the role of inheritance. The laboratory experimentation was reasonably scientific (IQ testing, begun at that time, has had a lengthy shelf life, as has music aptitude). Measures for identifying music talent exist in all countries, and new aptitude tests continue to be developed.

Tests of Talent and Aptitude

Carl Seashore (1932, 1936, 1938, 1946) believed that pure tasks from music stimuli matched the pure tasks in intelligence testing. This was wise, as the concern for the influence of culture (race and socioeconomic status [SES]) that marred other music talent tests as well as IQ testing is not an issue with discrimination competence. This interest in ability, aptitude, talent, or musicality dominated testing for more than half a century and continues to be present in new formats. It remains important, as does Seashore’s foundational research. McPherson and Hallam (2009) suggest that the types of musical talent are performing, improvising, composing, arranging, analyzing, appraising, conducting, and music teaching, but these would not change Seashore’s concepts, recognized or taught. Sound waves would be a necessary component of music ability, either native or acquired, a competence that James Mursell (1937) also accepted. Others with an interest in musical ability, like Géza Révész (1925/1999), were studying expert musicians and concluding that intuition, identifying notes in a chord and notes played on the piano, musical memory (singing back a melody), and imagery were factors in this ability. Sir Francis Galton (1869, 1883, 1889) had included music in his work with human competence, likely influencing Seashore to concentrate on using pure tasks that could be manipulated scientifically. Seashore states, “where there is no experiment there can be no science” (1946, p. 44). Seashore was a serious psychologist and experimented for some 20 years prior to publishing his list of tasks in 1919. His tasks, based on sound (p. 540) waves, were pitch, intensity, rhythm (added in 1925), time, consonance, and tonal memory. Intensity was later replaced by loudness and consonance, and these were replaced in 1939 with timbre. Seashore labeled his tasks as Measures of Musical Talent (1915) and cautioned that they were an incomplete test of music aptitude. Seashore gave his tasks to a wide variety of subjects, checking on any influence of gender, race, age, and experience. He found a wide variance in competence on the individual tasks as well as on the total test, data that he believed teachers could not ignore. This variance in competence for individual music tasks informed him that his tasks were not equal in difficulty and could not be weighted to obtain a total aptitude score. One could have a low score on one task and still be musical. The tasks are all tasks of discrimination, louder-softer; higher-lower, and so forth—skills that convinced him that the more talented individuals could make finer discriminations. Thus, he developed and published in 1939 a second test, part B, which required finer discrimination of intervals and patterns. For laboratory assessment, he developed but did not publish part C for research on perception competence. Seashore believed that music draws on math, physics, physiology, anatomy, genetics, anthropology, and general psychology (1946). The sound waves he investigated continue to be important in music psychology (Lahdelma & Eerola, 2016). A complex of pure tones, sine waves like a piano tone, can also have partials, each with its own frequency, amplitude, and phase. Complex tones with different spectral contents are perceived as having different sound qualities, tone colors, or timbres. Harmonic overtones also affect the degree of sensory consonance and dissonance of tone combinations presented simultaneously. Tone combinations with fundamental frequencies that are not related to each other such as the minor second 16:15 lead to sensory dissonance—important ideas in listening (Thompson & Schellenberg, 2006). There is ongoing research on pure tones, as harmonic overtones are not perceived as individual pitches. Complex tones with different spectral contents are perceived as having different sound qualities, tones, colors, or timbres.

James Mursell and Carl Seashore are seen as having opposite beliefs. This is only partially true; perception of sound is critical for both. Mursell’s interest was in the application of sound. He argued the art of music is a creation of the mind of man with everything depending on the action of the mind. For Mursell, the essential function of music is to express and objectify emotion. The aural experience is related to behavior and to mental life different in important respects from visual or tactile experience. This is a biological and psychological fact (Mursell, 1937, p. 19). Musicality does not depend directly on sensory abilities but on a complex of psychological functions that exist in varying degrees and relationships. The art of music depends on some of the most foundational and universal human characteristics. Mursell’s reaction to aptitude testing was to ascertain whether those who score high on a test display high musical behaviors like sight-singing, playing the piano, and doing well in music theory, all teachable. Mursell and Seashore had 30 consecutive articles on the psychology of music during the 1930s in the Music Educators Journal, an indication of the importance of these ideas. They agreed on the importance of four capacities: loudness, sense of pitch, sense of time, and sense of timbre. Mursell (Mursell & Glenn, 1931, 1938), suggested the results of psychological (p. 541) investigations in the field of music “can be of utmost value for the working music teacher, and can further the cause of music education in America” (p. iii). For Seashore, “psychology allows us to record, preserve, and interpret music in all forms of historical interest” (Seashore, 1946, p. 14). He believed,

We should think in orderly, specific, describable, repeatable, and verifiable terms, as the musician can see in the score an astonishing number of features of which he otherwise would not be aware. (p. 11)

If music is to keep up with other subjects, we have to adopt the scientific point of view. (p. 49)

All kinds of sounds have musical value; one only needs training and talent. (p. 184)

Students being assessed should be able to give reasons for their responses. (p. 191)

All genuine musicians have superior auditory imagery that can recall a tone so realistically and objectively that it can be scrutinized in all its detail just as in actual hearing. (p. 211)

If the public school instructor has a clear conception of the role of tonal imagery and can evaluate it to some degree, he can understand in large part the success or failure, and the likes and dislikes, of the students, and he can guide them more successfully. (p. 198)

Music is the life of feeling. One plays upon feeling to appreciate and create the beautiful in the tonal realm. (p. 204)

Seashore intended to establish the importance of his measures in the public schools. Abraham Flexner (a well-known educational reformer of the time), however, suggested that any research results would be more convincing if conducted at the Eastman School of Music. George Eastman and Howard Hanson became strong supporters of the validity of the measures. Seashore’s assistant, Hazel Stanton, worked at Eastman from at least 1921 to 1935 evaluating the measures.

Why such support? The public and the music education profession believed that knowing a student’s musical aptitude could avoid frustration and a waste of taxpayers’ money. To identify the most talented individuals and provide them with the necessary education would be more important than any attempt to educate the masses in areas for which they had limited talent. A similar belief exists today. Jose Abreu, the founder of the El Sistema program, believes that “if a thousand must be sacrificed so that four make it, the sacrifice is well worth it” (Baker, 2014, p. 203). Public school educators continue to express concern for the number of “dropouts” from the music programs on offer. A pleiad of music educators cite the low percentage of students enrolled in music in secondary schools as evidence of a problem. Music conservatories are talent-selective and there seems to be a concern for ignoring latent talent among the disadvantaged. “I’m not very musical” is a more common expression in America than “I’m not intelligent.” Belief is strong.

The Eastman validity study (Stanton, 1935) was based on more than 2,000 music students. Test results provided a basis for dividing the study body into five categories and to use graduation as the dependent variable. Sixty percent of the top fifth of the class graduated; but only 17% of the lowest fifth did. The faculty voted to drop any student (p. 542) identified as deficient in talent, perhaps influenced by a study by W. W. Larson (1938) that found a correlation of .59 between scores on the original Seashore test and music theory grades. Stanton was unable to use faculty ratings of student potential, as Eastman faculty did not discriminate among their students, and estimates of individual competence changed little over the 4 years of study. The consonance-dissonance test did not discriminate among Eastman students but remains valued in research by music psychologists and early education researchers. The original consonance test was scored by psychologists who determined the better of two sets of tones based on smoothness, blending, and purity.

Seashore objected to his published test being the sole measure of musicality. Stanton (1935) created an accompanying test of tonal imagery, an ability that Seashore argued was required for retention, recall, recognition, and anticipation of musical facts. In his Psychology of Music, Seashore states that one creates music by hearing it in one’s “minds ear”; “if one removes the aural image from the musical mind, one takes out the very essence of musicality” (1938/1967, p. 6). Surprisingly, Seashore believed that mental imagery could be developed to a marked degree, as music is essentially a play on “feeling with feeling.” (Some of these early beliefs/statements may remind the reader of the ideas of Bennett Reimer and Edwin Gordon.) Seashore describes two aspects of feeling, one aesthetic and the other creative, stating, “music can be appreciated only through active feeling” (1938/1967, p. 9). Stanton’s (1935) test of tonal imagery was scored on a six-point scale based on musical sensitivity, musical action, musical memory and imagination, musical intellect, and musical feeling. As the test was subjectively scored, suggestions were made to substitute either appreciation tests, attitude measures, or intelligence tests for tonal imagery. The Iowa Comprehension Test was selected, as it correlated with success. Stanton also used rating scales of musical environment, musical training and education, musical activity, musical feeling, and musical interests. These variables were scored in five ranks: safe, probably, possible, doubtful, and discouraged. She followed up with 10 discouraged students and found only 3 had graduated; the one graduate remaining in music became a public-school music supervisor in a small town (Stanton, 1935).

The importance of testing for talent is pervasive. The Seashore measures required about an hour to administer; many of the tests using similar tasks can be found in shorter or slightly modified tests including: Musical Talent Test, F.E. Olds & Sons, n.d.; Selmer Music Guidance Survey, n.d.; What Instrument Should I Play (WISIP), n.d.; Myers Music Aptitude Test, n.d.; Moyer Music Tests, n.d.; King Musical Instrument Appraisal Test, n.d.; C. L. McCreery Elementary Rhythm and Pitch Test, 1937; Conrad Instrument Talent Test, 1941; Tilson-Gretsch Musical Aptitude Tests, 1941; Pan American Music Aptitude Test, 1951; Kwalwasser Music Talent Test, 1953; Biondo Music Aptitude Test, 1957; Conn Music Aptitude Test, 1976; the Gordon Instrument Timbre Preference Tests, 1984; plus some unpublished and teacher-constructed tests (University of Maryland Libraries, 2005).

Some tests have recordings; others require a teacher to administer the task. Discrimination tasks include pitch, time, intensity, consonance, chords, melodies, and counting the number of times a tone appears. The Gordon Timbre Preference Test is a (p. 543) preference test of 7 timbres (woodwind and brass) organized into 42 recorded test items. The independent Kwalwasser test took only 10 minutes to administer and was very popular. The test had two forms, grades 4–6, and 7 and above, and consisted of short melodic patterns with the task being to decide if the second pattern differed in pitch, time, rhythm, or loudness from the first.

Interest in developing better, more holistic, and substantive measures of musical talent than the Seashore tests continued until at least 1989. Seashore’s student Jacob Kwalwasser (1926) could not resist the value of aptitude, but, likely influenced by James Mursell, labeled his melodic and harmonic sensitivity tests as aptitude. One judges the “better” of two-measure melodic progressions. In the harmonic sensitivity test, the task is to judge the better of three chords in conventional four-part harmonization. In 1930, Kwalwasser was joined by the music educator Peter Dykema in publishing a test that continued to use items from Seashore plus the addition of taste (e.g., Wing and Gordon) for tonal movement, melodic taste, pitch, and rhythm imagery. In a journal article in 1926, Lowery suggested that comparing cadences in various positions, same and different phrases, and musical memory for statements that contained irrelevant items was a more musical way of assessing musicality. Raleigh Drake’s (1957) interest in the role of memory began in 1934 and was last published in 1957 (University of Maryland Libraries, 2005, series 6.2, box 43, folder 10; also in the Talbot Library, Westminster Choir College). In this test one had to compare from two to seven melodies with the original, indicating whether any perceived change was in key, time values, or notes. The addition of each melody clearly was a more difficult memory task, allowing Walter Vispoel (1987) to use the test in computer adaptive testing.

Madison’s (1942) idea was that interval discrimination was the true aptitude test. The student compared four harmonic intervals to determine which one was different. James Froseth (1982, 1983) developed an ear-to-hand coordination test; the test was simplified by Mark Dickey (1991) for use in the public schools. The student is given a melodic pattern and required to match it on his or her instrument.

A Test of Musicality by E. Thayer Gaston appeared in 1942 and reached its fourth edition in 1957 (Gaston, 1942/1957). Instrumental teachers believed it to measure potential even though it contains notation. Students responded to whether a given note was in a chord, detected differences between sound and notation, and determined whether the final note of a phrase should be higher or lower; tonal memory was also tested, namely, identifying changes in pitch or rhythm. Herbert Wing’s (Wing, 1958) Standardized Tests of Musical Intelligence followed a year later and consisted of seven tests. In Wing’s test, the student is to identify the number of notes in a chord, detect a change of pitch between two chords played successively, identify the note changed in a short melody. In addition, four tests that measure taste and/or perception, were adopted 7 years later by Edwin Gordon; these four tasks were preferences for rhythmic accent, harmony, dynamic relationships, and phrasing. Bentley’s (1966) Measures of Musical Abilities, designed for ages 7–12, comprised four tests: pitch discrimination, tonal memory, chord analysis, and rhythmic memory. Scores improved only 3% on a second testing, thus providing (p. 544) data for the argument that the test measured talent. Edwin Gordon (1965), whose Musical Aptitude Profile was published a year earlier, also found that scores improved only from 3% to 4% after a semester of instruction. Gordon’s test is 115 minutes in length, an indication of the importance of aptitude and the difficulty in measuring it. Tonal, melody, and harmony imagery are assessed in part 1, with the respondent determining whether an embellished melody is the same, whether the harmony is changed by varying the lowest note, and, with rhythm, whether the tempo stays the same. In part 2 the task is to determine whether the metrical structure is the same or different. Three tasks are presented in part 3: select better phrasing, better ending, and better style. Gordon uses original music to reduce influence from the culture. In a 3-year study, Gordon (1967) obtained reliabilities of .17 to .56 with a composite from .35 to .71. Shorter Gordon aptitude tests are available for younger students but without the preference/perception tests and the supporting validity research. Gordon’s (1979) Primary Measures of Music Aptitude has a tonal and a rhythm test. Tonal items differ due to a change of one or more tones. Meter differs due to grouping of tones within one meter. In 1982, an Intermediate Measure (Gordon, 1982) was published for students scoring above the 80th percentile on the Primary Measures. The main difference in the tests is a greater use of the minor mode in the tonal section. In 1989, the Advanced Measures of Music Audiation (Gordon, 1989) for college students was published. Although a short test, Gordon (1997) argues that it is not an achievement test. Two short, original, musical statements are given with the task being to indicate whether the second is the same as or different from the first; changes are either tonal or rhythmic. In Sweden, Holmstrom (1963) and Gabrielsson (1995) advanced the idea that musicality consisted of pitch, chord analysis, tonal memory, and rhythm. When he factor-analyzed his work, he found that musicality consisted of two factors, pitch perception and musical experience.

A different approach to aptitude has been taken by Kai Karma (1974, 1975, 1979, 1995), who defines music aptitude as the ability to hear temporal structure in sound material. He argues that temporal patterns (from the point of human psychology) are more essential than sound. Thus, there could be music without sound and musical ability without hearing. The argument is similar to visual temporal patterns. In his aptitude test, the task is to find the repeated pattern in a sound sequence and compare it with a single (not repeated) pattern, which may be similar or structurally changed. The structures are formed by varying either the pitches, intensities, or durations of the sounds. The Japanese researchers Umemoto, Mikuno, and Murase (1989), published an aptitude test that focused on pitch deviation of 50 hz up or down on a diatonic scale using tonal and atonal sequences of four, five, and six tones. Gembris (2002, 2006) reports that meter recognition is thought to not develop until the age of 7, and to stabilize at 9. Five-year-olds, however, can maintain a given meter (p. 124).

Many of the aptitude tests were well researched by the authors and important to music educators if the skills measured are truly aptitude or very difficult to teach. One thread in music aptitude is the importance of musical memory. Discrimination is also important. One learns to hear. This half century of interest in aptitude testing established that students do differ greatly on these tasks; whether teachers are using this knowledge to (p. 545) improve their curricula is unknown. The human cognitive apparatus can discern the organization behind the music with comparative ease.

Music Achievement Tests

Published achievement tests should be an indicator of what has been or is being taught—a sort of curricular priorities checklist. If there is a sight-reading test as part of a music contest, the hint is that sight-reading should be part of the curriculum. With achievement tests, the curriculum and the measure must be aligned and the test sensitive to the instruction. In American music education there is little agreement on a graded curriculum; the music curriculum is influenced by the texts/materials used in general music and instruction books in instrumental music. In the United States, it is surprising that the National Assessment of Educational Progress (NAEP) specified 8th grade as a benchmark. Grade level tests are possible only with a standardized curriculum like those that exist in mathematics and language arts. National and state summative reports at the end of music study are infeasible, as are grade and grouped-grade levels, due to differences in opportunity to learn. Formative assessment is viable but difficult. The point of testing is to articulate any common knowledge and skills that should be known. Music achievement results from the intelligent and persistent use of capacities (Davidson & Scripp, 1992). As we write, there is interest in incorporating character skills, social-emotional skills, self-management skills, psychological skills, behavioral skills, interpersonal and intrapersonal skills (soft skills), and skills selected from a list of 21st-century skills, into all of instruction. Rating scales would be used in these noncognitive skills such as anchoring of vignettes, forced choice, rank, and preference methods, and situational judgment tests. Other measures include observation, questioning, written work, informal and formal testing, and self- and peer evaluations. There is always a danger of attainment’s taking precedence over progress.

In 1921, the Music Supervisors National Conference published a course of study for graded schools partially based on data from achievement tests. The interest was in “grading” the school (not the student) as being either average or good. Competence in singing was expected. National objectives were possible in tasks related to knowledge of notation. Fewer than 50% of 6th-grade students could recognize the national anthem from notation. Symbols, terms, and key and time signatures were poorly learned (Kwalwasser, 1927), with an appreciable loss of knowledge as students advanced in grade level. Professional educational organizations have no enforcement mechanism like the Food and Drug Administration or the Environmental Protection Agency, making for a loose alignment between what is taught and what is tested.

The early tests often contained items that had been established as aptitude. Which is aptitude and which is achievement is a decision for professionals. For example, the Beach Music Test of 1920 and 1930 and reprinted in 1938 (Beach, 1921) not only tested knowledge of music symbols but also had an aural component that required identification of duple, (p. 546) triple, or quadruple meter, ascending or descending melodies, similarity of phrases, identification of the highest or lowest note, identifying aural stimuli by syllable names, judging whether the notation is correct, and selecting from several written melodies from the ones heard. The student also had to match composers and artists. This eight-part test assumed a rigorous music curriculum and tasks that were accepted as important considering its more than two decades of shelf life. The Kwalwasser-Ruch Test of Music Accomplishment (Kwalwasser & Ruch, 1927) was similar, with more emphasis on knowledge of syllables. Alignment with the curriculum was supported by a high correlation with music grades for students in grades 6–12. Other 1920 achievement tests by Hutchinson and Pressey (1927), Gildersleeve and Soper (1921), Strouse (1937), and Torgerson and Fahnestock (1927/1930) required dictation skills; detecting changes in pitch, meter, and key signatures; ands recognition of song titles from notation.

Knuth’s (1936/1966) Achievement Tests in Music gave the notation for two measures of four played on the piano, with the student having to select the last two measures from four choices. In 1964, Marilyn Pflederer-Zimmerman investigated whether Piaget’s conservation stages applied to music (Pflederer, 1964). She continued the study with Lee Sechrest in 1968 under a federal grant, and later with Webster (1983). The deformations used were changes in instrument, tempo, harmony, mode, rhythm, contour, or interval of the phrases. Conservation of tonal patterns seems to appear earlier than that for rhythmic patterns, and alterations of instrumentation tempo and harmony are recognized earlier than those of mode, contour, and rhythm (Pflederer & Sechrest, 1968). Swinchoski’s (1965) middle school test was standardized in 1965.

The Colwell Music Achievement Tests 1–4 (1965/1970) were a deliberate attempt to align testing with the curriculum. Colwell analyzed contemporary teaching materials, and convened a national teacher conference to verify that the objectives were being taught in most classrooms. In the following year, a similar alignment procedure was used to develop the first NAEP in music. The Colwell tests are structured similar to the Aliferis college-level tests (1947/1954, 1962), with an initial basic task followed by the same competency presented in a more musical version. These tests assess pitch discrimination, melody moving by steps or leaps, meter discrimination, major-minor discrimination, feeling for tonal center, and auditory-visual discrimination (pitch and rhythm), the last part requiring a comparison of the aural stimulus with notation. Test 3 has three parts: tonal memory, melody recognition, and pitch and instrument recognition. Test 4 has four parts: musical style, auditory-visual discrimination, chord recognition, and cadence recognition. The Colwell tests are the only music tests where item difficulty and item discrimination are reported for each item on the four tests based on a standardization sample of 20,000 students for Tests 1 and 2 and nearly 10,000 students for tests 3 and 4.

The Farnum Music Notation Test (1953/1969) consists of 40 printed melodic phrases with one of the four bars of each melody containing an error (compared to the aural stimulus) in either pitch or rhythm. The student marks the measure containing the error; most errors are in pitch. In the same year, Farnum (1969b) published a multiple part Music Test with the first part his notation test. Part 2 is a cadence test of 30 items where the last tone is missing and the task is to decide whether this tone should ascend (p. 547) or descend. To test for musical memory, patterns of four or five tones are performed twice with the task to identify which note has been changed in the second performance. In what is marketed as an eye-hand coordination test, part four has nine notes indicated by a unique pattern of dots. The student is to match the dot pattern with traditional notation within 2 minutes.

The Gordon Iowa Tests of Musical Literacy (Gordon, 1970/1991) comprises six levels, each of which has two divisions and each of which is basically measuring the same competencies. The two divisions, like all Gordon tests, are tonal concepts and rhythm concepts, each with three subtests: aural perception, reading recognition, and notational understanding. Aural perception at levels 1 and 2 consists of 22 items in either major or minor. To test reading recognition, students identify whether the notation is same or different from the performed melody. For the notational understanding items, students are presented with a written melody that is missing some notes; next, they listen to the complete melody performed three times, and then write in the missing notes. The rhythmic concepts task requires the student to identify duple or triple meter, and the reading recognition items require students to determine whether the written notation matches the aural stimulus. Level 3 uses melodies in a usual or an unusual mode (unusual being Dorian, Phrygian, Lydian, Mixolydian, Aeolian, and Locrian tonalities) with the student distinguishing among the modes in listening and reading. The rhythm concepts part has tasks of usual meter or mixed meter. The tests are aligned with Edwin Gordon’s research; some specialized vocabulary is required.

The first NAEP (National Center for Education Statistics, 2018) in music was conducted in 1971–1972 and is the most complete of the three subsequent national assessments. The alignment remains exemplary for NAEP tests. One hundred fifty (150) exercises were developed in 1965 by music professionals, based on four age levels (9, 13, 17, and young adult [26–35]). A booklet of the objectives (Norris & Bowes, 1970) was printed and widely distributed, as well as the offering of pretest information sessions at national conferences. Approximately 24,000 school-age students were sampled at each age level, and about 5,000 young adults provided data, for a total of 80,000 individuals. The assessment covered five areas. Performance required singing familiar songs, repeating unfamiliar musical material, improvising, performing from notation, and, on a second day, performing a prepared piece. Notation and terminology included vocabulary, basic notation, and score reading. Instrumental and vocal media tested aural recognition, visual recognition, and performance practices (recognizing instruments from sound). Students were also asked how sound on the various instruments was produced. Music history and literature required some knowledge of periods in music history, musical genres and styles, and music literature. The fifth part was not scored; it asked about student attitudes. A tape of student performances on the test was provided with examples of the scoring standards of what was considered good, acceptable, and poor. This format continues to be appropriate for 2016.

The NAEP assessment was repeated in 1978–1979 with many of the same tasks but without the performance or adult participation components. The omission was due to a lack of support from the US National Assessment Governing Board (NAGB) and not (p. 548) because performance had a lower instructional priority. The third NAEP assessment took place in 1997 (the voluntary national standards replacing the curriculum alignment stage) and again in 2008 and 2016, but with the same questions as on the 1997 and 2008 versions. Only 8th-grade students were tested due to budget constraints, and the reporting lacked the aural thoroughness of the original assessment.

Colwell (1979) published criterion-referenced tests marketed as the Silver Burdett Music Competency Tests. These were aligned with the Silver-Burdett Music Series textbooks, and although criterion referenced, norms were provided due to teacher demand for comparisons and the availability of item discrimination. The tasks require the students to discriminate what they heard in a musical work: beat or no beat (or in doubt); fast or slow tempo; loud or soft dynamics; high or low pitch; whether the form is ABA, AB, or just A; and same or different style. Many of the same tasks are repeated at each grade level but in a more difficult format for a higher grade level. Other “what-to listen-for” tasks include hearing for accents, same or different rhythm patterns, ascending or descending melodies, tone color, tonality, style, and harmony–no harmony. At the higher grade levels, melody, duration, range, form, and tone color are combined to assess comprehensively what the student has learned to hear. As a caveat, the interpretation of achievement tests comes with the knowledge that tests that can be given in groups miss important individual objectives.

College-level achievement tests have been common for most of the 20th and continue into the 21st century, focusing on music theory tasks that include taking dictation, performing, writing a four-part accompaniment for a folk tune, writing excerpts of a rock/pop song in lead sheet format, and knowing the vocabulary and basic harmonization rules. The 1947 Aliferis Music Achievement TestCollege Entrance Level (Aliferis, 1947/1954) was the first published and standardized test at this level. The test was endorsed by the National Association of Schools of Music (NASM), which facilitated procurement of a normative sample of some 1,700 cases. Part 1 requires the student to match one of four written intervals with the interval played on the piano. Melodic patterns are performed in the more musical task with the student, again, selecting from four choices. The variable is the last note of the pattern. Section two is chord recognition, with the student selecting the chord notation sounded from four choices. In the more musical section, a three-chord sequence is offered and the student selects from three possibilities. In section 3, a rhythmic figure is defined as the rhythm within one beat duration. The figure is repeated three times in a C major scale and the student selects from four rhythm notations. The more musical version contains two rhythmic elements.

James Aliferis was joined by John Stecklein, a measurement specialist, in 1962 in the release of the Aliferis-Stecklein Music Achievement Test—College Midpoint Level (Aliferis & Stecklein, 1962). The test was designed to verify student music theory competence after 2 years of study in a college or university program. The melodic interval test consists of 34 items that each present 4 printed notes. The listener hears the pattern, and then identifies the fourth note in the pattern from a four-choice response set. The chord test consists of 26 items, each presenting a four-voice chord in notation. The student hears the chord, and then compares the aural presentation of the chord with the chord (p. 549) notated in the test booklet. One tone in the chord is different and students identify the chord that is different. There are 19 items that present 6-beat rhythmic patterns in notation and played on one tone. The test taker then compares the notated pattern with the pattern played, and identifies the beat where the pattern played differs from the notation. Although norms are provided, there is scant evidence that the test was widely used. A graduation level test was constructed but not published.

The Australian Test for Advanced Music Studies (Bridges & Rechter, 1974) is designed for grades 13–16 (ages 19–22). The test, in three parts, measures aural imagery and memory, score reading, and aural/visual discrimination, comprehension, and application of learned musical formats. The aural stimuli are selected from vocal and instrumental music common in Australia using a wide range of timbre and textures. Classical, folk, ethnic, jazz, and pop music are used. The student must be able to audiate the sounds represented by visual symbols through recognizing intervals, tonality, triads, and styles of particular composers. Dr. Bridges likens the test to the American Advanced Placement (AP) theory test, a test that is revised annually, sponsored by the College Board (based in the United States), and designed to assess the traditional content of the first year of college music theory course. The AP test also has objective questions on discrimination and perception. Students take dictation, realize a figured bass, compose, sight-sing, and demonstrate an understanding of the cultural issues in college music theory.

The Graduate Record Examination: Advanced Tests (Educational Testing Service, 1951/2001) was designed for college seniors in music, and focuses on music history and literature with sections on instrumentation and orchestration, and on music fundamentals. An aural part was added in 1965. This advanced test was discontinued in 2001. The former National Teacher Examination consisted of 125 multiple choice items and covered all phases of music education: vocal, instrumental, elementary school, and senior high school.

Computer achievement tests such as SmartMusic, Music Prodigy, and IPas are not standardized learning systems that incorporate assessment. Researchers have established reliability data for SmartMusic.

A few states including Michigan, Minnesota, Illinois, Kentucky, Tennessee, Washington, New York, and Connecticut have attempted, since the 1970s, to develop their own competency test or item bank with mixed results and usually without important psychometric data. These tests have insufficient data to be inspiring.

Music Appreciation and Preference

Preference is of interest to music sociologists and music psychologists, and most of the tests in this area were designed to discriminate, to test one’s knowledge, perhaps to encourage listening to better music, or to connect the school with the concert hall. They may measure emotion or nonmusic outcomes. Appreciation tests are subjective, relate minimally to instruction, and are usually culturally biased. A few ask the student to justify why specific works were preferred. Sociologists also review programming over (p. 550) time by major ensembles (for an example, see the sample programs in Farnsworth [1969, p. 114]). Sociologists and psychologists use data from perception and appreciation of music to portray the image of a musician, and make the argument that music preference is an integral parts of adolescents’ social identity. Such study leads to the identification of typicality in personality studies. At an instructional level of music appreciation, thinking in music is one form of music intellect and is involved in responding to music. Brophy (2000) suggests that responding uses critical thinking skills and acquired musical knowledge that are required for one to make reasonable and informed judgments about music and personal values with respect to music. Responding (intelligently) to music is an expectation of universal outcomes from music instruction (Juslin, Liljestrom, Vastfjall, Barradas, & Silva, 2008; Juslin, Liljestrom, Vastfjall, & Lundqvist, 2010). Emotion is only one part of responding; how one responds depends on multiple factors, thus making standardized testing difficult. One’s cultural background is important.

As early as 1910, C. Valentine discerned that students preferred the major third, then the minor third, followed by the octave (Valentine, 1913). Keston’s preference test first appeared in 1913 with a simplified version appearing as a 176-page textbook in 1953 (Keston & Pinto, 1913/1955). The test consisted of 30 questions with the student indicating which of four works was preferred. Authors of tests include Courtis, Schultz, Gernet, Bower, Adler, Mohler, Trabue, Schoen, Fisher, Kwalwasser, Crickmore, Simons, and Long. Kate Hevner’s (1936) descriptive adjective circle is the best-known list of adjective descriptors; it has been revised frequently, most recently by Asmus (1979). Hevner began her work as early as 1934, asking students to choose between an original and a distorted version (Hevner, 1935). Her published test is the University of Oregon Musical Discrimination Test (Hevner & Landsbury, 1935). With the additional assistance of R. Seashore, Hevner and Landsbury published five measures as the Oregon Music Tests. Like Carl Seashore, she used music psychologists to distort the music on rhythm, harmony, melody, and form. Newell Long (1965) obtained a federal grant to update and publish Hevner’s work using string quartets, woodwind quintets, and organ music from the Baroque to contemporary in addition to the earlier piano selections. A 4,000-student standardization of the test was completed in 1967 and a shorter version for younger students in 1970 (Long, 1978).

Aesthetic Judgment and Taste

Kyme’s (1954) doctoral dissertation was designed to measure aesthetic judgment and required content knowledge to justify answers. He used student solo performances, which were judged on the basis of intonation, tone quality, phrasing, interpretation, tempo, rhythm, and dynamics. Vocal and chamber music examples introduced factors of balance, diction, and blend. Orchestra performances were judged on tone quality, rhythm, balance, and so forth. Popular music items presented altered harmony, melody, and rhythm. Folksong examples were altered by tempo and style; one had to judge (p. 551) cadences on their finality; and a section on classical music required the listener to select a descriptive adjective such as “mischievous,” “exciting,” and “happy,” to describe the music. The interest in this area continues (Tan & Spackman, 2005).

Interest in taste seems universal. In Stockholm, Wedin (1972) investigated emotional expression in music by tying musical structure to emotional expression in music. In the USSR, Bochkarev (1989) aligned composers’ works with themes of melancholy, despair, delight, grief, and contemplation. He evaluated sensory and imagery characteristics of music as primary factors in psychological and operational sense. In France, Arlette Zenatti (1991) reported on her career research of aesthetic judgment and musical cognition. Mateos-Moreno (2015) developed a latent dimension structural model to measure attitudes toward contemporary music. The three constructs identified were “perceived complexity and stridency,” “desire to discover,” and “aesthetic respect.”

Tests of Emotion Stimulated by Music

Related to music appreciation are tests of emotion stimulated by music. Most assessment is based on observation of behavior, verbal reports, self-report data, and performance on teacher-made assessments. (Self-reports are common, but little is known about their validity). Berger and Karabenick (2016) found that with 7th-grade students in math, validity concerns included memory failure, attention control, difficulty in understanding, and a lack of knowledge about strategies. Self-reports, however, are essential if how long one listens to a radio TV station or CD before changing to a different stimulus is the datum of interest. Self-reports measure an aesthetic response, or a temporary feeling. Teacher-made instruments are usually questionnaires asking “What were you thinking when you listened?” or “How did the music make you feel?” “Think-Alouds” have been successful. Bundra (1994) examined verbal reports and identified 17 categories of behaviors, descriptions, comparisons, and attitudes. Concept maps have been used. The German scholar Behne (1997), using a five-point questionnaire, identified listening styles as concentrated, emotional, and sentimental. Veronica Cohen (1997) developed “musical mirrors,” where movements by children when seated were standardized to indicate what was heard.

A verbal report on the listening experience can indicate one’s sensitivity (or what one hears) that results in tension, feelings, ideas, and desires. Hickey (1991) used this approach in teacher education. The semantic differential provides quantitative data similar to a rating scale or a type of paired comparison. Cell phones and logs can record feelings and reactions over time, and can be randomly stimulated. The “time-series” is a tool that records feelings and emotions over time and events; it can include a variety of responses such as tears and shivers. The time series has been used to evaluate skin conductance during movement in piano improvisation. The Continuous Response Digital Interface is a tool (not a test) that has been used to measure responses to music, but it may not differentiate between hearing and listening, or the context and purpose of the (p. 552) stimulus. The Handbook of Music and Emotion: Theory, Research and Applications (Juslin & Sloboda, 2010) has an entire section of five chapters on assessment measures.

The Measurement of Emotional Response to Music

The physiological and subjective measures of emotion and affect represent music test development in the 21st century. A large number of “tests” assess the human impact of the elements of music identified by Carl Seashore and others in the 20th century. Electroencephalograms (EEG), magnetoencephalography (MEG), superconducting quantum interference devices (SQUID), event related potential (ERP), magnetic resonance imaging (MRI), computer tomography (CT) and positron emission tomography (PET) allow for data collection that define tests focused on human response to music. The factors measured include amplitude, articulation, harmony, intervals, loudness, pitch, mode, and timbre. There are measures of the effects of music by biochemicals such as growth hormones, beta endorphins, blood glucose, dopamine, and more (Hodges, 2010). The assessment work follows Seashore’s prognostication about scientific aesthetics and allows insight into how music affects our mental and bodily responses so strongly.

Berlyne’s (1971, 1974) books on aesthetics and psychobiology initiated today’s scientific definition of the aesthetic response. The theoretical postulates of Meyer (1956, 1967) inspired music educators to think about assessing music “appreciation.” The psychologist Howard Gardner suggested musical development was more continuous than that of Piaget’s four stages, and he also introduced the idea of music intelligence (Davidson, McKernan, & Gardner, 1981; Gardner, 2006). A recent study (Norman-Haignere, Kanwisher, & McDermott, 2015) using voxel decomposition, revealed that music and speech have separate cortical pathways, possibly confirming music intelligence. Variables may be sound-music-noise-silence (sound waves) sensation, perception, cognition, or a Gestalt approach that organizes stimuli for coherence. The tests are brain waves. From this research, music psychologists determined that the area of the somatosensory cortex representing the fingers of the left-fingering hand in violinists was larger than that in the contralateral hemisphere representing the right bow hand, and also larger than the area in nonmusicians. A continuing question is whether listening, performing, and composing are best understood in terms of neurons and networks or in terms of mental schemata and prototypes. Does emotion generated by music have a role in memory, reasoning, and problem-solving? Heiner Gembris (2006) suggests five areas for research by music psychologists: fetal learning and infant learning after birth, neurobiological research, expertise research, life-span development of musical abilities, and the emergence of developmental theories. Music psychologists suggest, as did Seashore a century earlier, that empirical findings can advise us to how and when to teach so that mind, memory, perception, and cognition can be developed most effectively—a new discipline of “neurodidactics.”

Advances in qualitative research that support the use of self-report instruments like Likert scales, adjective checklists, visual analogue scales, continuous response versions of self-report instruments, nonverbal evaluation tasks, experience sampling methods, (p. 553) diaries, and portfolios along with the controls on narratives as a research tool opened the possibility of investigating the importance of questions about musical meaning.

Test development and research also found a home in curriculum changes in music education with emphases on music of all cultures, of responsibility for community music, and lifelong music experiences, and the incorporation of precepts from sociology into outcome discussions. This increased interest in the affective domain restored it to its rightful place in the hierarchy of music education. Cognition had reigned. Prominent music educators added their expertise to the interests of psychologists investigating music as critical to a meaningful life—taking an exception to Steven Pinker (1997), who argued that music was not part of the evolution of many and was only auditory cheesecake. Among the music educators whose names appear in psychology handbooks are Harold Abeles (Abeles & Chung, 1999), John Sloboda (2010), Donald Hodges (2010), David Hargreaves and Adrian North (2010), Susan Hallam (2016) Tia DeNora (2011), Gary McPherson (McPherson & Hallam, 2009), and Robert Woody (Woody & McPherson, 2010). Lucy Green at the University of London Institute of Education and Jeanne Bamberger at MIT suggest that music education should be more informal (the opposite of in-school music). Green’s influence comes from her privately funded Musical Futures research with student outcomes of improved student motivation, enjoyment, and attitude. These three outcomes are assessed by teacher opinion of change in each (Green, 2008). A second assessment has been by student opinionnaire. Jeanne Bamberger (1978) agrees that school music promotes formal at the expense of intuitive understanding.

Harold Abeles with coauthor Jin Won Chung contributed to the change by documenting how responses to music related to the taxonomy of the affective domain (Krathwohl, Bloom, & Masia, 1964). Abeles cites 1974 research by Roderer, who provided two neurophysiological explanations of how music evokes meaningful and emotional responses. One neurological study clearly supports Meyers’s expectation theory that predications are based on past experiences. A second study indicated that that limbic system is probably engaged during music processing (Roderer, 1974). Abeles and Chung (1999) provide a thorough description of tests designed to measure affective responses, mood, preferences, tastes, and attitudes. With a focus on music education, the use of the semantic differential, paired comparisons, rating scales, categorical judgements, and behavioral measures are analyzed to discern the confusion in terminology leading to different results from similar studies. In-depth test results are provided for specific mood-emotion responses such as anxiety and arousal. Abeles and Chung provide examples that have used the sophisticated bodily measures as well as the subjective tools that measure emotion and issues related to preference.

With the affective domain of equal importance to the cognitive domain, this new and different type of data collection is important, and only a brief description of its quality is given in this chapter (we do not suggest that affect exists apart from thought). The change in test development to music uses has been accompanied by changes in the curriculum. Music appreciation long had its objective to broaden taste, to hear the subtleties in the classics, and respond to the tonal qualities that were thought to lead to a love for exemplary performances in music’s historical periods. Responding to music with this orientation was (p. 554) an important outcome, but vague and subjective. Tests were based on “What did you hear?” Today, responding by the body and the brain is accurately measured by laboratory tests. Accompanying subjective measures have often been well researched.

Research on response to music in daily life is a new objective. Hargreaves and North (2010) monitor music listening preferences not only by questionnaires, interviews, ranking, experience sampling, and observations but also with cell phones and similar devices. Multiple authors have amassed reliable preference scales by age, class, and country. Ratings are analyzed by multivariate techniques, factor analysis, cluster analysis, multidimensional scaling, or correspondence analysis to find a limited number of fundamental dimensions in this domain. Teachers want to know as much as possible about the impact of music and have confidence in the results of qualitative research. The National Assessment has attempted, since the first assessment, to obtain a general idea of student attitude, interest, and preference. Statements were along a continuum. Self-reports are necessary, as one cannot identify a specific emotion from physiological measures. Self-report instruments include Likert scales, adjective checklists, visual analogue scales, continuous response, nonverbal evaluation tasks (according to similarity without the verbal), experience sampling method (ongoing activities related to emotion and causes), diary study, free report, and narrative method. Emotion in music is a scientific construct that includes feelings, behaviors, and reactions in all of life (Zentner & Eerola, 2010).

The Geneva Emotional Music Scales are constructed each year on the basis of around 800 questionnaires. Confirmatory factor analysis found emotions of wonder, transcendence, tenderness, nostalgia, peacefulness, power, joyful activation, tension, and sadness (Zentner, Grandjean, & Scherer, 2008). A factor analysis was conducted in 2011 on genre-free cross-culture music preferences that results in five factors: (1) mellow, comprising smooth and relaxing styles; (2) urban, defined by rhythmic and percussive music; (3) sophisticated, which includes classical and jazz; (4) intense, defined by forceful and energetic music; and (5) campestral, a variety of music such as found in country and singer-songwriter genres (Rentfrow, Goldberg, & Levitin, 2011).

Changes in sociology and the sociology of music provide a rich resource for new measures. It has changed the emphasis from music production to how music is consumed and what it does in social life (WHOQOL, 1998). Sociology encompasses the production of culture, knowledge, institutions, organizations, and their conventions (Denora, 2010, 2011). Denora’s own research was based on 50 in-depth interviews seeking to understand music’s role in relation to the achieved character of feeling. The sociological approach stresses understanding of meaning as a result of experiences in music. In 1965, E. L. Rainbow conducted a pilot study to establish that a student’s socioeconomic status was a significant predictor or musical aptitude (Rainbow, 1965). Most tests in the sociology of music today were developed to investigate how music is used in consumer societies and how social influences combine to shape musical tastes. These tests were built on the earlier adjective work of Hevner and Farnsworth (1954), often using semantic differentials. The belief seemed to suggest that the physical setting and social dynamics were more important than the music itself. Building on the ideas of (p. 555) Csikszentmihalyi and Csikszentmihalyi (1988), Custodero (2002) found that musical flow experiences in early childhood tend to occur with active multisensory involvement presented in a socially playful or game-like context. Group flow exists with jazz structure and improvisation. Measurement occurs by eye contact and bodily gesture. In 1993, Madsen, Brittin, and Capperell-Sheldon reported on the development of a two-dimensional empirical-method for measuring an emotional response, the CRDI or continuous response digital interface. It is not clear whether the CRDI measures feeling or judgment (Madsen, Brittin, & Capperella-Sheldon, 1993). The idea for the CRDI is based on the operant music listening recorder and a practice of simply squeezing. Robert Woody and Gary McPherson (2010) connect emotion with motivation, partially based on Woody’s research on learning to feel the music, where college students learned to feel the music by manipulating elements like tempo and dynamics.

Sociology of music includes research on folk music including that on Alan Lomax and how folksong styles in most cultures reflect economic and social conditions and attitudes. Sociology of music requires a definition of talents that are more extensive than those often championed. These were mentioned earlier and include performing, improvising, composing, arranging, analyzing, appraising, conducting, and teaching music, each requiring tests or test-like devises drawn from sociology and other related disciplines (McPherson & Hallam, 2009). The Department of Psychology at Uppsala University maintains a center, Appraisal in Music and Emotion (AMUSE), to describe and explain people’s responses to music using multiple measures (Gabrielsson, 1995). The use of music, of context, and of students’ ranges as an outcome from student to national identity is better. In 2005, US support for the war in Iraq was beginning to wane. The National Association for Music Education obtained the support from more than a dozen corporations and launched the National Anthem Project, a 3-year campaign trumpeted as “restoring America’s voice through music education.” Students and others were to learn to sing the national anthem (Garofalo, 2010; Quigley, 2004).The test of the project was traditional and subjective; the public judged the quality of the performance at the 21st world’s largest concert.

Personality and attitude are also impacted by listening to music. The IPAT Music Preferences Test of Personality (Catell & Anderson, 1953; Cattell, Eber, & Anderson, 1954; Cattell & Saunders, 1954) is based on the assumption that personality types should prefer a definite type of music. The scale provides 11 factors that might be used in a clinical situation. Cameron, Duffy, and Glenwright (2015a) studied personality types of student musicians. They used the Myers-Briggs Type Indicator (Meyers-Briggs, 1944/1975) and found that singers were extraverted and used that knowledge in planning lessons and activities. Also in 2015, Singers Take Center Stage (Cameron, Duffy, & Glenwright, 2015b), a similar study, was published in Psychology of Music.

Emotion is only one part of responding: How one responds depends on multiple factors, thus making standardized testing difficult. One’s cultural background is important. Personality test and music experiences are commonly investigated. In their most recent study at the time of this writing, Cameron et al. (2015a) again used the Myers-Briggs Type Indicator.

(p. 556) Attitude seems to have three components: feeling, beliefs, and values. Researchers use Likert scales, semantic differential, Guttman scalograms, Q methodology, double-digit analysis, and some multidimensional scaling (Kemp, 1981, 1996). The personality characteristics of musicians by instrument and section have been investigated by John Davies (1978) through interviews that included how the brass section is viewed by string players. Kenny, Driscoll, and Ackermann (2014) report on the psychological well-being of professional orchestral musicians in eight Australian orchestras, using the State-Trait-Anxiety Inventory, the Social Phobia Inventory, and the Anxiety and Depression detector. Kenny and Ackermann (2015) investigated performance anxiety and depression in orchestral players.

Performance

Probably no other component of teaching and learning is more important than performance, and observation is the primary “test” for gathering information. Hallam and Prince (2003) asked a sample of musicians, student musicians, educators, and the general public to define “musical ability.” Seventy-one percent (71%) viewed musical ability as being able to play a musical instrument or sing. The new Every Student Succeeds Act (S. 1177, 114th Cong., 2015) in the United States delegates curricular responsibilities to localities and states; this change will shift teaching priorities. One can observe the technical proficiency of a performance or one can “enjoy” a performance. One observes music performances in the classroom, and the music critic observes in the concert hall. Any large-scale assessment must have a performance component in order to fairly represent teaching and learning in music. The first NAEP in 1971–1972 (National Center for Education Statistics, 2018) required students to sing familiar songs, repeat unfamiliar musical material, improvise, perform from notation, and demonstrate proficiency by performing music that had been “learned.” “Testing” performance must usually be done one student at a time.

Sight-Singing

Although critically important, few formal tests for performance have been published. Hillbrand’s sight-singing test was published by the World Book Company in 1923 (Hillbrand, 1923). It consisted of a four-page folder of six original songs and a list of criteria: notes wrongly pitched, transpositions, times flatted, times sharped, notes omitted, errors in time, extra notes, repetitions, and hesitations. Teachers today would probably use a similar list in testing for sight-reading. In 1925, Raymond Mosher published a group method of measuring sight-singing in Teachers College Contributions to Teacher Education, 194 (Mosher, 1925) that applied a set of common criteria in a group method. Otterstein and Mosher (1932) published their sight-singing test of 28 exercises in both major and minor modes, plus a few more difficult intervals in 1932. Lists of sight-singing (p. 557) materials (which could be considered or used as tests) continue to be published with multiple editions. Zhukov, Viney, Riddle, Teniswood-Harvey, and Fujimura (2016) reported on strategies to improve sight-reading of advanced pianists. The available materials indicate that sight-singing remains an important competency for many teachers (Kopiez, Weihs, Ligges, & Lee, 2006). Gutsch (1965) conducted a federally funded study on the ability to sight-read rhythm on an instrument.

Instrumental Performance

John Watkins (1941) developed an objective measurement of instrumental performance (trumpet) which was adapted for all band instruments by Stephen Farnum (1954). It was designed to cover 6 years of learning and consisted of 14 “sight-reading” exercises graded in difficulty. The objective grading system was primarily on pitch and rhythm, but included observation of tempo, expression, and slur markings as well as the repeat sign. Fifteen years later, Farnum (1969a) released a string performance scale designed for grades 7–12. Farnum based the difficulty of 14 of the 17 string exercises on the performance competence of about 50 violinists but suggested that the test would be difficult to standardize. The Oregon State Department (1977) published a self-evaluation checklist for orchestra grades 4–12. Their argument was that students should check their own progress on concrete elements of orchestra performance. Gary McPherson and John McCormick have investigated the contribution of motivational factors to an instrumental performance examination (McPherson & McCormick, 2000; McCormick & McPherson, 2003). Self-efficacy was measured by obtaining strength of beliefs about self-performance competence on an 11-point scale. Using questionnaires on cognitive strategy use, anxiety/confidence, intrinsic value, and self-regulation, the authors conclude that how students think about themselves, the task, and their performance is just as important as the time they devote to practicing. Evans and McPherson (2015) in a 10-year study, asked questions about the students’ perception of their future in music. Those with a long-term view demonstrated higher achievement and more practicing than students with a shorter view of the role of music in their lives.

Choral Music

Choral music tests are primarily based on observation and a nonstandardized checklist. Phillips and Doneski (2011) report that the 1997 NAEP test found that only 25% of 8th-grade students could sing “Twinkle Twinkle Little Star” at the adequate level or above. They state that this fact should have been a wake-up call for testing performance competence. Joanne Rutkowski (1990) has developed an unpublished but respected Singing Voice Development Measure to assess singing progress. It has nine levels from presinger to singer. The fourth level is termed “inconsistent limited range singer,” where the student wavers between speaking and singing and uses a limited range, usually D3 to F3. The use of nine testing levels indicates that learning to sing could be a lengthy process.

(p. 558) Surprisingly, there are few attitude studies about choral music. Phillips and Doneski (2011) report only eight studies of children’s attitude toward vocal music prior to 1990 and fewer since that time. Only 16 of the 42 common song repertoire list of NAfME are found on any other vocal list or songbook (McGuire 2000).

At the secondary level, Henry (2001, 2004) developed a vocal sight-reading inventory and tested it with about 180 students. Little additional information is available. Use of the spectrograph has been tried to measure individual and group performances but is unreliable for testing due to mike placement and room acoustics.

State Music Contests in the United States

Performance testing, individual and group, occurs at state music contests, both instrumental and choral. The process is open-ended, although in some states music must be performed from an approved list and difficulty levels assigned. Usually three judges use a standard checklist to arrive at a final rating. The checklist may have weighted items. Judges are often experienced but not formally trained. A number of research projects have proposed rating scales, but none are used for these state contests. The rating of “overall effect” seems to take precedence over individual performance items.

Other Tests of Performance

There is limited research that combines performance research results; Jennifer Mishra (2014) conducted a meta-analysis of 92 studies on sight-reading. Eighty-one percent of the studies found no impact in the intervention that tried to improve sight-reading. The interventions that helped were focused on aural training, controlled reading (eye movement), creative activities, and singing/solfège (p. 143). Although significant, the differences were small.

The Associated Boards of the Royal Schools of Music (ABRSM, http://us.abrsm.org) initiated a performance examination for piano, which has since been expanded to include all instruments. The ABRSM has seven practical tests for eight grades (these are grades of competence, not grade levels in school). Examiners are trained and some 700 examiners are based in 90 countries. Examiners use standardized rubrics for performance and for competence in scales, arpeggios, sight-reading, and aural tests. Jazz competence by instrument is assessed in addition to music theory. There are rubrics for ensembles and choirs, and diplomas in both teaching and conducting.

A few colleges have promoted tests of competency rather than course work to meet any requirements including graduation. New Hampshire allows competency tests in its public schools, including music. Performance may be the primary criterion at present. A student well versed in jazz can inform the school music teacher how his or her competence meets standards (Marion & Leather, 2015).

(p. 559) Creativity

Developing tests for the assessment of creativity in music has been difficult. The first such test may have been Moorhead and Pond (1941). Assessment of creative work is based generally on the tests developed by Torrance (1974) in general creative thinking, fluency, flexibility, elaboration, and originality. Vaughan (1973) suggested measures of fluency, rhythmic security, ideation, and synthesis. The dependent variable was an “interesting response.” Gorder (1980) developed creative tasks for an instrumentalist. Gorder’s three tasks are straightforward divergent thinking tasks using musical materials. The student composes or improvises on simple melodies according to stimuli of various complexity rendered on percussion instruments. Students can respond by singing, whistling, or playing a familiar instrument. There is also a musical staff with space for recording the task, and an opportunity to demonstrate the task with a contour line. Peter Webster (1987/1992) continues to work on developing measures of creativity. He has developed composition tasks like composing a phrase for a triangle, a stimulating one to think about variations on a phrase and creating an extended composition, along with analysis of a melody and a duet. The student is to improvise rhythms on claves and melodies on bells, and to reproduce a melody on bells that are scored. He suggests (1987) the measurement is one of musical extensiveness, musical flexibility, musical originality, and musical syntax. Kodály expert Zoltan Laczo (1995), investigated musical abilities, especially the relationship between intelligence and vocal improvisation. Teresa Amabile (1996) has replaced Torrance as the leader in creativity. In testing, she equates reliability with validity. Her position is that creativity is easily recognized and, if so, consensual technique is all that is needed for assessment. Stefanic and Randles (2014) have developed consensual materials in music. Amabile’s work is heuristic rather than algorithmic, requiring the student to have some factual knowledge, technical skills, and musical talent to engage in creating. Student personality and motivation are more influential than cognitive competence. Like other tasks, long hours of practice from an early age, belief in self, perseverance, unwillingness to accept the first solution, exposure to other composers, existence of mentors, and critics are possible ways to measure creative potential.

Tests Used for Teacher Evaluation

Observation is used to test novice and experienced teachers. The observation of student teachers is “high stakes,” as one can fail student teaching if the observer so deems. It is also subject to great error, as can be proven by varying stories of individuals who observe firsthand an accident but report important details incorrectly, or of subjects who mistakenly identify suspects in a line-up. Observation has become high-stakes in teacher (p. 560) evaluation. The interest of the US-based Bill and Melinda Gates Foundation in teaching resulted in a multiple-year study (Bill and Melinda Gates Foundation, 2013), mentioned earlier, on the validity of observation at a cost of at least 45 million dollars. The research on observation was conducted in mathematics (grades 6–8) and language arts (grades 4–6) accompanied by a student questionnaire (Tripod) developed by Ronald Ferguson (2012). The focus of the observation in the Gates’s research was limited to its use in evaluating teachers. Several different observation “tests” or strategies were used: the Classroom Assessment Scoring System (CLASS), the Framework for Teaching (FfT), the Protocol for Language Arts Teaching Observations (PLATO), the Mathematical Quality of Instruction (MQI), and the Quality of Science Teaching (QST); the primary one being the Danielson (FfT), which is also fundamental to the National Board for Teacher Proficiency Test, Praxis III. Despite considerable training in observation, observers did not discern differences between score points on a rubric. The FfT framework was recently revised to clarify the rubrics used with Danielson’s four phases of planning and preparation, classroom environment, instruction (including use of assessment), and professional responsibilities. Observation scores do differ by grade level but not by teacher characteristics, experience, education, demographics, classroom composition, or potential rater bias. Most observational scales focus on planning and preparation, classroom environment, instruction, and professional responsibilities. The MET study involved 3,000 volunteer teachers in six states and in urban situations (Bill and Melinda Gates Foundation, 2013).

What can music educators learn from this major study with a focus on two disciplines? Professional educators have accepted that there is a commonality to teaching. The Gates’s research, however, should raise a cautionary flag for assessment based on observation. Principal ratings were lower for middle school than for elementary. With the need to have a cut-score in all meaningful assessments, judgment of where to place the cut score is important. Even after training, observers had difficulty discerning differences between score points even with the use of exemplars on the borders of score points (Bell et al., 2014). Public school music teachers recognize the importance of also being objective about parent and possibly administrative opinion. Observers were better at assessing classroom organization and environment than on instructional and emotional aspects of classrooms. Student–teacher interaction and instruction of complex topics were the most difficult to assess. With all observation systems studied, the first 30 minutes of the class were adequate to obtain enough information for assessment, as long as observers focused on a small set of complementary traits.

Teacher Certification Tests

Many states have a subject matter examination (high stakes) as part of teacher certification using the ETS Praxis (Educational Testing Service, 2015). These may be local (some cities can certify teachers) or the state may contract with a professional organization (p. 561) like Pearson Education to construct and administer the test. The 2015 California subject matter test in music (California Educator Credentialing Examinations, 2015) is typical, requiring functional keyboard proficiency, vocal/instrumental proficiency, including proficiency on a required number on one’s major instrument, knowledge of conducting, score reading, including recognition of be-bop from a score, applied music theory, orchestration, music history, and a relationship between music and dance, with a few questions on relationships such as acoustics and community relations. In addition, certification may require that competence be established for preservice music teachers (students in tertiary music education teacher preparation programs) by passing the edTPA, a portfolio evaluation that contains a video segment of the students’ teaching. There is an accompanying list of questions that asks the student why they did what they did, what changes they would make the next time, and to reflect on their work. The statements are designed to have the student justify planning, teaching, and assessment decisions. The idea is to establish competence in curriculum goals, instructional strategies, assessment, use of research, rapport building, and fairness. The rubrics used to assess the teaching are those of the revised Danielson Framework (Ferguson & Danielson, 2014). In other states, different observation matrices similar to those listed earlier are used. Based on the philosophy that “teaching is teaching,” the music student must carefully select those music teaching tasks that best fit what is to be assessed. edTPA is an incomplete, and perhaps unsatisfactory test, of the basics that a music teacher must know and be able to do (Jordan & Hawley, 2016; Wilkerson, 2015). One assumes that a job analysis has been conducted as is common with government employees or fire and police personnel, and that rating scales describe work responsibilities as well as knowledge that applies to the arts. The visual arts have made an excellent case that critical thinking does not apply to the teaching of painting—critical thinking is an important tool for the art critic but not the artist. One senses this possible disconnect in some proposed standards for music education. The work of Smith and Wuttke (2016) is encouraging.

Other Uses of Music Related Tests

Researchers who study intelligence, emotion, personality, physical, behavioral, and psychological reaction have used music and/or music tests. Space allows for only a few examples. David Healey (2016) investigated the impact of marching band membership on engagement of college students with diversity, personal social responsibility, reflective learning, and other concepts based on George Kuh’s National Survey of Student Engagement (NSSE; Kuh, 2012). Zdzinski (2013) developed a test to measure parental involvement and home environment in music. That test, paired with Fortney’s 1992 music attitude scales that measured attitudes of students in high school instrumental programs, was used on a national scale with more than a thousand students. Zdzinski’s later research (Zdzinski et al., 2014–2015) was supported by National Association of (p. 562) Music Merchants (NAMM), the Grammy Foundation, and the US Department of Education through a FIPSE grant. The finding was that parenting style was significantly related to all outcome variables in music as well as success in school. Martin-Santana, Muela-Molina, and Reinares-Iara (2015) reported on music in radio advertising and the effects on the spokesperson’s credibility and advertising effectiveness. Athletes purposely use music to manage their emotional state and music listening may facilitate their subsequent reactive performance measured with an fMRI (Bishop, Karageorghis, & Kinrade, 2009; Bishop, Karageorghis, & Liouzu, 2007; Bishop, Wright, & Karageorghis, 2014). Tempo and intensity of pretask music modulate neural activity during reactive task performance based on the fMRI. The same authors (2007) investigated the effects of musically induced emotions on choice reaction time performance. They used grounded theory to assess young tennis players’ use of music to manipulate their emotional state. Anxiety in music interested Osborne & Kenny (2005, 2008) and Sarbescu & Dorgo (2014).

Perhaps the most interesting test is the effect of a music instrument in obtaining a female’s phone number. Three hundred females (ages 18–22) exiting a Paris underground station were approached by a handsome male of the same age who asked for her phone number, posing with a sports bag, no bag, or a guitar. Thirty-one percent responded positively to the male when he had the guitar; 14% with no bag, and 9% with the sports bag (Gueguen, Meineri, & Fischer-Lokou, 2014). Human judgment remains the ultimate test in music.

Rating scales are common in noncognitive assessment, but methods for moderating rating responses such as anchoring vignettes, forced choice, rank and preference methods, situational judgment tests, and performance measures are also used. Schellenberg (2016) identified more than 160 studies that related music to nonmusical abilities. Many of these investigated the relationship of music to intelligence. Others focused on language, speech, emotion, preferences, and memory. The measures used were reliable tests in each field as well as observation.

Conclusion

An exhaustive search of published and unpublished tests indicates that their value to teaching and learning has been underused for over 100 years. Many are not aware of what tests have to offer. Achievement tests have been developed for doctoral dissertations, an indicator of interest and importance, but few were influential or found a use in the stable of teaching/learning aids. One problem lies in defining what is meant by musicality and what aspects of musicality can be successfully taught and validly tested. Creating, performing, and responding are listed as three artistic processes that presumably could or should be taught. Process, however, is just that—a process, and incompatible with formative and summative tests. These three processes are too broad to provide guidance for a taxonomy of learning objectives.

(p. 563) A second problem is the lack of a felt need to test either talent or achievement. Music theory and music history are taught and evaluated in every school of music with no known demand for standardized and validated tests. Most of the colleges have an entrance/aptitude examination consisting of a one-shot performance that appears to satisfy any need. The aptitude tests developed in the 20th century identified the elements of sound and established that individuals differed in their ability to discriminate among these elements. There is agreement that artistry/musicality requires a good ear but that is insufficient; acute discrimination plus a vision of appropriateness or fitness is basic to musical interpretation. Improvisation and interpretation are creative musical processes that are not sufficiently generalizable for testing. Fitness requires explicit and implicit contextual knowledge along with the means to communicate. There is both a physical and mental component to fitness. These and other competencies are long-term goals requiring practice, which lacks instructional sensitivity.

Twenty-first-century psychologists are using tests to investigate brain and bodily activity resulting from human reaction to the musical elements identified by aptitude test developers. The work is exciting and interesting. Both James Mursell and Carl Seashore believed that aptitude was more holistic than simple measurement of the elements of sound. Mursell believed that an emotional reaction was necessary for sound to become music. Seashore (1947) believed that music was the language of emotion and as a means of communication it acquired its social value.

A third problem is the rise of qualitative thinking and the sociology of music in how music teaching should be conducted, and that any assessment should not be limited to the classroom. Bowman and Frega (2012) argue that it is more important to “do” music than to explore troubling and distracting questions about how it might be done more effectively. Elliott and Silverman (2015) pick up on John Dewey’s philosophy that educational objectives are not prespecifications of learning but rather the outcomes of teacher-learning interactions. This idea would certainly complicate achievement testing. They argue against separating community and school music and believe that curriculum is established by teachers reflecting on subject matter knowledge, resources and materials, students’ abilities, lesson aims and goals, teaching strategies, and evaluation procedures making teaching idiosyncratic. They also argue against the artificial mind-body split, and oppose standards suggesting that assessment is an interaction among students, teachers, and particular content. Penuel and Shepard (2016) also suggest that teaching is interactive and adaptive, suggesting that interim assessments do not have an extensive research base and were introduced only for No Child Left Behind as they lack the connection of embedded assessment. Riconscente, Mislevy, and Corrigan (2016) believe that assessment is a broad set of processes and instruments by which we arrive at inferences about learner proficiency. Behaviors and/or performances should reveal the constructs of the theory that facilitates communication, coherence, and efficiency in assessment. Fautley (2010), in a text on music assessment, states that it is difficult to write criteria for aesthetic quality and one should negotiate quality with the students. He also agrees with Bennett Reimer that understanding is a process, and one makes (p. 564) judgments along a continuum. Keith Swanwick (Swanwick, 1999) criticizes present assessment practices in developing tests in the United States for standards. Difficulty is assessed by quantity, not quality. A performance rated difficult has more things, more key signatures, more sharps and flats, more variety of rhythm patterns, and more notes.

Assessment and the curriculum must be compatible, even matched but difficult at best with a sociological orientation and music outcomes linked to community music. Describing student outcomes rather than providing comparison data has been encouraged by our most ardent supporters, including Robert Stake, Liora Bresler, and Elliot Eisner. Curriculum researchers admit the paucity of assessment instruments and hope the teacher understands and can build teaching/learning on the best available evidence. A careful reading of the material in this chapter indicates what is known, and what is known is the basis for better assessments.

References

Abeles, H. F., & Chung, J. W. (1999). Responses to music. In D. A. Hodges (Ed.), Handbook of music psychology (2nd ed., pp. 285–342). San Antonio, TX: IMR Press.Find this resource:

    Aliferis, J. (1947/1954). Aliferis music achievement test-College entrance level. Minneapolis: University of Minnesota Press.Find this resource:

      Aliferis, J., & Stecklein, J. (1962). Aliferis-Stecklein music achievement test—College midpoint level. Minneapolis: University of Minnesota Press.Find this resource:

        Amabile, T. (1996). Creativity in context. Boulder, CO: Westview Press.Find this resource:

          Asmus, E., Jr. (1979). The operational characteristics of adjectives as descriptors of musical affect (Unpublished doctoral dissertation). University of Kansas, Lawrence, KS.Find this resource:

            Baker, G. (2014). El Sistema: Orchestrating Venezuela’s youth: New York, NY: Oxford University Press.Find this resource:

              Bamberger, J. (1978). Intuitive and formal musical knowing: Parables of cognitive dissonance. In S. Madeja (Ed.), The arts, cognition, and basic skills (pp. 173–209). St. Louis, MO: CEMREL.Find this resource:

                Beach, F. A. (1921). Beach standardized music test. (2nd ed.) Emporia, KS: Bureau of Educational Measurements and Standards.Find this resource:

                  Behne, K. (1997). The development of “Musikerleben” in adolescence: How and why young people listen to music. In I. Deliege, & J. Sloboda (Eds.), Perception and cognition of music (pp. 143–160). Hove, UK: Psychology Press.Find this resource:

                    Bell, C., Qi, Y., Croft, A., Leusner, D., McCaffrey, D., Gitomer, D., & Pianata, R. (2014). Improving observational score quality: Challenges in observer thinking. In T. Kane, K. Kerr, & R. Pianta (Eds.), Designing teacher evaluation systems: New guidance from the Measures of Effective Teaching Project. (pp. 50–97). San Francisco, CA: Jossey-Bass.Find this resource:

                      Bentley, A. (1966). Musical ability in children and its measurement. London, UK: Harrap.Find this resource:

                        Berger, J., & Karabenick, S. (2016). Construct validity of self-reported metacognitive learning strategies. Educational Assessment, 21(1), 19–33.Find this resource:

                          Berlyne, D. E. (1971). Aesthetics and psychobiology. New York: Appleton-Century-Crofts.Find this resource:

                            Berlyne, E. D. (Ed). (1974). Studies in the new experimental aesthetics: Steps toward an objective psychology of aesthetic appreciation. New York, NY: Halsted Press.Find this resource:

                              Bill and Melinda Gates Foundation. (2013, January). Ensuring fair and reliable measures of effective teaching: Culminating findings from the MET Project’s three-year study. Retrieved (p. 565) from http://www.metproject.org/downloads/MET_Ensuring_Fair_and_Reliable_Measures_Practitioner_Brief.pdfFind this resource:

                                Bishop, D., Karageorghis, C., & Kinrade, N. (2009). Effects of musically-induced emotions on choice reaction time performance. The Sport Psychologist, 23, 59–76.Find this resource:

                                  Bishop, D., Karageorghis C., & Loizou, G. (2007). A grounded theory of young tennis players’ use of music to manipulate emotional state. Journal of Sport and Exercise Psychology, 29, 585–607.Find this resource:

                                    Bishop, D., Wright, M., & Karageorghis, C. (2014). Tempo and intensity of pre-task music modulate neural activity during reactive task performance. Psychology of Music, 42, 714–727. doi: 10.3389/fnhum.2015.00508Find this resource:

                                      Bochkarev, L. (1989). Psikhologicheskie mekhanizmy muzykal’nogo perezhivaniia. [Psychological mechanisms of musical experience]. (Doctoral dissertation). Kyiv, Ukraine: Tara Shevchenko National University.Find this resource:

                                        Bowman, W., & Frega, A. L. (2012). What should the music education profession expect of philosophy? In W. Bowman, & A. L. Frega, (Eds.). The Oxford handbook of philosophy in music education (pp. 17–36). Oxford, UK: Oxford University Press.Find this resource:

                                          Bridges, D., & Rechter, B. (1974/1978). Australian test for advanced music studies. Hawthorn, Australia: Australian Council for Education Research.Find this resource:

                                            Brophy, T. (2000). Assessing the developing child musician: A guide for general music teachers, Chicago, IL: GIA Publications.Find this resource:

                                              Bundra, J. (1994). A study of the music listening processes through verbal reports of school-aged children (Unpublished doctoral dissertation). Northwestern University, Evanston, IL.Find this resource:

                                                Burton, D. (2016). A quartile analysis of selected variables from the 2008 NAEP visual arts report. Studies in Art Education 57(2), 165–178.Find this resource:

                                                  California Educator Credentialing Examinations. (2015). California Subject Examinations for Teachers (CSET): Music. Retrieved from http://www.ctcexams.nesinc.com/TestView.aspx?f=HTML_FRAG/CA_CSET136_TestPage.htmlFind this resource:

                                                    Cameron, J. Duffy, M., & Glenwright, B. (2015a). Personality types of student musicians: A guide for music educators. Canadian Music Educator, 56(4), 13–17.Find this resource:

                                                      Cameron, J., Duffy, M., & Glenwright, B. (2015b). Singers take center stage! Personality traits and stereotypes of popular musicians. Psychology of Music, 43, 818–830. doi: 10.1177/0305735614543217Find this resource:

                                                        Cattell, R., & Anderson, J. (1953). The measurement of personality and behavior disorders by the IPAT Music Preference Tests. Journal of Applied Psychology, 37, 446–454.Find this resource:

                                                          Cattell, R., Eber, H., & Anderson, J. (1954). The IPAT Music Preference Test of Personality (The MPT). Champaign, IL: The Institute for Personality and Ability Testing.Find this resource:

                                                            Cattell, R., & Saunders, D. (1954). Music preferences and personality diagnosis: A factorization of 120 themes. Journal of Social Psychology, 39, 3–24.Find this resource:

                                                              Cohen, V. (1997). Exploration of kinesthetic analogues for musical schemes. Bulletin of the Council for Research in Music education, 131, 1–13.Find this resource:

                                                                Colwell, R. (1965/1970). Music achievement tests, 1–4. Chicago, IL: Follett.Find this resource:

                                                                  Colwell, R. (1979). Silver Burdett music competency tests. Morristown, NJ: Silver Burdett.Find this resource:

                                                                    Csikszentmihalyi, M., & Csikszentmihalyi, I. (Eds.). (1988). Optimal experience: Psychological studies of flow in consciousness. Cambridge, UK: Cambridge University Press.Find this resource:

                                                                      Custodero, L. A. (2002). Seeking challenge, finding skill: Flow experience and music education. Arts Education Policy Review, 103, 3–9.Find this resource:

                                                                        Darling-Hammond, L., Herman, J., Pelligrino, J., Abedi, J., Aber, J. L., Baker, E., … Steele, C. (2014). Criteria for high quality assessment. Stanford Center for Opportunity Policy in (p. 566) Education: Stanford University; Center for Research on Student Standards and Testing, University of California at Los Angeles; Learning and Science Research Institute, University of Illinois at Chicago.Find this resource:

                                                                          Davidson, L., McKernan, P., & Gardner, H. (1981). The acquisition of song: A developmental approach. In Documentary report of the Ann Arbor Symposium (pp. 301–315). Reston, VA: Music Educators National Conference.Find this resource:

                                                                            Davidson, L., & Scripp, L. (1992). Surveying the coordinates of cognitive skills in music. In Colwell, R. (Ed.), Handbook of research on music teaching and learning (pp. 392–413). New York: Schirmer Books.Find this resource:

                                                                              Davies, J. (1978). The psychology of music. Stanford, CA: Stanford University Press.Find this resource:

                                                                                Denora, T. (2010). Emotion as social emergence: Perspectives from music sociology. In P. N. Juslin, & J. A. Sloboda (Eds.), Handbook of music and emotions: Theory, research, and applications (pp. 159–183). Oxford, UK: Oxford University Press.Find this resource:

                                                                                  Denora, T. (2011). Emotion as social emergence: Perspectives from music psychology. In Julsin, P., & Sloboda, J. (Eds.), Handbook of music and emotion: Theory, research, applications (pp. 159–186). Oxford, UK: Oxford University Press. doi: 10.1093/acprof:oxfordhb/9780199230143.003.0007Find this resource:

                                                                                    Dickey, M. (1991). A comparison of verbal instruction and nonverbal teacher-student modeling in instrumental ensembles. Journal of Research in Music Education, 39, 132–142. doi: 10.2307/3344693Find this resource:

                                                                                      Doherty, K. M., & Jacobs, S. (2013). Connect the dots: Using evaluations of teacher effectiveness to inform policy and practice (State of the states 2013). Washington, DC: National Council on Teacher Quality (NCTQ).Find this resource:

                                                                                        Domaleski, C. (2011) State end of course testing program: A policy brief. Retrieved from http://www.wyoleg.gov/InterimCommittee/2011/SelectAccountability/State%20End%20of%20Course%20Test%20Programs%2091511.pdf.Find this resource:

                                                                                          Drake, R. (1957). Manual for the Drake Musical Aptitude Tests. Chicago, IL: Science Research Associates.Find this resource:

                                                                                            Educational Testing Service. (1951/2001). Graduate record examinations advanced tests: Music. Princeton, NJ: Author.Find this resource:

                                                                                              Educational Testing Service. (2015). Praxis performance assessment for teachers. Princeton, NJ: Author.Find this resource:

                                                                                                Eisner, E. (2001). Should we create new aims for art education? Art Education, 54(5), 6–10. doi: 10.1080/00043125.2001.11653461Find this resource:

                                                                                                  Elliott, D., & Silverman, M. (2015). Music matters (2nd ed.). New York: Oxford University Press.Find this resource:

                                                                                                    Evans, P., & McPherson G. (2015). Identity and practice: The motivational benefits of a long-term musical identity. Psychology of Music, 43, 407–422. doi: 10.1177/0305735613514471Find this resource:

                                                                                                      Farnsworth, P. R. (1954). A study of the Hevner adjective circle. Journal of Aesthetics and Art Criticism, 13, 97–103.Find this resource:

                                                                                                        Farnsworth, P. R. (1969). The social psychology of music. Ames: Iowa State University Press.Find this resource:

                                                                                                          Farnum, S. (1953/1969). Farnum music notation test. New York, NY: Psychological Corporation.Find this resource:

                                                                                                            Farnum, S. (1954). The Watkins-Farnum performance scale. Winona, MN: Hal Leonard Music.Find this resource:

                                                                                                              Farnum, S. (1969a). The Farnum string scale. Milwaukee, WI: Hal Leonard Music.Find this resource:

                                                                                                                Farnum, S. (1969b). Farnum music test. New York, NY: Psychological Corporation.Find this resource:

                                                                                                                  Fautley, M. (2010). Assessment in music education. London, UK: Oxford University Press.Find this resource:

                                                                                                                    Ferguson, R. (2012). Can student surveys measure teacher quality? Phi Delta Kappan, 94(3), 24–28. doi: 10.1177/003172171209400306Find this resource:

                                                                                                                      (p. 567) Ferguson, R., & Danielson, C. (2014). How framework for teaching and tripod 7Cs evidence distinguish key components of effective teaching. In T. Kane, K. Kerri, & R. Pianta (Eds.), Designing teacher evaluation systems: New guidance from the measures of effective teaching project (pp. 98–143). San Francisco, CA: Jossey-Bass.Find this resource:

                                                                                                                        Ferrara, S., & Way, D. (2016). Design and development of end-of-course tests for student assessment and teacher evaluation. In H. Braun (Ed.), Measuring the challenges to measurement in an era of accountability (pp. 11–48). New York, NY: Routledge.Find this resource:

                                                                                                                          Fortney, P. (1992). The construction and validation of an instrument to measure attitudes of students in high school instrumental programs. Contributions to Music Education, 19, 32–45.Find this resource:

                                                                                                                            Froseth, J. (1982). Test of melodic ear to hand coordination (Unpublished doctoral dissertation). University of Michigan, Ann Arbor, MI.Find this resource:

                                                                                                                              Froseth, J. (1983). Ear-hand coordination test. Chicago, IL: GIA Publications.Find this resource:

                                                                                                                                Gabrielsson, A. (1995). Music psychology in Sweden. In M. Manturewska, K. Miklaszewski, & A. Biatkowski (Eds.), Psychology of music today. Warsaw, Poland: Fryderyk Chopin Academy of Music.Find this resource:

                                                                                                                                  Galton, F. (1869). Hereditary genius: An inquiry into its laws and consequences. (reissued in 1892). London, UK: MacMillan Books.Find this resource:

                                                                                                                                    Galton, F. (1883). Inquiries into human faculty and its development. London, UK: J.M. Dent.Find this resource:

                                                                                                                                      Galton, F. (1889). Natural inheritance. London, UK: Macmillan/McGraw-Hill.Find this resource:

                                                                                                                                        Gardner, H. (2006). Multiple intelligences: New horizons. New York, NY: Basic Books.Find this resource:

                                                                                                                                          Garofalo, R. (2010). Politics, meditation, social context, and public use. In P. Juslin, & J. Sloboda (Eds.), Handbook of music and emotions: Theory, research, and applications (pp. 725–754). Oxford, UK: Oxford University Press.Find this resource:

                                                                                                                                            Gaston, E. T. (1942/1957). A test of musicality. Lawrence, KS: Odell’s Instrumental Service.Find this resource:

                                                                                                                                              Gembris, H. (2002). The development of musical abilities. In R. Colwell, & C. Richardson (Eds.), The new handbook of research on music teaching and learning (pp. 487–508). New York, NY: Oxford University Press.Find this resource:

                                                                                                                                                Gembris, H. (2006). The development of musical abilities. In R. Colwell (Ed.), MENC handbook of musical cognition and development (pp. 124–164). New York, NY: Oxford University Press.Find this resource:

                                                                                                                                                  Gildersleeve, G., & Soper, W. (1921). Musical achievement test. New York: Columbia Teachers College Press. (Also listed as 1929 or n.d. in some sources.)Find this resource:

                                                                                                                                                    Gorder, W. (1980). Divergent production abilities as constructs of musical creativity. Journal of Research in Music Education, 28(1), 34–42. doi: 10.2307/3345051Find this resource:

                                                                                                                                                      Gordon, E. (1965). Musical aptitude profile. Boston, MA: Houghton Mifflin.Find this resource:

                                                                                                                                                        Gordon, E. (1967). A three-year longitudinal predictive validity study of the Musical Aptitude Profile. Iowa City: University of Iowa Press.Find this resource:

                                                                                                                                                          Gordon, E. (1970/1991). Iowa tests of musical literacy. Iowa City, IA: Bureau of Educational Research and Service.Find this resource:

                                                                                                                                                            Gordon, E. (1979). Primary measures of music audiation. Chicago, IL: GIA.Find this resource:

                                                                                                                                                              Gordon, E. (1982). Intermediate measures of music audiation. Chicago, IL: GIA.Find this resource:

                                                                                                                                                                Gordon, E. (1989). Advanced measures of music audiation. Chicago, IL: GIA.Find this resource:

                                                                                                                                                                  Gordon, E. (1997). Taking another look at the established procedure for scoring the advanced measures of music audiation. GIML Monograph Series #2. Narberth, PA: Gordon Institute for Music Learning.Find this resource:

                                                                                                                                                                    Green, L. (2008). Music, informal learning and the school: A new classroom pedagogy. Surry, UK: Ashgate.Find this resource:

                                                                                                                                                                      (p. 568) Gueguen, N., Meineri, S., & Fischer-Lokou, J. (2014). Men’s music ability and attractiveness to women in a real-life courtship context. Psychology of Music, 42, 545–549. doi: 10.1177/0305735613482025Find this resource:

                                                                                                                                                                        Gutsch, K. (1965). Evaluation in instrumental performance: An individual approach. Music Educators Journal, 51(3), 2–5.Find this resource:

                                                                                                                                                                          Hallam, S. (2016). Motivation to learn. In S. Hallam. I. Cross, & M. Thaut, The Oxford handbook of music psychology (pp. 463–478). Oxford, UK: Oxford University Press.Find this resource:

                                                                                                                                                                            Hallam, S., & Prince, V. (2003). Conceptions of musical ability. Research Studies in Music Education, 20, 2–22. doi: 10.1177/1321103X030200010101Find this resource:

                                                                                                                                                                              Hargreaves, D. J., & North, A. C. (2010). Experimental aesthetics and liking for music. In P. N. Juslin, & J. A. Sloboda (Eds.), Handbook of music and emotions: Theory, research, and applications (pp. 515–546). Oxford, UK: Oxford University Press.Find this resource:

                                                                                                                                                                                Healey, D. (2016). E pluribus unum: An evaluation of student engagement and learning in the college marching band (Unpublished doctoral dissertation). Boston College, Boston, MA.Find this resource:

                                                                                                                                                                                  Henry, M. (2001). The development of a vocal sight-reading inventory. Bulletin of the Council for Research in Music Education, 150, 21–35.Find this resource:

                                                                                                                                                                                    Henry, M. (2004). The use of targeted pitch skills for sight-singing instruction in the choral rehearsal. Journal of Research in Music Education, 52, 206–217. doi: 10.2307/3345855Find this resource:

                                                                                                                                                                                      Hevner, K. (1935). Expression in music: A discussion of experimental studies and theories. Psychological Review 42, 187–204.Find this resource:

                                                                                                                                                                                        Hevner, K. (1936). Experimental studies of the elements of expression in music. American Journal of Psychology, 48, 246–268. doi: 10.2307/1415746Find this resource:

                                                                                                                                                                                          Hevner, K., & Landsbury, J. (1935). Oregon musical discrimination tests, Chicago, IL: C.H. Stoelting.Find this resource:

                                                                                                                                                                                            Hickey, M. (1991). A comparison of verbal instruction and nonverbal teacher-student modeling in instrumental ensembles. Journal of Research in Music Education, 39, 132–142. doi: 10.2307/3344693Find this resource:

                                                                                                                                                                                              Hillbrand, E. (1923). Hillbrand sight-singing Test. New York, NY: World Book Company.Find this resource:

                                                                                                                                                                                                Hodges, D. A. (2010). Psychophysiological measures. In P. N. Juslin, & J. A. Sloboda (Eds.), Handbook of music and emotion: Theory, research, applications (pp. 279–311). New York, NY: Oxford University Press.Find this resource:

                                                                                                                                                                                                  Holmstrom, L. G. (1963). Musicality and prognosis. SvenskaBokforlaget/Norstedts Publication #17. Stockholm, Sweden: Royal Swedish Academy of Music.Find this resource:

                                                                                                                                                                                                    Hutchinson, H., & Pressey, L. C. (1927) Hutchinson music tests. Bloomington, IL: Public School Publishing Company.Find this resource:

                                                                                                                                                                                                      Jordan, A., & Hawley, T. (2016, February 16). By the elite, for the vulnerable: The edTPA, academic oppression, and the battle to define good teaching. Teachers College Record. [ID Number 19461]. Retrieved from http://www.tcrecord.org.Find this resource:

                                                                                                                                                                                                        Juslin, P. N., Liljestrom, S., Vastfjall, D. Barradas, G., & Silva, A. (2008). An experience sampling study of emotional reactions to music: Listener, music, and situation. Emotion, 8, 668–683. doi: 10.1037/a0013505Find this resource:

                                                                                                                                                                                                          Juslin, P. N., Liljestrom, S. Vastfjall, D., & Lundqvist, L. (2010). How does music evoke emotions? Exploring underlying mechanisms. In P. Juslin, & J. Sloboda (2010). (Eds.), The handbook of music and emotion: Theory, research and applications (pp. 605–642). Oxford, UK: University of Oxford Press.Find this resource:

                                                                                                                                                                                                            Juslin, P. N., & Sloboda, J. A. (Eds.). (2010). The handbook of theory and emotion: Theory, research, and applications. New York, NY: Oxford University Press.Find this resource:

                                                                                                                                                                                                              Kane, T., & Kerr, K., & Pianta, R. (Eds.). (2014). Designing teacher evaluation systems. New guidance from the measures of effective teacher project. San Francisco, CA: Jossey-Bass.Find this resource:

                                                                                                                                                                                                                (p. 569) Karma, K. (1974, 1979, 1995). Auditory and visual temporal structuring. In M. Manturzewska, K. Miklaszewsik., & A. Bialkowski (Eds.), Psychology of music today. Warsaw, Poland: Chopin Academy.Find this resource:

                                                                                                                                                                                                                  Karma, K. (1975). Selecting students for music instruction. Bulletin of the Council for Research in Music Education 75, 23–32.Find this resource:

                                                                                                                                                                                                                    Kemp, A. (1981). Personality differences between the players of string, woodwind, brass, and keyboard instruments, and singers. Bulletin of the Council for Research in Music Education, 66–67, 33–38.Find this resource:

                                                                                                                                                                                                                      Kemp, A. (1996). The musical temperament: Psychology and personality of musicians. Oxford, UK: Oxford University Press.Find this resource:

                                                                                                                                                                                                                        Kenny, D., & Ackermann, B. (2015). Performance-related musculoskeletal pain, depression and music performance anxiety in professional orchestral musicians: A population study. Psychology of Music, 43(1), 43–60. doi: 10.1177/0305735613493953Find this resource:

                                                                                                                                                                                                                          Kenny, D., Driscoll, T., & Ackermann B. (2014). Psychological well-being in professional orchestral musicians in Australia: A descriptive population study. Psychology of Music, 42, 210–232. doi: 10.1177/0305735612463950Find this resource:

                                                                                                                                                                                                                            Keston, M., & Pinto, I. (1913/1955). Possible factors influencing musical preference. Journal of Genetic Psychology, 86, 101–113.Find this resource:

                                                                                                                                                                                                                              Knuth, W. (1936/1966). Knuth achievement tests in music. San Francisco, CA: Creative Arts Research Associates.Find this resource:

                                                                                                                                                                                                                                Kopiez, R. Weihs, C. Ligges, U., & Lee, J. (2006). Classification of high and low achievers in a music sight-reading task. Psychology of Music, 34(1), 5–26.Find this resource:

                                                                                                                                                                                                                                  Krathwohl, D. R., Bloom, M. S., & Masia, B. B. (1964). Taxonomy of education objectives, handbook II: Affective domain. New York, NY: David McKay Company.Find this resource:

                                                                                                                                                                                                                                    Kwalwasser, J. (1926). Melodic and harmonic sensitivity tests. Iowa City, IA: Bureau of Educational Research and Service.Find this resource:

                                                                                                                                                                                                                                      Kwalwasser, J. (1927). Tests and measurements in music. Boston, MA: C.C. Birchard.Find this resource:

                                                                                                                                                                                                                                        Kwalwasser, J., & Ruch, G. (1927). Kwalwasser-Ruch test of musical accomplishment. Iowa City, IA: Bureau of Educational Research and Service.Find this resource:

                                                                                                                                                                                                                                          Kyme, G. (1954). The value of aesthetic judgments in the assessment of musical capacity (Unpublished doctoral dissertation). University of California, Berkeley, CA.Find this resource:

                                                                                                                                                                                                                                            Laczo, Z. (1995). Psychology of music in Hungary. In M. Manturzewska, K. Miklaszewsik, & A. Bialkowski (Eds.), Psychology of music today (pp. 50–51). Warsaw, Poland: Chopin Academy.Find this resource:

                                                                                                                                                                                                                                              Lahdelma, I., & Eerola, T. (2016). Single chords convey distinct emotional qualities to both naïve and expert listeners. Psychology of Music, 44(1), 37–54. doi: 10.1177/0305735614552006Find this resource:

                                                                                                                                                                                                                                                Larson, W. (1938). Practical experience with music tests. Music Educators Journal, 24(3), 70–84.Find this resource:

                                                                                                                                                                                                                                                  Long, N. (1965). A revision of the university of Oregon music discrimination test (Unpublished doctoral dissertation). University of Indiana, Bloomington, IN.Find this resource:

                                                                                                                                                                                                                                                    Long, N. (1978). Indiana-Oregon music discrimination test. Bloomington, IN: Mid-West Tests.Find this resource:

                                                                                                                                                                                                                                                      Madison, T. H. (1942). Interval discrimination as a measure of musical aptitude. Archives of Psychology, no. 268, (entire issue).Find this resource:

                                                                                                                                                                                                                                                        Madsen, C., Brittin, R., & Capperella-Sheldon, D. (1993). An empirical method for measuring the aesthetic experience to music. Journal of Research in Music Education, 41, 57–69.Find this resource:

                                                                                                                                                                                                                                                          Marion, S., & Leather, P. (2015). Assessment and accountability to support meaningful learning: New Hampshire’s effort to move to competency education of PACE. Education Policy Analysis Archives, 23(9). doi: 10.14507/epaa.v23.1984Find this resource:

                                                                                                                                                                                                                                                            (p. 570) Martin-Santana, J., Reinares-Iara, E., & Muela-Molina, C. (2015). Music in radio advertising: Effects on radio spokesperson credibility and advertising effectiveness. Psychology of Music 43(6), 763–778.Find this resource:

                                                                                                                                                                                                                                                              Mateos-Moreno, D. (2015). Latent dimensions of attitudes towards contemporary music: A structural model. Psychology of Music 43(4), 545–562.Find this resource:

                                                                                                                                                                                                                                                                McCormick, J., & McPherson, G. (2003). The role of self-efficacy in a musical performance examination. An exploratory structural equation analysis. Psychology of Music, 31(1), 37–51. doi: 10.1177/0305735603031001322Find this resource:

                                                                                                                                                                                                                                                                  McGuire, K. (2000). Common songs of the cultural heritage of the United States: A compilation of songs that most people “know” and “should know.” Journal of Research in Music Education, 48, 310–322. doi: 10.2307/3345366Find this resource:

                                                                                                                                                                                                                                                                    McPherson, G., & Hallam, S. (2009). Musical potential. In S, Hallam, I. Cross, & M. Thaut (Eds.), The Oxford handbook of music psychology (pp. 255–264). Oxford, UK: Oxford University Press.Find this resource:

                                                                                                                                                                                                                                                                      McPherson, G., & McCormick, J. (2000). The contribution of motivational factors to instrumental performance in a music examination. Research Studies in Music Education, 15(1), 31–39. doi: 10.1177/1321103X0001500105Find this resource:

                                                                                                                                                                                                                                                                        McPherson, G., & McCormick J. (2006). Self-efficacy and music performance. Psychology of Music, 34, 325–339. doi: 10.1177/0305735606064841Find this resource:

                                                                                                                                                                                                                                                                          Meyer, L. B. (1956). Emotion and meaning in music. Chicago, IL: University of Chicago Press.Find this resource:

                                                                                                                                                                                                                                                                            Meyer, L. B. (1967). Music the arts and ideas. Chicago, IL: University of Chicago Press.Find this resource:

                                                                                                                                                                                                                                                                              Mishra, J. (2014). Improving sightreading accuracy: A meta-analysis. Psychology of Music, 42, 131–156. doi: 10.1177/0305735612463770Find this resource:

                                                                                                                                                                                                                                                                                Moorhead, G., & Pond, D. (1941). Music of young children: A three volume report. Santa Barbara, CA: Pillsbury Foundation for the Advancement of Music Education.Find this resource:

                                                                                                                                                                                                                                                                                  Mosher, R. (1925). A study of group method of measurement of sight-singing. Contributions to Education, 194 (entire issue). New York, NY: Teachers College, Columbia University.Find this resource:

                                                                                                                                                                                                                                                                                    Mursell, J. (1937). The psychology of music. New York, NY: Norton.Find this resource:

                                                                                                                                                                                                                                                                                      Mursell, J., & Glenn, M. (1931). The psychology of school music teaching. New York, NY: Silver Burdett.Find this resource:

                                                                                                                                                                                                                                                                                        Myers-Briggs, I. (1944/1975). The Myers-Briggs Type Indicator. Princeton: ETS, 1962; Gainesville, FL: Myers-Briggs Foundation, 1975.Find this resource:

                                                                                                                                                                                                                                                                                          National Center for Education Statistics. (2018). National Assessment of Educational Progress (NAEP) (1974/1978/1997/2008). Retrieved from https://nces.ed.gov/nationsreportcard/

                                                                                                                                                                                                                                                                                          Norman-Haignere, S., Kanwisher, N., & McDermott, J. (2015). Distinct cortical pathways for music and speech revealed by hypothesis-free voxel decomposition, Neuron, 88, 1281–1296. doi: 10.1016/j.neuron.2015.11.035Find this resource:

                                                                                                                                                                                                                                                                                            Norris, E. L., & Bowes, J. E. (Eds.). (1970). National Assessment of Educational Progress: Music objectives. Education Commission of the States, Denver, CO. Retrieved from ERIC database (ED063197).Find this resource:

                                                                                                                                                                                                                                                                                              Oregon State Department of Education (1977). Self-evaluation checklist for orchestra, grades 4-12. Salem, OR: Author. Retrieved from ERIC database (ED152662).Find this resource:

                                                                                                                                                                                                                                                                                                Osborne M., & Kenny, D. (2005). Development and validation of a music performance anxiety inventory for gifted adolescent musicians. Journal of Anxiety Disorders, 19, 725–751. doi: 10.1016/j.janxdis.2004.09.002Find this resource:

                                                                                                                                                                                                                                                                                                  Osborne, M., & Kenny, D. (2008). The role of sensitizing experiences in MPA in adolescent musicians. Psychology of Music, 36, 447–462.Find this resource:

                                                                                                                                                                                                                                                                                                    (p. 571) Otterstein, A., & Mosher, R. (1932). O-M Sight-Singing Test. Stanford, CA: Stanford University Press.Find this resource:

                                                                                                                                                                                                                                                                                                      Penuel, W., & Shepard, L. (2016). Assessment and teaching. In D. Gitomer, & C. Bell (Eds.), Handbook of research on teaching (5th ed., pp. 787–850). Washington, DC: AERA.Find this resource:

                                                                                                                                                                                                                                                                                                        Pflederer, M. (1964). The responses of children to musical tasks embodying Piaget’s principles of conservation (Unpublished doctoral dissertation). University of Illinois, Urbana, IL.Find this resource:

                                                                                                                                                                                                                                                                                                          Pflederer, M., & Secrest, L. (1968). How children conceptually organize musical sounds. Council for Research in Music Education, 13, 19–36.Find this resource:

                                                                                                                                                                                                                                                                                                            Phillips, K., & Doneski, S. (2011). Research on elementary and secondary school singing. In R. Colwell, & P. Webster (Eds.), MENC handbook on research on music learning (Vol. 2, pp. 176–232). New York, NY: Oxford University Press.Find this resource:

                                                                                                                                                                                                                                                                                                              Pinker, S. (1997). How the mind works. New York, NY: W.W. Norton.Find this resource:

                                                                                                                                                                                                                                                                                                                Quigley, S. L. (2004). Project to rekindle singing of the national anthem. American Forces Press Service. Retrieved from http://www.defenselink.mil/news/newsarticle.aspx?id=24915

                                                                                                                                                                                                                                                                                                                Rainbow, E. L. (1965). A pilot study to investigate the constructs of musical aptitude. Journal of Research in Music Education, 13(1), 3–14.Find this resource:

                                                                                                                                                                                                                                                                                                                  Ravitch, D. (2016). The New York Review of Books, 34–36.Find this resource:

                                                                                                                                                                                                                                                                                                                    Rentfrow, P., Goldberg, L., & Levitin, D. (2011). The structure of musical preferences: A five-factor model. Journal of Personality and Social Psychology, 100, 1139–1157. doi: 10.1037/a0022406Find this resource:

                                                                                                                                                                                                                                                                                                                      Révész, G. (1999). The psychology of a musical prodigy. (Trans. unknown). London, UK: Routledge. (Original work published 1925.)Find this resource:

                                                                                                                                                                                                                                                                                                                        Riconscente, R. M., Mislevy, R. J., & Corrigan, S. (2016). Evidence-centered design. In S. Lane, M. Raymond, & T. Haladnya (Eds.), Handbook of test development (2nd ed., pp. 40–63). New York, NY: Routledge.Find this resource:

                                                                                                                                                                                                                                                                                                                          Roderer, J. G. (1974). The psychophysics of music perception. Music Educators Journal, 60(6), 20–30.Find this resource:

                                                                                                                                                                                                                                                                                                                            Rutkowski, J. (1990). The measurement and evaluation of children’s singing voice development. The Quarterly, 1(1–2), 81–95.Find this resource:

                                                                                                                                                                                                                                                                                                                              Ruzek, E., Hafen, C., Hamre, B., & Pianta, R. (2014). Combining classroom observations and value added for the evaluation and professional development of teachers. In T. Kane, K. Kerri, & R. Pianta, (Eds.), Designing teacher evaluation systems: New guidance from the measures of effective teaching project (pp. 205–233). San Francisco, CA: Jossey-Bass.Find this resource:

                                                                                                                                                                                                                                                                                                                                S. 1177, 114th Cong. (2015, December 10). Every Student Succeeds Act, Public Law 114-95. Washington, DC: US Government Printing Office. Retrieved from https://www.congress.gov/bill/114th-congress/senate-bill/1177/text?overview=closedFind this resource:

                                                                                                                                                                                                                                                                                                                                  Sarbescu P., & Dorgo, M. (2014). Frightened by the stage or by the public? Exploring the multidimensionality of music performance anxiety. Psychology of Music, 42, 568–579. doi: 10.1177/0305735613483669Find this resource:

                                                                                                                                                                                                                                                                                                                                    Schellenberg, G. (2016). Music and nonmusical abilities. In G. Mcpherson (Ed.), The child as musician: A handbook of musical development (2nd ed., pp. 149–176). New York, NY: Oxford University Press.Find this resource:

                                                                                                                                                                                                                                                                                                                                      Seashore, C. (1915). The measurement of musical talent. New York, NY: Schirmer.Find this resource:

                                                                                                                                                                                                                                                                                                                                        Seashore, C. (1932). The vibrato. University of Iowa studies in psychology of music (Vol. 1). Iowa City, IA: University Press.Find this resource:

                                                                                                                                                                                                                                                                                                                                          Seashore, C. (1936). The vibrato. University of Iowa studies in psychology of music (Vol. 3). Iowa City, IA: University Press.Find this resource:

                                                                                                                                                                                                                                                                                                                                            (p. 572) Seashore, C. (1938). Psychology of music. New York, NY: McGraw Hill.Find this resource:

                                                                                                                                                                                                                                                                                                                                              Seashore, C. (1946). In search of beauty in music: A scientific approach to musical esthetics. (reprint). New York, NY: The Ronald Press Co.Find this resource:

                                                                                                                                                                                                                                                                                                                                                Seashore, C. (1947). In search of beauty in music: A scientific approach to musical esthetics. New York, NY: Ronald Press.Find this resource:

                                                                                                                                                                                                                                                                                                                                                  Sloboda, J. A. (2010). Music in everyday life: The role of emotions. In P. N. Juslin, & J. A. Sloboda (Eds.), Handbook of music and emotions: Theory, research, and applications (pp. 493–514). Oxford, UK: Oxford University Press.Find this resource:

                                                                                                                                                                                                                                                                                                                                                    Smith, W., & Wuttke, B. (2016). Developing a model of the effective first-year secondary music teacher: Musical and teaching skills. In T. Brophy, J. Marlatt, & Ritcher, G. (Eds.), Connecting practice, measurement, and evaluation (pp. 177–192). Chicago, IL: GIA Publications.Find this resource:

                                                                                                                                                                                                                                                                                                                                                      Stanton, H. (1935). Measurement of musical talent: The Eastman experiment. Studies in the Psychology of Music, 2. Iowa City, IA: University of Iowa Press.Find this resource:

                                                                                                                                                                                                                                                                                                                                                        Stefanic, N., & Randles, C. (2014). Examining the reliability of scores from the consensual assessment technique in the measurement of individual and small group creativity. Music Education Research, 17, 278–295. doi: 10.1080/14613808.2014.909398Find this resource:

                                                                                                                                                                                                                                                                                                                                                          Stumpf, C. (1883/1890). Tonpsychologie. Leipzig, Germany: Hirzel.Find this resource:

                                                                                                                                                                                                                                                                                                                                                            Strouse, C. (1937). Strouse music test. Emporia: Kansas State Teachers’ College, Bureau of Educational Measurements.Find this resource:

                                                                                                                                                                                                                                                                                                                                                              Swanwick, K. (1999). Teaching music musically. London, UK: Routledge.Find this resource:

                                                                                                                                                                                                                                                                                                                                                                Swinchoski, A. (1965). A standardized music achievement test batter for the intermediate grades. Journal of Research in Music Education, 13, 159–168. doi: 10.2307/3343670Find this resource:

                                                                                                                                                                                                                                                                                                                                                                  Tan, Siu-Lan., & Spackman, M. (2005). Listeners’ judgments of the musical unity of structurally altered and intact musical compositions. Psychology of Music, 33, 133–153. doi: 10.1177/0305735605050648Find this resource:

                                                                                                                                                                                                                                                                                                                                                                    Thompson, W., & Schellenberg, G. (2006). Listening to music. In R. Colwell (Ed.), MENC handbook of musical cognition and development (pp. 72–113). New York, NY: Oxford University Press.Find this resource:

                                                                                                                                                                                                                                                                                                                                                                      Torgerson, R. L., & Fahnestock, E. (1927/1930). Torgerson-Fahnestock music test. Bloomington, IL: Public School Publishing Company.Find this resource:

                                                                                                                                                                                                                                                                                                                                                                        Torrance, E. P. (1974). Torrance tests of creative thinking. Berensville, IL: Scholastic Testing Service.Find this resource:

                                                                                                                                                                                                                                                                                                                                                                          Umemoto, T., Mikuno, M., & Murase, A. (1989). Development of tonal sense: A new test of cognition of pitch deviation. Human Developmental Research, 5, 155–174.Find this resource:

                                                                                                                                                                                                                                                                                                                                                                            University of Maryland Libraries. (2005, July). Music tests. Paul Lehman Papers, Special Collections, Series 6 (processed by T. McKay). Retrieved from http://hdl.handle.net/1903.1/19477Find this resource:

                                                                                                                                                                                                                                                                                                                                                                              Valentine, C. (1913). The aesthetic appreciation of musical intervals among school children and adults. British Journal of Psychology, 6, 190–216.Find this resource:

                                                                                                                                                                                                                                                                                                                                                                                Vaughan, M. (1973). Cultivating creative behavior: Energy levels in the process of creativity. Music Educators Journal, 59(8), 35–37. doi: 10.2307/3394272Find this resource:

                                                                                                                                                                                                                                                                                                                                                                                  Vispoel, W. (1987). An adaptive test of musical memory: An application of item response theory to the assessment of musical ability (Unpublished doctoral dissertation). University of Illinois, Urbana, IL.Find this resource:

                                                                                                                                                                                                                                                                                                                                                                                    Watkins, J. (1941). Objective measurement of instrumental performance. New York, NY: Columbia University Press.Find this resource:

                                                                                                                                                                                                                                                                                                                                                                                      (p. 573) Webster, P. (1987/1992). Research on creative thinking in music: The assessment literature. In R. Colwell (Ed.), Handbook of research on music teaching and learning (pp. 266–280). New York, NY: Schirmer Books.Find this resource:

                                                                                                                                                                                                                                                                                                                                                                                        Wedin, L. (1972). Multidimensional scaling of emotional expression in music. Swedish Journal of Musicology, 54, 115–131.Find this resource:

                                                                                                                                                                                                                                                                                                                                                                                          WHOQOL Group. (1998). Development of the WHOQOL.BREG Quality of life assessment. Psychological Medicine, 28, 551–558.Find this resource:

                                                                                                                                                                                                                                                                                                                                                                                            Wilkerson, J. (2015). Examining the interval structure evidence for the performance assessment for California teachers. Journal of Teacher Education, 66, 184–192.Find this resource:

                                                                                                                                                                                                                                                                                                                                                                                              Wing, H. (1958). Standardized tests of musical intelligence. Windsor, UK: National Foundation Education Research.Find this resource:

                                                                                                                                                                                                                                                                                                                                                                                                Woody, R. H. (2000). Learning expressivity in music performance: An exploratory study. Research Studies in Music Education, 14, 14–23.Find this resource:

                                                                                                                                                                                                                                                                                                                                                                                                  Woody, R. H., & McPherson, G. E. (2010). Emotion and motivation in the lives of performers. In P. N. Juslin, & J. A. Sloboda (Eds.), Handbook of music and emotions: Theory, research, and applications (pp. 401–424). Oxford, UK: Oxford University Press.Find this resource:

                                                                                                                                                                                                                                                                                                                                                                                                    Zdzinski, S. (2013). The underlying structure of parental involvement-home environment in music. Bulletin of the Council for Research in Music Education, 198, 69–88.Find this resource:

                                                                                                                                                                                                                                                                                                                                                                                                      Zdzinski, S., Dell, C., Gumm, A., Rinnert, N., Orzolek, D., Yap, C. C., Cooper, S., … Russell, B. (2014–2015). Musical home environment, family background, and parenting style on success in school music and in school. Contributions to Music Education, 40(1), 71–90.Find this resource:

                                                                                                                                                                                                                                                                                                                                                                                                        Zenatti, A. (1991). A comparative study in sample of French and British children and adults. Psychology of Music, 19, 63–73.Find this resource:

                                                                                                                                                                                                                                                                                                                                                                                                          Zentner, M., & Eerola, T. (2010). Self-report measures and models. In P. N. Juslin, & J. A. Sloboda (Eds.), Handbook of music and emotions: Theory, research, and applications (pp. 187–221). Oxford, UK: Oxford University Press.Find this resource:

                                                                                                                                                                                                                                                                                                                                                                                                            Zentner, M., Grandjean, D., & Scherer, K. (2008). Emotions evoked by the sound of music: Characterization, classification, and measurement. Emotion, 8, 494–521. doi: 10.1037/1528-3542.8.4.494Find this resource:

                                                                                                                                                                                                                                                                                                                                                                                                              Zhukov, K., Viney, L., Riddle, G., Teniswood-Harvey, A., & Fujimura, K. (2016). Improving sight-reading skills in advanced pianists: A hybrid approach. Psychology of Music, 44, 155–167. doi: 10.1177/0305735614550229Find this resource: