Please Note: Balanced Assessment printed materials are again available, as of December 6, 2005. Please use the order form and follow the ordering directions carefully as they have changed. Additionally, note that not all prices on the site are current. The current prices can be found on our order form.
Please also note that the Balanced Assessment
Primary &
Elementary Tasks have been published by Corwin
Press. These tasks may still be viewed
in .pdf
format on this website but they may not be copied or printed.

![]()

![]()

![]()

![]()

![]()
An Interim Report
of the
Harvard Group
Balanced Assessment in Mathematics Project
September, 1995
Educational Technology Center
Harvard Graduate School of Education
This research was supported by a subcontract from the University of California at Berkeley, under National Science Foundation grant MDR‑9252902.
Let early education be a sort of amusement; you will then be better able to find out the natural bent.
Plato, The Republic, bk. VII, 537
Introduction
Assessing the mathematical performance of our students and the effectiveness of our mathematics instructional programs has become a major concern of a large part of the mathematics education community as well as a concern of several larger publics. The National Council of Teachers of Mathematics (NCTM) has addressed that concern in its recently released Assessment Standards for School Mathematics which provides a set of six standards to guide the development of assessment instruments for school mathematics.
This NCTM document makes clear, however, that it is a guide and not a “how-to” document. Guides are necessary but not sufficient. One actually needs different models of assessment that instantiate the principles set down in guidelines such as those offered by the NCTM. Balanced Assessment in Mathematics (BA) is a National Science Foundation Project charged with developing new approaches to the assessment of mathematical competence in the elementary and secondary grades. The principal grantee is the University of California at Berkeley with subcontracts to Michigan State University, the Shell Mathematics Centre of the University of Nottingham, and the Educational Technology Center of the Harvard University Graduate School of Education. The Principal Investigator for the entire project is Alan Schoenfeld of the University of California at Berkeley.
The main goal of Balanced Assessment is to produce assessment that can be used in classrooms throughout the nation — assessments that reflect the values of the mathematics reform movement as articulated in the National Council of Teachers of Mathematics Curriculum and Evaluation Standards. The assessments created by Balanced Assessment are designed to provide students, teachers, schools and parents with useful information about how students and programs are doing with respect to those standards.
This document is an interim report of two years of work by the Harvard Group of the project. It is intended to both complement and supplement other reports issued by the project. This report addresses the following questions:
· What is mathematics about?
· What are the purposes of assessment?
· How should assessment in mathematics be done?
· What is Balanced Assessment in Mathematics about?
· What is the Harvard Group of Balanced Assessment in Mathematics about?
Our report also includes a complete archive of the work of the Harvard Group. It contains a packet of on-demand tasks and scoring rubrics at the elementary level, several packets of on-demand tasks and scoring rubrics at the secondary level, a secondary level portfolio packet containing problems and projects, and a technology resource package from which teachers may draw materials to supplement other on-demand assessment. Included with this report is a CD-ROM containing all of these materials in a form that can be used by anyone who has access to Microsoft Word on either a PC-compatible or Macintosh computer with a CD-ROM drive.
The work of the Balanced Assessment Project has been influenced in no small measure by the efforts of those who preceded it. We have been helped enormously by being able to draw on these efforts. A selected bibliography of the most important of these is included as Appendix C.
This document owes much to the work of many hundreds of students and dozens of teachers. We are indebted to all of them. We are particularly grateful to Joel Hillel for helping us think through many knotty issues. In addition we want to thank Walter Stroup and the Boston teachers and students with whom he worked for many of the tasks involving graphing calculators. Finally, we wish to thank our colleagues at the other project sites.
Judah L. Schwartz, Director
Joan M. Kenney, Coordinator
Kevin A. Kelly
Teresa Sienkiewicz
Yesha Sivan
Victor Steinbok
Michal Yerushalmy
Table of Contents
1. What is Mathematics About? — the way we see the structure of the subject...............
The Objects of Mathematics..................................................................................................
The Actions of Mathematics................................................................................................
What Can/Should be Expected of Students at the Elementary Level.......................................
What Can/Should be Expected of Students at the Secondary Level.......................................
2. What are the Purposes of Assessment?...........................................................................
3. How should Assessment in Mathematics be Done?.........................................................
4. What is Balanced Assessment in Mathematics About?...................................................
5. What is the Harvard Group of Balanced Assessment About?........................................
Why this Report?................................................................................................................
Task Design........................................................................................................................
New Task Types.................................................................................................................
The “-Ness” tasks.........................................................................................................
Fermi tasks....................................................................................................................
Example generation........................................................................................................
Weighting of Tasks..............................................................................................................
Writing Rubrics for Tasks....................................................................................................
Scoring Student Performance...............................................................................................
Balancing Assessment Packets.............................................................................................
Appendix A: Mathematical Content Matrix for the Elementary Grades..........................
Appendix B: An Analysis of the “Square-Ness” Task........................................................
Appendix C: Selected Assessment Bibliography.................................................................
Appendix D: Balanced Assessment Packets........................................................................
A Balanced Assessment Packet for the Elementary Grades....................................................
Balanced Assessment Packets for the Secondary Grades......................................................
The Technology Resource Packet........................................................................................
Like many subjects, it is possible to identify both content and process dimensions in the subject of mathematics. Unlike many subjects where most of the process dimension refers to general reasoning, problem-formulating and problem-solving skills, the process dimension in mathematics refers to many skills that are mathematics specific. As a result, many people tend to lump content and process together when speaking about mathematics, calling it all mathematics content.
We believe it is important to maintain the distinction between content and process. In part we say this because we believe that this distinction reflects a something very deep about the way humans approach mental activity of all sorts. All human languages have grammatical structures that distinguish between noun phrases and verb phrases. They use these structures to express the distinction between objects, and the actions carried out by or on these objects.
We believe that the content-process distinction in mathematics is best described by the words object and action. What are the mathematical objects we wish to deal with? What are the mathematical actions that we carry out with these objects? We will try to answer these questions in a way that makes clear the continuity of the subject from the earliest grades through post-secondary mathematics. Seen in the proper light there are really very few kinds of mathematical objects and actions.
The first set of mathematical objects we need to consider are number and quantity. Indeed, elementary mathematics is largely about these objects and the actions we carry out with and on them.
integers (positive and negative whole numbers and zero)
rationals (fractions, decimals and all the integers)
measures (length, area, volume, time, weight)
reals (p, e, etc. and all the rationals)
complex numbers
vectors and matrices
Along with number and quantity we introduce very early a concern for another kind of mathematical object, namely shape and space.
topological spaces (concepts of connected and enclosure)
metric spaces (with such shapes as lines/segments, polygons, circles, conic sections, etc.)
From the beginning we try to make students aware of pattern in the worlds of number and shape. Pattern as a mathematical object matures into function which is the central mathematical object of the subjects we call algebra and calculus.
functions on real numbers (linear, quadratic, power, rational, periodic, transcendental)
functions on shapes
There are several other kinds of mathematical objects that have less prominent roles in the mathematics we expect our youngsters to study. These include Chance and Data, and Arrangement:
relative frequency and probability
discrete and continuous data
Some aspects of data collection, organization and presentation can be done in the earliest grades but little, if any, data analysis. Notions of probability are not realistically addressable until late middle school.
permutations, combinations, graphs, networks, trees, counting schemes
At the youngest grades, these topics tend to blend with the study of patterns of numbers and shapes.
The following table describes the kinds of mathematical objects in more detail, along with their properties, operations that can be performed on them, and their pragmatic uses.
|
properties of objects |
operations on objects |
semantics of pragmatic use |
|
|
Number and Quantity integers rationals reals
measures: length area volume time weight |
order between-ness
part-whole relationships
units dimensions
|
arithmetic operations
addition subtraction multiplication division exponentiation |
counting or measuring anything in the world around us |
|
Shape and Space topological
metric: lines/segments polygons circles conic sections
other (e.g. spherical geometry) |
connectedness enclosure
distance location symmetry similarity |
scaling projection translation rotation reflection inversion conformal mapping homotopies/deformations covering, packing and tessellating |
designing and building objects
mapping and traveling |
|
Pattern and Function linear quadratic power rational periodic transcendental
|
domain/range continuity boundedness rate of change, curvature, etc. maxima and minima rate of accumulation
linear-root, slope/intercept
quadratic-roots, axis of symmetry
power-roots, asymptotic behavior
rational-roots, singularities, asymptotic behavior
periodic-frequency, phase
transcendental-“growth” constant |
arithmetic operations (functions on Rn)
comparison equations inequalities identities
composition translation reflection dilation/contraction
|
expressing how something depends on one or more other things
resolving constraints (solving equations and inequalities)
|
|
Chance and Data discrete continuous
|
determinism randomness relative frequency distribution moments |
sampling (by counts and/or measures)
composing representing |
dealing with uncertainty
dealing with lack of precision |
|
Arrangement permutations combinations graphs/networks trees |
adjacency enumeration vertices and edges of graphs/networks |
|
organizing discrete information |
As previously mentioned, the process dimension of mathematics has many actions that are mathematics specific. It also involves actions that are properly regarded as general problem-formulating, problem-solving and reasoning skills. We divide these skills into four categories.
Modeling/Formulating
Transforming/Manipulating
Inferring/Drawing Conclusions
Communicating
With the exception of communication, each of these actions has aspects that are specific to mathematics and aspects that are not specific to mathematics but that are quite general in nature. We list below some of these aspects.
observation and evidence gathering
necessary and/but not sufficient conditions
analogy and contrast
deciding, with awareness, what is important and what can be ignored
deciding, with awareness, what can be mathematized and then doing so
formally expressing dependencies, relationships and constraints
understanding “the rules of the game”
understanding the nature of equivalence and identity
arithmetic computation
symbolic manipulation in algebra and calculus
formal proofs in geometry
shifting point of view
testing conjectures
exploitation of limiting cases
exploitation of symmetry and invariance
exploitation of “between-ness”
making a clear argument orally and in writing (using both prose and images)
It is evident that there is no reasonable way to separate, nor should there be any interest in separating, the domain-specific and the domain-general aspects of the process dimension of mathematics. We therefore come to the conclusion that it is better to parse the domain of mathematics as
object (Number and Quantity, Shape and Space, Pattern and Function, Chance and Data, Arrangement)
´
action (including both domain-specific and domain-general actions)
rather than by
content (usually defined by “topics” — an undifferentiated mixture of objects and domain-specific actions)
and
process (i.e. domain-general actions)
which is the usual procedure in mathematics education.
We view elementary mathematics (at about the 4th grade level) as being concerned with the following mathematical objects:
number and quantity
shape and space
pattern
data
We expect youngsters to be able to demonstrate
· a robust understanding of the conceptual meaning of addition and subtraction of whole numbers and integers
Sarah has 3 apples and Joe gave her 2 more. How many
apples does Sarah have now?
Sarah has 3 apples and Joe has 2 more apples than Sarah. How many apples do
they have altogether?
· a growing understanding of the various meanings of both multiplication and division of whole numbers and integers
At a party 20 bags of candy were given out. Each bag
contained 5 candies. How many candies were given out altogether?
Thelma has 5 skirts and 3 blouses. How many different outfits can Thelma put
together?
· a reasonable degree of computational facility with the four arithmetic operations on whole numbers and integers
· an ability to make reasonable approximations for the results of arithmetic computations (this expectation is not currently realizable in most US fourth grade classrooms)
To the nearest hundred, what is 38 times 42?
To the nearest hundred, what is 716 and 879?
· a growing understanding of the order properties of decimals and other rational fractions (this expectation is not currently realizable in most US fourth grade classrooms)
Write a fraction that is larger than 1/3 and smaller
than ½.
Write a decimal that is larger than 0.083 and smaller than 0.15.
· an ability to identify and measure continuous quantity such as length, area, weight and time
· an ability to make reasonable estimates of lengths, areas, weights and time in ones environment
How much does a gallon of milk weigh?
How much time does it take you to say your name?
We expect youngsters to be able to demonstrate
· an ability to distinguish and name a variety of two- and three-dimensional shapes
Draw three different kinds of closed figures that have four straight lines
· an understanding of the symmetries of these shapes
Find all the lines along which you can fold a paper hexagon so that the two parts lie exactly on top of each other.
· an ability to read and interpret simple maps
Which two rooms in your school are furthest apart? Figure out three different routes that go from one to the other and tell how you would decide which is the shortest route.
We expect youngsters to be able to demonstrate
· an ability to recognize and generate numerical patterns
What numerical pattern could continue the sequence
1, 4, 7, 10, 13, ....?
What numerical pattern could continue the sequence 1, 2, 4, 8, ....?
· an ability to recognize and generate spatial patterns
Can you tile a floor with tiles like this so that the pattern is “regular”?
· an ability to enumerate and organize simple combinations and permutations
How many different ways can you seat four people at a square table so that there is one person on each side of the table?
We expect youngsters to be able to demonstrate
· an ability to collect, organize and display simple data sets
Make a presentation of all the kinds of pets owned by the students in your class. Include their weights, ages, and length from nose to tip of tail (where appropriate).
In Appendix A, the interested reader can see how these expectations of elementary school mathematical competence relates to the earlier discussion of the structure of the subject of mathematics as a whole.
The traditional mathematics curriculum at the elementary levels concentrates on the acquisition of computational skills, specifically getting students to master with some degree of automaticity the algorithms for adding, subtracting, multiplying and dividing whole numbers, fractions, and decimals. We believe it is time to think carefully about that enterprise.
We live in an age when a simple four-function calculator can be bought for less than the cost of a weekly newsmagazine. With the exception of the elementary grades of the schools of our country, almost all the calculation done in the country is done electronically. Thus the schools, in preparing students to calculate “by hand” are not preparing our students for the world they will encounter.
The counter-argument is often made that students need to understand the conceptual underpinnings of the computations that are done in the world around them. Indeed they do! We claim that such conceptual understanding does not flow from mindless repetition of un-understood mathematical ceremonies, but rather from a direct addressing of the conceptual issues involved in computation with whole numbers, fractions and decimals. Thus, at the youngest levels, the reader will find that we have stressed the importance of the order properties of numbers and estimation much more than is normally done in the traditional curriculum. At more advanced levels there are other interesting and subtle conceptual issues about numbers, the differences between the way in which they have been traditionally treated, and the way in which they are treated electronically.
Repetitive computational exercises are often performed without understanding. For example, how many educated adults understand why the procedures for long division or for multiplication and division of fractions work? Filling school and homework time with tiresome computational drill
· does not prepare students for the kinds of applications of mathematics that they are likely to encounter
· deadens the students’ interest and curiosity about mathematics
· uses up time that might better be spent in helping students develop a conceptual understanding of, and appreciation for, the subject of mathematics
Accordingly, we would be well advised to reconsider what we think is important mathematics in the elementary grades.
By the end of secondary school we ought to have a much more reasonable set of expectations of the mathematical capabilities of our students than we now do. In particular, there is a set of expectations we ought to have of students going directly into the world of work as well as of students going on to further education in subjects that are not mathematically demanding. We expect any school-leaving young adult, no matter what their formal mathematical training at secondary level, to be able to meet these expectations.
In our analyses of this question we have relied heavily on the work of some of our secondary school teacher colleagues who regularly bring “blue-collar” people from their community into their algebra classes to talk with students about the mathematics they use in their work.
It is important to point out that our expectations of mathematical competence for this group of young people is not what is normally referred to as “basic skills,” a pastiche of rote-memorized computational procedures and formulae, but rather a much more conceptual set of understandings of how to use the fundamental mathematical content they have learned in the contexts they are likely to encounter.
We have another set of expectations for students going on to further education in the natural sciences and engineering, the social sciences, and business and economics. These expectations are not very different either rhetorically or in content from those put forward by the National Council of Teachers of Mathematics in their Curriculum and Evaluation Standards for School Mathematics. They do differ, however, in what we regard as an important way, i.e., their organization of the mathematics by object ´ action.
In this section we have outlined these two sets of expectations as school-leaving standard and advanced.
By the end of Grade 12 we expect students to be comfortable with the numerical computations at least conceptually and to be able to measure lengths, areas, volumes, weights and time. Minimally, we expect all students to be able to reason qualitatively about the order properties of integers, fractions and decimals as well as be able to estimate length, weight, area, volume, time and number of things in their surround. We also expect students to be able to perform approximate numerical computations readily.
We expect some students to be able to undertake number-theoretic tasks, as well as intricate estimates of number, length, area, volume, weight and time that they encounter in their surround.
We expect all students to be able to scale lengths and read and interpret visual representations such as blueprints, maps, floor plans and clothing patterns.
We expect some students to be able to scale areas and volumes as well as to be familiar with geometrical concepts and constructs, and to be able to formulate, manipulate, and interpret geometrical models of situations in the world around them.
We expect all students to be able to evaluate algebraic expressions, to solve simple equations and inequalities, and to be able to read and interpret graphs. We also expect students to be able to model simple dependencies qualitatively and to sketch qualitative graphs of those dependencies.
We expect some students to be able to formulate and manipulate quantitative algebraic models of the world around them using symbolic, numerical and graphical representations, and to be able to reason, at least qualitatively, about both rates of change and accumulations of functions.
We expect all students to understand the consequences of the law of large numbers, elementary statistical analysis, and the use of statistical evidence in the communications media.
We expect some students to be comfortable with the concepts of statistical independence, conditional probability, and exploratory data analysis.
We expect all students to be able to generate and enumerate simple permutations and combinations.
We expect some students to be comfortable with iterative and recursive algorithms, discrete modeling, and optimization.
As a society and as educators, we assess both performance and competence in education in a variety of ways and for a variety of purposes. Broadly speaking the purposes are
serving instruction
accountability
selection
licensure
Assessing student performance in order to inform instruction is something that all teachers do. It is often the case that an external agency of some sort gets involved in assessment, nominally to serve instruction. The time lapse between the administration of the tests and the reporting of “scores” to teachers who might be able to use the information is such that there is little reason to assume that any such testing by an external agency has much to contribute to assessment for instruction.
Assessing for the purpose of saying how well a student, or a class, or a school, or an instructional program is doing is the primary purpose of assessment for accountability. Traditionally such information has been presented in one of two quite different forms, norm-referenced and criterion-referenced. Norm-referenced accountability statements involve comparing students’ performance (or classes or schools) to one another and then presenting the results of those comparisons in rank order. It should be noted that this can only be done if the performance of the students can be encoded in a unidimensional measure.Criterion-referenced accountability statements involve comparing students’ performance (or classes or schools) to some predetermined set of performance criteria without regard to how they compare to one another. It should be noted that this can only be done if one has a clearly defined set of performance criteria that reflect one’s theory of competence in the domain being assessed.
Assessing for selection is normally done for the purpose of helping to ascertain whether a student will have access to limited resources. Such assessment is often employed in order to inform decisions about access to select universities, programs for gifted music students, special education programs, etc.
Assessing for the purposes of licensure is normally done in order to ascertain whether the people being assessed have exceeded some threshold of minimal competence and are thus permitted to practice in an unsupervised fashion the skill that they have demonstrated. Such skills include driving automobiles, swimming in the deep part of the pool, barbering, butchering, working as an electrician or a plumber, etc.
Although it has never clearly articulated its stance with respect to these purposes, the Balanced Assessment project has focused it attention primarily on assessment to serve instruction and assessment for accountability, largely through the mechanism of assessing the performance of students on collections of tasks that the BA sites devised or adapted.
In 1992 the National Council of Teachers of mathematics undertook the development of a report on assessment to complement its earlier Curriculum and Evaluation Standards for School Mathematics. At the end of this document is a table summarizing the shifts in assessment practice that the NCTM is calling for. We cite that table here.
Major Shifts in Assessment Practice
|
toward |
away from |
|
assessing students’ full mathematical power |
assessing only students’ knowledge of specific facts and isolated skills |
|
comparing students’ performance with established criteria |
comparing students’ performance with that of other students |
|
giving support to teachers and credence to their informed judgment |
designing “teacher-proof” assessment systems |
|
making the assessment process, public, participatory and dynamic |
making the assessment process secret, exclusive and fixed |
|
providing students multiple opportunities to demonstrate their full mathematical power |
restricting students to a single way for demonstrating mathematical knowledge |
|
developing a shared vision of what to assess and how to do it |
developing assessment by oneself |
|
using assessment results to ensure that all students have the opportunity to achieve their potential |
using assessment to filter and select students out of the opportunities to learn mathematics |
|
aligning assessment with curriculum and instruction |
treating assessment as independent of curriculum or instruction |
|
basing inferences on multiple sources of evidence |
basing inferences on restricted or single sources of evidence |
|
viewing students as active participants in the assessment process |
viewing students as the objects of assessment |
|
regarding assessment as continual and recursive |
regarding assessment as sporadic and conclusive
|
|
holding all concerned with mathematics learning accountable for assessment results |
holding only a few accountable for assessment results |
This summary of the past and desired future of assessment in mathematics is as clear a set of guidelines as one could ask for in designing mathematics assessment. However, there is little, if anything, in this summary that could not have been written, with appropriate changes of adjective, by a task force of the National Council of Teachers of English. The central problem of changing the nature of assessment in mathematics must be faced in the design of actual mathematics assessments that reflect these guidelines. Roughly speaking, that is what the Balanced Assessment in Mathematics Project is about.
Balanced Assessment in Mathematics (BA) is a National Science Foundation project charged with developing new approaches to the assessment of mathematical competence in the elementary and secondary grades. The project is being carried out at four sites: the University of California at Berkeley, Michigan State University, the Shell Mathematics Centre of the University of Nottingham and the Educational Technology Center of the Harvard University Graduate School of Education. Support for the Berkeley and Nottingham sites of the project began in July of 1992; support for the Michigan State and Harvard sites began in the fall of 1993.
The main goal of Balanced Assessment is to produce assessment that can be used in classrooms throughout the nation — assessments that reflect the values of the mathematics reform movement as articulated in the National Council of Teachers of Mathematics Curriculum and Evaluation Standards. The assessments created by Balanced Assessment are designed to provide students, teachers, schools and parents with useful information about how students and programs are doing with respect to those standards.
By the end of 1995, BA will have completed the piloting of packages of assessment at each of three levels, elementary, middle school, and high school. The packages have assessment tasks and suggestions for longer projects to be done throughout the school year. Teachers and students use a package to create a set of selected works for each student which can be scored and used to document that student’s mathematical achievement, as well as to provide a balanced picture of that student as a learner of mathematics.
The type of assessments that BA is creating contrast sharply with traditional forms of testing, which rely primarily on multiple-choice questions. On standardized tests students are expected to answer each item in a minute or two. Such tests make no claim to assess a student’s problem-solving abilities, nor do these test provide information about how a student reasons, communicates mathematically or makes connections across mathematical content.
BA’s focus is on rich, mathematically complex work that requires students to create a plan, make a decision or solve a problem — and then justify their thinking. The contents of each assessment package range from short tasks to extended investigations and projects involving a week or more of work, and which include evidence of student collaboration, reflection and growth.
Further the project believes that assessment that is worthwhile to teachers, students, and others with a valid interest in what students can do mathematically, must also have the following characteristics:
· Assessment focuses on important, grade-level appropriate mathematics. Since assessment can only sample from all that is learned, it must sample as effectively as possible — by concentrating on the most important and useful mathematics taught and learned at that grade level, as defined by the NCTM Standards.
· Assessments are worthwhile learning activities — not digressions from learning. For the student, assessment is a tool that helps further the understanding of important mathematical ideas. For the teacher, assessment is student work that informs and augments instruction. Worthwhile assessment is not something students and teachers “stop and do,” but a way to further what they are already doing.
· The assessment maintains a focus on accessibility and equity for all students. The student must have — and the teacher and student must perceive that the student has — a fair opportunity to do his or her best. Assessments are designed to provide a student of either gender and of any cultural, linguistic and socio-economic background with the means to do his or her strongest mathematical work.
· Assessment elicits scorable, informative student work. The assessments are designed to elicit more than just an answer from the student. Rather, students are asked to solve a problem, show their thinking, create a product. The information in the student’s response, and the features of the student’s work that are evaluated, give a picture of his or her understanding of mathematical concepts, strategies, tools and procedures.
Clearly, the stated intent of the Balanced Assessment in Mathematics Project is entirely consonant with the directions in which the NCTM would like to see assessment in mathematics move. More to the point, the products of its efforts are also consonant with these directions and, in our view, represent a major step forward.
While there are no differences between the members of the Harvard Group and the BA project as a whole with respect to overall strategic goals, there are several areas in which the work of the Harvard group of BA differs from that of the other project sites. These include
task design and new task types
“weighting” of tasks
writing rubrics for tasks
scoring students performance
balancing assessment packets
The Harvard Group went about the process of task design in a way that differed from the other groups and was largely informed by the object ´ action analysis of the domain that it had made at the outset. This led to a particular kind of analysis of task demands, and a particular strategy for writing scoring rubrics for tasks. It also led to an explicit procedure for the balancing of tasks.
In addition to these procedural differences, there are two philosophical differences:
1. We believe that human performance in any cognitive domain of interest, including mathematics, is too complex to be reduced to unidimensional measures. Scoring performance of students should reflect this complexity. In keeping with this position, we do not accept the idea giving a student a single score on a task.
In addition to the trivialization of complexity that accompanies unidimensional measures, the use of such measures opens the door to a great deal of social mischief by making it easier to compare students to one another rather than to established criteria as called for by the NCTM and others.
2. Also in keeping with the NCTM Assessment Standards, we believe in making assessment public rather than secret. Well-crafted tasks aimed toward assessing well-defined skills and understandings need not be kept secret, either before or after they are administered. Indeed, if one wishes, as the draft NCTM Assessment Standards call for, to have assessment “...aligned with curriculum and instruction” and to be understood as “...continual and recursive” and for the community of mathematics educators and the public to have a “...shared vision of what to assess and how to do it,” then keeping the tasks secret is counterproductive.
We turn now to a detailed description of some of the ways the Harvard Group of the Balanced Assessment Project has gone about doing its work for the past two years.
For most of the period of the grant the primary responsibility of each of the sites of the BA project was to design assessment tasks, to try them in both classroom and clinical settings, and to revise them in the light of student reaction to those trials. The BA project undertook to design three quite different sorts of tasks. They are
skills tasks tasks that primarily test the ability to manipulate and compute
problems tasks that primarily test the ability to model, infer, and generalize
projects tasks that test the ability to analyze, organize, and manage complexity
The Harvard Group of BA approached the problem of Task Design within the framework of the Mathematical Content Matrix presented earlier in this document. First we decided what mathematical objects and actions we expect students to have mastered; this gave us a reasonably focused view of the mathematical playing fields within which we needed to design tasks.
If one needs to generate a large number of assessment tasks, it is clear that thinking in terms of task types, rather than in terms of individual tasks, is a useful strategy. Wherever possible we strove to make clusters of tasks that were linked by context, or mathematical structure, or both. By way of illustrating this strategy as well as demonstrating what we mean by each of the three categories of task, here is an example of each.
Here is a diagram of a new kind of race track. What is the total length of the track?

The combined length of all the curved sections is 100 meters.
The length of each straight section is 100 meters.
Two joggers set out at the same time and from the same place and in the same direction to jog on a circular track. Jogger A jogs at a constant speed which is exactly twice the speed of jogger B. They jog for the same period of time and stop after A has completed 6 laps around the track. (You may ignore the time it takes for the joggers to get up to speed at the outset and to slow down at the end.)
An observer at the geometric center of the track monitors the angle between the two joggers as a function of time. Sketch a graph of this observer’s data.
How would the graph of the observer’s data differ if the two runners had started off in opposite directions at the outset?
Track of Dreams
The Situation
In the last twenty years, science has improved the conditions of competition in many sports. Some things, however, have not changed in ages. The dimensions of playing fields and courts in team sports, tennis and some other court sports have remained the same, largely to preserve the tradition of the sport. However, in track and field these unchanged dimensions are primarily due to the fact that the track and other venues are often associated with football or soccer fields and, therefore, must rely on their dimensions. If an average football field is about 140 by 65 yards, it allows for a track oval around it with the length of the inside lane of about 400 meters. A soccer field, which is shorter but wider, produces roughly the same length of the track.
For international competition, 400 meters is the standard length of the inside lane on the track. In addition, the following conditions are required for international competition
· The width of the each lane must accommodate the runners in such a way that two runners running side by side in adjacent lanes would not interfere with each other.
· The track must have 8 lanes.
· The 100 meter race must be run along a straight path.
· The finish line must be at the end of a straight part of the track, continuous across all lanes and perpendicular to the track.
· The starting line must also be perpendicular to the track, but need not be in the same place for all lanes (this is a “staggered” start necessitated by the different lengths of the lanes).
There are also restrictions about the type and quality of the surface and some other conditions necessary for accreditation, but these are not important here.
The Problem
You have been hired by the Santa Monica Track Club (the most prestigious in the country) to design a new track. Your job is to analyze a number of possible shapes and present the club with three alternative shapes for the new track and to also present arguments — pro and con — for each of these. The track is to be built solely for the track-and-field purposes so there will be no external constraints with respect to the dimensions or directions of the track.
Some of the international rules are a matter of tradition and could be relaxed for a new scientifically designed track. However, all the conditions listed above (except for the length of the inside lane) must be met. Furthermore, there are other physical and esthetic constraints that further limit the design possibilities:
· The track must accommodate races for 100, 200, 400, 800, 1500, 3000, 5000 and 10000 meters. Some other races, such as 1 mile or 50 meters, may be run there as well, but are not a priority, so they should not figure in the computations.
· The length of the shortest lane of the new track must be some multiple of 100 meters in length.
· For construction reasons, all parts of the track must be designed as parts of circles or straight lines; at the transition points from one part to another, these circles and lines must be tangent to each other.
· The lanes cannot separate at any point, that is, crossing the track at any point in the direction perpendicular to it must cut across all eight lanes in succession with no space between them.
·
The faster you run the harder it is to turn, so races from 50 to
400 meters must be run in such a way that no one makes sharp turns; for longer
races the turns could be made tighter; overall, no part of the course for a
specific race can contain a part of a circle with a diameter (measured in
meters) smaller than
where 9,000 is a constant measured in
square meters which was derived by observation. This constraint is designed to
prevent runners from falling on turns or one runner having an advantage over
another.
· An attempt must be made to minimize the overall dimensions of the track, for two reasons: Costs should be considered, although a superior track might be chosen despite its higher price. Spectators should be seated so as to allow them to see as much of the race as possible; for this reason a straight track would be out of the question.
1. Several designs have already been submitted by different parties. As the official design consultant you must sift through these and explain why some of them must be rejected. Even though you reject some or all of these, they may give you ideas about possible designs. Write a letter to your assistant, explaining which of these proposals are rejected and why, and point out some possible modifications which could make similar designs acceptable. (In all instances the distances are approximate and measured along the track between the marked points.)



The combined length of all the curved pieces is 100 meters.
2. Suppose now that you are the assistant who received the above letter. You must respond to the letter with proposed modifications of these designs. Note that in proposal E the combined length of all the curved parts could not only be 100 meters, but also 200, 300 meters or some other multiple of 100 meters. In your response, you will need to analyze several possible variations, including changing the length in the case of design E. Write such a letter with detailed mathematical analysis.
3. Most of the tracks under consideration have the unfortunate property that they require a staggered start. This happens because the lanes are not all of equal length and the athletes on the inside lanes are required to make a sharper turn than the athletes on the outside. Can you generate some possible designs that would satisfy all the conditions and would not require a staggered start for at least some of the races?
4. Having returned to your supervisory capacity, now is the time to find other possible shapes and write the final report on the proposal. If you believe that some of the conditions should be relaxed in favor of a specific design, you will need to convince a committee composed of athletes, administrators, architects and mathematicians. Therefore, your arguments must be clear, succinct and precise.
This third category of task, projects, deserves special comment because of the growing interest in the use of portfolio assessment. We found that many users of portfolio assessment took the position that the essential feature of such assessment was the fact that the student chose, with or without guidance from the teacher, what pieces of work to include in his or her portfolio. While this in itself is desirable, it can lead to portfolio content that is minimally demanding and hardly able to exhibit the student’s strengths and weaknesses. The situation may be compared to that of a student who applies for admission to a music conservatory and submits a tape of playing scales.
Harvard’s Balanced Assessment team thinks of projects as intellectual undertakings that require students to make an effort over an extended period of time to structure and formulate a problem, and then to analyze the problem as they have formulated it. Projects are not problems with unique, correct solutions. Projects are not long and complicated versions of problems that one normally assigns in the context of a classroom assignment. Projects are not problems that require tricky insights or inventions to solve. The essence of a Balanced Assessment project is that it is a task that requires a student to ruminate and reflect about a rich web of complexity, and to sort out some main threads that can serve as the basis for structuring a response.
In the course of addressing the problem that forms the core of the project as they have formulated it, students will have to perform a wide range of traditionally taught mathematical actions that might include manipulating algebraic symbols, plotting graphs, geometric constructions, compiling tables, and performing numerical computations. The accurate performance of these actions, as important as they are, is only a part of doing a Balanced Assessment project. Students are also asked to make inferences, draw conclusions, and present their work in both written and oral form.
In this section we describe some considerations that teachers should keep in mind as their students work on projects and as they, the teachers, assess the products of their students’ efforts.
Mathematics has traditionally been a subject in which we have insisted that students work alone. Whatever the merits of that viewpoint might be with respect to covering the content of the syllabus, it seems to us that project work is different. The essential issue in project work is development of desirable “habits of mind” about organizing and analyzing complexity. Adults, when confronted with tasks of this sort, often address them in groups. The reason for doing so is the intellectual resonance and symbiosis that leads groups of people to fashion far better solutions to complex problems when they work cooperatively than when they work in isolation. We suggest, therefore, that students undertake project work in small groups. You may want to give some thought to how groups ought to be composed and whether or not to juggle the composition of the small groups at various times during the school year.
This will vary with individual teachers. Some teachers will try to build a whole year’s work around projects. We find that it is often difficult to do this — there is always the nagging feeling that the curriculum is not being “covered.” On the other hand we feel that the kind of intellectual development that coping with a project offers is of sufficient importance that students ought to spend no less than 10% of their time, and probably as much as 25% of their time on such work.
One could imagine students spending one quarter of their time every week throughout the school year on their continuing project work. Alternatively, one can imagine intensive two week project periods distributed throughout the year. During these periods students would use all of their mathematics time for project work. Other time arrangements are also possible. Ultimately, it will be individual teachers who make this decision in light of their understanding of what best fits the needs of their classes.
We think that project work should be presented both in writing and orally. The written presentation should describe the contextual setting and how, within that setting, the problem is defined. The written presentation should present explicit arguments for why the projects omits consideration of some factors and includes others. It should show clearly how solution was approached. It should indicate clearly where there are further issues to investigate.
The oral presentation should be made publicly to the entire class after the teacher and at least some of the students have read the written presentation. Following a brief outlining of the written document, the presenting students should entertain questions and comments.
Here is one possible specific way you might organize the presenting of project work. Have each group submit a written report of its work. In addition, ask each group to read the written reports of two other groups. On the day of the oral presentations, have each group present its work — and then serve as a panel to answer questions put to them by the other groups that have read their work, as well as by the teacher and other students.
All Balanced Assessment projects ask the student to prepare a document with an intended purpose for an intended audience. Consequently, scoring of projects comes down to an analysis of two questions:
Is the document suitable for the specified audience?
Does the document fulfill the requested purpose?
To aid in evaluating student project work in the light of these two criteria, we suggest the following six perspectives; each perspective may allow you to reach conclusions about some aspect of the student’s effort.
Organizing the Subject: How well does the student structure a large collection of interrelated issues and identify possible problematic areas? Are the constraints described in the problem made clear at the outset? How well does the student argue the relative importance of factors that are taken into consideration in addressing the problem, and the relative unimportance of factors that are ignored? Does the report proceed in a logical manner?
Analyzing the Problem: How well does the student define the problem? Is the student clear about all the resources, both tools and information, that will be needed to address the problem? Does the student draw appropriate implications about the given data?
Accuracy and Appropriateness of Computation/Manipulation: Are the symbolic manipulations carried out accurately? Are the graphs plotted correctly? Are the axes labeled sensibly? Is the scale reasonable? Are the geometric constructions “constructable”? Are the table columns properly labeled? Are the computations that underlie computed columns clearly defined? Are the numerical computations done accurately? Are the graphs, charts, tables used appropriately? Are they pertinent to and do they strengthen the argument?
Thoroughness of Inquiry: Is the work perfunctory or thorough? Is the reader left to fill in many missing steps? Are all the implications of the complexity of the problem followed up and examined? Is the reporting level of detail adequate?
Clarity of Communication: Can the student’s work be read by a colleague who is previously unacquainted with the work? Is the student’s written presentation intelligible to another teacher? to the principal? to a group of parents?
Drawing Conclusions: o students draw reasonable conclusions from their work? Do they clearly explore the ways in which their work answers the questions they have posed?
Extending the Inquiry: Has the student identified interesting aspects of the problem that lend themselves to further exploration? Has the student noted related problems that might be approached in similar ways?
Finally, it should be said that we are mindful that teachers’ constraints and opportunities vary from place to place. Not everything we suggest will be desirable, or even possible, at every location. However, we are certain that challenging the students with a significant amount of project work will be rewarding and engaging to both teacher and student.
One of the strategies the Harvard Group of BA employed in order to design tasks that are fresh and engaging to student is to try to find new task types that students may not have encountered before. There are several such types that we found and/or developed. For the most part these are tasks that do not have unique correct answers. They are tasks that promote discussion, and occasionally debate, among students. We believe these to be important features of successful tasks.
The purpose of these tasks is to see how well students can mathematize a relationship that they are aware of perceptually but probably have never attempted to describe in any formal, not to mention quantitative, fashion.
Each of these tasks requires students to identify and describe formally a geometric property of some two or three dimensional shape. It is important to stress that properties such as “squareness” or “bumpyness” are not formal geometric properties. There are no formally correct and universally accepted answers to these questions. On the other hand, there are sensible (and non-sensible) answers.
For the sake of specificity, the following is an example.
Below is a collection of rectangles.
1. Which of the rectangles is the “squarest”?
2. Arrange the rectangles in order of “square-ness” from most to least square.
3. Devise a measure of “square-ness,” expressed algebraically, that allows you to order any collection of rectangles in order of “squareness.”
4. Devise a second measure of “square-ness” and discuss the advantages and disadvantages of each of your measures.

The elements of performance on these tasks are as follows:
a. choosing the most and least square, sharp, etc. figure
b. a verbal description of the geometric property being modeled
c. identifying the geometric elements that combine to form the measure
d. forming an algebraic relationship among these elements
e. computing values of the measure for various figures
f. discussing the advantages and disadvantages of the measure
The weight that should be assigned to the successful completion of these elements of the task increases as one goes down the list. In our view successful completion involves satisfactory completion of at least parts a through d.
Problems of this sort have several important virtues. Even very weak students can get started on them. At the same time, they provide an opportunity for strong students to display a great deal of sophistication. In addition, they exercise several quite important mathematical muscles that are rarely called upon in school mathematics. We refer here to the need for defining quantitative constructs — a central feature of the application of mathematics to both the natural and social sciences.
Appendix B contains a discussion of some of the directions one might go with the “square-ness” problem.
Another type of problem that we have introduced is known in some circles as “Fermi problems.” These problems ask for the estimation of quantities such as number, length, area, volume, weight, and time. Many of these quantities are measures of things that you are likely to encounter in everyday life, although they may not be quite in the form that you normally think of them.
These problems are called “Fermi problems” after Enrico Fermi, one of the great physicists of the twentieth century. Fermi taught for many years at University of Chicago where he used to ask his beginning graduate students “...to estimate the number of piano tuners in Chicago.”
In order to make the kinds of estimates that respond to Fermi problems, one must often make use of information that is not contained in the statement of the problem. Sometimes the necessary information will be the sort of thing you might already know. Sometimes you may have to use reference materials in order to find the necessary pieces of information.
Lets explore some ways of thinking about this sort of problem. Suppose we are asked “How many words in all the books in the school library?” How could we go about estimating this number?
If we knew
the number of shelves of books in library, and
the average number of books on a shelf, and
the average number of pages in a book, and
the average number of words on a page
then we could estimate the number of words. Suppose a school library has
600 shelves
and that on the average there are
16 books on each shelf
Suppose further, that there are, on the average,
250 pages in each book, and
400 words on each page.
We can estimate the number of words by multiplying
![]()
The size of this product is 960,000,000 words. How good is this answer? How well does it apply to your school library?
People are not accustomed to seeing problems in mathematics that do not have exactly one correct answer. How, then, are they to think about these problems? In particular, when is the answer to such a problem good enough? A rough guide is the following:
Devise two different strategies for arriving at an estimate of the desired quantity. If the larger of the two estimates is no more than 10 times the smaller estimate then there is a strong likelihood that the estimate(s) you have made is (are) reasonable.
For example, consider the illustrative problem that we worked on before, i.e. the number of words in all the books in the school library. Suppose we had approached the problem differently. Suppose we said that the library, on the average, buys 50 books a month during each of the 10 months of the school year. Suppose, further, that the library has been doing this for the 20 years that the school has been in existence.
We can now compute the number of words in all the books in the library in a quite different way.
![]()
This computation produces an estimate of 1,000,000,000 words.
Here are three quite different ways of arriving at an estimate the total distance a person walks in a day. One estimate attempts to take account of the total waking time of a person during the day, assumes that they are moving about a certain fraction of that time and that when they move, they do so at an assumed average speed.
Another estimate relies on the information that comes from a shoemaker who says that a pair of sneakers wear out after some number of miles. If you know how often you replace your sneakers then you can calculate an average distance walked (or run) in a day.
Finally, a third method of estimating the total distance a person walks in a day relies on following the person about and simply adding up the estimated distances they walk, including to and from school, store, ball field, around the house and school, and so on.
To be sure these estimates may not all yield the same number. Although they differ from one another, they are all reasonable ways of approaching the problem. If you estimate the distance a person walks by each of these three methods and they turn out to give numbers that are not very different from one another you may assume that you have made a reasonable estimate.
An effective way of probing a student’s understanding of the application of a concept in context is to ask the student to generate an example of the application of that concept. Typically, such questions do not have unique answers. For example, one might ask a student to give an example of a number that is an integer power of both 2 and 4, or an even function of x that is not constant but never exceeds 1 in absolute value.
This type of problem, i.e., of providing examples of a mathematical object that has a specified set of properties, is in our view underutilized in the assessment of mathematical competence. Here are some examples.
Give three examples of numbers that are evenly
divisible by 2, 3, 4, and 5.
Write a number whose value is between 1/4
and 1/5.
Draw a 4-sided figure with two pairs of sides of
equal length that is not a parallelogram.
Draw a triangle whose circumscribing circle’s center lies outside the triangle.
For each of the following pairs of functions, write a function which is everywhere at least as large as the smaller and no larger than the larger of the two functions

Design a red and blue painted dart board such that the chance of landing on red is three times the chance of landing on blue.
Devise two different techniques for alphabetizing a large list of names, for example all the students in your school. Contrast the efficiency of your techniques with one another.
In order to approach the problem of designing balanced assessment packages in mathematics one must have a clear view of the kinds of understandings and skills that we wish to assess in our students and the ways in which the tasks we design elicit demonstrable evidence of those skills and understandings. In what follows we shall describe how our view of the subject of mathematics, its objects and its actions, informs the design of tasks and the balancing of assessment packages.
Each task is classified according to domain, i.e. the mathematical objects that are prominent in the accomplishment of the task. Most of our tasks deal predominantly with a single sort of mathematical object although some deal with two. Each task offers students an opportunity to demonstrate a variety of kinds of skill and understanding.
In order to score student performance on a task one has to first analyze the task and decide on the nature of the demands that the task makes on the student. We considered the following four kinds of skill and understanding.
Modeling/Formulating: How well does the student take the presenting statement and formulate the mathematical problem to be solved? Some tasks make minimal demands along these lines. For example, a problem that asks students to calculate the length of the hypotenuse of a right triangle given the lengths of the two legs does not make serious demands along these lines. On the other hand, the problem of how many 3 inch diameter tennis balls can fit in a (rectangular parallelepiped) box that is 3" ´ 4" ´ 10", while exercising the same Pythagorean muscles in the solution, is rather different in the demands it makes on students’ ability to formulate problems.
Transforming/Manipulating: How well does the student manipulate the mathematical formalism in which the problem is expressed? This may mean dividing one fraction by another, making a geometric construction, solving an equation or inequality, plotting graphs, or finding the derivative of a function. Most tasks will make some demands along these lines. Indeed most traditional mathematics assessment consists of problems whose demands are primarily of this sort.
Inferring/Drawing Conclusions: How well does the student apply the results of his or her manipulation of the formalism to the problem situation that spawned the problem? Traditional assessments often pose problems that make little demand of this sort. For example, students may well be asked to demonstrate that they can multiply the polynomials (x+1) and (x–1) but not be expected to notice (or understand) that the numbers one cell away from the main diagonal of a multiplication table always differ from perfect squares by exactly 1.
Communicating: How well do students communicate to others what they have done in formulating the problem, manipulating the formalism, and drawing conclusions about the implications of their results?
Since we do not expect each task to make the same kinds of demands on students in each of the four skills/understandings area, we assign a single digit measure of the prominence of that skill/understanding in the problem according to the following scale of weighting codes:
Weighting codes
0 not present at all
1 present in small measure
2 present in moderate measure, and affects solution
3 a prominent presence
4 a dominant presence
Note that these numbers are not measures of student performance but measures of the demands of the task for a given performance action.
Most tasks will involve these skills and understandings in some combination. Needless to say, different tasks will call differently on these actions. Therefore, it is necessary in designing tasks to pay particular attention to the nature of the demands on performance that the tasks make.
What does all of this mean for the fashioning of both tasks and balanced assessment packages?
For each task one decides on the basis of experience, taste and judgment, ideology and philosophy, how the task’s demands should be distributed among the content domains:
Content domain weighting
|
Number and Quantity |
|
|
Shape and Space |
|
|
Pattern and Function |
|
|
Chance and Data |
|
|
Arrangement |
|
[Entries must sum to 1.]
and among the various sorts of performance actions:
Process weighting
|
Modeling/ |
Transforming/ |
Inferring/Drawing Conclusions |
Communicating |
|
|
|
|
|
[Each entry is on a 0-4 scale.]
This is precisely how we went about the design of our assessment packages. Each of the tasks we designed was weighted in this fashion. Balanced packets of assessments at different grade levels were then assembled by suitably assembling tasks into collections whose aggregated weights reflect our thoughts about appropriate demands for that grade level.
At the youngest grades we placed great emphasis on Number and Quantity and Shape and Space objects. We tried to stress the Modeling/Formulating and Inferring/Drawing Conclusions actions much more than do traditional assessment programs that tend to concentrate on assessing students’ ability to manipulate number and shapes. In addition, we tried to devise fresh and interesting ways of having students communicate with one another as well as report to us on their efforts.
At the other end of the grade spectrum, the distribution of emphasis shifted. The Number and Quantity and Shape and Space objects are still present but the Chance and Data and Arrangement objects are now a significant portion of the assessment, and the Pattern and Function object assumes great importance. The demands on students are more subtle and nuanced. Apart from technical questions designed to ensure that transforming and manipulating skill exceeds some reasonable threshold, strong emphasis is placed on tasks that make extensive Modeling/Formulating and Inferring/Drawing Conclusions demands. Here too we have attempted to devise a wide variety of ways for students to communicate with one another and with us about their efforts.
The particular distribution of emphasis for each of the packages we designed is discussed in the introduction to that package.
Afte