In Part 1 on
Cognitive Load Theory (https://polymathtobe.blogspot.com/2023/02/learning-and-teaching-cognitive-load.html)
, the framework of WHAT Cognitive Load Theory is was laid out in principle,
following Oliver Lovell’s book Cognitive
Load Theory In Action on the subject (Lovell 2020).
Part 2 is on
how teachers can minimize extrinsic load on the learner through honing
their presentation. (https://polymathtobe.blogspot.com/2023/04/learning-and-teaching-cognitive-load.html)
Part 3 is on how teachers can minimize the extrinsic load on
the learner through structuring their practices and lessons. (https://polymathtobe.blogspot.com/2023/05/learning-and-teaching-cognitive-load.html)
This article roughly follows Oliver Lovell’s book in
examining how the teacher, coach, and learner can apply Cognitive Load Theory to minimize the
extrinsic loading on the working memory and then learn to optimize the
intrinsic load in the working memory.
The definitions of intrinsic and extrinsic loads are defined
again for ease of reference.
The extrinsic cognitive loads are:
·
A part of
the manner and structure of how the information is conveyed to the
learners.
·
Disruptive to the learning task because it
distracts the learner from learning by occupying valuable working memory space.
Whereas the intrinsic cognitive loads are those that are
critical to learning whatever it is that we need to learn. They are:
·
Part of the nature of the information that we
are learning.
·
Core learning.
·
Information that we WANT the learner to have in
their working memory.
The critical limitation is that the working memory has a
finite capacity; that is, the intrinsic and the extrinsic loads are vying for
the same finite resource. The emphasis is placed on minimizing the extrinsic
load; that is, to offload unnecessary extrinsic cognitive load, to make space
in the working memory before optimizing the intrinsic loads.
Note that even though Lovell’s book is relatively short, he
presents quite a bit of results and information gained from the studies that
make definitive arguments and gives excellent implementation examples, so it is
worthwhile to read through the book.
Since I am coming from two familiar but different points of
view: teaching at a university level and coaching, I will try to illustrate the
points by giving simple examples from both milieus.
The defining difference between teaching and coaching is
that teaching can be effected over a longer time frame. Learners in the academic
milieu can take their time in building up the learning scaffolds because the academic
learner does not need to implement the material immediately, there is time to
spend on repeatedly going over the material, digging into the granularity as
well as examining the broad scope of the topics. In sports coaching, the
learner is expected to learn to act and react instantaneously to new situations
which are never identical to the contrived practice situations, this increases extraneous
extrinsic load on their working memory as they struggle to learn.
Optimizing Intrinsic Load
The language used to describe the necessary actions for intrinsic
loads and extrinsic loads are different. Intrinsic loads need to be optimized,
which means that the intrinsic loads can be maximized, minimized, or remain the
same.
Optimizing the intrinsic load is the key to successful
learning, the difference in wording for intrinsic and extrinsic loading is
intentional and comes because of the effect that intrinsic and extrinsic load has
on the working memory. The extrinsic loads must be minimized because it is
extraneous and impedes learning. The goal is to devote the maximum of the available working
memory to intrinsic loading, it is the loading that facilitates maximum
learning. Even as the extrinsic load is minimized, the amount of intrinsic
loading placed on the learner could still overwhelm the learner’s working
memory.
Intrinsic loads need to be optimized, i.e., adjusted to
accommodate the learner’s learning capacity. If the learner can handle more intrinsic
load, then the teacher must fill in the void in the learner’s working memory. If
the learner is overloaded with intrinsic loads, then the teacher must simplify
and remove some of the intrinsic loads to enable the learner to learn
effectively. The next question is how do you know when to add and when to
simplify? As with all things human, it depends on the person. Each learner is
different, and their learning experience needs to be accommodated. Slightly overloading the learner’s working
memory may be beneficial because stressing the learner can sometimes accelerate
their adaptation to the intrinsic load and allows them to automatically chunk
the topics that is loading them down. At the same time, overstressing the
working memory might diminish their ability to learn. There are many factors to
this: whether the learner is a novice or an expert, the amount of related
knowledge resident in the learner’s long-term memory which can be leveraged to
associate with the new knowledge, the amount of time and the amount of exposure
the learners are allowed to the knowledge. The complexity and difficulties
associated with the topic. These facts makes teaching to a large number of
learners challenging, but not impossible.
The overall aim of teaching is not just to teach the
fundamental of the topic: the tools of the craft; the overall aim of teaching is
to teach how to use the tools effectively and efficiently. Not just the
experiential aspect of learning: the how’s, why’s, what if’s; but also, the
ability to extrapolate and using their knowledge and reasoning abilities to adapt,
improvise, and overcome.
As stated before, there are two key reasons for optimizing
the intrinsic loads: to adjust the level of intrinsic load placed on the
working memory, which eases the learning process for the learner without
exceeding the finite working memory; and
also using the learner’s working memory to its fullest capability without
exceeding the capacity.
The former requires that the teacher adapt the material to
the different learning abilities of the learner. The latter requires that the
teacher structure the learning so that the learner can fully leverage their
finite working memory resource, preventing the learner’s attention from
wandering. Not too much, not too little, but just enough; optimize
rather than just maximize.
Expertise-Reversal Effect
The Expertise Reversal Effect compounds the complexity of the
strategies for optimizing intrinsic load.
The expertise-reversal effect
suggests that learners need differing amounts of support depending upon their
level of expertise.
This is explained by Guadagnoli and Lee in their paper (Guadagnoli 2004). In short, those
learners who are learning new skills and concept without any previous
experience are more likely to be lost and confused while learning the basics
because their long-term memory is not equipped with existing tangible memories to
retrieve which would aid in learning and can be leveraged to create new
knowledge for the learner.
The expert-reversal effect has significant impact on how the
teacher needs to approach and implement their optimization strategy. That is,
they need to carefully consider the level of the learners they are trying to
reach and adjust their strategy.
Teaching
In the context of teaching engineering, the expert-reversal
effect implies that :”…worked examples are better for novices, and problem
solving is better for experts…”, i.e. the novices needs to be led through
the example in discrete steps, and shown the reasoning for each step; while the
experts can be given the problem to solve directly.
It can mean that the teacher must be judicious while choosing
when to switch between solving examples and assigning problems. It is better to
nudge the novice learners a bit more aggressively to solve problems rather than
boring them with too many of the same examples, the reverse effect is that the bored
learners will tune out the teacher.
Coaching
In the coaching context, the expert-reversal effect means
that the coach needs to build up the novice learners’ long-term memory with
tangible experiences. Initial experiences are critical to the novice learner
because that is the scaffolding for their future learning, not only must the
initial experience be practical for the moment, but they must also be useful in
the future.
This is asking for quite a bit. The expert-reversal effect
will be invoked later in the article to illustrate the impact of the
expert-reversal effect has on the decisions for the teaching strategies chosen
by the coaches.
A note on the term “tangible experiences”. There has been
discussion about the topic of specificity. Some literalists insist that unless
the experience is completely in the same domain and context, that experience is
not pertinent. In the neurological explanation from Scott Grafton’s Physical
Intelligence (Grafton 2020), in which I am
an amateur, the neurons don’t know that they are firing for something
completely different from the purpose that it had initially acquired the “muscle
synergy” and the “basis set”. The neurons are firing because it has that
experience to reuse in its “efference model”.
In other words, the neurons will fire for the arm motion of
throwing a ball, even if the movement required is not throwing the ball. The
bottom line is that the accrual of all motor skills could potentially be
beneficial sometime, if not specifically for a domain or context.
Pre-teaching
“Pre-teaching is delivering a
portion of the content before the main lesson, and reinforcing it through
revision over time, can reduce the intrinsic load experienced by the learner
when they attempt the final, complete task.” (Lovell 2020)
Pre-teaching focuses the learner’s attention on the
pre-taught material and places it in the working memory of the learner prior to
the formal instruction. It gives a basis for the learner to use for later learning.
Teaching
·
Teacher can foreshadow by introducing the coming
topics before diving into the details.
·
Teachers can also expose the learners to the big
picture of the elements of the topics prior to diving into the subtopics,
giving them a framework to place each topic.
Coaching
·
Coaches can introduce advanced skills in
practice, partly as incentives, and partly to expose the learners to the future
of their own athletic abilities.
Part-Whole or Whole-Part
Part-whole means building constituent skills and
knowledge before putting is all together. Whole -part requires providing a
general overview first, followed by more focused practice of individual
segments. (Lovell 2020)
On the surface, the difference between part-whole and
whole-part may seem like rhetoric but the decisions have a significant impact
on how well the learner can use their working memory to optimize the intrinsic load,
i.e., how well the learner learns.
Part-whole partitions a topic into constituent parts,
teaching the constituent parts in some imposed sequence, and reintegrating the
constituent parts into the whole. Whole-part teaches the whole topic first and
selectively isolate on the parts through partitioning and emphasizing the parts
as the whole topic is taught.
There are two questions to be asked before making the
decision: one is about the experience and maturity of the learners we are
trying to reach; the other is about the complexity of the topic that is being
taught. The expert-reversal effect tells us that novices do not have the
fundamental tools or experiences to effectively integrate the whole topic into
their intrinsic load, while the experts had already effectively integrated a
significant portion of the topic into their long term memory. It seems obvious
that the part-whole approach is best for the novice and the whole-part is best
for the expert, because the expert can leverage their previous tools and
experiences to create new knowledge. I would say that the whole-part becomes
more beneficial to the learner as they progress from zero experience to
expertise. The usual mistake is to maintain the part-whole paradigm too often.
Yet another qualifying criterion that need to be considered
is whether the proposed partition of the topic is cognitively logical in
isolation; that is, whether the parts can be presented logically without taking
into consideration the cross-coupling effects that may be inherent in the
topic. If so, then taking a part-whole approach is better for the novice
because doing so would decrease the intrinsic loading of the learner to a
manageable level. If the individual partitioned parts do not make sense in
isolation, then the whole-part approach needs to be implemented. One major
caveat is that the act of partitioning a topic needs to be done carefully because
the couplings may not be obvious. It is the not-knowing-what-one-does-not-know
effect.
Part-Whole
The part-whole approach is to start simple and build
complexity as the learner adjusts to the increased load, the expense is that
the holistic view is ignored until later in the learning process.
The arguments for the part-whole approach are:
· “The
initial presentation of the part tasks helps consolidate procedures or rules,
which can be applied to the whole task at a later stage.” (Lovell 2020)
·
If the topic that is taught is complex, the
complexity can overload the learner’s working memory so that the integration of
the total task serves to the detriment of learning.
The thrust of the part-whole presentation approach is that the
lesson must meet the level of the learner instead of the teacher. Presenting
the whole skill at the beginning of the learning process may overwhelm the novice’s
working memory.
Lovell lists a few techniques that can be deployed to implement
the part-whole approach.
·
Chain forward-Forward chaining presents the partitioned
parts of the skills in the chronological order in which they appear in the
total skill.
·
Chain backward-Backward chaining is just the
reverse, it presents the partitioned parts of the skills in backwards
chronological order, starting from the ending.
·
Snowball-Snowballing is a variation on the
forward and backward chaining. The idea is to perform a newly introduced partitioned
parts of the skills with the previously presented partitioned parts of the
skills in conjunction. As the new partitioned part of the skills is added to
the repertoire, all the previous taught skills are integrated into the
practice. This is to facilitate the learner to Chunk the partitioned
parts of the skills together. Chunking is explained later in this article.
One note. Practicing partitioned parts of the skill may not
resemble the whole skill in the least. The teacher’s feedback needs to reflect
that fact, to not compare apples to oranges. The ultimate goal is to
successfully execute the whole skill, not the partitioned parts of skills.
Whole-Part
The whole-part approach is to present the whole topic first
and then simplify the complex whole skill as the learner’s learning progress
demands to help the learner to absorb the simplified topic.
The argument for the whole-part approach is:
·
“For complex motor tasks and many
professional real-life tasks, it is essential that the learner understand and
learn the relevant interactions and coordination between the various subtasks.
By learning the subtasks in isolation, these interactions may be missed.” (Lovell 2020)
·
Generally, presenting the whole skill first seem
to make more sense because we want the learner to be exposed to the holistic
view, this way the linkage and coupling of the parts can be demonstrated and illustrated.
This helps the learner to picture the whole topic and can allow them to
anticipate.
Lovell also
lists a few techniques that can be deployed to implement the whole-part approach.
·
Simplifying conditions-Simplifying the
conditions mean that while the whole skill is being practiced and learned, the
conditions under which the practices are conducted are simplified to ease the
intrinsic loading on the learner. Rather
than focusing on the entire skill under real conditions, take away the real
constraints and open up the degrees of freedom under which the skills are being
practiced.
·
Manipulating the emphasis-By minimizing the
total number of emphases of the skill, just have the learner focus on the parts
of the whole skill that are giving them problems. This serves to unload some of
the working memory so that the learner is not overwhelmed.
·
Introducing variations-This technique does not
minimize the intrinsic loading as the previous techniques. Indeed, adding
variations to a practice increases the intrinsic loading on the learners, it
increases the intrinsic load. So why is this an effective technique? Introducing
variations stresses the learnings working memory exposing them to tasks that
are not in their repertory, it forces them to struggle with the complexity of performing
the skill with increased variation. The
learner is forced to develop new neuronal pathways which create new working
memory which is based on existing experience. These newly created neuronal
pathways will be integrated into new long-term memory. One note of caution, the
teacher needs to be perspicacious about how the learner reacts to the
variation. They could easily overload their working memory. There is a fuzzy
limit to how much overloading of the working memory with intrinsic serves to
force the learner to adapt to the challenge or as an unintended consequence,
completely overwhelming the learner. I am of the belief that most learners
are often more resilient than the teacher believes. The best option is to
just experiment.
Teaching
·
Teaching is usually taught as a part-whole
exercise with the teacher walking through examples step by step.
·
The students will often be ahead of the examples
and are able to anticipate the next step.
·
Backward chaining and snowballing would make an
interesting exercise in helping the learner learn how to anticipate and
connecting the parts.
·
In teaching the qualitative topics, the
whole-part approach is taken by introducing the topic from a macro point of
view, giving the student the linkages, which connect the different topics and
fields within a broad topic. This way the learner can use the relationships taught
holistically to anticipate and extrapolate into future topics.
·
A particular problem is that teachers tend to
keep the learners in the part-whole realm for too long. Repeatedly giving the
learners the same examples and problems to solve rather than giving the
learners opportunities to extemporize on the partitioned parts of the skills
that they have learned. Problem solving should be about the tool and abilities
to make connections between each step.
Coaching
·
For absolute novices, a part-whole approach is
best to not overwhelm them cognitively and over stress them emotionally. The
frustration from being unable to grasp the entirety of a skill is a showstopper
for many learners.
·
Coaches also tend to keep the learners in the
part-whole realm for too long. Having the players drilling on the same part of
the overall skill to perfection before allowing the players to extemporize and
to learn how to solve problems on the court or pitch. Problem solving should be
about the tool and abilities to make connections between each step. This
problem is especially acute when playing a sport because the time necessary for
problem solving and decision making is miniscule.
·
Start with part-whole for novices and proceed to
whole-part as soon as possible. There is no room in sports training for
perfection.
General Practices
Some general practices that are often recommended are listed
here to demonstrate how the Cognitive Load Theory is applied. These practices are particularly useful for
optimizing the intrinsic loading.
Chunking
According Lemov’s The Coaches Guide to Teaching (Lemov 2020): Experts
processes more information than novices because they process information in
chunks. This is a key goal of
teaching: to induce the learner to chunk their information — knowledge and
experiences — together so that the working memory capacity is less laden when
the chunks are recalled because chunks are multiple pieces of information
chunked together. According to Lemov. The practice of chunking is domain and
context specific; that is, the chunks will most likely be useless when taken to
a different domain and placed in a different context. Which by the way,
contradicts the adage: the game teaches the game. The game teaches the game if
and only if the learner understands the domain and context of their knowledge
and experiences, that understanding gives meaning to the chunks of knowledge.
The techniques that are mentioned in both the part-whole and
whole-part sections will all expose the learner to the logical chain that
underlies the main skill. Chaining, snowballing, simplification, manipulating
the emphasis, and introducing variations all link the partitioned parts of the
skills into the whole skill. Snowballing is particularly effective in creating
the circumstances under which the learner can link and associate the different parts
of the skills into the whole skills.
The limiting factor for chunking is the number of knowledges
the working memory can handle at once. Once again, the expert-reversal effect
affects the total number of tasks a learner can handle without overloading the
working memory. The rule of thumb was that the average human working memory can
manage to focus on seven salient tasks, although I have read in various
literature that experts — especially while under duress — can only focus on
three to four salient tasks at once. It
may be that a novice may only be able to focus on one or two tasks at once. Indeed, it is more
pragmatic to empirically decide individually on how many tasks a person can focus
on.
Retrieval Spacing and Interleaving
Retrieval, spacing, and Interleaving practices go hand in
hand. I first read about the practice in Brown,
Roedinger, et, al. Making It Stick (Brown 2014)
and then it was reinforced in Lemov (Lemov 2020).
The idea is to plan and schedule the practice to give the
learners a chance to actively retrieve the memory of the skill as often
as possible from the long-term memory. Constant retrieval of the memories from
long term memory strengthens the memory and migrates the memory closer to the
top of the stack in the long-term memory, helping to make the memory permanent.
Spacing works in combination with retrieval practice. By
spacing segments of the same practice in time, allows the memories of the practice
subject to fade from the working memory and when the same practice is re-initiated
later, whether it is within the same dedicated time segment or not, the memories
are retrieved.
Interleaving accomplishes both retrieval and spacing by
practicing in cycles rather than in one continuous sequence. Rather than doing
one drill or studying one subject for a given amount of time or for the
accomplishment of a final goal, interleave the same drills or study period by
doing them in cycles. Examples are shown below.
Unfortunately, there are no general rules of thumb regarding
the number of times retrievals need to happen to make the knowledge permanent
in long term memory. Nor is there a recommended time for spacing which is
optimal to guarantee forgetting and retrieval, I had asked Prof. Brown in an
email on this aspect, he told me there had not been any studies in that regard,
it depends on the specific group of learners with each learner having a
distinct timing and the complexity of the topic.
I have tried to make retrieval practices as prevalent and
numerous as possible in both teaching and coaching.
Teaching
·
Instituting short term assessments such as
quizzes at regular intervals to motivate the learners to retrieve prior
knowledge. Each quiz is comprehensive, not just focusing on the topic of the
week.
·
Question and answer periods within the
recitation where the teacher cold call students on previously learned topics,
giving the learners opportunities to actively retrieve previously learned
knowledge.
·
All assessments are comprehensive.
·
Alluding to previously covered topics and
integrating them into the new topics, creating connections and context for the
old and the new topics.
·
Encouraging the learners to modify their study
habits by committing to 20–25-minute blocks that are devoted to one topic, then
taking a 5-minute break before moving to another topic. But the learner must
return to the topic at least once before the studying session is over, to actively
retrieve the knowledge.
o
Study subject 1 for 20-25 minutes. Take a 5-minute
break.
o
Study subject 2 for 20-25 minutes. Take a 5-minute
break.
o
Study subject 3 for 20-25 minutes. Take a 5-minute
break.
o
Return to subject 1.
o
Return to subject 2.
o
Return to subject 3.
o
Repeat as convenient.
Coaching
·
Retrieval, spacing, and interleaving can be
combined in effective practice planning. Rather planning on having numerous drills,
each lasting until a performance or timed target is achieved.
o
Traditionally:
§
Drill 1, with a time and/or performance goal.
§
Drill 2, with a time and/or performance goal.
§
Drill 3, with a time and/or performance goal.
§
Drill 4, with a time and/or performance goal.
o
Using interleaving
§
Drill 1, with a time and/or performance goal
that is adjusted to reduce the total time spent from the traditional way.
§
Drill 2, with a time and/or performance goal that
is adjusted to reduce the total time spent from the traditional way.
§
Drill 3, with a time and/or performance goal that
is adjusted to reduce the total time spent from the traditional way.
§
Drill 4, with a time and/or performance goal that
is adjusted to reduce the total time spent from the traditional way.
§
Drill 1 again, with a time and/or performance
goal that is adjusted and based on the time and performance goal from the first
time through.
§
Drill 2 again , with a time and/or performance
goal that is adjusted and based on the time and performance goal from the first
time through.
§
Drill 3 again, with a time and/or performance
goal that is adjusted and based on the time and performance goal from the first
time through.
§
Drill 4 again, with a time and/or performance
goal that is adjusted and based on the time and performance goal from the first
time through.
§
Drill 1 again, with a time and/or performance
goal that is adjusted and based on the time and performance goal from the second
time through.
§
And so on.
§
Note that the drill segments do not need to be
in sequence, changing the order randomly is implementing the introduction of variations
technique in the Whole-part paradigm. The key is to not put the same drills back-to-back,
which defeats the purpose.
§
By judiciously changing the goals — time goals
or performance goals — the intrinsic load is varied with each repetition of the
drill, creating an elevating desirable difficulty to the intrinsic loading.
Again, attention need to be paid to the learner’s response, to decide if the
variation is overloading their working memory.
§
Interleaving, if properly practiced, can be more
time effective than devoting a large block of time to each drill. Frequent changing
of emphasis and physical requirements ensures that the learners are constantly
being challenged and refreshed rather than being stuck in an interminable rut.
§
Depending on the performance goal selected, the
learners may achieve the final desired goal through the escalating intermediate
goals quicker than trying to achieve the desired goal at once.
§
Another element is to introduce a scrimmage or
play segment in between the drills, allowing the learners to quickly
incorporate the skills from the drills after the first cycle of the drill or through
the sequence. This gives the teacher a chance to give them feedback on what they
are missing and remind them of the purpose of the drills. All in time to go
through the second cycle of the drill or drills. Repeat the scrimmage or play
segment as desired.
Block and Random
Much like the part-whole versus whole-part discussion, the
block and random debate has been controversial and going on indefinitely,
especially in the sports context. While the vast majority of coaches agree that
random practices more closely resemble reality, either in the classroom or on
the sporting fields or courts, sometimes block training is necessary. Those
instances are when the expert-reversal effect comes into play.
Going back to the arguments from before, block practices are
necessary for those novices who need to integrate the basics through scaffolded
repetitions. Random repetitions, while much more beneficial for anyone who is not
a novice, imposes too much extrinsic load on the novice and will retard the
initial learning. The problem is that most teachers will resort to block
training intuitively. It is partly because they have been taught this way and
because they want the learners to execute the skills immaculately prior to
exposing them to reality. Learners are by and large more resilient and adaptive
than teachers assume, which means that the optimal transition period to random
training comes much sooner than the teachers estimate.
Summary
This article lays out several topics that I found personally
convincing with regard to many topics that incorporates the ideas inherent in
the Cognitive Load Theory. My extemporizing has digressed somewhat, but it
encapsulates much of what I believe to be the foundation of my philosophy of
teaching and coaching. It took a long way to get there.
Learning is not a game of perfect, it is a process that
requires constant and consistent elevation of challenges to the learner,
creating a process of desirable difficulties which stimulates their ability to
integrate existing knowledge and experiences and to challenge their ability to create
new personal solutions through problem solving and decision making.
I am convinced that the Cognitive Load Theory is the best
model we have for understanding how the human learning process works. I have
tried to examine the topic synoptically through reading several references, of
course it is impossible to completely include every study and tome on the topic
in my reading or to encapsulate the ideas in my mind. My working memory would
be overloaded. 😊
I welcome any discussion with those experts in the areas of
learning, skill acquisition, neuro, and cognitive science because the topic
fascinates this dilettante.
References
Brown, Peter C. ,Roediger III, Henry L. ,
McDaniel,Mark A. Make It Stick: The Science of Successful Learning.
Canbridge MA: Belknap Press, 2014.
Grafton, Scott. Physical Intelligence. New
York: Pantheon Books, 2020.
Guadagnoli, Mark and Timothy D. Lee. "Challenge
Point: a Framework for Conceptualizing the Effects of Various Practice
Conditions in Motor Learning." Journal of Motor Behavior, June
2004: 212-224.
Lemov, Doug. The Coaches Guide to Teaching.
Clearwater, FL: John Catt Educational Ltd., 2020.
Lovell, Oliver. Sweller's Cognitive Load Theory
in Action. Melton: John Catt Educational Ltd, 2020.