Extending Bloom's Taxonomy using Machine Learning

Project Overview

Bloom's Taxonomy is widely used to classify levels of cognitive activity in education, learning analytics, design research, and protocol analysis. The cognitive domain is typically organized into six levels: Knowledge, Comprehension, Application, Analysis, Synthesis, and Evaluation. Each level is associated with action verbs that indicate different forms of thinking.

The limitation is that traditional Bloom verb lists are relatively small and were not designed for large-scale computational analysis of natural language. This project addresses that gap by building a machine-learning pipeline to extend the cognitive-domain verb inventory in a systematic, reproducible, and taxonomy-aware way.

The extended verb lists were then applied to think-aloud transcripts from a comparative study of AR-CAD and a traditional CAD tool. This allowed cognitive-process patterns to be analyzed over time, showing how different design environments shape expressed cognitive activity during CAD tasks.

6

Bloom cognitive-domain levels modeled as classification targets

4

pipeline stages from core verbs to final extended taxonomy outputs

20

participant design-study dataset used for downstream think-aloud analysis

My Role and Contributions

I designed and implemented the machine-learning workflow, including candidate verb extraction, semantic filtering, embedding generation, one-vs-rest classification, probability calibration, threshold-based acceptance, and application of the extended taxonomy to time-aligned think-aloud transcripts.

I also connected the taxonomy-extension work to my AR-CAD research by applying the resulting verb lists to analyze cognitive activity during immersive and traditional CAD modeling tasks. This made the project both a taxonomy-extension contribution and a practical analysis tool for human-computer interaction and design cognition research.

Supervisor

Dr. Ahmed Jawad Qureshi
Professor, University of Alberta

Collaborator

Jingchuan Shi
University of Alberta

Research Problem

Standard supervised model selection usually depends on labeled examples or gold-standard validation data. However, the target extension set did not come with manually labeled Bloom categories. This created a methodological challenge: the model had to be selected and evaluated using a strategy that did not rely only on conventional supervised accuracy.

The project therefore required a pipeline that could combine supervised learning on the original Bloom-aligned verbs with structural checks for whether the predicted assignments were coherent and plausible across the taxonomy.

Core Challenge

How can an unlabeled set of candidate cognitive verbs be assigned to Bloom levels in a way that is computationally scalable, semantically meaningful, and useful for downstream think-aloud analysis?

Methodological Pipeline

The workflow was structured as a four-stage pipeline. The goal was to move from a small core Bloom-aligned verb list toward an expanded set of cognitively meaningful verbs that could support automated protocol analysis.

BT1

Core Verb Preparation

Prepared and standardized the original Bloom-aligned verbs across the six cognitive levels.

BT2

Candidate Extraction

Extracted candidate verbs from engineering and education-related text sources and filtered them using WordNet-based semantic constraints.

BT3

Model Training

Trained one-vs-rest classifiers using sentence-transformer embeddings and calibrated model scores for each Bloom category.

BT4

Taxonomy Extension

Applied the selected model to the extended verb set, generated hard and soft labels, and exported final lists for expert validation and downstream analysis.

Technical Design

Candidate verbs were embedded using sentence-transformer representations, allowing the model to capture semantic similarity beyond exact lexical overlap. A one-vs-rest classification strategy was used so that each Bloom level could receive a calibrated confidence score rather than forcing a single uncalibrated decision.

Calibration and thresholding were important because Bloom categories are conceptually related and some verbs can plausibly support more than one cognitive process depending on context. The pipeline therefore supported both hard label acceptance and softer domain suggestions for expert review.

Since the extension target lacked gold labels, model selection also considered structural coherence: whether predicted assignments aligned with plausible taxonomy-level relationships rather than simply maximizing supervised scores on the small core set.

Application to Think-Aloud Analysis

After extending the taxonomy, the verb lists were applied to think-aloud transcripts from a comparative AR-CAD versus traditional CAD study. The analysis pipeline used transcription, verb extraction, Bloom-domain classification, and time-normalized progress bins to compare cognitive-process patterns across tools.

This enabled analysis of when participants expressed lower-level cognitive processes such as Knowledge, Comprehension, and Application, and when they moved into higher-level processes such as Analysis, Synthesis, and Evaluation. The result was a time-resolved view of design cognition rather than a single post-task score.

Research Value

The project turns Bloom's Taxonomy from a static educational coding framework into a scalable computational tool for analyzing verbalized cognitive activity in design studies.

Key Contributions

ML Pipeline for Taxonomy Extension

Built a complete NLP/ML pipeline for extracting, filtering, classifying, calibrating, and accepting new Bloom-aligned verbs.

Taxonomy-Aware Model Selection

Addressed the challenge of unlabeled extension targets by considering structural plausibility rather than relying only on conventional supervised accuracy.

Think-Aloud Coding Workflow

Applied the extended taxonomy to classify time-aligned think-aloud transcripts from design tasks.

Integration with AR-CAD Research

Used the workflow to compare cognitive activity between immersive AR-CAD and a traditional CAD tool.

Publication

M. Talha, J. Shi, and A. J. Qureshi, “Extending the Cognitive Domain of Bloom's Taxonomy using Machine Learning,” Research Square preprint, 2026.

View Preprint GitHub Repository Think-Aloud Pipeline

← Return Back to Featured Projects