Classplus Test and PDF Solution Fetcher β A Complete Technical Guide
π Classplus Test Solution Fetcher
A suite of Python scripts for fetching test questions, solutions, and course assets from the Classplus student CMS platform β evolving from a simple two-step proof-of-concept into a robust, bulk-processing automation toolkit.
π Table of Contents
- Overview
- Project File Structure
- How Authentication Works
- The API Flow
- Script Breakdown
- How the Scripts Work Together
- Key Concepts Explained
- Setup & Usage
- Configuration Reference
- Error Handling & Resilience
- Data Flow Diagrams
- Download Source Files
Overview
The Classplus platform stores course content as a deeply nested JSON tree. Each node in the tree has a contentType field:
contentType |
Meaning |
|---|---|
1 |
Folder (has children) |
2 |
Thumbnail image |
3 |
PDF document |
4 |
Test / Quiz |
This project reverse-engineers the Classplus CMS API to automate fetching all test questions and model solutions β useful for students who want to review material offline or study without time pressure. Two scripts operate on binary assets, and four handle the test API, representing an iterative development journey from a simple proof-of-concept to a robust, resumable bulk processor.
downp.pyβ Downloads binary assets: PDFs and thumbnail images.bulk-processor.pyβ Hits the Classplus API to fetch test questions and solutions.
Project File Structure
After running all scripts, your working directory will look like this:
project/
βββ course.json # Raw course tree (your input β shared by both scripts)
βββ master_course.json # Pruned test-only tree (generated by bulk-processor.py --init)
βββ file_map.json # Download status log (generated by downp.py)
βββ pdfs/ # Downloaded PDF files
βββ thumbnails/ # Downloaded thumbnail images
βββ outputs/
βββ <testId_1>/
β βββ questions.json
β βββ solutions.json
βββ <testId_2>/
βββ questions.json
βββ solutions.json
How Authentication Works
All requests to the Classplus API require a JWT (JSON Web Token) passed as the x-cms-access-token header.
x-cms-access-token: eyJhbGciOiJIUzM4NCIsInR5cCI6IkpXVCJ9...
A JWT has three parts separated by dots: a header, a payload, and a signature. The payload (the middle part) is Base64-encoded and contains fields like:
| Field | Description |
|---|---|
userId |
Your Classplus user account ID |
studentId |
Your enrollment ID |
testId |
The specific test this token unlocks |
courseId |
The course the test belongs to |
exp |
Expiry timestamp (Unix epoch) |
How to Get Your Token
- Open the Classplus test in Chrome.
- Press
F12β go to the Network tab. - Click Start Test in the UI.
- Find the
startrequest tocms-gcp.classplusapp.com. - Under Request Headers, copy the value of
x-cms-access-token.
β οΈ Tokens expire. The
expfield in the JWT payload tells you when. Most tokens are valid for ~7 days. If you get401 Unauthorizederrors, you need to refresh your token.
The API Flow
Every script follows the same fundamental sequence of HTTP calls:
POST /test/start
β
(optional) POST /question/submit β repeat per question
β
POST /test/evaluate
β
GET /test/{testId}/student/{studentTestId}/solutions
Endpoints
| Endpoint | Method | Purpose |
|---|---|---|
/student/api/v2/test/start |
POST |
Begins a test session, returns questions |
/student/api/v2/question/submit |
POST |
Saves a single questionβs answer |
/student/api/v2/test/evaluate |
POST |
Finalizes/submits the test |
/student/api/v2/test/{id}/student/{sid}/solutions |
GET |
Retrieves full solutions with explanations |
Script Breakdown
1. instant-fetch.py β The Proof of Concept
Purpose: Validates the core hypothesis β can we start a test, immediately evaluate it with no answers, and still get the solutions?
Approach: The instant method β skips question submission entirely.
# Start the test
response = requests.post(start_url, json={"testId": TEST_ID_PAYLOAD}, headers=base_headers)
student_test_id = start_data['data']['studentTestId']
# Immediately evaluate with zero answers
evaluate_payload = {
"testId": test_id,
"studentTestId": student_test_id,
"questions": [], # No answers submitted at all
"timeTaken": 5000, # Fake 5-second duration
"autoSubmitType": 1 # Force-submit flag
}
requests.post(evaluate_url, json=evaluate_payload, headers=headers)
# Fetch solutions
requests.get(solution_url, headers=headers)
Why it might fail: Some tests validate that at least one question was interacted with before solutions are unlocked. Submitting zero questions can result in the solutions endpoint returning an error or empty data.
Output: solutions_{testId}.json
2. submit-autosubmit.py β The Optimized Single-Test Fetcher
Purpose: Adds a βsession activationβ step to work around the zero-question problem, while keeping things fast.
Key improvement: Submits only the first question (with a blank answer) before evaluating. This activates the session without wasting time looping over all questions.
# Submit ONLY the first question to activate the session
first_question = start_data['data']['test']['sections'][0]['questions'][0]
question_payload = {
"questions": [{
"_id": first_question['_id'],
"selectedOptions": [], # Blank answer
"markForReview": False,
"timeTaken": 1500
}],
"timeTaken": 1500
}
requests.post(submit_url, json=question_payload, headers=headers_submit)
# Now force evaluate with the "switched app" auto-submit reason
evaluate_payload = {
"reasonForSubmit": "Student switched from test for 3 times",
"autoSubmitType": 3 # Different flag β simulates tab-switching penalty
}
autoSubmitType values explained:
| Value | Meaning |
|---|---|
1 |
Standard student-initiated submit |
3 |
Auto-submit due to app-switching (cheating detection) |
Using autoSubmitType: 3 with the corresponding reasonForSubmit string mimics the platformβs own cheating-detection auto-submit, which appears to bypass some solution-lock checks.
Output: test_solutions_optimized.json
3. submit-gracefully.py β The Reliable Fetcher
Purpose: Maximizes compatibility by fully mimicking a real student completing the test.
Key improvement: Submits every single question individually before evaluating. This is the most βlegitimateβ flow and is least likely to be rejected by the server.
total_time_taken = 0
for i, q in enumerate(questions_from_start):
time_per_question = 1500 # 1.5 seconds per question
total_time_taken += time_per_question
question_payload = {
"questions": [{
"_id": q['_id'],
"solution": "", # Blank answer
"selectedOptions": [],
"fillUpsAnswers": [],
"markForReview": False,
"timeTaken": time_per_question
}],
"timeTaken": total_time_taken # Cumulative time
}
requests.post(submit_url, json=question_payload, headers=headers_submit)
print(f" -> Submitting Q{i+1}/{len(questions_from_start)}... Status: {submit_response.status_code}")
if submit_response.status_code != 200:
print(" β οΈ Submission failed. Stopping.")
break
Trade-off: Slower than submit-autosubmit.py for tests with many questions, but much more reliable. The cumulative timeTaken counter is carefully maintained to look realistic.
Output: test_solutions.json
4. bulk-processor.py β The Production-Grade Bulk Processor
Purpose: Processes an entire course worth of tests automatically, with state management, retries, and resumability.
This is the most sophisticated script and introduces several new systems:
ποΈ Master File Architecture
Instead of hardcoding a single test ID, bulk-processor.py works from a master_course.json file β a pruned, enhanced version of the raw course content tree.
course.json (raw, from API)
β --init flag
master_course.json (tests only, with processingInfo)
β normal run
outputs/{testId}/questions.json
outputs/{testId}/solutions.json
The --init process uses a recursive _prune_and_enhance_node() function that walks the course content tree and:
- Keeps items with
contentType == 4(tests) and adds aprocessingInfoblock - Discards items with other content types (videos, PDFs, etc.)
- Discards folders that become empty after pruning
- Extracts the JWT token from the testβs URL automatically
def _prune_and_enhance_node(node):
# Keep & enhance tests
if node.get("contentType") == 4 and node.get("testId") and node.get("URL"):
node["processingInfo"] = {
"status": "pending",
"token": None, # Extracted from URL
"questionsPath": None,
"solutionsPath": None,
"lastAttemptTimestamp": None,
"lastError": None
}
# Extract token from the test URL's query string
parsed_url = urlparse(node["URL"])
token = parse_qs(parsed_url.query).get('token', [None])[0]
node["processingInfo"]["token"] = token
return node
# Discard non-folders (videos, PDFs)
if node.get("contentType") != 1:
return None
# Recursively prune folder children
pruned_children = [_prune_and_enhance_node(c) for c in node.get("children", [])]
pruned_children = [c for c in pruned_children if c] # Remove Nones
if pruned_children:
node["children"] = pruned_children
return node
return None # Discard empty folders
π Retry Mechanism
All HTTP requests go through make_request_with_retry():
def make_request_with_retry(method, url, **kwargs):
for attempt in range(MAX_RETRIES): # Default: 3
try:
response = requests.request(method, url, **kwargs, timeout=30)
if response.ok: # Any 2xx status
return response
else:
print(f"β οΈ Failed with {response.status_code}. Retrying ({attempt+1}/{MAX_RETRIES})")
except requests.exceptions.RequestException as e:
print(f"β Exception: {e}. Retrying ({attempt+1}/{MAX_RETRIES})")
time.sleep(RETRY_DELAY_SECONDS) # Default: 5s between retries
return None
πΎ Atomic File Saves
To prevent data corruption if the script is interrupted mid-write:
def save_master_file(data, filepath):
temp_filepath = filepath + ".tmp"
with open(temp_filepath, 'w', encoding='utf-8') as f:
json.dump(data, f, indent=2)
os.replace(temp_filepath, filepath) # Atomic on most OS
Write to a .tmp file first, then atomically replace the real file. This means a crash during a write will leave either the old complete file or the new complete file β never a half-written one.
π Graceful Interruption
try:
for i, test_obj in enumerate(tests_to_process):
# ... process test ...
save_master_file(master_data, args.master_file) # Save after EVERY test
except KeyboardInterrupt:
print("\nβοΈ Interrupted. Saving current state before exiting...")
save_master_file(master_data, args.master_file)
sys.exit(1)
Press Ctrl+C at any time and all progress up to the last completed test is saved. The next run will automatically skip already-completed tests.
ποΈ CLI Flags
python bulk-processor.py --init # Build master file from course.json
python bulk-processor.py # Fetch questions only (safe, no submit)
python bulk-processor.py --fetch-answers # Submit + evaluate + get solutions (graceful)
python bulk-processor.py --fetch-answers --quick-submit # Submit + evaluate + get solutions (fast)
python bulk-processor.py --master-file custom.json # Use a custom master file path
5. downp.py β The Asset Downloader
Purpose: Traverses course.json and downloads two types of static assets β PDFs and thumbnail images β into organized local directories. Maintains a persistent file_map.json to track download status so interrupted runs can be safely resumed.
sanitize_filename(name)
def sanitize_filename(name):
return re.sub(r'[<>:"/\\|?*]', '_', name)
Strips characters illegal on Windows/Linux/macOS filesystems. Any of < > : " / \ | ? * are replaced with underscores, applied to every filename before writing to disk.
download_file(url, local_path, retries=5, delay=2)
The workhorse of the downloader. Uses requests.get in streaming mode (stream=True) so large files are written in 8 KB chunks, keeping memory usage flat regardless of file size.
with requests.get(url, stream=True, timeout=30) as r:
r.raise_for_status()
with open(local_path, 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
Retries up to 5 times on any RequestException, with a configurable delay between attempts. Returns True on success, False after all retries are exhausted.
process_content_item(item, file_map_by_id, pdfs_dir, thumbnails_dir)
Dispatches a single course tree node based on its contentType:
For thumbnails (contentType == 2): reads thumbnailUrl, detects extension from name, falls back to .jpg if unrecognised, saves to thumbnails_dir.
For PDFs (contentType == 3 and format == "pdf"): reads the url field, appends .pdf if missing, saves to pdfs_dir.
If the item ID already exists in file_map_by_id with status: "success", it is skipped entirely β making reruns fully efficient.
traverse_content(item, callback, **kwargs)
def traverse_content(item, callback, **kwargs):
callback(item, **kwargs)
if item.get('contentType') == 1 and 'children' in item:
for child in item['children']:
traverse_content(child, callback, **kwargs)
A recursive DFS (Depth-First Search) walker. Calls callback on every node, then descends into children only when contentType == 1 (a folder). This cleanly separates tree-walking logic from processing logic β the callback can be swapped for other use-cases.
main() β Five Sequential Steps
- Create output directories (
pdfs/,thumbnails/) - Load
file_map.jsoninto a dict keyed by item ID for O(1) lookups - Retry any previously failed downloads before processing new items
- Walk
course.jsonand process every node viaprocess_content_item - Save the updated map back to
file_map.json
How the Scripts Work Together
Both scripts share course.json as a read-only input. Neither modifies it. Each maintains its own separate state file, so they can be run independently or in any order.
course.json
β
ββββΊ downp.py βββββββββββββββββββΊ pdfs/
β thumbnails/
β file_map.json
β
ββββΊ bulk-processor.py --init βββΊ master_course.json
β
ββββΊ bulk-processor.py βββΊ outputs/<testId>/questions.json
[--fetch-answers] outputs/<testId>/solutions.json
Key Concepts Explained
Why Submit Blank Answers?
The solutions endpoint only becomes accessible after a test has been βevaluatedβ (submitted). By submitting blank answers and forcing evaluation, the script unlocks the solutions without requiring correct answers. The selectedOptions: [] and solution: "" fields represent unanswered questions.
The timeTaken Field
All submit and evaluate payloads include a timeTaken field in milliseconds. The scripts use realistic-looking values (1500ms per question = 1.5 seconds) to avoid triggering any server-side anomaly detection. The evaluate payload adds extra time (total + 5000ms) to account for the time βspentβ on the submission UI.
accept-encoding: br (Brotli)
bulk-processor.py requests Brotli-compressed responses (br). This requires the brotli Python package. If you remove this header, the server falls back to gzip, which requests handles automatically β but responses will be larger.
Atomic Writes with os.replace()
os.replace() is atomic on POSIX systems (Linux/macOS) and is atomic on Windows for files on the same drive. This is critical for the master file, which is the single source of truth β corruption here would mean losing all progress tracking.
Setup & Usage
Prerequisites
pip install requests brotli
Step 1: Download Assets with downp.py
python downp.py
On subsequent runs, already-downloaded files are skipped automatically. Previously failed downloads are retried first.
Step 2: Quick Start (Single Test)
Use submit-gracefully.py for the most reliable single-test experience:
- Open the relevant Classplus test in Chrome.
- Copy your
x-cms-access-tokenfrom DevTools β Network β Headers. - Edit the
ACCESS_TOKENandtestIdconstants in the script. - Run:
python submit-gracefully.py
The solution will be saved to test_solutions.json.
Step 3: Bulk Processing with bulk-processor.py
- Export your course data from the Classplus API and save it as
course.json. - Initialize the master file (only needs to be done once):
python bulk-processor.py --init
- Fetch questions only (no test submissions β safe for browsing):
python bulk-processor.py
- Fetch full solutions (submits tests to unlock answers):
# Graceful mode (reliable, slower)
python bulk-processor.py --fetch-answers
# Quick mode (faster, may fail on strict tests)
python bulk-processor.py --fetch-answers --quick-submit
- Resume after interruption β just re-run the same command. Tests already marked
completedare skipped automatically.
Configuration Reference
downp.py
| Variable | Default | Description |
|---|---|---|
json_input_file |
course.json |
Input course tree |
pdfs_output_dir |
pdfs |
PDF download directory |
thumbnails_output_dir |
thumbnails |
Thumbnail download directory |
map_output_file |
file_map.json |
Download status log |
retries |
5 |
Download retry count |
delay |
2 |
Seconds between retries |
bulk-processor.py
| Variable | Default | Description |
|---|---|---|
BASE_URL |
Classplus API v2 | API base URL |
INPUT_COURSE_FILE |
course.json |
Raw course input |
MASTER_FILE |
master_course.json |
Pruned test-only tree |
OUTPUT_BASE_DIR |
outputs |
Root for test output dirs |
MAX_RETRIES |
3 |
API request retry count |
RETRY_DELAY_SECONDS |
5 |
Seconds between retries |
Error Handling & Resilience
Both scripts are built to survive interruptions and partial failures:
- Idempotent reruns β Items with
status: "success"are never re-processed. - Atomic file writes (
bulk-processor.py) β Master file is never corrupted mid-write. - Graceful Ctrl+C (
bulk-processor.py) βKeyboardInterruptis caught; progress is saved before exit. - Per-item retry logic β Failed items are flagged and retried on the next run.
- Streaming downloads (
downp.py) β Large files do not cause memory exhaustion.
Data Flow Diagrams
downp.py β Asset Download Flow
START
β
βΌ
Load file_map.json (if exists)
β
βΌ
Retry all items where status == "failed"
β
βΌ
Load course.json
β
βΌ
traverse_content() βββββββββββββββββββββββββββββββ
β β
βΌ β
process_content_item() β
β β
βββ contentType == 2? βββΊ download thumbnailUrl β
βββ contentType == 3 β
β + format == "pdf"? βββΊ download url β
βββ anything else? βββΊ skip β
β β
βΌ β
Update file_map_by_id {status: success|failed} β
β β
βΌ β
contentType == 1 (folder)? βββ YES βββββββββββββββ
β NO
βΌ
Save file_map.json
β
βΌ
END
bulk-processor.py β Status Lifecycle
pending
β
ββββΊ questions_only (ran without --fetch-answers)
β β
β ββββΊ completed (re-ran with --fetch-answers)
β ββββΊ failed
β
ββββΊ completed (ran with --fetch-answers)
ββββΊ failed βββΊ retried on next run
Script Comparison
| Feature | instant-fetch.py |
submit-autosubmit.py |
submit-gracefully.py |
bulk-processor.py |
downp.py |
|---|---|---|---|---|---|
| Questions submitted | 0 | 1 (first only) | All | All (configurable) | N/A |
| Speed | β‘ Fastest | β‘ Fast | π’ Slow | Configurable | β‘ Streamed |
| Reliability | β οΈ Low | β Medium | β High | β High | β High |
| Retry logic | β | β | β | β (3 retries) | β (5 retries) |
| Bulk processing | β | β | β | β | β |
| Resumable | β | β | β | β | β |
| Progress tracking | β | β | β | β master_course.json | β file_map.json |
| CLI interface | β | β | β | β | β |
Download Source Files
π¦ Download all scripts as a ZIP
The archive contains:
instant-fetch.pyβ Proof of conceptsubmit-autosubmit.pyβ Optimized single-test fetchersubmit-gracefully.pyβ Reliable single-test fetcherbulk-processor.pyβ Production bulk processordownp.pyβ PDF & thumbnail asset downloader
Built for educational and personal study purposes. Always respect the platformβs terms of service.