TG Bulk Downloader — A High-Performance Telegram MTProto Bot
🤖 TG Bulk Downloader — main_v4.py
A production-grade, single-file Telegram bot built on Pyrogram + FastAPI that bulk-downloads and re-uploads entire Telegram channel ranges — preserving reply chains, handling FloodWait, and keeping you live-updated with a real-time status board.
📋 Table of Contents
- Why This Exists
- Architecture Overview
- Features
- Prerequisites
- Installation
- Environment Variables
- Deployment
- Bot Commands Reference
- Pipeline Deep-Dive
- Task Status Lifecycle
- Live Status Board
- Configuration Tuning
- Fault Tolerance & Retry System
- User Settings
- Code Structure
- Logging
- Known Limitations
- Download Source Files
💡 Why This Exists
HuggingFace Spaces blocks api.telegram.org (the standard Bot HTTP API), which means typical webhook bots simply cannot run there. This bot solves that by using Pyrogram over MTProto (raw TCP) — a lower-level protocol that HF Spaces does not block.
It also solves a common problem: copying large ranges of posts from one Telegram channel/group to another, including media albums, polls, stickers, voice notes, and documents — without losing message order or reply chains.
🏗 Architecture Overview
User sends /bdl <start_url> <end_url>
│
▼
┌─────────────────────────────────────────────────────┐
│ FastAPI Webhook │
│ POST /webhook ─► route handler ─► task launch │
└─────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ Parallel Pipeline │
│ │
│ Phase 1 ─ FETCH │ Phase 2 ─ DOWNLOAD │ Phase 3 ─ UPLOAD │
│ ───────────────── │ ────────────────── │ ───────────────── │
│ Chunked metadata │ asyncio.Semaphore │ Ordered delivery │
│ get_messages() │ pool (N workers) │ ready_queue + │
│ batch=20 msgs │ MAX_CONCURRENT_ │ send_pointer │
│ │ DOWNLOADS=3 │ MAX_UPLOAD_ │
│ │ │ WORKERS=2 │
│ │
│ Phase 4 ─ RETRY │
│ ──────────────── │
│ /retry re-runs failed/abandoned slots │
│ Edits placeholder messages in-place │
└──────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ BatchStateManager │
│ Live Telegram UI — edits status every 5 seconds │
│ delete+resend after each file (always-bottom mode) │
└─────────────────────────────────────────────────────┘
The bot uses two Pyrogram clients simultaneously:
| Client | Auth type | Role |
|---|---|---|
user |
Session string (user account) | Reads source channel messages |
bot |
Bot token | Sends files to destination chat |
✨ Features
- Forward EVERYTHING — media (photos, videos, audio, documents), text, polls, stickers, voice notes, and forwarded messages
- Preserve reply chains — maps source
reply_to_msg_idto our group’s equivalent message - Ordered delivery — files always arrive in the correct sequence, even with parallel downloads
- Partial file fallback — if download stalls, sends whatever bytes arrived with a
⚠️ Partialwarning caption - Abandoned placeholder — on total failure, sends a text stub to hold sequence position (retryable)
/retrycommand — re-attempts allfailed/abandoned/partialslots and edits placeholders in-place/killall— hard-cancels all worker tasks for the active batch immediately/skip— skips the currently stalling download slot- Live status board — edits a single Telegram message every
STATE_LOG_INTERVALseconds with real-time counters - Upload progress bar — ephemeral
📤 Uploading...message showing%completion, auto-deleted on finish - Always-bottom status — status message auto-deleted and resent after each file so it stays newest
- Inline Refresh button — tap 🔄 on the status message for an instant redraw
/settingspanel — toggle bot behaviour per-chat via inline keyboard- FloodWait handling — respects
FloodWaitandFloodPremiumWaiterrors with automatic backoff - Rotating log file —
logs.txt(5 MB, 10 backups), downloadable via/logs
📦 Prerequisites
- Python 3.10+
- A Telegram API ID and API Hash from my.telegram.org
- A Telegram Bot Token from @BotFather
- A Pyrogram Session String for a user account (generated via
pyrogram.Client) - Optionally: a HuggingFace Spaces Docker environment
🚀 Installation
1. Clone / download
git clone https://github.com/yourname/tg-bulk-downloader.git
cd tg-bulk-downloader
2. Install dependencies
pip install -r requirements.txt
requirements.txt:
pyrofork
pyleaves
tgcrypto
python-dotenv
psutil
fastapi
uvicorn[standard]
| Package | Purpose |
|---|---|
pyrofork |
Pyrogram fork — MTProto Telegram client |
pyleaves |
Pyrogram leaves plugin system |
tgcrypto |
Native crypto acceleration for Pyrogram |
python-dotenv |
.env file loader for config |
psutil |
System stats for /stats command |
fastapi |
ASGI web framework for the webhook server |
uvicorn[standard] |
ASGI server (with uvloop + httptools) |
3. Configure environment
Create a config.env file (see Environment Variables below).
4. Run
uvicorn main:app --host 0.0.0.0 --port 8000
Or on HuggingFace Spaces (port 7860):
uvicorn main:app --host 0.0.0.0 --port 7860
🔑 Environment Variables
Create config.env or config.env.local in the project root:
# ── Required ──────────────────────────────────────────────────────────
BOT_TOKEN=123456789:AABBCCyour-bot-token-here
SESSION_STRING=BQAyour-pyrogram-session-string-here
API_ID=12345678
API_HASH=abcdef1234567890abcdef1234567890
# ── Worker concurrency ─────────────────────────────────────────────────
MAX_FETCH_WORKERS=3 # parallel metadata fetch threads
MAX_CONCURRENT_DOWNLOADS=3 # parallel download slots
MAX_UPLOAD_WORKERS=2 # parallel Telegram upload slots
# ── Retry behaviour ────────────────────────────────────────────────────
MAX_DOWNLOAD_RETRIES=3 # per-slot download retry limit
RETRY_DELAY_SECONDS=5 # wait between retries
# ── Timeouts / safety ─────────────────────────────────────────────────
CONSECUTIVE_ERROR_LIMIT=5 # abort batch after N consecutive errors
FLOOD_WAIT_DELAY=3 # extra seconds added on FloodWait
FETCH_BATCH_SIZE=20 # messages fetched per get_messages() call
# ── UI timings ─────────────────────────────────────────────────────────
STATE_LOG_INTERVAL=5.0 # status board edit interval (seconds)
UPLOAD_PROGRESS_INTERVAL=3.0 # upload progress message edit interval
Tip: Generate a session string with:
from pyrogram import Client with Client("my_account", api_id=API_ID, api_hash=API_HASH) as app: print(app.export_session_string())
☁️ Deployment
HuggingFace Spaces (Docker)
Add a Dockerfile:
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY main_v4.py main.py
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "7860"]
Set all environment variables as HF Secrets. The MTProto raw TCP connection used by Pyrogram is not blocked by HuggingFace, unlike the standard Bot API HTTP endpoint.
Local / VPS
# systemd service or just:
nohup uvicorn main:app --host 0.0.0.0 --port 8000 &
Webhook Setup
After the server is running, register the Telegram webhook:
curl "https://api.telegram.org/bot<BOT_TOKEN>/setWebhook?url=https://your-domain.com/webhook"
📟 Bot Commands Reference
| Command | Arguments | Description |
|---|---|---|
/start |
— | Welcome message |
/help |
— | Full command reference |
/dl |
<post_url> |
Download & re-upload a single post |
/bdl |
<start_url> <end_url> |
Bulk copy a range of posts |
/retry |
— | Re-attempt all failed/abandoned/partial slots |
/skip |
— | Skip the currently stalling download |
/killall |
— | Hard-cancel all workers for the active batch |
/refresh |
— | Force-redraw the status board immediately |
/stats |
— | Show CPU, RAM, disk, uptime stats |
/logs |
— | Upload logs.txt as a document |
/cleanup |
— | Delete all temp files in downloads/ |
/settings |
— | Open the per-chat settings panel |
/bdl Example
/bdl https://t.me/mychannel/100 https://t.me/mychannel/120
This copies posts #100 through #120 from mychannel to your chat.
🔬 Pipeline Deep-Dive
Phase 1 — Metadata Fetch
Messages are fetched in chunks of FETCH_BATCH_SIZE=20 using get_messages(). The fetch pool uses up to MAX_FETCH_WORKERS=3 concurrent tasks, each handling a slice of the total ID range. This avoids a single serial fetch loop on large batches.
Phase 2 — Parallel Download
Each fetched message is queued into a download pool protected by an asyncio.Semaphore(MAX_CONCURRENT_DOWNLOADS). Workers grab a permit, download the file to downloads/<chat_id>/<filename>, and upon completion place the result into a ready_queue.
Download progress is tracked via a Pyrogram progress callback, updating bytes_done on the slot’s TaskState. The BatchStateManager reads these live values to render per-file progress bars in the status board.
Phase 3 — Ordered Upload
A dedicated sender task monitors ready_queue and a send_pointer (the index of the next expected slot). Files are uploaded in strict sequential order: if slot #5 finishes before slot #3, slot #5 waits in the queue until the pointer reaches it.
Up to MAX_UPLOAD_WORKERS=2 uploads run in parallel once ordering is satisfied. Telegram-native albums (grouped media) are batched into send_media_group() calls.
Phase 4 — Retry
/retry iterates over all FAILED, ABANDONED, and PARTIAL slots. For each:
- Re-downloads the source message
- Attempts to re-upload the file
- On success: edits the placeholder message in-place with the actual content
- On failure: increments
still_failedcounter
🔄 Task Status Lifecycle
PENDING ──► FETCHING ──► DOWNLOADING ──► SENDING ──► UPLOADING ──► COMPLETED
│ │ │
│ └──────────────────────────────► PARTIAL
│ │
└───────────────────────────────────────────────► ABANDONED
│
(use /retry)
│
──► COMPLETED
──► FAILED
| Status | Icon | Meaning |
|---|---|---|
PENDING |
⏳ | Queued, not yet started |
FETCHING |
🔍 | Fetching message metadata |
DOWNLOADING |
⬇️ | Downloading file bytes |
SENDING |
📦 | Downloaded, queued for upload slot |
UPLOADING |
📤 | Actively uploading to Telegram |
COMPLETED |
✅ | Successfully sent |
FAILED |
❌ | All retries exhausted |
PARTIAL |
⚠️ | Partial file sent with warning caption |
ABANDONED |
🔲 | Placeholder sent; retry available |
SKIPPED |
⏭️ | Skipped via /skip |
📊 Live Status Board
The bot maintains a single Telegram message that it edits every STATE_LOG_INTERVAL seconds. The board looks like:
⚡ Batch Download — 20 posts
▓▓▓▓▓▓░░░░ 60% • ✅12 ⚠️1 🔲0 ❌0 ⏭️0 • 2m 14s
⬇️2 dl 📤1 up 📦0 queued
Active:
⬇️ ▓▓▓▓░░ video_clip.mp4
45.3 MB / 112.0 MB
📤 document.pdf 8.2 MB — uploading…
Recent:
✅ photo_album.jpg 2.1 MB
✅ voice_note.ogg 0.4 MB
✅ sticker.webp
⏳ 7 post(s) queued…
/skip — skip current • /killall — stop all • /refresh — refresh now
A 🔄 Refresh inline button allows instant manual redraw.
⚙️ Configuration Tuning
Throughput vs. Stability
| Goal | Adjust |
|---|---|
| Faster bulk downloads | ↑ MAX_CONCURRENT_DOWNLOADS |
| Fewer FloodWait errors | ↓ MAX_UPLOAD_WORKERS, ↑ FLOOD_WAIT_DELAY |
| Reduce Telegram API calls | ↑ FETCH_BATCH_SIZE |
| More responsive status UI | ↓ STATE_LOG_INTERVAL |
| Reduce edit rate on busy batches | ↑ STATE_LOG_INTERVAL |
Memory / Disk
Downloaded files are stored temporarily in downloads/<chat_id>/. Use /cleanup to purge them, or they are removed automatically after each successful upload. psutil is used by /stats to report live disk usage.
🛡 Fault Tolerance & Retry System
FloodWait
The bot catches both FloodWait and FloodPremiumWait errors. On FloodWait(x), it waits x + FLOOD_WAIT_DELAY seconds before retrying. This applies to both downloads and uploads.
Consecutive Error Limit
If CONSECUTIVE_ERROR_LIMIT slots fail back-to-back, the batch is aborted to avoid hammering the API on a persistent issue.
Partial File Fallback
If a download is interrupted mid-stream, the bot detects a truncated file and sends it anyway with a caption flagging the partial state. The slot is marked PARTIAL and is retryable.
Abandoned Slots
When a file cannot be sent (e.g. content too large, unsupported type), the bot sends a plain-text placeholder message to hold the sequence position. The slot is marked ABANDONED and can be recovered via /retry.
🎛 User Settings
Open /settings to see and toggle per-chat options:
| Setting | Default | Description |
|---|---|---|
| 📌 Status bar always at bottom | On | After each file, the status message is deleted and re-sent so it stays as the newest message |
Settings are stored in-memory and reset on redeployment. Future versions may persist them.
🗂 Code Structure
Since this is intentionally a single-file bot, all components live in main_v4.py:
main_v4.py
├── Logging setup (RotatingFileHandler, console)
├── PyroConf (all env var config in one place)
├── TaskStatus (Enum) (slot state machine)
├── UserSettings (per-chat preferences)
├── UX helpers (fmt_size, fmt_time, progress_bar, ...)
├── TaskState (dataclass) (per-slot mutable state)
├── BatchStateManager (live Telegram UI, update queue)
├── Batch Registry (_batch_registry, _batch_worker_tasks)
├── File helpers (get_download_path, cleanup_download, ...)
├── Pyrogram clients (user + bot)
├── Message handlers (handle_download, handle_batch_download, ...)
│ ├── Single /dl handler
│ ├── Batch /bdl handler
│ │ ├── fetch_worker()
│ │ ├── download_worker()
│ │ ├── upload_worker() (ordered sender)
│ │ └── status_logger() (live UI loop)
│ └── /retry handler
├── Command routers (/start, /help, /stats, /killall, /skip, ...)
├── FastAPI lifespan (start/stop Pyrogram clients)
└── FastAPI webhook (POST /webhook — routes all updates)
📝 Logging
Logs are written to both the console and logs.txt (5 MB rolling, 10 backups).
Format:
[DD-Mon-YY HH:MM:SS AM/PM - LEVEL] - function_name() - Line N: module - message
- Pyrogram’s own logs are suppressed to
ERRORlevel to reduce noise. - Send
/logsin Telegram to download the current log file as a document. - Old
logs.txtis deleted on each bot restart (clean slate).
⚠️ Known Limitations
- Settings are in-memory — toggled preferences reset on each redeploy.
- No multi-batch support per chat — starting a new
/bdlwhile one is running cancels the old one. - Large files — Telegram has a 2 GB upload limit for bots. Files exceeding this will be marked
ABANDONED. - Private channels — the user account (
SESSION_STRING) must be a member of the source channel. - Album ordering — grouped media (albums) are sent as a unit; individual photos within an album preserve their internal order.
📥 Download Source Files
You can download the full source package (includes main_v4.py and requirements.txt) below:
⬇️ Download tg_bulk_downloader.zip
🙏 Credits
Built with ❤️ for the Telegram automation community.