One-Click Screenshot to AI- My Extension Mod

My Journey: Hacking a Chrome Extension to Automate a GenAI Course 🚀

This story begins with a simple goal during my Tata micro-internship on Generative AI 🎓. I wanted to find a clever way to automate the course, by using the course material itself—specifically, the images and screenshots—as input for an AI.

The Spark of an Idea 💡

My plan was to take full-page screenshots of the course content and feed them to a powerful AI model to summarize, analyze, or even answer questions. This felt like a practical and exciting way to apply the concepts I was learning.

Assembling the Toolkit 🛠️

To bring this idea to life, I chose a few key tools:

The AI Brain: Google’s AI Studio (Gemini). It’s fantastic for multimodal tasks, including understanding images.
The Screenshot Tool: I needed a reliable way to capture full web pages. After a quick search, I settled on a popular Chrome extension: GoFullPage - Full Page Screen Capture.

Hitting a Wall 🚧

The plan seemed solid until I hit a roadblock. The GoFullPage extension was excellent at taking screenshots, but it had no built-in feature to directly upload or send the captured image to an AI platform like Google’s AI Studio. Manually saving each image and then uploading it would defeat the whole purpose of automation.

The “I’ll Do It Myself” Moment 👨‍💻

Instead of giving up, I decided to build the feature myself. Chrome extensions are just a collection of HTML, CSS, and JavaScript files. So, I thought, why not modify the existing extension to add the functionality I needed?

I downloaded the source code of the GoFullPage extension, rolled up my sleeves, and started modifying its files to create my own AI-enhanced version.

The Grand Reveal: It Works! 🎉

After some trial, error, and a lot of fun, I successfully integrated the “Send to AI Studio” functionality directly into the extension’s interface! Now, with a single click, I can send a full-page screenshot straight to Gemini.

Peek Under the Hood: Explore the Code 🔍

For those who want to dive into the technical details, I’ve packaged everything for you. You can explore the original code, my modified version, and see a detailed analysis of every single change I made.

🤖 My AI-Extended Version: Download AI-Extended Version.zip
📄 Original GoFullPage Version: Download Original Version.zip

For your convenience, here are the diff analyses that break down the changes:

📊 Directory Comparison Overview

📊 Directory Comparison Overview: Click for PDF

🔬 Deep Diff Analysis

For a complete line-by-line comparison, explore the interactive diff report embedded below. This shows every single code change between the original and my modified version.

🔬 Deep Diff Analysis: View as PDF

Technical Deep Dive: What Exactly Did I Change? 🔬

Based on the diff analysis, here’s a summary of the key modifications I made to bring this feature to life.

1. `manifest.json` (The Extension’s Blueprint)

This is the most critical file. I had to declare new permissions and features:

Added a New Button: A shiny “Send to AI Studio” button is now part of the UI.
Host Permissions: Granted the extension permission to interact with aistudio.google.com.
New Permissions: Added tabs and downloads permissions to manage the new tab and handle the image data.
Keyboard Shortcut: Introduced a Ctrl+Shift+Q shortcut to instantly capture the visible tab and send it to AI Studio.

2. `capture.html` (The User Interface)

I added a new button to the main header, right next to the existing controls.

AI Studio Button: A new <a> tag with the ID btn-aistudio was added, featuring a “gleam” icon to represent AI.
Tooltip: Included a helpful tooltip that says “Send to AI Studio” on hover.

3. `js/background/index.js` (The Background Logic)

This is where the magic happens behind the scenes.

Message Listener: The script now listens for a sendToAIStudio message from the capture page.
Tab Management: When a message is received, it checks if an AI Studio tab is already open.
- If yes, it focuses on that tab.
- If no, it creates a new tab and navigates to AI Studio.
Script Injection: Once the tab is ready, it injects a script (simulateFileDrop) that programmatically “drops” the screenshot onto the page, just as if you had dragged and dropped the file yourself!

4. `capture.*.js` (The Frontend Script)

This file connects the new UI button to the background logic.

Event Listener: An onClick event listener was added to the new #btn-aistudio button.
Image Processing: When clicked, it grabs the screenshot from the page, converts it into a dataURL (a Base64 encoded string), and sends it to the background script for processing.

Conclusion 🚀

This project was an amazing learning experience. It started as a simple idea to automate a course and turned into a deep dive into how Chrome extensions work. It’s a great example of how, with a little curiosity, you can modify existing tools to perfectly fit your workflow.

Bonus: My Personal Lightweight Script 🧑‍🔧

While modifying GoFullPage was a fantastic learning experience, I also developed a separate, minimalist screenshot extension from scratch. This custom script became my primary tool for the internship course because it was perfectly streamlined for one specific task: quickly capturing screenshots without any extra overhead.

It serves as a great barebones example if you’re interested in building your own simple Chrome extension.

Download My Custom Extension: custom_extension.zip

References

GoFullPage Reference Links

Here are the official links for the GoFullPage extension:

Official Website: https://gofullpage.com/
Chrome Web Store: https://chromewebstore.google.com/detail/gofullpage-full-page-scre/fdpohaocaechififmbbbbbknoalclacl

Gemini Reference Link

Official Website: https://aistudio.google.com/prompts/new_chat

The Tools used for Documentation:

- Recursive Diff Tools:

Diffoscope: https://try.diffoscope.org/
Diffnow: https://www.diffnow.com/compare-files

- Document conversion Tools:

Sodapdf: https://www.sodapdf.com/pdf-tools/html-to-pdf/

Extras:

Another Great Tool Worth Mentioning: FireShot 🔥

During my research, I came across another excellent screenshot extension called FireShot. In some scenarios, especially on very complex or dynamic pages, I found it to be highly effective and reliable. It offers a robust set of features, making it a powerful alternative to GoFullPage.

If you’re looking for another top-tier screenshot tool, I highly recommend giving it a try.

Official Website: https://getfireshot.com/
Chrome Web Store: https://chromewebstore.google.com/detail/take-webpage-screenshots/mcbpblocgmgfnpjjppndjkmgjaogfceg

Blog expansion on August 28, 2025:

I have discovered a powerful, built-in feature in Google Chrome’s Developer Tools for capturing full-page screenshots. These instructions are a great way to take a screenshot of an entire webpage, including the content that is only visible after scrolling.

This functionality is part of the Command Menu within Chrome DevTools, which provides a quick way to access various development and debugging tools.

Step-by-Step Guide to a Full-Page Screenshot in Chrome:

For those who want to follow your method, here are the detailed steps:

Open Developer Tools: On the desired webpage, press Ctrl+Shift+I (on Windows/Linux) or Cmd+Option+I (on Mac) to open the Developer Tools panel.
Open the Command Menu: With the Developer Tools open, press Ctrl+Shift+P (on Windows/Linux) or Cmd+Shift+P (on Mac). This will open a “Run” command prompt.
Find the Screenshot Command: In the command prompt, start typing the word “screenshot”. You will see a list of available screenshot options.
Capture the Full-Size Screenshot: From the list of options, select “Capture full size screenshot”. Chrome will then automatically capture the entire page and save it as a PNG file in your “Downloads” folder.

This developer feature is a handy alternative to using browser extensions or other software for taking scrolling screenshots. In addition to a full-page screenshot, you’ll also notice other options in the command menu, such as capturing a screenshot of a specific area or just the visible portion of the page.