Video Analysis: MVP & File API Implementation
Introduction: The Challenge of Video Analysis
Hey guys! Let's dive into a cool project: implementing video analysis for our gemini-media-mcp server. Right now, we rock at analyzing images and audio, but videos? Nope. Adding video analysis is a game-changer, but it's not a walk in the park. We're talking about potential token limits with large videos and keeping track of uploaded files. This is where our strategy comes in, breaking this down into smaller, more manageable steps.
Problem: Expanding Media Analysis Capabilities
Our current system, gemini-media-mcp, is pretty good at handling images and audio. But, come on, it's 2024! We need video analysis. This expansion promises a lot, but it also means dealing with some technical hurdles. We are going to address these problems.
Goal: A Robust and Scalable Video Analysis Feature
Our goal? To build a solid, scalable video analysis feature, following two key principles: YAGNI (You Ain't Gonna Need It) and SRP (Single Responsibility Principle). To keep things simple and get something working fast, we'll break this down into two phases (MVP).
- Phase 1 (MVP): Get the basics down by analyzing short videos (up to 20MB) using 
inline_data. This is our minimum viable product. - Phase 2: Crank things up by adding support for longer videos (up to 2GB) using the 
File API. This tackles the problem of file management on the client side. We're aiming for a solution that is both effective and easy to use. This is our target. 
Architectural Changes: How We'll Build It
Design Principles: Keeping It Clean
- YAGNI (You Ain't Gonna Need It): We're starting small. We'll focus on the simplest solution for short videos first. We add complexity only when it's absolutely necessary.
 - SRP (Single Responsibility Principle): The video analysis logic will live in its own separate module, 
tools/video_analyzer.py. This keeps everything organized and prevents the analysis from getting tangled up with other media types. We believe in keeping things simple. 
Sequence Diagram (Phase 1: MVP): Short Video Processing
This diagram shows how a short video gets processed. The client sends a request. The server checks the file. The file's bytes go straight into the Gemini API. The API gives back structured data. The server then returns a Pydantic object with the results.
sequenceDiagram
    participant C as Client
    participant S as Server
    participant VA as VideoAnalyzer
    participant GC as GeminiClient
    participant API as Gemini API
    C->>S: Call analyze_video(path)
    S->>VA: analyze_video(path)
    VA->>VA: File validation (size < 20MB)
    VA->>GC: generate_content(media_bytes)
    GC->>API: Request with inline_data
    API-->>GC: JSON response
    GC-->>VA: Response text
    VA->>VA: Parsing into VideoAnalysisResponse
    VA-->>S: Return Pydantic object
    S-->>C: Analysis result
Logical Diagram: Two-Phase Approach
This diagram shows how the server will decide how to process the video, depending on the file size. Once both phases are fully implemented, this diagram illustrates this.
graph TD
    A[Video analysis request] --> B{File size < 20 MB?}
    B -- Yes --> C[Use inline_data (MVP)]
    B -- No --> D[Use File API (Phase 2)]
    C --> E{Analysis result}
    D --> F[Upload file, get file_uri]
    F --> G[Analysis using file_uri]
    G --> H[Analysis result + file_uri]
Concrete Implementation: Making It Real
Data Models and Signatures: The Building Blocks
1.  VideoAnalysisResponse Model (in models/analysis.py): This model defines the structure of the response we get back from the video analysis. It will include things like the video title, a summary, the full transcript, key events, hashtags, and a file_uri if the File API is used.
from pydantic import BaseModel, Field
from typing import List, Optional
class VideoEvent(BaseModel):
    """Describes a key event in the video with a timestamp."""
    timestamp: str = Field(..., description="Event timestamp in MM:SS format.")
    description: str = Field(..., description="Event description.")
class VideoAnalysisResponse(BaseModel):
    """Structured response with video analysis results."""
    title: str = Field(..., description="Short and informative video title.")
    summary: str = Field(..., description="Video summary.")
    transcription: str = Field(..., description="Full audio track transcription.")
    events: List[VideoEvent] = Field(..., description="List of key events with timestamps.")
    hashtags: List[str] = Field(..., description="List of relevant hashtags.")
    file_uri: Optional[str] = Field(None, description="File URI after upload via File API for reuse.")
2.  analyze_video Function Signature (in tools/video_analyzer.py): This is the main function that handles the video analysis. It takes a video path, an optional user prompt, model name, and system instructions as inputs. It automatically decides whether to use inline_data or the File API based on the file size.
from models.analysis import VideoAnalysisResponse, ErrorResponse
def analyze_video(
    video_path: str,
    user_prompt: str = "",
    model_name: str | None = None,
    system_instruction_name: str = "default",
    system_instruction_override: str | None = None,
    system_instruction_file_path: str | None = None,
) -> VideoAnalysisResponse | ErrorResponse:
    """
    Analyzes a video file, automatically selecting the method (inline_data or File API)
    depending on the file size.
    """
    # ... implementation ...
File API Strategy: Keeping It Stateless
To keep our server stateless (meaning it doesn't store information about uploaded files), we won't save any file data. Instead, when analyzing large videos using the File API, the VideoAnalysisResponse will include a file_uri. The client (you) can save this URI and reuse it in subsequent requests to re-analyze the same file for up to 48 hours, without re-uploading. That's pretty cool, right?
Task TODO List: Your Checklist
- Phase 1: Short Video Analysis (<20MB)
[ ]Create theVideoAnalysisResponsemodel inmodels/analysis.py. We need to start here to define the structure for our results.[ ]Add theis_video_validfunction inutils/file_utils.pyto check the format and size of the video. Gotta make sure we're only analyzing valid videos.[ ]Implementanalyze_videointools/video_analyzer.pywith theinline_datalogic. This is the core of our MVP.[ ]Integrate and register the new tool inserver.pyandtools/\_init__.py. We need to make sure our server knows about this new function.[ ]Add a test short video intests/. Always test, test, test!
 - Phase 2: Long Video Analysis (File API)
[ ]Extendanalyze_videoto support theFile APIfor files > 20MB. Make it handle larger files.[ ]Addfile_uritoVideoAnalysisResponsewhen using theFile API. We need this to allow clients to reuse files.[ ]UpdateGeminiClientto support video uploads via theFile API. The client needs to know how to upload the videos.[ ]Add tests for both logic branches (short and long videos). More tests.
 - Completion
[ ]Update the project documentation, describing the new feature. Let's make sure everyone knows how to use this.[ ]Update the knowledge base with information about this solution. Let's document our work.
 
That's the plan, guys! Let's get to work and make this video analysis a reality!