Getting Started with Vidgen

An introduction to Vidgen, covering its core purpose, key features, technology stack, and a step-by-step guide to setting up and running the application locally for the first time.

What is Vidgen?

Vidgen is a powerful web application designed to automate the creation of engaging short-form social media videos, such as TikToks, Instagram Reels, and YouTube Shorts, directly from a single text prompt. Leveraging advanced AI technologies, Vidgen streamlines the content creation process by generating video scripts, synthesizing audio, and producing dynamic, TikTok-style subtitles, all within an intuitive browser-based interface.

Whether you're a content creator looking to scale your output, a marketer aiming for rapid content deployment, or a developer interested in the fusion of AI and media processing, Vidgen offers an integrated solution for transforming ideas into ready-to-publish video content with minimal effort.

Key Features

Vidgen is packed with features that empower users to create compelling short-form videos efficiently. Here's a look at its core capabilities:

Single-Prompt Video Generation

Create entire short-form videos for TikTok, Reels, or Shorts from just one text prompt.

AI-Driven Script Generation

Utilizes multiple AI models (Gemini, Grok, GPT-4o Mini) with smart fallbacks and caching to generate engaging scripts.

AI-Driven Audio Generation

Integrates with the ElevenLabs API to transform scripts into high-quality, natural-sounding speech.

Local Transcription & Captions

Generates accurate transcriptions locally using Whisper-CPP and produces dynamic, TikTok-style subtitles for enhanced engagement.

Reddit-Style Overlay Creation

Composes videos with custom Reddit-style overlays, perfect for narrative-driven content.

Server-Side Video Compilation

Renders high-quality video outputs using Remotion CLI on the server, ensuring efficient and robust processing.

Modular UI Components

Built with Shadcn UI and TailwindCSS for a modern, responsive, and highly customizable user interface.

Web-Based Media Processing

All video and audio processing, from script to final render, is managed through the web interface for ease of use.

Technical Stack Overview

Vidgen is built on a robust and modern technology stack, designed for performance, scalability, and ease of development. Its architecture is modular, allowing for independent evolution of different components.

Core Technologies

Next.js: The leading React framework for building fast, full-stack web applications.
TypeScript: Ensures type safety and improves code quality and maintainability across the entire project.
Remotion: A powerful React framework for programmatically creating videos, handling all video composition and rendering logic.
AI-SDK: Facilitates seamless integration with various AI models for script generation, offering a unified interface.
Google Gemini API: Primary AI model for high-quality script generation, focusing on speed and relevance.
ElevenLabs API: Powers the text-to-speech functionality, providing realistic and expressive AI-generated audio.
OpenAI Whisper (local via Whisper-CPP): Utilized for highly accurate, local transcription of audio, enabling dynamic captioning.
Shadcn UI & TailwindCSS: Provide a beautiful, responsive, and highly customizable component library and styling framework for the frontend.
Node.js (>=18.17.0): The runtime environment for the backend services and Remotion rendering.
pnpm: The package manager of choice for efficient dependency management and faster installations.

Architecture Highlights

The system employs a multi-flow architecture to manage its complex processes:

Script Generation (Flow 1): Utilizes AI-SDK with a preference for Gemini 2.5 Flash, falling back to Grok Beta and GPT-4o Mini if needed, and finally a hardcoded template for resilience. This flow includes input validation, 1-hour caching for efficiency, schema validation, and metadata calculation.
Audio Generation (Flow 2): Integrates the ElevenLabs API for converting AI-generated scripts into high-fidelity audio.
Video Compilation (Flow 3): Involves local transcription using Whisper-CPP, dynamic Reddit-style overlay generation, and server-side video bundling and rendering through the Remotion CLI. This flow specifically bypasses Next.js SSR to avoid bundling conflicts, ensuring a stable rendering process.

This architecture, coupled with robust fallback and caching strategies for script generation, ensures high resilience against API failures and provides excellent observability into the system's operations.

Prerequisites

Before you begin, ensure your development environment meets the following requirements:

Node.js Version

Vidgen requires Node.js version 18.17.0 or higher. You can check your Node.js version by running node -v in your terminal. If you need to upgrade, consider using a version manager like nvm.

pnpm: Ensure pnpm is installed globally. If not, you can install it via npm: npm install -g pnpm.
API Keys: You will need API keys for the following services:
- Google Gemini API: Required for AI script generation. Set GEMINI_API_KEY in your environment variables.
- ElevenLabs API: Required for AI audio generation. Set ELEVENLABS_API_KEY in your environment variables.
- (Optional) Grok API: For an additional script generation fallback. Set GROQ_API_KEY.
- (Optional) OpenAI API: For an additional script generation fallback. Set OPENAI_API_KEY.

Local Setup and Installation

Follow these steps to get Vidgen up and running on your local machine:

1. Clone the Repository

First, clone the Vidgen repository to your local machine using Git:

git clone [repository-url]
cd [repository-name]

Replace [repository-url] and [repository-name] with the actual URL and desired directory name of the Vidgen project.

2. Install Dependencies

Navigate into the project directory and install the necessary dependencies using pnpm:

pnpm install

3. Install Whisper-CPP Locally

Vidgen uses Whisper-CPP for local audio transcription, which is crucial for generating subtitles. Install it using the provided Remotion script:

pnpm exec tsx remotion/scripts/install-whisper.mjs

This script will download and compile the necessary Whisper-CPP binaries.

Whisper-CPP Installation

The Whisper-CPP installation might take a few minutes, as it involves downloading a language model and compiling C++ code. Ensure you have the necessary build tools (e.g., make, g++ on Linux/macOS, or Visual Studio Build Tools on Windows) installed on your system.

4. Configure Environment Variables

Create a .env.local file in the root of your project and add your API keys. Make sure to replace YOUR_API_KEY with your actual keys:

GEMINI_API_KEY="YOUR_GEMINI_API_KEY"
ELEVENLABS_API_KEY="YOUR_ELEVENLABS_API_KEY"
GROQ_API_KEY="YOUR_GROQ_API_KEY" # Optional
OPENAI_API_KEY="YOUR_OPENAI_API_KEY" # Optional

Running the Application

Once setup is complete, you can start the development server and begin using Vidgen.

1. Start the Next.js Development Server

Run the following command to start the Next.js application:

pnpm dev

This will typically start the application on http://localhost:3000. Open this URL in your browser to access the Vidgen user interface.

2. Generate and Render a Video

Vidgen's video compilation and rendering process is handled server-side using Remotion CLI, which is important due to a known conflict with Next.js's bundling when using @remotion/tailwind-v4. Therefore, direct server-side rendering (SSR) via Next.js is bypassed for video output.

To render a video, you will use the Remotion CLI directly after generating the script and audio through the UI.

Important Remotion Rendering Note

Due to an incompatibility between Remotion's server-side rendering and Next.js's bundling with @remotion/tailwind-v4, video rendering must be performed using the Remotion CLI directly. This means you will interact with the UI to generate the script and audio, and then use a command-line tool to compile the final video file.

Quickstart: Generate Your First Video

Let's walk through the process of generating your first short-form video with Vidgen.

Access the UI: Open your browser and navigate to http://localhost:3000.
Input Your Prompt: In the Vidgen interface, you will find an input field. Enter a detailed text prompt describing the video you want to create. For example:

"A story about a cat who accidentally orders a thousand cans of tuna online and the hilarious aftermath."
Initiate Generation: Click the "Generate Video" or similar button within the UI. The application will then perform the following steps:
- AI Script Generation: The AI will generate a script based on your prompt.
- AI Audio Generation: The script will be converted into audio using ElevenLabs.
- Local Transcription: Whisper-CPP will transcribe the audio for captioning.
Review and Prepare for Render: After the script and audio are generated, the UI will display a preview or summary. At this point, the necessary assets for video compilation are ready.
Render the Video using Remotion CLI: Open a new terminal window (keep your pnpm dev process running in the first terminal). Execute the following command to render your video:

npx remotion render remotion/index.ts MyVideo output.mp4


*   `remotion/index.ts`: Specifies the entry point for your Remotion composition.
*   `MyVideo`: This is the ID of the composition defined in `remotion/Composition.tsx` that Remotion should render. (You might need to verify the exact composition ID within the Remotion project files).
*   `output.mp4`: The desired output filename for your video.

<Callout>
  <CalloutTitle>Customizing Output</CalloutTitle>
  <CalloutDescription>
    You can customize the output filename (`output.mp4`) to something more descriptive for your video. You can also pass additional props to your Remotion composition via the CLI if your video requires dynamic data, though for initial quickstart, the default setup often suffices.
  </CalloutDescription>
</Callout>

6.  **Locate Your Video**: Once the rendering process completes (which might take a few moments depending on your system and video length), your generated `.mp4` file will be saved in the root directory of your project (or a specified output path).

Congratulations! You've successfully generated your first AI-powered short-form video with Vidgen. Experiment with different prompts and explore the various features to unleash your creativity.