Getting Started with Vidgen
An introduction to Vidgen, covering its core purpose, key features, technology stack, and a step-by-step guide to setting up and running the application locally for the first time.
What is Vidgen?
Vidgen is a powerful web application designed to automate the creation of engaging short-form social media videos, such as TikToks, Instagram Reels, and YouTube Shorts, directly from a single text prompt. Leveraging advanced AI technologies, Vidgen streamlines the content creation process by generating video scripts, synthesizing audio, and producing dynamic, TikTok-style subtitles, all within an intuitive browser-based interface.
Whether you're a content creator looking to scale your output, a marketer aiming for rapid content deployment, or a developer interested in the fusion of AI and media processing, Vidgen offers an integrated solution for transforming ideas into ready-to-publish video content with minimal effort.
Key Features
Vidgen is packed with features that empower users to create compelling short-form videos efficiently. Here's a look at its core capabilities:
Single-Prompt Video Generation
Create entire short-form videos for TikTok, Reels, or Shorts from just one text prompt.
AI-Driven Script Generation
Utilizes multiple AI models (Gemini, Grok, GPT-4o Mini) with smart fallbacks and caching to generate engaging scripts.
AI-Driven Audio Generation
Integrates with the ElevenLabs API to transform scripts into high-quality, natural-sounding speech.
Local Transcription & Captions
Generates accurate transcriptions locally using Whisper-CPP and produces dynamic, TikTok-style subtitles for enhanced engagement.
Reddit-Style Overlay Creation
Composes videos with custom Reddit-style overlays, perfect for narrative-driven content.
Server-Side Video Compilation
Renders high-quality video outputs using Remotion CLI on the server, ensuring efficient and robust processing.
Modular UI Components
Built with Shadcn UI and TailwindCSS for a modern, responsive, and highly customizable user interface.
Web-Based Media Processing
All video and audio processing, from script to final render, is managed through the web interface for ease of use.
Technical Stack Overview
Vidgen is built on a robust and modern technology stack, designed for performance, scalability, and ease of development. Its architecture is modular, allowing for independent evolution of different components.
Core Technologies
- Next.js: The leading React framework for building fast, full-stack web applications.
- TypeScript: Ensures type safety and improves code quality and maintainability across the entire project.
- Remotion: A powerful React framework for programmatically creating videos, handling all video composition and rendering logic.
- AI-SDK: Facilitates seamless integration with various AI models for script generation, offering a unified interface.
- Google Gemini API: Primary AI model for high-quality script generation, focusing on speed and relevance.
- ElevenLabs API: Powers the text-to-speech functionality, providing realistic and expressive AI-generated audio.
- OpenAI Whisper (local via Whisper-CPP): Utilized for highly accurate, local transcription of audio, enabling dynamic captioning.
- Shadcn UI & TailwindCSS: Provide a beautiful, responsive, and highly customizable component library and styling framework for the frontend.
- Node.js (>=18.17.0): The runtime environment for the backend services and Remotion rendering.
- pnpm: The package manager of choice for efficient dependency management and faster installations.
Architecture Highlights
The system employs a multi-flow architecture to manage its complex processes:
- Script Generation (Flow 1): Utilizes AI-SDK with a preference for Gemini 2.5 Flash, falling back to Grok Beta and GPT-4o Mini if needed, and finally a hardcoded template for resilience. This flow includes input validation, 1-hour caching for efficiency, schema validation, and metadata calculation.
- Audio Generation (Flow 2): Integrates the ElevenLabs API for converting AI-generated scripts into high-fidelity audio.
- Video Compilation (Flow 3): Involves local transcription using Whisper-CPP, dynamic Reddit-style overlay generation, and server-side video bundling and rendering through the Remotion CLI. This flow specifically bypasses Next.js SSR to avoid bundling conflicts, ensuring a stable rendering process.
This architecture, coupled with robust fallback and caching strategies for script generation, ensures high resilience against API failures and provides excellent observability into the system's operations.
Prerequisites
Before you begin, ensure your development environment meets the following requirements:
Node.js Version
Vidgen requires Node.js version 18.17.0 or higher. You can check your Node.js version by running node -v in your terminal. If you need to upgrade, consider using a version manager like nvm.
- pnpm: Ensure pnpm is installed globally. If not, you can install it via npm:
npm install -g pnpm. - API Keys: You will need API keys for the following services:
- Google Gemini API: Required for AI script generation. Set
GEMINI_API_KEYin your environment variables. - ElevenLabs API: Required for AI audio generation. Set
ELEVENLABS_API_KEYin your environment variables. - (Optional) Grok API: For an additional script generation fallback. Set
GROQ_API_KEY. - (Optional) OpenAI API: For an additional script generation fallback. Set
OPENAI_API_KEY.
- Google Gemini API: Required for AI script generation. Set
Local Setup and Installation
Follow these steps to get Vidgen up and running on your local machine:
1. Clone the Repository
First, clone the Vidgen repository to your local machine using Git:
git clone [repository-url]
cd [repository-name]Replace [repository-url] and [repository-name] with the actual URL and desired directory name of the Vidgen project.
2. Install Dependencies
Navigate into the project directory and install the necessary dependencies using pnpm:
pnpm install3. Install Whisper-CPP Locally
Vidgen uses Whisper-CPP for local audio transcription, which is crucial for generating subtitles. Install it using the provided Remotion script:
pnpm exec tsx remotion/scripts/install-whisper.mjsThis script will download and compile the necessary Whisper-CPP binaries.
Whisper-CPP Installation
The Whisper-CPP installation might take a few minutes, as it involves downloading a language model and compiling C++ code. Ensure you have the necessary build tools (e.g., make, g++ on Linux/macOS, or Visual Studio Build Tools on Windows) installed on your system.
4. Configure Environment Variables
Create a .env.local file in the root of your project and add your API keys. Make sure to replace YOUR_API_KEY with your actual keys:
GEMINI_API_KEY="YOUR_GEMINI_API_KEY"
ELEVENLABS_API_KEY="YOUR_ELEVENLABS_API_KEY"
GROQ_API_KEY="YOUR_GROQ_API_KEY" # Optional
OPENAI_API_KEY="YOUR_OPENAI_API_KEY" # OptionalRunning the Application
Once setup is complete, you can start the development server and begin using Vidgen.
1. Start the Next.js Development Server
Run the following command to start the Next.js application:
pnpm devThis will typically start the application on http://localhost:3000. Open this URL in your browser to access the Vidgen user interface.
2. Generate and Render a Video
Vidgen's video compilation and rendering process is handled server-side using Remotion CLI, which is important due to a known conflict with Next.js's bundling when using @remotion/tailwind-v4. Therefore, direct server-side rendering (SSR) via Next.js is bypassed for video output.
To render a video, you will use the Remotion CLI directly after generating the script and audio through the UI.
Important Remotion Rendering Note
Due to an incompatibility between Remotion's server-side rendering and Next.js's bundling with @remotion/tailwind-v4, video rendering must be performed using the Remotion CLI directly. This means you will interact with the UI to generate the script and audio, and then use a command-line tool to compile the final video file.
Quickstart: Generate Your First Video
Let's walk through the process of generating your first short-form video with Vidgen.
-
Access the UI: Open your browser and navigate to
http://localhost:3000. -
Input Your Prompt: In the Vidgen interface, you will find an input field. Enter a detailed text prompt describing the video you want to create. For example:
"A story about a cat who accidentally orders a thousand cans of tuna online and the hilarious aftermath."
-
Initiate Generation: Click the "Generate Video" or similar button within the UI. The application will then perform the following steps:
- AI Script Generation: The AI will generate a script based on your prompt.
- AI Audio Generation: The script will be converted into audio using ElevenLabs.
- Local Transcription: Whisper-CPP will transcribe the audio for captioning.
-
Review and Prepare for Render: After the script and audio are generated, the UI will display a preview or summary. At this point, the necessary assets for video compilation are ready.
-
Render the Video using Remotion CLI: Open a new terminal window (keep your
pnpm devprocess running in the first terminal). Execute the following command to render your video:
npx remotion render remotion/index.ts MyVideo output.mp4
* `remotion/index.ts`: Specifies the entry point for your Remotion composition.
* `MyVideo`: This is the ID of the composition defined in `remotion/Composition.tsx` that Remotion should render. (You might need to verify the exact composition ID within the Remotion project files).
* `output.mp4`: The desired output filename for your video.
<Callout>
<CalloutTitle>Customizing Output</CalloutTitle>
<CalloutDescription>
You can customize the output filename (`output.mp4`) to something more descriptive for your video. You can also pass additional props to your Remotion composition via the CLI if your video requires dynamic data, though for initial quickstart, the default setup often suffices.
</CalloutDescription>
</Callout>
6. **Locate Your Video**: Once the rendering process completes (which might take a few moments depending on your system and video length), your generated `.mp4` file will be saved in the root directory of your project (or a specified output path).
Congratulations! You've successfully generated your first AI-powered short-form video with Vidgen. Experiment with different prompts and explore the various features to unleash your creativity.