Customization, Extensibility, and Contribution
Guidance on how to customize Vidgen to fit specific needs, extend its functionality, troubleshoot common issues, and contribute to the project's development.
This document provides comprehensive guidance on customizing, extending, and contributing to the Vidgen project. Whether you're looking to modify the user interface, integrate new AI models, enhance video compositions, or contribute to the core development, this guide will walk you through the process.
Customizing UI Components
Vidgen leverages Shadcn UI components built with TailwindCSS for a modular and highly customizable user interface. This architecture makes it straightforward to modify existing components or introduce new ones to match specific branding or functional requirements.
Modifying Existing Components
Shadcn UI components are designed to be easily themeable and extendable. You can typically find component definitions within your project's components/ui directory or directly within app components that utilize them. Styles are managed via TailwindCSS classes.
For example, to change the default button style, you would locate the Button component and adjust its className properties or extend its variants:
// components/ui/button.tsx
// ... (existing button component definition)
interface ButtonProps extends React.ButtonHTMLAttributes<HTMLButtonElement> {
variant?: "default" | "destructive" | "outline" | "secondary" | "ghost" | "link" | "primaryCustom"; // Add your custom variant
size?: "default" | "sm" | "lg" | "icon";
asChild?: boolean;
}
const Button = React.forwardRef<HTMLButtonElement, ButtonProps>(
({ className, variant = "default", size = "default", asChild = false, ...props }, ref) => {
const Comp = asChild ? Slot : "button";
return (
<Comp
className={cn(
buttonVariants({
variant,
size,
className,
}),
variant === "primaryCustom" && "bg-gradient-to-r from-purple-500 to-indigo-600 text-white hover:from-purple-600 hover:to-indigo-700"
)}
ref={ref}
{...props}
/>
);
}
);
Button.displayName = "Button";Tip: Using cn Utility
The cn utility (from lib/utils.ts) is a powerful helper for conditionally combining TailwindCSS classes. It's recommended to use it for maintaining clean and readable class strings.
Creating New UI Elements
To introduce entirely new UI elements, you can follow the Shadcn UI pattern. Create a new .tsx file in components/ui for your component, define its props, and apply TailwindCSS for styling. Then, simply import and use it in your application.
Adding New Story Genres
Vidgen's core script generation is driven by AI models and specific prompt templates. You can easily extend the application to support new story genres or content formats by defining new prompts and integrating them into the generation flow.
Understanding Script Generation
The script generation logic resides primarily in app/actions/generate-script.ts. This action orchestrates the AI model interaction, input validation, caching, and schema validation.
Existing story templates, such as the Reddit story prompt, are located in lib/prompts/reddit-story.ts.
Steps to Add a New Genre:
-
Define a New Prompt Template: Create a new file, for example,
lib/prompts/my-new-genre.ts. This file should export a prompt string (or a function returning a prompt string) that guides the AI on what kind of story to generate. Ensure the prompt clearly specifies the desired output format, ideally matching a schema.// lib/prompts/my-new-genre.ts import { z } from 'zod'; export const myNewGenreSchema = z.object({ title: z.string().describe("The title of the story."), paragraphs: z.array(z.string()).describe("An array of paragraphs that make up the story body."), conclusion: z.string().describe("A concluding remark or summary."), }); export const myNewGenrePrompt = ` You are an expert storyteller. Generate a short, engaging story about a whimsical adventure. The story should be structured with a title, several paragraphs, and a clear conclusion. Ensure the language is imaginative and suitable for a short-form video. Output JSON in the following format: { "title": "string", "paragraphs": ["string", "string", ...], "conclusion": "string" } `; -
Integrate with
generate-script.ts: Modifyapp/actions/generate-script.tsto include your new prompt. You'll likely want to add a newgenreparameter to thegenerateScriptfunction or an enum to select different genres.// app/actions/generate-script.ts import { myNewGenrePrompt, myNewGenreSchema } from '@/lib/prompts/my-new-genre'; import { redditStoryPrompt, redditStorySchema } from '@/lib/prompts/reddit-story'; // ... other imports export type StoryGenre = 'reddit' | 'my-new-genre'; // Define new genre type export async function generateScript( input: z.infer<typeof generateScriptInputSchema> ) { // ... (existing logic) let promptToUse; let schemaToUse; switch (input.genre) { case 'reddit': promptToUse = redditStoryPrompt(input.prompt); schemaToUse = redditStorySchema; break; case 'my-new-genre': promptToUse = myNewGenrePrompt; schemaToUse = myNewGenreSchema; break; default: throw new Error('Unknown story genre'); } // ... (rest of the script generation logic using promptToUse and schemaToUse) } -
Update UI: Ensure your frontend interface allows users to select the new genre, passing the appropriate
genreparameter to thegenerateScriptaction.
Integrating Alternative AI Models or APIs
Vidgen is built with flexibility in mind regarding AI model integration. The current setup utilizes AI-SDK for script generation with a multi-model fallback, and ElevenLabs API for audio generation. You can swap these out or add new services.
Script Generation (AI-SDK)
The app/actions/generate-script.ts file is the central point for AI-driven script generation. It currently supports Gemini 2.5 Flash, Grok Beta, and GPT-4o Mini via AI-SDK.
To integrate a new model:
-
Install necessary AI-SDK packages: If the new model requires a different provider, install the corresponding
@ai-sdk/package (e.g.,@ai-sdk/ollamafor Ollama). -
Configure API Keys: Add new environment variables for the new model's API key if required (e.g.,
OLLAMA_API_KEY). -
Modify
generate-script.ts: Adjust thecreateAIFunctioncall to include your new model or modify the fallback logic. AI-SDK allows you to configure different models easily.// app/actions/generate-script.ts import { createAIFunction, generate, tool } from 'ai'; import { geminiFromGoogle } from '@ai-sdk/google'; // Assuming you added a new model like 'google' import { openai } from '@ai-sdk/openai'; import { groq } from '@ai-sdk/groq'; const ai = createAIFunction({ // Example of adding a new AI model model: groq('grok-1-preview'), // Primary model fallback: [ geminiFromGoogle('gemini-2.5-flash-latest'), // Fallback 1 openai('gpt-4o-mini'), // Fallback 2 // Add another fallback here, e.g., using a different Google model or another provider geminiFromGoogle('gemini-1.5-pro'), ], }); // ... rest of the generateScript function
Audio Generation (ElevenLabs)
Audio generation is handled in app/actions/generate-audio.ts, which directly calls the ElevenLabs API. To switch to a different text-to-speech (TTS) service:
-
Choose a New TTS API: Select an alternative TTS provider (e.g., Google Cloud Text-to-Speech, Amazon Polly).
-
Configure API Keys: Add environment variables for the new service's credentials.
-
Modify
generate-audio.ts: Replace the ElevenLabs API call with the API call for your chosen service. Ensure the output format (typically an audio stream or URL) is handled correctly.// app/actions/generate-audio.ts // import { ElevenLabsClient } from 'elevenlabs'; // Remove or comment out // import { ELEVENLABS_API_KEY } from '@/lib/env'; // Adjust if needed import { textToSpeech as googleTTS } from '@google-cloud/text-to-speech'; // Example for Google TTS import * as fs from 'node:fs'; import * as util from 'node:util'; import { GOOGLE_TTS_API_KEY } from '@/lib/env'; // New env var export async function generateAudio(text: string): Promise<string> { // Initialize Google TTS client const client = new googleTTS.TextToSpeechClient({ credentials: JSON.parse(GOOGLE_TTS_API_KEY) // Assuming JSON key file content }); const [response] = await client.synthesizeSpeech({ input: { text: text }, voice: { languageCode: 'en-US', ssmlGender: 'NEUTRAL' }, audioConfig: { audioEncoding: 'MP3' }, }); // Write the binary audio content to a temporary file const writeFile = util.promisify(fs.writeFile); const audioFilePath = `/tmp/audio-${Date.now()}.mp3`; await writeFile(audioFilePath, response.audioContent, 'binary'); // In a real application, you might upload this to a CDN and return a URL return audioFilePath; // For local testing, return path }
Important: Environment Variables
Always store API keys and sensitive credentials in environment variables (e.g., in a .env.local file). Do not hardcode them directly into your codebase.
Extending Video Compositions
Vidgen uses Remotion for server-side video compilation and rendering. The core video logic resides in the remotion/ directory. Extending compositions involves modifying existing components or creating new ones.
Key Files for Video Compositions:
remotion/index.ts: Defines the main Remotion composition(s).remotion/Composition.tsx: The main React component for the video, where different visual elements are orchestrated.remotion/RedditOverlay.tsx: A specific component for the Reddit-style overlay.remotion/CaptionText.tsx: Renders individual caption segments.remotion/TiktokCaptions.tsx: Manages the dynamic display of TikTok-style subtitles.remotion.config.ts: Remotion configuration file.
Adding New Visual Elements or Animations
To add new elements to your video:
-
Create a New Remotion Component: Create a new
.tsxfile inremotion/(e.g.,remotion/MyCustomElement.tsx). This component will receive props for data and timing.// remotion/MyCustomElement.tsx import React from 'react'; import { AbsoluteFill, interpolate, useCurrentFrame, useVideoConfig } from 'remotion'; interface MyCustomElementProps { text: string; startFrame: number; endFrame: number; } export const MyCustomElement: React.FC<MyCustomElementProps> = ({ text, startFrame, endFrame }) => { const frame = useCurrentFrame(); const { fps } = useVideoConfig(); const opacity = interpolate( frame, [startFrame, startFrame + fps * 0.5, endFrame - fps * 0.5, endFrame], [0, 1, 1, 0], { extrapolateLeft: 'clamp', extrapolateRight: 'clamp', } ); return ( <AbsoluteFill style={{ opacity }} className="justify-center items-center"> <h1 className="text-white text-6xl font-bold drop-shadow-lg"> {text} </h1> </AbsoluteFill> ); }; -
Integrate into
Composition.tsx: Import your new component and render it within the mainComposition.tsx, passing relevant data and timing information.// remotion/Composition.tsx import React from 'react'; import { AbsoluteFill, Series, staticFile } from 'remotion'; import { RedditOverlay } from './RedditOverlay'; import { TiktokCaptions } from './TiktokCaptions'; import { MyCustomElement } from './MyCustomElement'; // Import your new component interface MyVideoProps { story: { title: string; paragraphs: string[]; conclusion: string }; audioSrc: string; durationInFrames: number; // ... other props } export const Composition: React.FC<MyVideoProps> = ({ story, audioSrc, durationInFrames }) => { return ( <AbsoluteFill className="bg-gray-900"> {/* Background video or image */} <Video src={staticFile('background.mp4')} /> <Series> {/* Existing elements */} <Series.Sequence durationInFrames={durationInFrames}> <RedditOverlay story={story} /> <TiktokCaptions captions={story.captions} /> {/* Assuming captions are part of story data */} {/* Add your new element */} <MyCustomElement text="Welcome to Vidgen!" startFrame={0} endFrame={3 * 30} /> </Series.Sequence> </Series> {/* Audio track */} <Audio src={audioSrc} /> </AbsoluteFill> ); };
Important: Remotion Rendering
Due to known conflicts with Next.js's bundling and @remotion/tailwind-v4, server-side video rendering requires using the Remotion CLI directly. Run npx remotion render remotion/index.ts MyVideo output.mp4 to render your compositions. For production, consider using @remotion/lambda for scalable and conflict-free rendering.
Troubleshooting Common Issues
Here are some common issues you might encounter and their solutions:
Remotion and Next.js Bundling Conflicts
Issue: Remotion's server-side rendering (SSR) does not work seamlessly with Next.js when using @remotion/tailwind-v4 and certain bundler configurations.
Solution: Avoid direct Next.js SSR for video rendering. Instead, use the Remotion CLI for rendering. This is explicitly handled in the project by instructing users to run npx remotion render.
Missing or Incorrect Environment Variables
Issue: AI actions or audio generation fail with authentication errors or unexpected behavior.
Solution: Verify that all required environment variables (GEMINI_API_KEY, ELEVENLABS_API_KEY, etc.) are correctly set in your .env.local file and are being loaded by the application. Restart your development server after modifying .env.local.
Whisper-CPP Installation Issues
Issue: Local transcription fails or remotion/scripts/generate-captions.ts encounters errors.
Solution: Ensure Whisper-CPP is correctly installed. Run pnpm install and then remotion/scripts/install-whisper.mjs to install it. Check the console output during installation for any errors. You might need specific build tools on your system (e.g., build-essential on Debian/Ubuntu, Xcode Command Line Tools on macOS).
API Rate Limits or Quotas
Issue: AI or audio generation requests start failing after a certain number of calls.
Solution: This typically indicates hitting API rate limits or exceeding usage quotas. Check the documentation and your dashboard for the respective AI/audio providers (Gemini, ElevenLabs) for current limits and your usage. The script generation flow includes caching, which helps mitigate this for repeated prompts.
Contributing to Vidgen
We welcome contributions to Vidgen! By contributing, you help improve the project for everyone. Please follow these guidelines to ensure a smooth contribution process.
How to Contribute
- Fork the Repository: Start by forking the main Vidgen repository to your GitHub account.
- Clone Your Fork: Clone your forked repository to your local machine:
git clone https://github.com/YOUR_USERNAME/vidgen.git - Install Dependencies: Navigate into the project directory and install dependencies using pnpm:
cd vidgen pnpm install - Install Whisper-CPP:
node remotion/scripts/install-whisper.mjs - Create a New Branch: Create a new branch for your feature or bug fix:
git checkout -b feature/your-feature-name - Make Your Changes: Implement your changes, following the existing code style and best practices.
- Test Your Changes: Thoroughly test your changes to ensure they work as expected and don't introduce regressions.
- Commit Your Changes: Commit your changes with a clear and concise commit message:
git commit -m "feat: Add new story genre for whimsical adventures" - Push to Your Fork: Push your branch to your forked repository:
git push origin feature/your-feature-name - Open a Pull Request (PR): Go to the original Vidgen repository on GitHub and open a new Pull Request from your branch. Provide a detailed description of your changes.
Code Style and Standards
- Language: TypeScript is primarily used throughout the project. Please adhere to TypeScript best practices.
- Formatting: The project uses Prettier for code formatting and ESLint for linting. Ensure your code passes lint checks before submitting a PR.
- Comments: Use comments sparingly to explain complex logic, but prefer self-documenting code.
Reporting Bugs and Suggesting Features
- Bug Reports: If you find a bug, please open an issue on the GitHub repository. Provide a clear description, steps to reproduce, and any relevant error messages or screenshots.
- Feature Requests: We welcome ideas for new features! Open an issue to describe your suggestion, its potential benefits, and how it might fit into the existing architecture.
Code of Conduct
Vidgen adheres to a Code of Conduct that all contributors are expected to follow. Please read it to understand the standards of behavior we uphold.
License Information
This project is open-source and distributed under the MIT License. By contributing, you agree that your contributions will be licensed under the same terms.
Core Architecture and Video Generation Workflow
A deep dive into Vidgen's high-level architecture, detailing the multi-stage process from prompt to final video, including AI model selection strategies, audio generation, and video composition.
Video Rendering and Deployment Strategies
Explores Vidgen's video rendering mechanisms using Remotion, addressing common challenges like server-side rendering in Next.js, and outlining best practices for deploying Vidgen to production environments.