Customization, Extensibility, and Contribution

Guidance on how to customize Vidgen to fit specific needs, extend its functionality, troubleshoot common issues, and contribute to the project's development.

This document provides comprehensive guidance on customizing, extending, and contributing to the Vidgen project. Whether you're looking to modify the user interface, integrate new AI models, enhance video compositions, or contribute to the core development, this guide will walk you through the process.

Customizing UI Components

Vidgen leverages Shadcn UI components built with TailwindCSS for a modular and highly customizable user interface. This architecture makes it straightforward to modify existing components or introduce new ones to match specific branding or functional requirements.

Modifying Existing Components

Shadcn UI components are designed to be easily themeable and extendable. You can typically find component definitions within your project's components/ui directory or directly within app components that utilize them. Styles are managed via TailwindCSS classes.

For example, to change the default button style, you would locate the Button component and adjust its className properties or extend its variants:

// components/ui/button.tsx

// ... (existing button component definition)

interface ButtonProps extends React.ButtonHTMLAttributes<HTMLButtonElement> {
  variant?: "default" | "destructive" | "outline" | "secondary" | "ghost" | "link" | "primaryCustom"; // Add your custom variant
  size?: "default" | "sm" | "lg" | "icon";
  asChild?: boolean;
}

const Button = React.forwardRef<HTMLButtonElement, ButtonProps>(
  ({ className, variant = "default", size = "default", asChild = false, ...props }, ref) => {
    const Comp = asChild ? Slot : "button";
    return (
      <Comp
        className={cn(
          buttonVariants({
            variant,
            size,
            className,
          }),
          variant === "primaryCustom" && "bg-gradient-to-r from-purple-500 to-indigo-600 text-white hover:from-purple-600 hover:to-indigo-700"
        )}
        ref={ref}
        {...props}
      />
    );
  }
);
Button.displayName = "Button";

Tip: Using cn Utility

The cn utility (from lib/utils.ts) is a powerful helper for conditionally combining TailwindCSS classes. It's recommended to use it for maintaining clean and readable class strings.

Creating New UI Elements

To introduce entirely new UI elements, you can follow the Shadcn UI pattern. Create a new .tsx file in components/ui for your component, define its props, and apply TailwindCSS for styling. Then, simply import and use it in your application.

Adding New Story Genres

Vidgen's core script generation is driven by AI models and specific prompt templates. You can easily extend the application to support new story genres or content formats by defining new prompts and integrating them into the generation flow.

Understanding Script Generation

The script generation logic resides primarily in app/actions/generate-script.ts. This action orchestrates the AI model interaction, input validation, caching, and schema validation.

Existing story templates, such as the Reddit story prompt, are located in lib/prompts/reddit-story.ts.

Steps to Add a New Genre:

Define a New Prompt Template: Create a new file, for example, lib/prompts/my-new-genre.ts. This file should export a prompt string (or a function returning a prompt string) that guides the AI on what kind of story to generate. Ensure the prompt clearly specifies the desired output format, ideally matching a schema.

// lib/prompts/my-new-genre.ts
import { z } from 'zod';

export const myNewGenreSchema = z.object({
  title: z.string().describe("The title of the story."),
  paragraphs: z.array(z.string()).describe("An array of paragraphs that make up the story body."),
  conclusion: z.string().describe("A concluding remark or summary."),
});

export const myNewGenrePrompt = `
  You are an expert storyteller. Generate a short, engaging story about a whimsical adventure.
  The story should be structured with a title, several paragraphs, and a clear conclusion.
  Ensure the language is imaginative and suitable for a short-form video.

  Output JSON in the following format:
  {
    "title": "string",
    "paragraphs": ["string", "string", ...],
    "conclusion": "string"
  }
`;

Integrate with generate-script.ts: Modify app/actions/generate-script.ts to include your new prompt. You'll likely want to add a new genre parameter to the generateScript function or an enum to select different genres.

// app/actions/generate-script.ts
import { myNewGenrePrompt, myNewGenreSchema } from '@/lib/prompts/my-new-genre';
import { redditStoryPrompt, redditStorySchema } from '@/lib/prompts/reddit-story';
// ... other imports

export type StoryGenre = 'reddit' | 'my-new-genre'; // Define new genre type

export async function generateScript(
  input: z.infer<typeof generateScriptInputSchema>
) {
  // ... (existing logic)

  let promptToUse;
  let schemaToUse;

  switch (input.genre) {
    case 'reddit':
      promptToUse = redditStoryPrompt(input.prompt);
      schemaToUse = redditStorySchema;
      break;
    case 'my-new-genre':
      promptToUse = myNewGenrePrompt;
      schemaToUse = myNewGenreSchema;
      break;
    default:
      throw new Error('Unknown story genre');
  }

  // ... (rest of the script generation logic using promptToUse and schemaToUse)
}

Update UI: Ensure your frontend interface allows users to select the new genre, passing the appropriate genre parameter to the generateScript action.

Integrating Alternative AI Models or APIs

Vidgen is built with flexibility in mind regarding AI model integration. The current setup utilizes AI-SDK for script generation with a multi-model fallback, and ElevenLabs API for audio generation. You can swap these out or add new services.

Script Generation (AI-SDK)

The app/actions/generate-script.ts file is the central point for AI-driven script generation. It currently supports Gemini 2.5 Flash, Grok Beta, and GPT-4o Mini via AI-SDK.

To integrate a new model:

Install necessary AI-SDK packages: If the new model requires a different provider, install the corresponding @ai-sdk/ package (e.g., @ai-sdk/ollama for Ollama).
Configure API Keys: Add new environment variables for the new model's API key if required (e.g., OLLAMA_API_KEY).

Modify generate-script.ts: Adjust the createAIFunction call to include your new model or modify the fallback logic. AI-SDK allows you to configure different models easily.

// app/actions/generate-script.ts
import { createAIFunction, generate, tool } from 'ai';
import { geminiFromGoogle } from '@ai-sdk/google'; // Assuming you added a new model like 'google'
import { openai } from '@ai-sdk/openai';
import { groq } from '@ai-sdk/groq';

const ai = createAIFunction({ // Example of adding a new AI model
  model: groq('grok-1-preview'), // Primary model
  fallback: [
    geminiFromGoogle('gemini-2.5-flash-latest'), // Fallback 1
    openai('gpt-4o-mini'), // Fallback 2
    // Add another fallback here, e.g., using a different Google model or another provider
    geminiFromGoogle('gemini-1.5-pro'),
  ],
});

// ... rest of the generateScript function

Audio Generation (ElevenLabs)

Audio generation is handled in app/actions/generate-audio.ts, which directly calls the ElevenLabs API. To switch to a different text-to-speech (TTS) service:

Choose a New TTS API: Select an alternative TTS provider (e.g., Google Cloud Text-to-Speech, Amazon Polly).
Configure API Keys: Add environment variables for the new service's credentials.

Modify generate-audio.ts: Replace the ElevenLabs API call with the API call for your chosen service. Ensure the output format (typically an audio stream or URL) is handled correctly.

// app/actions/generate-audio.ts
// import { ElevenLabsClient } from 'elevenlabs'; // Remove or comment out
// import { ELEVENLABS_API_KEY } from '@/lib/env'; // Adjust if needed

import { textToSpeech as googleTTS } from '@google-cloud/text-to-speech'; // Example for Google TTS
import * as fs from 'node:fs';
import * as util from 'node:util';
import { GOOGLE_TTS_API_KEY } from '@/lib/env'; // New env var

export async function generateAudio(text: string): Promise<string> {
  // Initialize Google TTS client
  const client = new googleTTS.TextToSpeechClient({
    credentials: JSON.parse(GOOGLE_TTS_API_KEY) // Assuming JSON key file content
  });

  const [response] = await client.synthesizeSpeech({
    input: { text: text },
    voice: { languageCode: 'en-US', ssmlGender: 'NEUTRAL' },
    audioConfig: { audioEncoding: 'MP3' },
  });

  // Write the binary audio content to a temporary file
  const writeFile = util.promisify(fs.writeFile);
  const audioFilePath = `/tmp/audio-${Date.now()}.mp3`;
  await writeFile(audioFilePath, response.audioContent, 'binary');

  // In a real application, you might upload this to a CDN and return a URL
  return audioFilePath; // For local testing, return path
}

Important: Environment Variables

Always store API keys and sensitive credentials in environment variables (e.g., in a .env.local file). Do not hardcode them directly into your codebase.

Extending Video Compositions

Vidgen uses Remotion for server-side video compilation and rendering. The core video logic resides in the remotion/ directory. Extending compositions involves modifying existing components or creating new ones.

Key Files for Video Compositions:

remotion/index.ts: Defines the main Remotion composition(s).
remotion/Composition.tsx: The main React component for the video, where different visual elements are orchestrated.
remotion/RedditOverlay.tsx: A specific component for the Reddit-style overlay.
remotion/CaptionText.tsx: Renders individual caption segments.
remotion/TiktokCaptions.tsx: Manages the dynamic display of TikTok-style subtitles.
remotion.config.ts: Remotion configuration file.

Adding New Visual Elements or Animations

To add new elements to your video:

Create a New Remotion Component: Create a new .tsx file in remotion/ (e.g., remotion/MyCustomElement.tsx). This component will receive props for data and timing.

// remotion/MyCustomElement.tsx
import React from 'react';
import { AbsoluteFill, interpolate, useCurrentFrame, useVideoConfig } from 'remotion';

interface MyCustomElementProps {
  text: string;
  startFrame: number;
  endFrame: number;
}

export const MyCustomElement: React.FC<MyCustomElementProps> = ({
  text, startFrame, endFrame
}) => {
  const frame = useCurrentFrame();
  const { fps } = useVideoConfig();

  const opacity = interpolate(
    frame,
    [startFrame, startFrame + fps * 0.5, endFrame - fps * 0.5, endFrame],
    [0, 1, 1, 0],
    {
      extrapolateLeft: 'clamp',
      extrapolateRight: 'clamp',
    }
  );

  return (
    <AbsoluteFill style={{ opacity }} className="justify-center items-center">
      <h1 className="text-white text-6xl font-bold drop-shadow-lg">
        {text}
      </h1>
    </AbsoluteFill>
  );
};

Integrate into Composition.tsx: Import your new component and render it within the main Composition.tsx, passing relevant data and timing information.

// remotion/Composition.tsx
import React from 'react';
import { AbsoluteFill, Series, staticFile } from 'remotion';
import { RedditOverlay } from './RedditOverlay';
import { TiktokCaptions } from './TiktokCaptions';
import { MyCustomElement } from './MyCustomElement'; // Import your new component

interface MyVideoProps {
  story: { title: string; paragraphs: string[]; conclusion: string };
  audioSrc: string;
  durationInFrames: number;
  // ... other props
}

export const Composition: React.FC<MyVideoProps> = ({
  story, audioSrc, durationInFrames
}) => {
  return (
    <AbsoluteFill className="bg-gray-900">
      {/* Background video or image */}
      <Video src={staticFile('background.mp4')} />

      <Series>
        {/* Existing elements */}
        <Series.Sequence durationInFrames={durationInFrames}>
          <RedditOverlay story={story} />
          <TiktokCaptions captions={story.captions} /> {/* Assuming captions are part of story data */}
          {/* Add your new element */}
          <MyCustomElement text="Welcome to Vidgen!" startFrame={0} endFrame={3 * 30} />
        </Series.Sequence>
      </Series>
      {/* Audio track */}
      <Audio src={audioSrc} />
    </AbsoluteFill>
  );
};

Important: Remotion Rendering

Due to known conflicts with Next.js's bundling and @remotion/tailwind-v4, server-side video rendering requires using the Remotion CLI directly. Run npx remotion render remotion/index.ts MyVideo output.mp4 to render your compositions. For production, consider using @remotion/lambda for scalable and conflict-free rendering.

Troubleshooting Common Issues

Here are some common issues you might encounter and their solutions:

Remotion and Next.js Bundling Conflicts

Issue: Remotion's server-side rendering (SSR) does not work seamlessly with Next.js when using @remotion/tailwind-v4 and certain bundler configurations.

Solution: Avoid direct Next.js SSR for video rendering. Instead, use the Remotion CLI for rendering. This is explicitly handled in the project by instructing users to run npx remotion render.

Missing or Incorrect Environment Variables

Issue: AI actions or audio generation fail with authentication errors or unexpected behavior.

Solution: Verify that all required environment variables (GEMINI_API_KEY, ELEVENLABS_API_KEY, etc.) are correctly set in your .env.local file and are being loaded by the application. Restart your development server after modifying .env.local.

Whisper-CPP Installation Issues

Issue: Local transcription fails or remotion/scripts/generate-captions.ts encounters errors.

Solution: Ensure Whisper-CPP is correctly installed. Run pnpm install and then remotion/scripts/install-whisper.mjs to install it. Check the console output during installation for any errors. You might need specific build tools on your system (e.g., build-essential on Debian/Ubuntu, Xcode Command Line Tools on macOS).

API Rate Limits or Quotas

Issue: AI or audio generation requests start failing after a certain number of calls.

Solution: This typically indicates hitting API rate limits or exceeding usage quotas. Check the documentation and your dashboard for the respective AI/audio providers (Gemini, ElevenLabs) for current limits and your usage. The script generation flow includes caching, which helps mitigate this for repeated prompts.

Contributing to Vidgen

We welcome contributions to Vidgen! By contributing, you help improve the project for everyone. Please follow these guidelines to ensure a smooth contribution process.

How to Contribute

Fork the Repository: Start by forking the main Vidgen repository to your GitHub account.
Clone Your Fork: Clone your forked repository to your local machine:
```
git clone https://github.com/YOUR_USERNAME/vidgen.git
```
Install Dependencies: Navigate into the project directory and install dependencies using pnpm:
```
cd vidgen
pnpm install
```

Install Whisper-CPP:

node remotion/scripts/install-whisper.mjs

Create a New Branch: Create a new branch for your feature or bug fix:
```
git checkout -b feature/your-feature-name
```
Make Your Changes: Implement your changes, following the existing code style and best practices.
Test Your Changes: Thoroughly test your changes to ensure they work as expected and don't introduce regressions.
Commit Your Changes: Commit your changes with a clear and concise commit message:
```
git commit -m "feat: Add new story genre for whimsical adventures"
```
Push to Your Fork: Push your branch to your forked repository:
```
git push origin feature/your-feature-name
```
Open a Pull Request (PR): Go to the original Vidgen repository on GitHub and open a new Pull Request from your branch. Provide a detailed description of your changes.

Code Style and Standards

Language: TypeScript is primarily used throughout the project. Please adhere to TypeScript best practices.
Formatting: The project uses Prettier for code formatting and ESLint for linting. Ensure your code passes lint checks before submitting a PR.
Comments: Use comments sparingly to explain complex logic, but prefer self-documenting code.

Reporting Bugs and Suggesting Features

Bug Reports: If you find a bug, please open an issue on the GitHub repository. Provide a clear description, steps to reproduce, and any relevant error messages or screenshots.
Feature Requests: We welcome ideas for new features! Open an issue to describe your suggestion, its potential benefits, and how it might fit into the existing architecture.