Welcome to Rosie’s
Updated my ComfyUI and discovers a few new templates. My attention caught new image generation platform LTX-2. Decided to try it out. After some fiddling got some results.
This is the clip generated by the default image-to-video template. Click the image to play the clip in a new page.

The source image was generated in the same ComfyUI using a simple custom template with JedPointReal checkpoint. The clip took about 4 minutes to generate. Clip length: 4 seconds.
Next Song
These results got me pretty much encouraged and I tried making another clip, this time trying to make the character sing. Well, singing was a problem, so I went back to talking. Here is the resulting clip. Click the image to play in in a new page. The source image for the i2v LTX-2 workflow was also generated using ComfyUI default HiDream I1 Full workflow and its default fp8 checkpoint. Resolution: 720×1280. Generation time: about 10 minutes.

While the voice was generated along with the clip by the default LTX-2 i2v workflow, the guitar strumming was added in using ffmpeg. Well, this one, while not being out of hands bad, is not something to write home about. Here, there is no mistaking that it was AI-generated. And it also has some quality issues, such as, e.g., changing the facial features and making the skin sort of rubbery and too wrinkly. I don’t know, perhaps fiddling with the prompts could improve the quality, but I am just giving this a fast go over. In any case, I think it’s good enough for some purposes.
Taste of Cousins
Next, I decided to check out how long generated clips could be. I tried a couple of custom workflows using GGUF checkpoints, but they all would OOM on anything longer than 5 seconds. I did not start investigating the reasons why, because I got an idea to try using the CPU as the main processing device instead of the GPU by starting ComfyUI with –novram parameter. It turned out to be the right direction and after several unsuccessful attempts I managed to generate this 10 second clip using the default LTX-2 i2v workflow. Click the image to watch the clip in a new page.

Not perfect, but good enough for memes and YouTube shorts not pretending to be realistic.
Using –novram parameter turned out to be good enough to allow using the standard Gemma gemma_3_12B_it.safetensors text encoder, the heaviest checkpoint (ltx-2-19b-dev.safetensors), and even string up two LoRAs at each stage. Practically all the memory use was in RAM (up to 75-80% of the total 128 GB), but VRAM was also getting used, up to about 50% of the 12 GB of the dedicated GPU at the refiner stage.