November 20, 2023
Here’s what caught our eye last week in AI.
Emu Video is a text-to-video model that runs in two-steps:
Fconditioned on both the first frame and the input text.
The diffusion model is a pretrained text-to-image model, with additional 1D convolution and attention layers to allow processing of a temporal dimension (see the Make-a-Video paper for details). It’s finetuned on text-to-video data, during which it is tasked with predicting future frames, given a starting frame and the input text. When combined with a temporal interpolation model, Emu Video can create 4-second long 16 fps videos, at a resolution of 512x512. You can see some samples here.
Emu Edit is a diffusion model that can perform a variety of image-editing and computer-vision tasks. It is trained on a new dataset consisting of tasks like:
To create the dataset, they leveraged the Llama2 LLM and the DINO self-supervised vision model. For example:
See the appendix of the paper for full details.
You can see some sample image edits here.
Training data contamination is the problem of test set data existing in the training set. To properly evaluate the generalization capabilities of LLMs, it is important to have a reliable method for detecting training data contamination. But this turns out to be a difficult problem. A new paper (Rethinking Benchmark and Contamination for Language Models with Rephrased Samples) introduces the following contamination detection technique:
They find that this approach outperforms other contamination detection methods like n-gram overlap and embedding similarity.
A new PyTorch blog post highlights how the Segment Anything Model (SAM) can be accelerated up to 8 times using new PyTorch features:
torch.compile, GPU quantization, Scaled Dot Product Attention, Semi-Structured Sparsity, NestedTensor, and a custom Triton kernel. See the details here.
Language models are not just for language processing; they’re being used to analyze genome data too. A new paper reviews how these models are applied to genome data for tasks like disease risk prediction. Check out the paper: To Transformers and Beyond: Large Language Models for the Genome.
Interested in future weekly updates? Stay up to date by joining our Slack Community!