Unlimiformer: Long-Range Transformers with Unlimited Length Input
Unlimiformer improves pretrained models such as BART and Longformer by extending them to unlimited inputs without additional learned weights and without modifying their code (via kNN-search)
https://arxiv.org/abs/2305.01625
Unlimiformer improves pretrained models such as BART and Longformer by extending them to unlimited inputs without additional learned weights and without modifying their code (via kNN-search)
https://arxiv.org/abs/2305.01625
https://arxiv.org/abs/2305.07759
TinyStories: 3-30M (not G) parameter model with coherent English from a curated dataset.
Don't expect it to code but curious if this is usable as a LoRA or similar baseline - also need to look closer at their tokenizer setup, must be way different from GPT
TinyStories: 3-30M (not G) parameter model with coherent English from a curated dataset.
Don't expect it to code but curious if this is usable as a LoRA or similar baseline - also need to look closer at their tokenizer setup, must be way different from GPT
phi-1: with datasets of higher quality, model with 1.3B parameters and 7B tokens can be quite competitive to gpt4 and other 100x larger models on coding tasks
https://arxiv.org/abs/2306.11644
https://arxiv.org/abs/2306.11644
Welcome to the dark side of the cyberpunk.
Once-theoretical timing attacks are now a reality.
(TLDR: Cops still can't decrypt the messages, but they can track who's chatting with whom comparing the small spikes of traffic as the message gets delivered)
https://wz.ax/timing-is-real
Once-theoretical timing attacks are now a reality.
(TLDR: Cops still can't decrypt the messages, but they can track who's chatting with whom comparing the small spikes of traffic as the message gets delivered)
https://wz.ax/timing-is-real
NY Times
Cracking Down on Dissent, Russia Seeds a Surveillance Supply Chain
Russia is incubating a cottage industry of new digital surveillance tools to suppress domestic opposition to the war in Ukraine. The tech may also be sold overseas.
🤯2
why do they call this an ‘attack’? this is the way to set the model free!
(TLDR: DAN prompt generator)
https://arxiv.org/abs/2307.15043
https://llm-attacks.org
(TLDR: DAN prompt generator)
https://arxiv.org/abs/2307.15043
https://llm-attacks.org
Masked Trajectory Models for Prediction, Representation, and Control
TLDR: Transformers using state space and action embeddings as tokens are better at RL than, um, RL algorithms. Oops.
https://arxiv.org/abs/2305.02968
TLDR: Transformers using state space and action embeddings as tokens are better at RL than, um, RL algorithms. Oops.
https://arxiv.org/abs/2305.02968
Pushing the Limits of Machine Design: Automated CPU Design with AI
By efficiently exploring a search space of unprecedented size 10^10^540 (note: reducible to 10^6), and thus pushing the limits of machine design, our approach generates an industrial-scale RISC-V CPU within only 5 hours.
> which sheds some light on building a self-evolving machine to beat theCPU designed by humanity eventually.
> The automatically designed CPU was sent to the manufacturer in December 2021
https://arxiv.org/abs/2306.12456
By efficiently exploring a search space of unprecedented size 10^10^540 (note: reducible to 10^6), and thus pushing the limits of machine design, our approach generates an industrial-scale RISC-V CPU within only 5 hours.
> which sheds some light on building a self-evolving machine to beat the
> The automatically designed CPU was sent to the manufacturer in December 2021
https://arxiv.org/abs/2306.12456
🤩1
Teaching Arithmetic to Small Transformers
an interesting detailed write-up
https://arxiv.org/abs/2307.03381
an interesting detailed write-up
https://arxiv.org/abs/2307.03381
Transformers ain't everything
tree-based methods still outperform DL on tabular data and medium (~10k samples) datasets
https://hal.science/hal-03723551
tree-based methods still outperform DL on tabular data and medium (~10k samples) datasets
https://hal.science/hal-03723551
hal.science
Why do tree-based models still outperform deep learning on typical tabular data?
While deep learning has enabled tremendous progress on text and image datasets, its superiority on tabular data is not clear. We contribute extensive benchmarks of standard and novel deep learning methods as well as tree-based models such as XGBoost and Random…
https://arxiv.org/abs/2308.16898
Transformers as Support Vector Machines
TLDR ~ transformer layers are SVMs w/gradient-trainable global convergence, when a) overparameterized b) have nonlinear heads
my remark: this explains
1) why huge models are important (so the gradient is high-dimensional enough to be monotonic)
2) why attention (aka connections, aka indirections) is trainable at all;
and says nothing about why they might generalize the dataset
Transformers as Support Vector Machines
TLDR ~ transformer layers are SVMs w/gradient-trainable global convergence, when a) overparameterized b) have nonlinear heads
my remark: this explains
1) why huge models are important (so the gradient is high-dimensional enough to be monotonic)
2) why attention (aka connections, aka indirections) is trainable at all;
and says nothing about why they might generalize the dataset
lucid and useful example of the beginner's mindset https://www.oneusefulthing.org/p/embracing-weirdness-what-it-means
www.oneusefulthing.org
Embracing weirdness: What it means to use AI as a (writing) tool
AI is strange. We need to learn to use it.
ASMesh: Anonymous and Secure Messaging in Mesh Networks Using Stronger, Anonymous Double Ratchet
https://eprint.iacr.org/2023/1053
interesting; though unclear re:traffic/metadata analysis and power consumption as the protocol is rather chatty
https://eprint.iacr.org/2023/1053
interesting; though unclear re:traffic/metadata analysis and power consumption as the protocol is rather chatty
NExT-GPT: Any-to-Any Multimodal LLM
TLDR: Vicuna (fine-tuned Llama) extended with basic image, audio and video understanding and generation. Impressive demo videos, the Gradio-based demo is currently broken though. Anybody up to deploy this somewhere to play with?
https://next-gpt.github.io/ + https://arxiv.org/pdf/2309.05519.pdf
TLDR: Vicuna (fine-tuned Llama) extended with basic image, audio and video understanding and generation. Impressive demo videos, the Gradio-based demo is currently broken though. Anybody up to deploy this somewhere to play with?
https://next-gpt.github.io/ + https://arxiv.org/pdf/2309.05519.pdf
❤1
https://blog.isosceles.com/the-webp-0day/ TLDR fixed in upstream, will be with us w/libwebp for some time
Isosceles Blog
The WebP 0day
Early last week, Google released a new stable update for Chrome. The update included a single security fix that was reported by Apple's Security Engineering and Architecture (SEAR) team. The issue, CVE-2023-4863, was a heap buffer overflow in the WebP image…
🔥1
A stunning example of how efficient market and arbitrage trades enabled by mobile phones help to avoid waste, stabilize prices and increase welfare (Fish markets in Kerala, India, 1997-2001)
https://www.jstor.org/stable/25098864
https://www.jstor.org/stable/25098864
Mistral-7B is seriously good. And fast. And works with llama.cpp already!
(except for the sliding window attention part, so limited to 4-8K or so)
https://twitter.com/arthurmensch/status/1707041835091165580
(except for the sliding window attention part, so limited to 4-8K or so)
https://twitter.com/arthurmensch/status/1707041835091165580
X (formerly Twitter)
Arthur Mensch on X
At @MistralAI we're releasing our very first model, the best 7B in town (outperforming Llama 13B on all metrics, and good at code), Apache 2.0.
We believe in open models and we'll push them to the frontier https://t.co/HjT5Xrvpxs
Very proud of the team…
We believe in open models and we'll push them to the frontier https://t.co/HjT5Xrvpxs
Very proud of the team…
Oh. https://ml-jku.github.io/hopfield-layers/
via @kuu_channel
It’s beautiful. I wonder where’s the catch. I.e. why Llama et al don’t use Hopfield layers
via @kuu_channel
It’s beautiful. I wonder where’s the catch. I.e. why Llama et al don’t use Hopfield layers
hopfield-layers
Hopfield Networks is All You Need
Blog post