Transformers ain't everything
tree-based methods still outperform DL on tabular data and medium (~10k samples) datasets
https://hal.science/hal-03723551
tree-based methods still outperform DL on tabular data and medium (~10k samples) datasets
https://hal.science/hal-03723551
hal.science
Why do tree-based models still outperform deep learning on typical tabular data?
While deep learning has enabled tremendous progress on text and image datasets, its superiority on tabular data is not clear. We contribute extensive benchmarks of standard and novel deep learning methods as well as tree-based models such as XGBoost and Random…
https://arxiv.org/abs/2308.16898
Transformers as Support Vector Machines
TLDR ~ transformer layers are SVMs w/gradient-trainable global convergence, when a) overparameterized b) have nonlinear heads
my remark: this explains
1) why huge models are important (so the gradient is high-dimensional enough to be monotonic)
2) why attention (aka connections, aka indirections) is trainable at all;
and says nothing about why they might generalize the dataset
Transformers as Support Vector Machines
TLDR ~ transformer layers are SVMs w/gradient-trainable global convergence, when a) overparameterized b) have nonlinear heads
my remark: this explains
1) why huge models are important (so the gradient is high-dimensional enough to be monotonic)
2) why attention (aka connections, aka indirections) is trainable at all;
and says nothing about why they might generalize the dataset
lucid and useful example of the beginner's mindset https://www.oneusefulthing.org/p/embracing-weirdness-what-it-means
www.oneusefulthing.org
Embracing weirdness: What it means to use AI as a (writing) tool
AI is strange. We need to learn to use it.
ASMesh: Anonymous and Secure Messaging in Mesh Networks Using Stronger, Anonymous Double Ratchet
https://eprint.iacr.org/2023/1053
interesting; though unclear re:traffic/metadata analysis and power consumption as the protocol is rather chatty
https://eprint.iacr.org/2023/1053
interesting; though unclear re:traffic/metadata analysis and power consumption as the protocol is rather chatty
NExT-GPT: Any-to-Any Multimodal LLM
TLDR: Vicuna (fine-tuned Llama) extended with basic image, audio and video understanding and generation. Impressive demo videos, the Gradio-based demo is currently broken though. Anybody up to deploy this somewhere to play with?
https://next-gpt.github.io/ + https://arxiv.org/pdf/2309.05519.pdf
TLDR: Vicuna (fine-tuned Llama) extended with basic image, audio and video understanding and generation. Impressive demo videos, the Gradio-based demo is currently broken though. Anybody up to deploy this somewhere to play with?
https://next-gpt.github.io/ + https://arxiv.org/pdf/2309.05519.pdf
❤1
https://blog.isosceles.com/the-webp-0day/ TLDR fixed in upstream, will be with us w/libwebp for some time
Isosceles Blog
The WebP 0day
Early last week, Google released a new stable update for Chrome. The update included a single security fix that was reported by Apple's Security Engineering and Architecture (SEAR) team. The issue, CVE-2023-4863, was a heap buffer overflow in the WebP image…
🔥1
A stunning example of how efficient market and arbitrage trades enabled by mobile phones help to avoid waste, stabilize prices and increase welfare (Fish markets in Kerala, India, 1997-2001)
https://www.jstor.org/stable/25098864
https://www.jstor.org/stable/25098864
Mistral-7B is seriously good. And fast. And works with llama.cpp already!
(except for the sliding window attention part, so limited to 4-8K or so)
https://twitter.com/arthurmensch/status/1707041835091165580
(except for the sliding window attention part, so limited to 4-8K or so)
https://twitter.com/arthurmensch/status/1707041835091165580
X (formerly Twitter)
Arthur Mensch on X
At @MistralAI we're releasing our very first model, the best 7B in town (outperforming Llama 13B on all metrics, and good at code), Apache 2.0.
We believe in open models and we'll push them to the frontier https://t.co/HjT5Xrvpxs
Very proud of the team…
We believe in open models and we'll push them to the frontier https://t.co/HjT5Xrvpxs
Very proud of the team…
Oh. https://ml-jku.github.io/hopfield-layers/
via @kuu_channel
It’s beautiful. I wonder where’s the catch. I.e. why Llama et al don’t use Hopfield layers
via @kuu_channel
It’s beautiful. I wonder where’s the catch. I.e. why Llama et al don’t use Hopfield layers
hopfield-layers
Hopfield Networks is All You Need
Blog post
Reviewer #2, step back!
82% of authors found the GPT-4 feedback more useful than feedback from (at least some) human reviewers
https://arxiv.org/abs/2310.01783
82% of authors found the GPT-4 feedback more useful than feedback from (at least some) human reviewers
https://arxiv.org/abs/2310.01783
nvidia might be the king of the hill right now, but the future of AI is reconfigurable analog-like electronics (~100x more energy efficient already, which will take Moore’s law at least another 10 years for silicon)
Caveat: no backprop :P forward-forward and other algorithms exist though
https://www.nature.com/articles/s41928-023-01042-7
Caveat: no backprop :P forward-forward and other algorithms exist though
https://www.nature.com/articles/s41928-023-01042-7
Nature
Reconfigurable mixed-kernel heterojunction transistors for personalized support vector machine classification
Nature Electronics - Dual-gated van der Waals heterojunction transistors can provide Gaussian, sigmoid and mixed-kernel functions for use in low-power machine learning classification operations.
🔥2
TLDR: Placebo effects are dose-dependent.
We need structured belief engineering then
https://www.biorxiv.org/content/10.1101/2022.07.15.500226v2
We need structured belief engineering then
https://www.biorxiv.org/content/10.1101/2022.07.15.500226v2
bioRxiv
A thalamic circuit represents dose-like responses induced by nicotine-related beliefs in human smokers
Could non-pharmacological constructs, such as beliefs, impact brain activities in a dose-dependent manner as drugs do? While beliefs shape many aspects of our behavior and wellbeing, the precise mapping between subjective beliefs and neural substrates remains…
💊1
https://www.nature.com/articles/s41586-023-06668-3
interesting. code/models: https://github.com/brendenlake/MLC
interesting. code/models: https://github.com/brendenlake/MLC
Nature
Human-like systematic generalization through a meta-learning neural network
Nature - The meta-learning for compositionality approach achieves the systematicity and flexibility needed for human-like generalization.
generalization, continued:
> We argue that Transformers will generalize to harder instances on algorithmic tasks iff the algorithm can be written in the RASP-L programming language (Weiss et al). By design, each line of RASP-L code can be compiled into weights of 1 Transformer layer.
https://arxiv.org/abs/2310.16028
> We argue that Transformers will generalize to harder instances on algorithmic tasks iff the algorithm can be written in the RASP-L programming language (Weiss et al). By design, each line of RASP-L code can be compiled into weights of 1 Transformer layer.
https://arxiv.org/abs/2310.16028
now these are really hallucinations lol
> ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Real Image
https://kylesargent.github.io/zeronvs/
> ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Real Image
https://kylesargent.github.io/zeronvs/
👍1
LLMs are far from being the first technology met with fear, uncertainty and doubt
https://pessimistsarchive.org
https://pessimistsarchive.org
pessimistsarchive.org
Pessimists Archive
Archive of historical technological pessimism
big if works well: first paper that claims relatively efficient search on encrypted data without revealing what’s being searched
https://eprint.iacr.org/2022/1703
https://eprint.iacr.org/2022/1703
❤1🔥1🤩1
related: the value of privacy (2006) in plain English
https://www.schneier.com/blog/archives/2006/05/the_value_of_pr.html
https://www.schneier.com/blog/archives/2006/05/the_value_of_pr.html
R-Tuning: Teaching Large Language Models to Refuse Unknown Questions
TLDR: LLMs "hallucinate" because the training datasets never included the "I don't know" answer 🤷
https://arxiv.org/pdf/2311.09677.pdf
TLDR: LLMs "hallucinate" because the training datasets never included the "I don't know" answer 🤷
https://arxiv.org/pdf/2311.09677.pdf
A chance of training on huge piles of retail GPUs and not just on the superclusters?
https://huggingface.co/papers/2311.08105
https://huggingface.co/papers/2311.08105
huggingface.co
Paper page - DiLoCo: Distributed Low-Communication Training of Language Models
Join the discussion on this paper page
❤2
System 2 Attention (S2A).
- Soft attention in Transformers is susceptible to irrelevant/biased info
- S2A uses LLM reasoning to generate what to attend to
Improves factuality & objectivity, decreases sycophancy.
https://arxiv.org/abs/2311.11829
- Soft attention in Transformers is susceptible to irrelevant/biased info
- S2A uses LLM reasoning to generate what to attend to
Improves factuality & objectivity, decreases sycophancy.
https://arxiv.org/abs/2311.11829