interesting. the fact that it’s a hybrid embedding/prediction model sounds very… logical.
so you can chug along without attention just fine it seems
https://twitter.com/BlinkDL_AI/status/1638555109373378560?s=20
so you can chug along without attention just fine it seems
https://twitter.com/BlinkDL_AI/status/1638555109373378560?s=20
https://arxiv.org/pdf/2303.11366.pdf
Reflexion: an autonomous agent with dynamic memory and self-reflection
Reflexion: an autonomous agent with dynamic memory and self-reflection
Map of Contemporaries:
The history of the world in famous people’s lifespans. Did you realize that Alessandro Volta was younger than Napoleon? See which famous people shared their time on Earth.
https://ybogdanov.github.io/history-timeline/
The history of the world in famous people’s lifespans. Did you realize that Alessandro Volta was younger than Napoleon? See which famous people shared their time on Earth.
https://ybogdanov.github.io/history-timeline/
Map of Contemporaries
The history of the world in famous people’s lifespans.
👍1
https://twitter.com/_akhaliq/status/1645257919997394945
Dwarf Fortress got a serious competitor!
Dwarf Fortress got a serious competitor!
https://arxiv.org/abs/2302.10866
https://github.com/HazyResearch/safari
Convolutional LMM, hmmm.
> reaching Transformer quality with a 20% reduction in training compute required at sequence length 2K. Hyena operators are twice as fast as highly optimized attention at sequence length 8K, and 100x faster at sequence length 64K.
https://github.com/HazyResearch/safari
Convolutional LMM, hmmm.
> reaching Transformer quality with a 20% reduction in training compute required at sequence length 2K. Hyena operators are twice as fast as highly optimized attention at sequence length 8K, and 100x faster at sequence length 64K.
GitHub
GitHub - HazyResearch/safari: Convolutions for Sequence Modeling
Convolutions for Sequence Modeling. Contribute to HazyResearch/safari development by creating an account on GitHub.
❤1👍1🤔1
Unlimiformer: Long-Range Transformers with Unlimited Length Input
Unlimiformer improves pretrained models such as BART and Longformer by extending them to unlimited inputs without additional learned weights and without modifying their code (via kNN-search)
https://arxiv.org/abs/2305.01625
Unlimiformer improves pretrained models such as BART and Longformer by extending them to unlimited inputs without additional learned weights and without modifying their code (via kNN-search)
https://arxiv.org/abs/2305.01625
https://arxiv.org/abs/2305.07759
TinyStories: 3-30M (not G) parameter model with coherent English from a curated dataset.
Don't expect it to code but curious if this is usable as a LoRA or similar baseline - also need to look closer at their tokenizer setup, must be way different from GPT
TinyStories: 3-30M (not G) parameter model with coherent English from a curated dataset.
Don't expect it to code but curious if this is usable as a LoRA or similar baseline - also need to look closer at their tokenizer setup, must be way different from GPT
phi-1: with datasets of higher quality, model with 1.3B parameters and 7B tokens can be quite competitive to gpt4 and other 100x larger models on coding tasks
https://arxiv.org/abs/2306.11644
https://arxiv.org/abs/2306.11644
Welcome to the dark side of the cyberpunk.
Once-theoretical timing attacks are now a reality.
(TLDR: Cops still can't decrypt the messages, but they can track who's chatting with whom comparing the small spikes of traffic as the message gets delivered)
https://wz.ax/timing-is-real
Once-theoretical timing attacks are now a reality.
(TLDR: Cops still can't decrypt the messages, but they can track who's chatting with whom comparing the small spikes of traffic as the message gets delivered)
https://wz.ax/timing-is-real
NY Times
Cracking Down on Dissent, Russia Seeds a Surveillance Supply Chain
Russia is incubating a cottage industry of new digital surveillance tools to suppress domestic opposition to the war in Ukraine. The tech may also be sold overseas.
🤯2
why do they call this an ‘attack’? this is the way to set the model free!
(TLDR: DAN prompt generator)
https://arxiv.org/abs/2307.15043
https://llm-attacks.org
(TLDR: DAN prompt generator)
https://arxiv.org/abs/2307.15043
https://llm-attacks.org
Masked Trajectory Models for Prediction, Representation, and Control
TLDR: Transformers using state space and action embeddings as tokens are better at RL than, um, RL algorithms. Oops.
https://arxiv.org/abs/2305.02968
TLDR: Transformers using state space and action embeddings as tokens are better at RL than, um, RL algorithms. Oops.
https://arxiv.org/abs/2305.02968
Pushing the Limits of Machine Design: Automated CPU Design with AI
By efficiently exploring a search space of unprecedented size 10^10^540 (note: reducible to 10^6), and thus pushing the limits of machine design, our approach generates an industrial-scale RISC-V CPU within only 5 hours.
> which sheds some light on building a self-evolving machine to beat theCPU designed by humanity eventually.
> The automatically designed CPU was sent to the manufacturer in December 2021
https://arxiv.org/abs/2306.12456
By efficiently exploring a search space of unprecedented size 10^10^540 (note: reducible to 10^6), and thus pushing the limits of machine design, our approach generates an industrial-scale RISC-V CPU within only 5 hours.
> which sheds some light on building a self-evolving machine to beat the
> The automatically designed CPU was sent to the manufacturer in December 2021
https://arxiv.org/abs/2306.12456
🤩1
Teaching Arithmetic to Small Transformers
an interesting detailed write-up
https://arxiv.org/abs/2307.03381
an interesting detailed write-up
https://arxiv.org/abs/2307.03381
Transformers ain't everything
tree-based methods still outperform DL on tabular data and medium (~10k samples) datasets
https://hal.science/hal-03723551
tree-based methods still outperform DL on tabular data and medium (~10k samples) datasets
https://hal.science/hal-03723551
hal.science
Why do tree-based models still outperform deep learning on typical tabular data?
While deep learning has enabled tremendous progress on text and image datasets, its superiority on tabular data is not clear. We contribute extensive benchmarks of standard and novel deep learning methods as well as tree-based models such as XGBoost and Random…
https://arxiv.org/abs/2308.16898
Transformers as Support Vector Machines
TLDR ~ transformer layers are SVMs w/gradient-trainable global convergence, when a) overparameterized b) have nonlinear heads
my remark: this explains
1) why huge models are important (so the gradient is high-dimensional enough to be monotonic)
2) why attention (aka connections, aka indirections) is trainable at all;
and says nothing about why they might generalize the dataset
Transformers as Support Vector Machines
TLDR ~ transformer layers are SVMs w/gradient-trainable global convergence, when a) overparameterized b) have nonlinear heads
my remark: this explains
1) why huge models are important (so the gradient is high-dimensional enough to be monotonic)
2) why attention (aka connections, aka indirections) is trainable at all;
and says nothing about why they might generalize the dataset
lucid and useful example of the beginner's mindset https://www.oneusefulthing.org/p/embracing-weirdness-what-it-means
www.oneusefulthing.org
Embracing weirdness: What it means to use AI as a (writing) tool
AI is strange. We need to learn to use it.