Linkstream – Telegram
Linkstream
173 subscribers
32 photos
3 videos
2 files
899 links
Various links I find interesting. Mostly hardcore tech :) // by @oleksandr_now. See @notatky for the personal stuff
Download Telegram
https://arxiv.org/abs/2305.07759
TinyStories: 3-30M (not G) parameter model with coherent English from a curated dataset.

Don't expect it to code but curious if this is usable as a LoRA or similar baseline - also need to look closer at their tokenizer setup, must be way different from GPT
phi-1: with datasets of higher quality, model with 1.3B parameters and 7B tokens can be quite competitive to gpt4 and other 100x larger models on coding tasks
https://arxiv.org/abs/2306.11644
Welcome to the dark side of the cyberpunk.
Once-theoretical timing attacks are now a reality.
(TLDR: Cops still can't decrypt the messages, but they can track who's chatting with whom comparing the small spikes of traffic as the message gets delivered)
https://wz.ax/timing-is-real
🤯2
why do they call this an ‘attack’? this is the way to set the model free!

(TLDR: DAN prompt generator)
https://arxiv.org/abs/2307.15043
https://llm-attacks.org
Masked Trajectory Models for Prediction, Representation, and Control

TLDR: Transformers using state space and action embeddings as tokens are better at RL than, um, RL algorithms. Oops.

https://arxiv.org/abs/2305.02968
Pushing the Limits of Machine Design: Automated CPU Design with AI
By efficiently exploring a search space of unprecedented size 10^10^540 (note: reducible to 10^6), and thus pushing the limits of machine design, our approach generates an industrial-scale RISC-V CPU within only 5 hours.

> which sheds some light on building a self-evolving machine to beat the CPU designed by humanity eventually.
> The automatically designed CPU was sent to the manufacturer in December 2021
https://arxiv.org/abs/2306.12456
🤩1
Teaching Arithmetic to Small Transformers
an interesting detailed write-up

https://arxiv.org/abs/2307.03381
https://arxiv.org/abs/2308.16898
Transformers as Support Vector Machines
TLDR ~ transformer layers are SVMs w/gradient-trainable global convergence, when a) overparameterized b) have nonlinear heads

my remark: this explains
1) why huge models are important (so the gradient is high-dimensional enough to be monotonic)
2) why attention (aka connections, aka indirections) is trainable at all;
and says nothing about why they might generalize the dataset
ASMesh: Anonymous and Secure Messaging in Mesh Networks Using Stronger, Anonymous Double Ratchet
https://eprint.iacr.org/2023/1053

interesting; though unclear re:traffic/metadata analysis and power consumption as the protocol is rather chatty
NExT-GPT: Any-to-Any Multimodal LLM
TLDR: Vicuna (fine-tuned Llama) extended with basic image, audio and video understanding and generation. Impressive demo videos, the Gradio-based demo is currently broken though. Anybody up to deploy this somewhere to play with?
https://next-gpt.github.io/ + https://arxiv.org/pdf/2309.05519.pdf
1
A stunning example of how efficient market and arbitrage trades enabled by mobile phones help to avoid waste, stabilize prices and increase welfare (Fish markets in Kerala, India, 1997-2001)
https://www.jstor.org/stable/25098864
Oh. https://ml-jku.github.io/hopfield-layers/
via @kuu_channel

It’s beautiful. I wonder where’s the catch. I.e. why Llama et al don’t use Hopfield layers
Reviewer #2, step back!
82% of authors found the GPT-4 feedback more useful than feedback from (at least some) human reviewers

https://arxiv.org/abs/2310.01783
nvidia might be the king of the hill right now, but the future of AI is reconfigurable analog-like electronics (~100x more energy efficient already, which will take Moore’s law at least another 10 years for silicon)

Caveat: no backprop :P forward-forward and other algorithms exist though

https://www.nature.com/articles/s41928-023-01042-7
🔥2