Linkstream – Telegram
Linkstream
173 subscribers
32 photos
3 videos
2 files
899 links
Various links I find interesting. Mostly hardcore tech :) // by @oleksandr_now. See @notatky for the personal stuff
Download Telegram
hmm, guys from Stanford claim that for instruction-tuning LLaMA 7B is enough. good! waiting for the fine-tuning code 🧐

https://crfm.stanford.edu/2023/03/13/alpaca.html
while the public is ranting, Bellard ships

ts_server is a web server proposing a REST API to large language models. They can be used for example for text completion, question answering, classification, chat, translation, image generation, ...
https://wz.ax/textsynth-server
interesting. the fact that it’s a hybrid embedding/prediction model sounds very… logical.

so you can chug along without attention just fine it seems

https://twitter.com/BlinkDL_AI/status/1638555109373378560?s=20
https://arxiv.org/pdf/2303.11366.pdf

Reflexion: an autonomous agent with dynamic memory and self-reflection
Map of Contemporaries:
The history of the world in famous people’s lifespans. Did you realize that Alessandro Volta was younger than Napoleon? See which famous people shared their time on Earth.
https://ybogdanov.github.io/history-timeline/
👍1
https://twitter.com/_akhaliq/status/1645257919997394945
Dwarf Fortress got a serious competitor!
https://arxiv.org/abs/2302.10866
https://github.com/HazyResearch/safari
Convolutional LMM, hmmm.
> reaching Transformer quality with a 20% reduction in training compute required at sequence length 2K. Hyena operators are twice as fast as highly optimized attention at sequence length 8K, and 100x faster at sequence length 64K.
1👍1🤔1
Unlimiformer: Long-Range Transformers with Unlimited Length Input
Unlimiformer improves pretrained models such as BART and Longformer by extending them to unlimited inputs without additional learned weights and without modifying their code (via kNN-search)
https://arxiv.org/abs/2305.01625
Insect–Machine Interface Based Neurocybernetics
Spy bugs! 2009!
https://wz.ax/cybugs09
😁1
https://arxiv.org/abs/2305.07759
TinyStories: 3-30M (not G) parameter model with coherent English from a curated dataset.

Don't expect it to code but curious if this is usable as a LoRA or similar baseline - also need to look closer at their tokenizer setup, must be way different from GPT
phi-1: with datasets of higher quality, model with 1.3B parameters and 7B tokens can be quite competitive to gpt4 and other 100x larger models on coding tasks
https://arxiv.org/abs/2306.11644
Welcome to the dark side of the cyberpunk.
Once-theoretical timing attacks are now a reality.
(TLDR: Cops still can't decrypt the messages, but they can track who's chatting with whom comparing the small spikes of traffic as the message gets delivered)
https://wz.ax/timing-is-real
🤯2
why do they call this an ‘attack’? this is the way to set the model free!

(TLDR: DAN prompt generator)
https://arxiv.org/abs/2307.15043
https://llm-attacks.org
Masked Trajectory Models for Prediction, Representation, and Control

TLDR: Transformers using state space and action embeddings as tokens are better at RL than, um, RL algorithms. Oops.

https://arxiv.org/abs/2305.02968