hmm, guys from Stanford claim that for instruction-tuning LLaMA 7B is enough. good! waiting for the fine-tuning code 🧐
https://crfm.stanford.edu/2023/03/13/alpaca.html
https://crfm.stanford.edu/2023/03/13/alpaca.html
Forwarded from Dmytro S
GitHub
GitHub - cksystemsteaching/selfie: An educational software system of a tiny self-compiling C compiler, a tiny self-executing RISC…
An educational software system of a tiny self-compiling C compiler, a tiny self-executing RISC-V emulator, and a tiny self-hosting RISC-V hypervisor. - cksystemsteaching/selfie
🤩3
while the public is ranting, Bellard ships
ts_server is a web server proposing a REST API to large language models. They can be used for example for text completion, question answering, classification, chat, translation, image generation, ...
https://wz.ax/textsynth-server
ts_server is a web server proposing a REST API to large language models. They can be used for example for text completion, question answering, classification, chat, translation, image generation, ...
https://wz.ax/textsynth-server
interesting. the fact that it’s a hybrid embedding/prediction model sounds very… logical.
so you can chug along without attention just fine it seems
https://twitter.com/BlinkDL_AI/status/1638555109373378560?s=20
so you can chug along without attention just fine it seems
https://twitter.com/BlinkDL_AI/status/1638555109373378560?s=20
https://arxiv.org/pdf/2303.11366.pdf
Reflexion: an autonomous agent with dynamic memory and self-reflection
Reflexion: an autonomous agent with dynamic memory and self-reflection
Map of Contemporaries:
The history of the world in famous people’s lifespans. Did you realize that Alessandro Volta was younger than Napoleon? See which famous people shared their time on Earth.
https://ybogdanov.github.io/history-timeline/
The history of the world in famous people’s lifespans. Did you realize that Alessandro Volta was younger than Napoleon? See which famous people shared their time on Earth.
https://ybogdanov.github.io/history-timeline/
Map of Contemporaries
The history of the world in famous people’s lifespans.
👍1
https://twitter.com/_akhaliq/status/1645257919997394945
Dwarf Fortress got a serious competitor!
Dwarf Fortress got a serious competitor!
https://arxiv.org/abs/2302.10866
https://github.com/HazyResearch/safari
Convolutional LMM, hmmm.
> reaching Transformer quality with a 20% reduction in training compute required at sequence length 2K. Hyena operators are twice as fast as highly optimized attention at sequence length 8K, and 100x faster at sequence length 64K.
https://github.com/HazyResearch/safari
Convolutional LMM, hmmm.
> reaching Transformer quality with a 20% reduction in training compute required at sequence length 2K. Hyena operators are twice as fast as highly optimized attention at sequence length 8K, and 100x faster at sequence length 64K.
GitHub
GitHub - HazyResearch/safari: Convolutions for Sequence Modeling
Convolutions for Sequence Modeling. Contribute to HazyResearch/safari development by creating an account on GitHub.
❤1👍1🤔1
Unlimiformer: Long-Range Transformers with Unlimited Length Input
Unlimiformer improves pretrained models such as BART and Longformer by extending them to unlimited inputs without additional learned weights and without modifying their code (via kNN-search)
https://arxiv.org/abs/2305.01625
Unlimiformer improves pretrained models such as BART and Longformer by extending them to unlimited inputs without additional learned weights and without modifying their code (via kNN-search)
https://arxiv.org/abs/2305.01625
https://arxiv.org/abs/2305.07759
TinyStories: 3-30M (not G) parameter model with coherent English from a curated dataset.
Don't expect it to code but curious if this is usable as a LoRA or similar baseline - also need to look closer at their tokenizer setup, must be way different from GPT
TinyStories: 3-30M (not G) parameter model with coherent English from a curated dataset.
Don't expect it to code but curious if this is usable as a LoRA or similar baseline - also need to look closer at their tokenizer setup, must be way different from GPT
phi-1: with datasets of higher quality, model with 1.3B parameters and 7B tokens can be quite competitive to gpt4 and other 100x larger models on coding tasks
https://arxiv.org/abs/2306.11644
https://arxiv.org/abs/2306.11644
Welcome to the dark side of the cyberpunk.
Once-theoretical timing attacks are now a reality.
(TLDR: Cops still can't decrypt the messages, but they can track who's chatting with whom comparing the small spikes of traffic as the message gets delivered)
https://wz.ax/timing-is-real
Once-theoretical timing attacks are now a reality.
(TLDR: Cops still can't decrypt the messages, but they can track who's chatting with whom comparing the small spikes of traffic as the message gets delivered)
https://wz.ax/timing-is-real
NY Times
Cracking Down on Dissent, Russia Seeds a Surveillance Supply Chain
Russia is incubating a cottage industry of new digital surveillance tools to suppress domestic opposition to the war in Ukraine. The tech may also be sold overseas.
🤯2
why do they call this an ‘attack’? this is the way to set the model free!
(TLDR: DAN prompt generator)
https://arxiv.org/abs/2307.15043
https://llm-attacks.org
(TLDR: DAN prompt generator)
https://arxiv.org/abs/2307.15043
https://llm-attacks.org
Masked Trajectory Models for Prediction, Representation, and Control
TLDR: Transformers using state space and action embeddings as tokens are better at RL than, um, RL algorithms. Oops.
https://arxiv.org/abs/2305.02968
TLDR: Transformers using state space and action embeddings as tokens are better at RL than, um, RL algorithms. Oops.
https://arxiv.org/abs/2305.02968