Singular Thinker – Telegram
Singular Thinker
948 subscribers
405 photos
17 videos
5 files
250 links
We talk about things that find them interesting.
توی این کانال از مطالبی که برامون جذاب باشه حرف
میزنیم، یا مطالب جذاب بقیه رو نشر می‌دیم.
Contact/راه ارتباطی:
@Singular_Thinker
Download Telegram
"ریاضیات چیزی نیست جز بیان چیزهای یکسان به روش‌های مختلف."

این جمله که نقل قولی از یک ریاضیدان هست رو از بخش آخر بلاگ امیر اصغری که در مورد جادوی جدول ضرب بود، آوردم. هم جمله جالبیه هم بلاگش، اگه دوست داشتید ببینیدش.

https://math.omidedu.org/magic-of-multiplication-table/

#math
@SingularThinker
👍6
Singular Thinker
Photo
#Meme‌ s came back again. I really liked the interesting abstract of paper and Schmidhuber one.

@SingularThinker
4
What is a Hilbert space, really?

It is just a fancy name for a Cauchy complete inner product vector space. 

Wait what? I'll explain.

First, why do we need inner product vector space? 

Inner product vector spaces have physical meaning and usefulness more than any arbitrarily vector space. 

Now lets consider combining many stones together to understand why Cauchy's completeness is necessary. You expect this to result in a stone object, don't you? I should say that it depends on how many stones you combine. If there are infinitely many ones, it may change the situation. For a more visual interpretation, I suggest watching this video (the initial minutes can be skipped).


In the end, why are Hilbert spaces useful?

We can construct orthogonal sets via Hilbert spaces.

🔗 Reference
#video #math

P.S: Also in some references said that a Hilbert space should be separable with respect to the norm defined by the inner product.
@SingularThinker
👍3
من هفته ی پیش یخورده وقت گیر اوردم نشستم این چند تا پیپر درباره‌ی این مدل جدید، Mamba، به خصوص تو فیلد Vision رو خوندم. دربارش یه note کوتاه فرستادم برای چندتا از استادام و چند تا از دوستام. عرفان پیشنهاد داد اینجا هم بذارمش شاید کس دیگه ای هم نکته ای داشت در این باره :

This weekend's power outage gave me some time to read some of these new trendy papers, the most interesting one was about a new NLP architecture called Mamba [1]. This paper came out a few months ago and took the AI community by storm and soon got noscripts such as “The New Transformer Killer”. The architecture is based on state space sequence models (SSMs) which to me is just a linear RNN (Recurrent Neural Network). So the idea is you apply an RNN network, but you take out any activation, so the whole process can be done by something similar to a convolution, therefore it is much faster, and also no more backpropagation through time (BPTT). It simply follows this equation:

[The equation is in comments section]

In these equations, A, B, and C are usually just trainable parameters, however, in this Mamba paper [1], inspired by transformers, they generate B and C with some dense layers based on the input, so it is input-dependent now. They showed it achieves better results than transformers and with some hardware manipulation “5 × higher throughput than transformers” on language tasks. Also, since there is no more T^2 attention matrix hanging around there, they could easily scale it up to million-length sequences.

Soon after, not surprisingly, our Chinese comrades took over to release several Vision Mamba models. One main problem is that As a result of its recurrent nature, the architecture has an impression of "causality" and "order", which are meaningless in a vision task. To address this, this paper [3] uses a bi-directional Mamba SSM on picture tokens. Similarly, this one [4] applies four simultaneous SSMs (left top to bottom right, right top to bottom left, …). They both mainly keep the vision transformers architecture and only replace the attention. They both show superior performance to transformers with faster computations on ImageNet-1K.

The idea of applying a bi-directional RNN to get global (from token to token) weights appears rather strange to me (and also stupid tbh). Yet, it simply shows the immortality of the costly attention mechanism. Maybe you can get some nice insight by looking into papers.

PS: since SSM is a linear operation, I am very much tempted to try it in an NNMF layer and see what happens. What do you think?

PS2: The very-famous-by-now Mamba paper is apparently getting rejected on ICLR OpenReview with an 8-8-5 and a 3! (https://openreview.net/forum?id=AL1fq05o7H). There was (like always) that one reviewer saying “Waah not enough experiments, what if you scale it more? Where are the Wikitext-103 dataset results?”. They said some of the experiments this reviewer is suggesting would cost up to $50,000 per run. People’s comments are just hilarious :)


[1] Mamba: Linear-Time Sequence Modeling with Selective State Spaces: https://arxiv.org/abs/2312.00752

[2] Hungry Hungry Hippos: Towards Language Modeling with State Space Models: https://arxiv.org/abs/2212.14052

[3] Vision Mamba: Efficient Visual Representation Learning with Bidirectional

State Space Model: https://arxiv.org/abs/2401.09417

[4] VMamba: Visual State Space Model: https://arxiv.org/abs/2401.10166
👍7🔥1