Singular Thinker
Photo
#Meme s came back again. I really liked the interesting abstract of paper and Schmidhuber one.
@SingularThinker
@SingularThinker
❤4
What is a Hilbert space, really?
It is just a fancy name for a Cauchy complete inner product vector space.
Wait what? I'll explain.
First, why do we need inner product vector space?
Inner product vector spaces have physical meaning and usefulness more than any arbitrarily vector space.
Now lets consider combining many stones together to understand why Cauchy's completeness is necessary. You expect this to result in a stone object, don't you? I should say that it depends on how many stones you combine. If there are infinitely many ones, it may change the situation. For a more visual interpretation, I suggest watching this video (the initial minutes can be skipped).
In the end, why are Hilbert spaces useful?
We can construct orthogonal sets via Hilbert spaces.
🔗 Reference
#video #math
P.S: Also in some references said that a Hilbert space should be separable with respect to the norm defined by the inner product.
@SingularThinker
It is just a fancy name for a Cauchy complete inner product vector space.
Wait what? I'll explain.
First, why do we need inner product vector space?
Inner product vector spaces have physical meaning and usefulness more than any arbitrarily vector space.
Now lets consider combining many stones together to understand why Cauchy's completeness is necessary. You expect this to result in a stone object, don't you? I should say that it depends on how many stones you combine. If there are infinitely many ones, it may change the situation. For a more visual interpretation, I suggest watching this video (the initial minutes can be skipped).
In the end, why are Hilbert spaces useful?
We can construct orthogonal sets via Hilbert spaces.
🔗 Reference
#video #math
P.S: Also in some references said that a Hilbert space should be separable with respect to the norm defined by the inner product.
@SingularThinker
YouTube
Ch 3: Why do we need a Hilbert Space? | Maths of Quantum Mechanics
Hello!
This is the third chapter in my series "Maths of Quantum Mechanics." In this episode, we'll find that infinity brings up a few issues within our quantum framework, and we'll see how a Hilbert space fixes them.
If you have any questions or comments…
This is the third chapter in my series "Maths of Quantum Mechanics." In this episode, we'll find that infinity brings up a few issues within our quantum framework, and we'll see how a Hilbert space fixes them.
If you have any questions or comments…
👍3
Singular Thinker
What is a Hilbert space, really? It is just a fancy name for a Cauchy complete inner product vector space. Wait what? I'll explain. First, why do we need inner product vector space? Inner product vector spaces have physical meaning and usefulness more…
Telegraph
How to make a Hilbert space from scratch? | Part 1/2
Let's create our mathematical world from the beginning while trying to make things consistent (this way of thinking is not based on the history of math, but it is a good practice to think more abstractly). Now imagine any arbitrary set X. This set includes…
🔥4
من هفته ی پیش یخورده وقت گیر اوردم نشستم این چند تا پیپر دربارهی این مدل جدید، Mamba، به خصوص تو فیلد Vision رو خوندم. دربارش یه note کوتاه فرستادم برای چندتا از استادام و چند تا از دوستام. عرفان پیشنهاد داد اینجا هم بذارمش شاید کس دیگه ای هم نکته ای داشت در این باره :
This weekend's power outage gave me some time to read some of these new trendy papers, the most interesting one was about a new NLP architecture called Mamba [1]. This paper came out a few months ago and took the AI community by storm and soon got noscripts such as “The New Transformer Killer”. The architecture is based on state space sequence models (SSMs) which to me is just a linear RNN (Recurrent Neural Network). So the idea is you apply an RNN network, but you take out any activation, so the whole process can be done by something similar to a convolution, therefore it is much faster, and also no more backpropagation through time (BPTT). It simply follows this equation:
[The equation is in comments section]
In these equations, A, B, and C are usually just trainable parameters, however, in this Mamba paper [1], inspired by transformers, they generate B and C with some dense layers based on the input, so it is input-dependent now. They showed it achieves better results than transformers and with some hardware manipulation “5 × higher throughput than transformers” on language tasks. Also, since there is no more T^2 attention matrix hanging around there, they could easily scale it up to million-length sequences.
Soon after, not surprisingly, our Chinese comrades took over to release several Vision Mamba models. One main problem is that As a result of its recurrent nature, the architecture has an impression of "causality" and "order", which are meaningless in a vision task. To address this, this paper [3] uses a bi-directional Mamba SSM on picture tokens. Similarly, this one [4] applies four simultaneous SSMs (left top to bottom right, right top to bottom left, …). They both mainly keep the vision transformers architecture and only replace the attention. They both show superior performance to transformers with faster computations on ImageNet-1K.
The idea of applying a bi-directional RNN to get global (from token to token) weights appears rather strange to me (and also stupid tbh). Yet, it simply shows the immortality of the costly attention mechanism. Maybe you can get some nice insight by looking into papers.
PS: since SSM is a linear operation, I am very much tempted to try it in an NNMF layer and see what happens. What do you think?
PS2: The very-famous-by-now Mamba paper is apparently getting rejected on ICLR OpenReview with an 8-8-5 and a 3! (https://openreview.net/forum?id=AL1fq05o7H). There was (like always) that one reviewer saying “Waah not enough experiments, what if you scale it more? Where are the Wikitext-103 dataset results?”. They said some of the experiments this reviewer is suggesting would cost up to $50,000 per run. People’s comments are just hilarious :)
[1] Mamba: Linear-Time Sequence Modeling with Selective State Spaces: https://arxiv.org/abs/2312.00752
[2] Hungry Hungry Hippos: Towards Language Modeling with State Space Models: https://arxiv.org/abs/2212.14052
[3] Vision Mamba: Efficient Visual Representation Learning with Bidirectional
State Space Model: https://arxiv.org/abs/2401.09417
[4] VMamba: Visual State Space Model: https://arxiv.org/abs/2401.10166
This weekend's power outage gave me some time to read some of these new trendy papers, the most interesting one was about a new NLP architecture called Mamba [1]. This paper came out a few months ago and took the AI community by storm and soon got noscripts such as “The New Transformer Killer”. The architecture is based on state space sequence models (SSMs) which to me is just a linear RNN (Recurrent Neural Network). So the idea is you apply an RNN network, but you take out any activation, so the whole process can be done by something similar to a convolution, therefore it is much faster, and also no more backpropagation through time (BPTT). It simply follows this equation:
[The equation is in comments section]
In these equations, A, B, and C are usually just trainable parameters, however, in this Mamba paper [1], inspired by transformers, they generate B and C with some dense layers based on the input, so it is input-dependent now. They showed it achieves better results than transformers and with some hardware manipulation “5 × higher throughput than transformers” on language tasks. Also, since there is no more T^2 attention matrix hanging around there, they could easily scale it up to million-length sequences.
Soon after, not surprisingly, our Chinese comrades took over to release several Vision Mamba models. One main problem is that As a result of its recurrent nature, the architecture has an impression of "causality" and "order", which are meaningless in a vision task. To address this, this paper [3] uses a bi-directional Mamba SSM on picture tokens. Similarly, this one [4] applies four simultaneous SSMs (left top to bottom right, right top to bottom left, …). They both mainly keep the vision transformers architecture and only replace the attention. They both show superior performance to transformers with faster computations on ImageNet-1K.
The idea of applying a bi-directional RNN to get global (from token to token) weights appears rather strange to me (and also stupid tbh). Yet, it simply shows the immortality of the costly attention mechanism. Maybe you can get some nice insight by looking into papers.
PS: since SSM is a linear operation, I am very much tempted to try it in an NNMF layer and see what happens. What do you think?
PS2: The very-famous-by-now Mamba paper is apparently getting rejected on ICLR OpenReview with an 8-8-5 and a 3! (https://openreview.net/forum?id=AL1fq05o7H). There was (like always) that one reviewer saying “Waah not enough experiments, what if you scale it more? Where are the Wikitext-103 dataset results?”. They said some of the experiments this reviewer is suggesting would cost up to $50,000 per run. People’s comments are just hilarious :)
[1] Mamba: Linear-Time Sequence Modeling with Selective State Spaces: https://arxiv.org/abs/2312.00752
[2] Hungry Hungry Hippos: Towards Language Modeling with State Space Models: https://arxiv.org/abs/2212.14052
[3] Vision Mamba: Efficient Visual Representation Learning with Bidirectional
State Space Model: https://arxiv.org/abs/2401.09417
[4] VMamba: Visual State Space Model: https://arxiv.org/abs/2401.10166
openreview.net
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many...
👍7🔥1
Singular Thinker
How to make a Hilbert space from scratch? | Part 1/2 🔗 Link #note #math @SingularThinker
Telegraph
How to make a Hilbert space from scratch? | Part 2/2
So far, we have defined a metric space as a set that consists of at least a map that takes two elements from the set, gives back a non-negative element, and satisfies positive definiteness, symmetry, and triangle inequality properties. And we talked about…
Singular Thinker
What is a Hilbert space, really? It is just a fancy name for a Cauchy complete inner product vector space. Wait what? I'll explain. First, why do we need inner product vector space? Inner product vector spaces have physical meaning and usefulness more…
What is Bra in Quantum Mechanics?
As you might notice in quantum mechanics, they use funny notations <x|, |y>.
The Bra-ket notation was first used by Paul Dirac in his 1939 publication A New Notation for Quantum Mechanics. But, why?
Let's not blame the physicists for being showy. There is an important theorem, the Riesz representation theorem, in functional analysis that might describe using of this notation. Let (X,<.,.>) be a Hilbert Space. Then for each continuous linear map L: X→F there is exactly one vector in X, x_L, such that L(x) = <x_L,x> and the norm of operator L, L = x_L .
Thus, in Hilbert space, each vector can be considered a continuous linear map. Thus, when we use Bra, <c|, we mean a linear map from the Hilbert space X to F such that the final result corresponds to an inner product of c and that vector.
#math
@SingularThinker
As you might notice in quantum mechanics, they use funny notations <x|, |y>.
The Bra-ket notation was first used by Paul Dirac in his 1939 publication A New Notation for Quantum Mechanics. But, why?
Let's not blame the physicists for being showy. There is an important theorem, the Riesz representation theorem, in functional analysis that might describe using of this notation. Let (X,<.,.>) be a Hilbert Space. Then for each continuous linear map L: X→F there is exactly one vector in X, x_L, such that L(x) = <x_L,x> and the norm of operator L,
Thus, in Hilbert space, each vector can be considered a continuous linear map. Thus, when we use Bra, <c|, we mean a linear map from the Hilbert space X to F such that the final result corresponds to an inner product of c and that vector.
#math
@SingularThinker
🔥2
Today I encountered this amazing website, which is a medium for presenting research in a new way. It is actually similar to the Medium website but just for the machine learning articles and the blog posts are peer-reviewed. When I checked the steering committee I found out why this is so amazing. Yoshua Bengio, Andrej Karpathy, Ian Goodfellow, … are members of the committee. Unfortunately, they didn’t continue the project but it still can help a lot.
So for instance I found this introduction about Gaussian Process very interesting. Check it out.
@SingularThinker
So for instance I found this introduction about Gaussian Process very interesting. Check it out.
@SingularThinker
Distill
A Visual Exploration of Gaussian Processes
How to turn a collection of small building blocks into a versatile tool for solving regression problems.
🔥9