📚 Welcome to our Telegram channel dedicated to Data Science Research Papers, created by Big Data Specialist 🧪🔬
🎉 Are you ready to embark on an exciting journey through the realm of cutting-edge research and advancements in the field of Data Science? 🌐💡
📊 In this channel, we aim to bring you a curated collection of the most fascinating and influential research papers in the realm of Data Science. From machine learning and artificial intelligence to data mining and predictive analytics, we cover a wide range of topics that define the future of data-driven insights. 📈🤖
💡 Whether you're a passionate data scientist, a researcher, or simply someone who wants to stay up-to-date with the latest trends and breakthroughs, you've come to the right place! 🚀✨
🎉 Are you ready to embark on an exciting journey through the realm of cutting-edge research and advancements in the field of Data Science? 🌐💡
📊 In this channel, we aim to bring you a curated collection of the most fascinating and influential research papers in the realm of Data Science. From machine learning and artificial intelligence to data mining and predictive analytics, we cover a wide range of topics that define the future of data-driven insights. 📈🤖
💡 Whether you're a passionate data scientist, a researcher, or simply someone who wants to stay up-to-date with the latest trends and breakthroughs, you've come to the right place! 🚀✨
Telegram
Programming, data science, ML - free courses by Big Data Specialist
Programming, Data and AI learning
Free courses, roadmaps and study materials.
Python, data science, ML, big data, AI, web, system design.
Join 👉 https://rebrand.ly/bigdatachannels
DMCA: @disclosure_bds
Contact: @mldatascientist
Free courses, roadmaps and study materials.
Python, data science, ML, big data, AI, web, system design.
Join 👉 https://rebrand.ly/bigdatachannels
DMCA: @disclosure_bds
Contact: @mldatascientist
👍3❤2
Going Denser with Open-Vocabulary Part Segmentation
Publication date: 18 May 2023
Topic: Object detection
Paper: https://arxiv.org/pdf/2305.11173v1.pdf
GitHub: https://github.com/facebookresearch/vlpart
Denoscription:
Object detection has been expanded from a limited number of categories to open vocabulary. Moving forward, a complete intelligent vision system requires understanding more fine-grained object denoscriptions, object parts. In this work, we propose a detector with the ability to predict both open-vocabulary objects and their part segmentation. This ability comes from two designs:
🔹 We train the detector on the joint of part-level, object-level and image-level data.
🔹 We parse the novel object into its parts by its dense semantic correspondence with the base object.
Publication date: 18 May 2023
Topic: Object detection
Paper: https://arxiv.org/pdf/2305.11173v1.pdf
GitHub: https://github.com/facebookresearch/vlpart
Denoscription:
Object detection has been expanded from a limited number of categories to open vocabulary. Moving forward, a complete intelligent vision system requires understanding more fine-grained object denoscriptions, object parts. In this work, we propose a detector with the ability to predict both open-vocabulary objects and their part segmentation. This ability comes from two designs:
🔹 We train the detector on the joint of part-level, object-level and image-level data.
🔹 We parse the novel object into its parts by its dense semantic correspondence with the base object.
🔥2👍1
A Simple Framework for Contrastive Learning of Visual Representations
Publication date: 01 Jul 2020
Topic: Image Classification
Paper: https://arxiv.org/pdf/2002.05709v3.pdf
GitHub: https://github.com/open-mmlab/mmselfsup
Denoscription:
A simple framework for contrastive learning of visual representations.
(1) composition of data augmentations plays a critical role in defining effective predictive tasks,
(2) introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations,
(3) contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. By combining these findings, we are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet.
Publication date: 01 Jul 2020
Topic: Image Classification
Paper: https://arxiv.org/pdf/2002.05709v3.pdf
GitHub: https://github.com/open-mmlab/mmselfsup
Denoscription:
A simple framework for contrastive learning of visual representations.
(1) composition of data augmentations plays a critical role in defining effective predictive tasks,
(2) introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations,
(3) contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. By combining these findings, we are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet.
GLM: General Language Model Pretraining with Autoregressive Blank Infilling
Publication date: 17 Mar 2022
Topic: Classification
Paper: https://arxiv.org/pdf/2103.10360v2.pdf
GitHub: https://github.com/THUDM/GLM
Denoscription:
There have been various types of pretraining architectures including autoencoding models (e.g., BERT), autoregressive models (e.g., GPT), and encoder-decoder models (e.g., T5). However, none of the pretraining frameworks performs the best for all tasks of three main categories including natural language understanding (NLU), unconditional generation, and conditional generation. We propose a General Language Model (GLM) based on autoregressive blank infilling to address this challenge. GLM improves blank filling pretraining by adding 2D positional encodings and allowing an arbitrary order to predict spans, which results in performance gains over BERT and T5 on NLU tasks. Meanwhile, GLM can be pretrained for different types of tasks by varying the number and lengths of blanks.
Publication date: 17 Mar 2022
Topic: Classification
Paper: https://arxiv.org/pdf/2103.10360v2.pdf
GitHub: https://github.com/THUDM/GLM
Denoscription:
There have been various types of pretraining architectures including autoencoding models (e.g., BERT), autoregressive models (e.g., GPT), and encoder-decoder models (e.g., T5). However, none of the pretraining frameworks performs the best for all tasks of three main categories including natural language understanding (NLU), unconditional generation, and conditional generation. We propose a General Language Model (GLM) based on autoregressive blank infilling to address this challenge. GLM improves blank filling pretraining by adding 2D positional encodings and allowing an arbitrary order to predict spans, which results in performance gains over BERT and T5 on NLU tasks. Meanwhile, GLM can be pretrained for different types of tasks by varying the number and lengths of blanks.
👍1
Masked Autoencoders Are Scalable Vision Learners
Publication date: 19 Dec 2021
Topic: Semantic Segmentation
Paper: https://arxiv.org/pdf/2111.06377v2.pdf
GitHub: https://github.com/facebookresearch/mae
Denoscription:
This paper shows that masked autoencoders (MAE) are scalable self-supervised learners for computer vision. Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels. It is based on two core designs. First, we develop an asymmetric encoder-decoder architecture, with an encoder that operates only on the visible subset of patches (without mask tokens), along with a lightweight decoder that reconstructs the original image from the latent representation and mask tokens. Second, we find that masking a high proportion of the input image, e.g., 75%, yields a nontrivial and meaningful self-supervisory task. Coupling these two designs enables us to train large models efficiently and effectively: we accelerate training (by 3x or more) and improve accuracy.
Publication date: 19 Dec 2021
Topic: Semantic Segmentation
Paper: https://arxiv.org/pdf/2111.06377v2.pdf
GitHub: https://github.com/facebookresearch/mae
Denoscription:
This paper shows that masked autoencoders (MAE) are scalable self-supervised learners for computer vision. Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels. It is based on two core designs. First, we develop an asymmetric encoder-decoder architecture, with an encoder that operates only on the visible subset of patches (without mask tokens), along with a lightweight decoder that reconstructs the original image from the latent representation and mask tokens. Second, we find that masking a high proportion of the input image, e.g., 75%, yields a nontrivial and meaningful self-supervisory task. Coupling these two designs enables us to train large models efficiently and effectively: we accelerate training (by 3x or more) and improve accuracy.
👍2
AdaFace: Quality Adaptive Margin for Face Recognition
Publication date: 16 Feb 2023
Topic: Facial Recognition and Modelling
Paper: https://arxiv.org/pdf/2204.00964v2.pdf
GitHub: https://github.com/mk-minchul/adaface
Denoscription:
Recognition in low quality face datasets is challenging because facial attributes are obscured and degraded. Advances in margin-based loss functions have resulted in enhanced discriminability of faces in the embedding space. Further, previous studies have studied the effect of adaptive losses to assign more importance to misclassified (hard) examples. In this work, we introduce another aspect of adaptiveness in the loss function, namely the image quality. We argue that the strategy to emphasize misclassified samples should be adjusted according to their image quality. Specifically, the relative importance of easy or hard samples should be based on the sample's image quality.
Publication date: 16 Feb 2023
Topic: Facial Recognition and Modelling
Paper: https://arxiv.org/pdf/2204.00964v2.pdf
GitHub: https://github.com/mk-minchul/adaface
Denoscription:
Recognition in low quality face datasets is challenging because facial attributes are obscured and degraded. Advances in margin-based loss functions have resulted in enhanced discriminability of faces in the embedding space. Further, previous studies have studied the effect of adaptive losses to assign more importance to misclassified (hard) examples. In this work, we introduce another aspect of adaptiveness in the loss function, namely the image quality. We argue that the strategy to emphasize misclassified samples should be adjusted according to their image quality. Specifically, the relative importance of easy or hard samples should be based on the sample's image quality.
👍1
HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation
Publication date: 12 Mar 2020
Topic: Representation Learning
Paper: https://arxiv.org/pdf/1908.10357v3.pdf
GitHub: https://github.com/HRNet/Higher-HRNet-Human-Pose-Estimation
Denoscription:
Bottom-up human pose estimation methods have difficulties in predicting the correct pose for small persons due to challenges in scale variation. In this paper, we present HigherHRNet: a novel bottom-up human pose estimation method for learning scale-aware representations using high-resolution feature pyramids. Equipped with multi-resolution supervision for training and multi-resolution aggregation for inference, the proposed approach is able to solve the scale variation challenge in bottom-up multi-person pose estimation and localize keypoints more precisely, especially for small person. The feature pyramid in HigherHRNet consists of feature map outputs from HRNet and upsampled higher-resolution outputs through a transposed convolution.
Publication date: 12 Mar 2020
Topic: Representation Learning
Paper: https://arxiv.org/pdf/1908.10357v3.pdf
GitHub: https://github.com/HRNet/Higher-HRNet-Human-Pose-Estimation
Denoscription:
Bottom-up human pose estimation methods have difficulties in predicting the correct pose for small persons due to challenges in scale variation. In this paper, we present HigherHRNet: a novel bottom-up human pose estimation method for learning scale-aware representations using high-resolution feature pyramids. Equipped with multi-resolution supervision for training and multi-resolution aggregation for inference, the proposed approach is able to solve the scale variation challenge in bottom-up multi-person pose estimation and localize keypoints more precisely, especially for small person. The feature pyramid in HigherHRNet consists of feature map outputs from HRNet and upsampled higher-resolution outputs through a transposed convolution.
Deductive Verification of Chain-of-Thought Reasoning
Publication date: 6 Jun 2023
Topic: Computation and Language
Paper: https://arxiv.org/pdf/2306.03872v2.pdf
GitHub: https://github.com/lz1oceani/verify_cot
Denoscription:
Large Language Models (LLMs) significantly benefit from Chain-of-Thought (CoT) prompting in performing various reasoning tasks. While CoT allows models to produce more comprehensive reasoning processes, its emphasis on intermediate reasoning steps can inadvertently introduce hallucinations and accumulated errors, thereby limiting models' ability to solve complex reasoning tasks. Inspired by how humans engage in careful and meticulous deductive logical reasoning processes to solve tasks, we seek to enable language models to perform explicit and rigorous deductive reasoning, and also ensure the trustworthiness of their reasoning process through self-verification. However, directly verifying the validity of an entire deductive reasoning process is challenging, even with advanced models like ChatGPT. In light of this, we propose to decompose a reasoning verification process into a series of step-by-step subprocesses, each only receiving their necessary context and premises. To facilitate this procedure, we propose Natural Program, a natural language-based deductive reasoning format. Our approach enables models to generate precise reasoning steps where subsequent steps are more rigorously grounded on prior steps. It also empowers language models to carry out reasoning self-verification in a step-by-step manner. By integrating this verification process into each deductive reasoning stage, we significantly enhance the rigor and trustfulness of generated reasoning steps. Along this process, we also improve the answer correctness on complex reasoning tasks.
Publication date: 6 Jun 2023
Topic: Computation and Language
Paper: https://arxiv.org/pdf/2306.03872v2.pdf
GitHub: https://github.com/lz1oceani/verify_cot
Denoscription:
Large Language Models (LLMs) significantly benefit from Chain-of-Thought (CoT) prompting in performing various reasoning tasks. While CoT allows models to produce more comprehensive reasoning processes, its emphasis on intermediate reasoning steps can inadvertently introduce hallucinations and accumulated errors, thereby limiting models' ability to solve complex reasoning tasks. Inspired by how humans engage in careful and meticulous deductive logical reasoning processes to solve tasks, we seek to enable language models to perform explicit and rigorous deductive reasoning, and also ensure the trustworthiness of their reasoning process through self-verification. However, directly verifying the validity of an entire deductive reasoning process is challenging, even with advanced models like ChatGPT. In light of this, we propose to decompose a reasoning verification process into a series of step-by-step subprocesses, each only receiving their necessary context and premises. To facilitate this procedure, we propose Natural Program, a natural language-based deductive reasoning format. Our approach enables models to generate precise reasoning steps where subsequent steps are more rigorously grounded on prior steps. It also empowers language models to carry out reasoning self-verification in a step-by-step manner. By integrating this verification process into each deductive reasoning stage, we significantly enhance the rigor and trustfulness of generated reasoning steps. Along this process, we also improve the answer correctness on complex reasoning tasks.
GitHub
GitHub - lz1oceani/verify_cot
Contribute to lz1oceani/verify_cot development by creating an account on GitHub.
SAM3D: Segment Anything in 3D Scenes
Publication date: 6 Jun 2023
Topic: Computer Vision and Pattern Recognition
Paper: https://arxiv.org/pdf/2306.03908v1.pdf
GitHub: https://github.com/pointcept/segmentanything3d
Denoscription:
In this work, we propose SAM3D, a novel framework that is able to predict masks in 3D point clouds by leveraging the Segment-Anything Model (SAM) in RGB images without further training or finetuning. For a point cloud of a 3D scene with posed RGB images, we first predict segmentation masks of RGB images with SAM, and then project the 2D masks into the 3D points. Later, we merge the 3D masks iteratively with a bottom-up merging approach. At each step, we merge the point cloud masks of two adjacent frames with the bidirectional merging approach. In this way, the 3D masks predicted from different frames are gradually merged into the 3D masks of the whole 3D scene. Finally, we can optionally ensemble the result from our SAM3D with the over-segmentation results based on the geometric information of the 3D scenes. Our approach is experimented with ScanNet dataset and qualitative results demonstrate that our SAM3D achieves reasonable and fine-grained 3D segmentation results without any training or finetuning of SAM.
Publication date: 6 Jun 2023
Topic: Computer Vision and Pattern Recognition
Paper: https://arxiv.org/pdf/2306.03908v1.pdf
GitHub: https://github.com/pointcept/segmentanything3d
Denoscription:
In this work, we propose SAM3D, a novel framework that is able to predict masks in 3D point clouds by leveraging the Segment-Anything Model (SAM) in RGB images without further training or finetuning. For a point cloud of a 3D scene with posed RGB images, we first predict segmentation masks of RGB images with SAM, and then project the 2D masks into the 3D points. Later, we merge the 3D masks iteratively with a bottom-up merging approach. At each step, we merge the point cloud masks of two adjacent frames with the bidirectional merging approach. In this way, the 3D masks predicted from different frames are gradually merged into the 3D masks of the whole 3D scene. Finally, we can optionally ensemble the result from our SAM3D with the over-segmentation results based on the geometric information of the 3D scenes. Our approach is experimented with ScanNet dataset and qualitative results demonstrate that our SAM3D achieves reasonable and fine-grained 3D segmentation results without any training or finetuning of SAM.
GitHub
GitHub - Pointcept/SegmentAnything3D: [ICCV'23 Workshop] SAM3D: Segment Anything in 3D Scenes
[ICCV'23 Workshop] SAM3D: Segment Anything in 3D Scenes - Pointcept/SegmentAnything3D
This media is not supported in your browser
VIEW IN TELEGRAM
SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model
Publication date: 4 Jun 2023
Topic: Object Detection
Paper: https://arxiv.org/pdf/2306.03408v1.pdf
GitHub: https://github.com/dyzhang09/sam3d
Denoscription:
We explore adapting the zero-shot ability of SAM to 3D object detection in this paper. We propose a SAM-powered BEV processing pipeline to detect objects and get promising results on the large-scale Waymo open dataset. As an early attempt, our method takes a step toward 3D object detection with vision foundation models and presents the opportunity to unleash their power on 3D vision tasks.
Publication date: 4 Jun 2023
Topic: Object Detection
Paper: https://arxiv.org/pdf/2306.03408v1.pdf
GitHub: https://github.com/dyzhang09/sam3d
Denoscription:
We explore adapting the zero-shot ability of SAM to 3D object detection in this paper. We propose a SAM-powered BEV processing pipeline to detect objects and get promising results on the large-scale Waymo open dataset. As an early attempt, our method takes a step toward 3D object detection with vision foundation models and presents the opportunity to unleash their power on 3D vision tasks.
Agents Explore the Environment Beyond Good Actions to Improve Their Model for Better Decisions
Publication date: 6 Jun 2023
Topic: Artificial Intelligence
Paper: https://arxiv.org/pdf/2306.03408v1.pdf
GitHub: https://github.com/enpasos/muzero
Denoscription:
Improving the decision-making capabilities of agents is a key challenge on the road to artificial intelligence. To improve the planning skills needed to make good decisions, MuZero's agent combines prediction by a network model and planning by a tree search using the predictions. MuZero's learning process can fail when predictions are poor but planning requires them. We use this as an impetus to get the agent to explore parts of the decision tree in the environment that it otherwise would not explore. The agent achieves this, first by normal planning to come up with an improved policy. Second, it randomly deviates from this policy at the beginning of each training episode. And third, it switches back to the improved policy at a random time step to experience the rewards from the environment associated with the improved policy, which is the basis for learning the correct value expectation. The simple board game Tic-Tac-Toe is used to illustrate how this approach can improve the agent's decision-making ability.
Publication date: 6 Jun 2023
Topic: Artificial Intelligence
Paper: https://arxiv.org/pdf/2306.03408v1.pdf
GitHub: https://github.com/enpasos/muzero
Denoscription:
Improving the decision-making capabilities of agents is a key challenge on the road to artificial intelligence. To improve the planning skills needed to make good decisions, MuZero's agent combines prediction by a network model and planning by a tree search using the predictions. MuZero's learning process can fail when predictions are poor but planning requires them. We use this as an impetus to get the agent to explore parts of the decision tree in the environment that it otherwise would not explore. The agent achieves this, first by normal planning to come up with an improved policy. Second, it randomly deviates from this policy at the beginning of each training episode. And third, it switches back to the improved policy at a random time step to experience the rewards from the environment associated with the improved policy, which is the basis for learning the correct value expectation. The simple board game Tic-Tac-Toe is used to illustrate how this approach can improve the agent's decision-making ability.
GitHub
GitHub - enpasos/muzero
Contribute to enpasos/muzero development by creating an account on GitHub.
Unleashing Mask: Explore the Intrinsic Out-of-Distribution Detection Capability
Publication date: 6 Jun 2023
Topic: Machine Learning
Paper: https://arxiv.org/pdf/2306.03715v1.pdf
GitHub: https://github.com/tmlr-group/unleashing-mask
Denoscription:
Out-of-distribution (OOD) detection is an indispensable aspect of secure AI when deploying machine learning models in real-world applications. Previous paradigms either explore better scoring functions or utilize the knowledge of outliers to equip the models with the ability of OOD detection. However, few of them pay attention to the intrinsic OOD detection capability of the given model. In this work, we generally discover the existence of an intermediate stage of a model trained on in-distribution (ID) data having higher OOD detection performance than that of its final stage across different settings, and further identify one critical data-level attribution to be learning with the atypical samples. Based on such insights, we propose a novel method, Unleashing Mask, which aims to restore the OOD discriminative capabilities of the well-trained model with ID data. Our method utilizes a mask to figure out the memorized atypical samples, and then finetune the model or prune it with the introduced mask to forget them. Extensive experiments and analysis demonstrate the effectiveness of our method.
Publication date: 6 Jun 2023
Topic: Machine Learning
Paper: https://arxiv.org/pdf/2306.03715v1.pdf
GitHub: https://github.com/tmlr-group/unleashing-mask
Denoscription:
Out-of-distribution (OOD) detection is an indispensable aspect of secure AI when deploying machine learning models in real-world applications. Previous paradigms either explore better scoring functions or utilize the knowledge of outliers to equip the models with the ability of OOD detection. However, few of them pay attention to the intrinsic OOD detection capability of the given model. In this work, we generally discover the existence of an intermediate stage of a model trained on in-distribution (ID) data having higher OOD detection performance than that of its final stage across different settings, and further identify one critical data-level attribution to be learning with the atypical samples. Based on such insights, we propose a novel method, Unleashing Mask, which aims to restore the OOD discriminative capabilities of the well-trained model with ID data. Our method utilizes a mask to figure out the memorized atypical samples, and then finetune the model or prune it with the introduced mask to forget them. Extensive experiments and analysis demonstrate the effectiveness of our method.
