Machine Learning Research – Telegram
Machine Learning Research
955 subscribers
61 photos
7 videos
2 files
1.05K links
Download Telegram
Официальный резил тайга 2.0. Краткая выжимка, по ссылке более детально:
We have gathered the resources with respect to popular NLP-problems:

thematic modelling - news with theme tags, all the sites which provide rubrication (news, poems, prose)
readability of texts - a popular science magazine NPlus1 has a readability metric for each text, provided by editor.
NER and fact extraction - news with references to mentioned person’s page or wiki-information, news with personalia tags
key-words extraction - news with key-word tags, hashtags on social media
authorship attribution - all the texts with author information - magazines, news, and more important - social media - with gender, age, city, time and education mark-up.
chat-bot training - open-source film subnoscripts
text generation - any resource depending on genre
rare words studying, frequency dictionaries - literary magazines, social media
morphological and syntactic parsers - any resource with respect to the genre
Taiga corpus is an ambitious project to become the largest fully available webcorpus constructed from open text sources. Taiga corpus is:

open source, CC BY-SA 3.0
big - about 5 billion words by now
sorted by datasets applicable to different machine laearning tasks
made by linguists, experienced in text crawling, parsing and filtering
rich with metainformation
POS-tagged and syntactically tagged in Universal Dependencies
https://tatianashavrina.github.io/taiga_site/

Создатели:
Tatiana Shavrina (rybolos@gmail.com)
Yana Kurmachova (yana.kurmacheva@gmail.com)
Comparing Sentence Similarity Methods
http://nlp.town/blog/sentence-similarity/