A Logarithm Too Clever by Half — on functions which can never be computed exactly, even with arbitrary-precision arithmetic
https://people.eecs.berkeley.edu/~wkahan/LOG10HAF.TXT
https://people.eecs.berkeley.edu/~wkahan/LOG10HAF.TXT
Infiniswap - using free cluster memory as swap; great demonstration of the power of randomized and decentralized algorithms
https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/gu
https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/gu
Datasaurus Dozen: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing
https://www.autodeskresearch.com/publications/samestats
https://www.autodeskresearch.com/publications/samestats
Autodesk Research
Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing
Why graphical representation and visualization are so important to...
MapD: Interactively query and visualize massive datasets with the parallel power of GPUs. Now open source.
https://www.mapd.com/
https://www.mapd.com/
www.heavy.ai
A Revolutionary GPU-Accelerated Analytics Platform | HEAVY.AI
HEAVY.AI provides advanced analytics that empower businesses and the government to visualize high-value opportunities and risks hidden in their big location and time data, supporting time-sensitive, high-impact decisions.
Hidden Technical Debt in Machine Learning Systems
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
Goods: Organizing Google’s Datasets '16
TLDR: Internal search engine operating over 26*10^9 *datasets* (e.g. DB tables), 5% daily churn, schema inference... interesting stuff
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45390.pdf
TLDR: Internal search engine operating over 26*10^9 *datasets* (e.g. DB tables), 5% daily churn, schema inference... interesting stuff
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45390.pdf
This common and unfortunate fact of the lack of an adequate presentation of basic ideas and motivations of almost any mathematical theory is, probably, due to the binary nature of mathematical perception: either you have no inkling of an idea or, once you have understood it, this very idea appears so embarrassingly obvious that you feel reluctant to say it aloud; moreover, once your mind switches from the state of darkness to the light, all memory of the dark state is erased and it becomes impossible to conceive the existence of another mind for which the idea appears nonobvious.
— Gromov
Source: M. Berger, Encounter with a geometer, Part II, Notices Amer. Math. Soc. 47 (2000), no. 3, 326--340.
via @atemerev
http://www.ams.org/notices/200003/fea-berger.pdf
— Gromov
Source: M. Berger, Encounter with a geometer, Part II, Notices Amer. Math. Soc. 47 (2000), no. 3, 326--340.
via @atemerev
http://www.ams.org/notices/200003/fea-berger.pdf
Google support at its best
(TLDR: Firebase costs increased 70x because they started including SSL handshake in traffic calculation)
https://news.ycombinator.com/item?id=14356409
(TLDR: Firebase costs increased 70x because they started including SSL handshake in traffic calculation)
https://news.ycombinator.com/item?id=14356409
On the Turing Completeness of MS PowerPoint 😂
https://www.andrew.cmu.edu/user/twildenh/PowerPointTM/Paper.pdf
https://www.andrew.cmu.edu/user/twildenh/PowerPointTM/Paper.pdf
Neurosurgeon: Collaborative Intelligence
Between the Cloud and Mobile Edge
TLDR: It's best to split processing between device and cloud, here's how to do it automatically (for DNNs)
http://web.eecs.umich.edu/~jahausw/publications/kang2017neurosurgeon.pdf
Between the Cloud and Mobile Edge
TLDR: It's best to split processing between device and cloud, here's how to do it automatically (for DNNs)
http://web.eecs.umich.edu/~jahausw/publications/kang2017neurosurgeon.pdf
DeepXplore: Automated Whitebox Testing of Deep Learning Systems '17
TLDR: Neuron coverage, use 3+ cross-referencing oracles to detect outliers
https://arxiv.org/pdf/1705.06640.pdf
TLDR: Neuron coverage, use 3+ cross-referencing oracles to detect outliers
https://arxiv.org/pdf/1705.06640.pdf
> Design your construct, pick your deliverable - dsDNA, plasmid DNA or purified protein - and Serotiny will instantly price your order from various vendors.
https://serotiny.bio/notes/pinecone/
We live in interesting times :)
https://serotiny.bio/notes/pinecone/
We live in interesting times :)
> Big data systems may scale well, but this can often be just because they introduce a lot of overhead
I believe I already posted this one, but somehow I can't find it in the history, so here it comes again
http://www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html
I believe I already posted this one, but somehow I can't find it in the history, so here it comes again
http://www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html
www.frankmcsherry.org
Scalability! But at what COST?
Michael Isard, Derek Murray, and I recently sent in a HotOS submission (it’s not blind, so no harm talking about it, we think). The subject is hinted at from...
CNTK 2.0. Nice production-related features, note Model Compression, Keras support and impressive multi-GPU scaling.
> If your project involves sequence processing, such as speech, natural language understanding, machine translation, etc., CNTK should be your best choice. And if you are a vision researcher working on video processing, you should definitely give CNTK a try.
https://www.microsoft.com/en-us/cognitive-toolkit/blog/2017/06/microsofts-high-performance-open-source-deep-learning-toolkit-now-generally-available/
> If your project involves sequence processing, such as speech, natural language understanding, machine translation, etc., CNTK should be your best choice. And if you are a vision researcher working on video processing, you should definitely give CNTK a try.
https://www.microsoft.com/en-us/cognitive-toolkit/blog/2017/06/microsofts-high-performance-open-source-deep-learning-toolkit-now-generally-available/
Microsoft Cognitive Toolkit
Microsoft’s high-performance, open source, deep learning toolkit is now generally available - Microsoft Cognitive Toolkit
Microsoft Cognitive Toolkit version 2.0 is now in full release with general availability. Cognitive Toolkit enables enterprise-ready, production-grade AI by allowing users to create, train, and evaluate their own neural networks that can then scale efficiently…
Unicode nuances you probably don't need to know about :)
http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries
http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries
Chronology of Math
http://www-history.mcs.st-andrews.ac.uk/Chronology/full.html
http://www-history.mcs.st-andrews.ac.uk/Chronology/full.html