Cortex – A Platform for Developers, Not Just Data Scientists

Some thoughts from the creator of an open source machine learning platform focused on developers.

Machine learning has, historically, been the purview of data science teams. This makes it a bit counter-intuitive that we built Cortex, our open source ML infrastructure platform, primarily for software engineers.

Going all the way back to machine learning’s roots in the 1950s, the field has historically been research-focused—things like Arthur Samuel’s checkers-playing AI (1959) or IBM’s chess-playing Deep Blue (1988).

Starting around 2010, there was a renewed interest in deep learning, with major tech companies releasing breakthroughs. Projects like Google Brain, DeepMind, and OpenAI (among others) began publishing new, state-of-the-art results.

These breakthroughs manifested as features in big companies’ products:

  • Netflix’s recommendation engine
  • Gmail’s smart compose
  • Facebook’s facial recognition tags

In addition, this renewed focus on machine learning—and particularly deep learning—lead to the creation of better tools and frameworks, like Google’s TensorFlow and Facebook’s PyTorch, as well as open source models and datasets, like OpenAI’s GPT-2 and ImageNet.

With better tools, open source models, and accessible data, it became possible for small teams to train models for production. As a consequence of this democratization, a wave of new products have emerged, all of which at their core are “just” ML models wrapped in software. We refer to these products as ML-native.

MLOps and DevOps

This article briefly outlines how as Machine Learning (ML) becomes more a part of corporate solutions, the need for MLOps is going to become more critical.

The term MLOps refers to a set of techniques and practises for data scientists to collaborate operations professionals.. MLOps aims to manage deployment of machine learning and deep learning models in large-scale production environments.

The term DevOps comes from the software engineering world and is concerned with developing and operating large-scale software systems. DevOps introduces two concepts: Continuous Integration (CI) and Continuous Delivery (CD). DevOps aims to shorten development cycles, increase deployment velocity and create dependable releases.

Is an ‘AI Winter’ Coming?

A BBC post speculating on whether there is a cooling off coming for AI

The last decade was a big one for artificial intelligence but researchers in the field believe that the industry is about to enter a new phase.

Hype surrounding AI has peaked and troughed over the years as the abilities of the technology get overestimated and then re-evaluated.

The peaks are known as AI summers, and the troughs AI winters.

The 10s were arguably the hottest AI summer on record with tech giants repeatedly touting AI’s abilities.

AI pioneer Yoshua Bengio, sometimes called one of the “godfathers of AI”, told the BBC that AI’s abilities were somewhat overhyped in the 10s by certain companies with an interest in doing so.

There are signs, however, that the hype might be about to start cooling off.

TinyML = Big Opportunity

This post explores while tinyML may be the next big thing.

A coalescence of several trends has made the microcontroller not just a conduit for implementing IoT applications but also a powerful, independent processing mechanism in its own right. In recent years, hardware advancements have made it possible for microcontrollers to perform calculations much faster.  Improved hardware coupled with more efficient development standards have made it easier for developers to build programs on these devices. Perhaps the most important trend, though, has been the rise of tiny machine learning, or TinyML. It’s a technology we’ve been following since investing in a startup in this space.

TinyML broadly encapsulates the field of machine learning technologies capable of performing on-device analytics of sensor data at extremely low power. Between hardware advancements and the TinyML community’s recent innovations in machine learning, it is now possible to run increasingly complex deep learning models (the foundation of most modern artificial intelligence applications) directly on microcontrollers. A quick glance under the hood shows this is fundamentally possible because deep learning models are compute-bound, meaning their efficiency is limited by the time it takes to complete a large number of arithmetic operations. Advancements in TinyML have made it possible to run these models on existing microcontroller hardware.

In other words, those 250 billion microcontrollers in our printers, TVs, cars, and pacemakers can now perform tasks that previously only our computers and smartphones could handle. All of our devices and appliances are getting smarter thanks to microcontrollers.

TinyML represents a collaborative effort between the embedded ultra-low power systems and machine learning communities, which traditionally have operated largely independently. This union has opened the floodgates for new and exciting applications of on-device machine learning. However, the knowledge that deep learning and microcontrollers are a perfect match has been pretty exclusive, hidden behind the walls of tech giants like Google and Apple. This becomes more obvious when you learn that this paradigm of running modified deep learning models on microcontrollers is responsible for the “Okay Google” and “Hey Siri,” functionality that has been around for years.

But why is it important that we be able to run these models on microcontrollers? Much of the sensor data generated today is discarded because of cost, bandwidth, or power constraints – or sometimes a combination of all three. For example, take an imagery micro-satellite. Such satellites are equipped with cameras capable of capturing high resolution images but are limited by the size and number of photos they can store and how often they can transmit those photos to Earth. As a result, such satellites have to store images at low resolution and at a low frame rate. What if we could use image detection models to save high resolution photos only if an object of interest (like a ship or weather pattern) was present in the image? While the computing resources on these micro-satellites have historically been too small to support image detection deep learning models, TinyML now makes this possible.

AI BS

or Artificial Intelligence Bull Shitake

There are a lot of claims being made, and as this article points out, not many of them are supported by strong evidence/math.

In Rebooting AI, Ernie Davis and I made six recommendations, each geared towards how readers – and journalists – and researchers might equally assess each new result that they achieve, asking the same set of questions in a limit section in the discussion of their papers:


Stripping away the rhetoric, what does the AI system actually do? Does a “reading system” really read?


How general is the result? (Could a driving system that works in Phoenix work as well in Mumbai? Would a Rubik’s cube system work in opening bottles? How much retraining would be required?)


Is there a demo where interested readers can probe for themselves?


If AI system is allegedly better than humans, then which humans, and how much better? (A comparison is low wage workers with little incentive to do well may not truly probe the limits of human ability)


How far does succeeding at the particular task actually take us toward building genuine AI?


How robust is the system? Could it work just as well with other data sets, without massive amounts of retraining? AlphaGo works fine on a 19×19 board, but would need to be retrained to play on a rectangular board; the lack of transfer is telling.

OpenCV Speed Cam

I really need to find the time to build this DIY speed cam. From my home office window, I have an excellent view of an intersection where I would estimate about 70% of the cars don’t even stop at the posted Stop sign. Further, I would guess that close to 90% of them are going faster than the 25 MPH speed limit. Data is good.

Computer vision itself isn’t anything new, but it has only recently reached a point where it’s practical for hobbyists to utilize. Part of that is because hardware has improved dramatically in recent years, but it also helps that good open-source machine learning and computer vision software has become available. More software options are becoming available, but OpenCV is one that has been around for a while now and is still one of the most popular. Over on PyImageSearch, Adrian Rosebrock has put together a tutorial that will walk you through how to detect vehicles and then track them to estimate the speed at which they’re traveling.

Rosebrock’s guide will show you how to make your very own DIY speed camera. But even if that isn’t something you have a need for, the tutorial is worth following just to learn some useful computer vision techniques. You could, for instance, modify this setup to count how many cars enter and exit a parking lot. This can be done with affordable and readily-available hardware, so the barrier to entry is low — perfect for the kind of project that is more of a learning experience than anything else.

Problems with AI Transparency

As more and more business decisions get handed over (sometime blindly) to computer algorithms (aka ‘AI’), companies are very late to the game in considering what the consequences of that delegation will yield. As a buffer against these consequences, a company may want to be more transparent about how it’s algorithms work but that is not without it’s challenges.

To start, companies attempting to utilize artificial intelligence need to recognize that there are costs associated with transparency. This is not, of course, to suggest that transparency isn’t worth achieving, simply that it also poses downsides that need to be fully understood. These costs should be incorporated into a broader risk model that governs how to engage with explainable models and the extent to which
information about the model is available to others.

Second, organizations must also recognize that security is becoming an increasing concern in the world of AI. As AI is adopted more widely, more security vulnerabilities and bugs will surely be discovered, as my colleagues and I at the Future of Privacy Forum recently argued. Indeed, security may be one of the biggest long-term barriers to the adoption of AI.

Reading Is Good For Your Brain

Science has found that reading is essential for a healthy brain. We already know reading is good for children’s developing noggins: A study of twins at the University of California at Berkeley found that kids who started reading at an earlier age went on to perform better on certain intelligence tests, such as analyses of their vocabulary size.


Other studies show that reading continues to develop the brains of adults. One 2012 Stanford University study, where people read passages of Jane Austen while inside an MRI, indicates that different types of reading exercise different parts of your brain. As you get older, another study suggests, reading might help slow down or even halt cognitive decline.Science has found that reading is essential for a healthy brain. We already know reading is good for children’s developing noggins: A study of twins at the University of California at Berkeley found that kids who started reading at an earlier age went on to perform better on certain intelligence tests, such as analyses of their vocabulary size.


Other studies show that reading continues to develop the brains of adults. One 2012 Stanford University study, where people read passages of Jane Austen while inside an MRI, indicates that different types of reading exercise different parts of your brain. As you get older, another study suggests, reading might help slow down or even halt cognitive decline.

https://www.popsci.com/read-more-books

 

And it doesn’t seem to matter if it is a physical book, an e-reader or an audio book (although the audio book has a slightly different impact on the brain).

 

As for audiobooks, the research so far has found that they stimulate the brain just as deeply as black-and-white pages, although they affect your gray matter somewhat differently. Because you’re listening to a story, you’re using different methods to decode and comprehend it. With print books, you need to provide the voice, called the prosody—you’re imagining the “tune and rhythm of speech,” the intonation, the stress on certain syllables, and so. With audio, the voice actor provides that information for you, so your brain isn’t generating the prosody itself, but rather working to understand the prosody in your ears.

Clever ‘AI’ or Poor Definition?

These types of articles seem to come down to the insatiable need for writers to sensationalize things that they don’t necessarily understand.

For example, in the scenario outlined in the article, it is unlikely that the ‘AI’ (aka computer algorithm) was self aware and said to itself “hey, I have a comprehensive understanding of humans and their capabilities, so I will modify myself to ‘cheat’ at this task in a way that a human would find difficult to detect”.

More likely is that the algorithm was poorly defined and the brute force computational model (aka ‘AI’) found a way to ‘solve’ the problem in a way that wasn’t contemplated by the software developer.

This clever AI hid data from its creators to cheat at its appointed task

Feed Shark

flickr OFF

Joined 2005… Left 2018…

I knew that flickr has been on the decline for a while.  IMHO, Yahoo’s acquisition was the beginning of the end.  SmugMug’s heavy handed idiocy of late was the last straw for me.

After a few arrogant email demands from SmarmMug, I had had enough so I requested all of my data from flickr and it only took them a week and a half to provide the requested files.  I happily downloaded my content and deleted my account after 13 years of use.

Personal Data as an Asset

There is a well worn axiom in business that ‘data should be treated as a corporate asset’.  This is, of course, very true and the advances in data science and ‘big data’ are giving the potential for that data to become even more valuable.

This got me thinking about how personal data should be thought about in the same way.  Think about all the data generated from what you watch, what you listen to, where you visit, what you review, data from wearables, etc.  All of this data is consumed and analyzed by 3rd parties currently, but what if individuals were able to take control of, what is, after all, their data.

Would this give rise to data science companies marketing algorithms directly to consumers (much like pharmaceutical companies market drugs directly)?  Could it also give rise to the equivalent ‘data quackery’ similar to the natural supplements and homeopathic industry?  That is, junk algorithms that, at their most benign, do no harm and at their worst incent you to dangerous courses of action?

Would there also be a new industry for ‘personal data scientists’ (like financial councilors or tax advisers) that would help you assess all of the data assets you have and how to best combine or leverage them with third parties to your best benefit (and not just the benefit of 3rd parties)?  Wouldn’t it be great to have some control over the hundreds of arbitrage-like transactions that go on behind the scenes when you are waiting for a page to load on a commercial web site via browser setting that allow you to control what information about you gets shared (and with companies).