Ayasdi: What Constitutes an AI Platform
Posted: 2 April 2018 | Author: Ajith Warrier | Source: Ayasdi
One of the questions we often field revolves around whether we identify ourselves as an artificial intelligence platform or an intelligent application company.
For us, the answer is yes.
We are an artificial intelligence platform on which one would build intelligent applications. In reality, it is more nuanced than that and in this post we explore a little about the different interaction points to our software and where they ultimately fit.
Finding the starting point is a little like engaging in the chicken/egg debate. Do you begin with an application and work back down to the platform or begin with the platform and construct an application from the ground up.
Given that interest in the subject is likely more technical than not, we are going to build from the ground up – starting with the data. One could (and probably should) argue that an AI platform starts with a business need but for the purposes of this post we will work up from the data.
An AI platform should be able to ingest data – with a focus on the 3Vs:
- Volume (big).
- Velocity (batch to streaming).
- Variety (structured, unstructured).
The ingest should be simple and executed programmatically, from the command line or via a well designed interface.
In the case of Ayasdi, we use a collection of methods to ingest data into Hadoop. We use YARN as our resource manager for distribution and execution. There are other ways to do this, each with their advantages and disadvantages. What we like about Hadoop is its ubiquity, cloudy-ness, community and performance (in the cloud or on-prem). This layer, however, is very much a part of the “platform” from our perspective and gets first-class treatment.
Above this sits our special sauce from a data science perspective as well as our compute optimization. From a data science perspective, we have optimized around a framework for machine learning called topological data analysis (TDA). TDA combines and synthesizes other algorithms (machine learning, geometric, statistical, bayesian etc.) in such a way that it reveals the structure of shape of the data. We will not cover this in any depth here as we have a few hundred papers in the resource section that do a superb job of that.
Given how TDA works, it is exceptionally effective at unsupervised learning tasks such as segmentation, anomaly detection and hotspot detection. Unsupervised learning is compute intensive and we have spent considerable engineering cycles optimizing for this. The result is that we take advantage of key instruction sets and memory management protocols to squeeze everything we can from the Intel infrastructure.
This is actually important. As a general rule, in the more sophisticated the algorithm the worse it will scale. In the case of linear learners this is not problematic, but for other approaches, the tradeoff is made in time, compute and accuracy. These tradeoffs often obscure what is going on, making justification exceptionally difficult. An inability to justify makes repair difficult, complicates production and limits scalability.
This “collection of algorithms”, in the traditional view of data science, is “the platform.” This view, however, is limited. What we just described, TDA and the collection of algorithms, are really an engine. Engines matter, but they are not platforms.
A platform is something more, a collection of elements that are optimized to deliver end-to-end capabilities. This is where our compute optimization, YARN implementation and northbound interfaces come into play.
It is through the northbound interfaces that one accesses the engine and there are multiple levels to this.
Furthermore, any AI platform should support an application development ecosystem that exposes the underlying capabilities in such a way that they can be deployed against a flexible range of use cases.
Key interfaces should include:
- REST API
- Python SDK
- An application development framework for non-technical users.
Subject matter experts represent the most logical users of AI technology. Their knowledge of the data, analytical workflows and business problem position them uniquely to extract the maximum value from autonomous, cognitive systems. In order to create the most leverage for this class of users, an enterprise AI platform should allow for application creation and development by this class of non-technical users.
To achieve this, the platform should support application development wizards, UI widgets and pre-built workflows to facilitate and accelerate the creation of applications.
Let’s drill down:
For any such platform, where the bulk of the processing is performed on cloud or on-premise servers, it is imperative that access controls and requests are bounded tightly. A REST interface is a standard requirement for such tasks. To that end, we provide a RESTful interface that can be authenticated, queried upon and initiate jobs that require machine intelligence. Our RESTful interface is exposed from an API Server that utilizes Kong which then communicates with the underlying interfaces that are written in Java or C++
While having a RESTful interface allows an easy invocation of the libraries and methods available within the platform, it is at times necessary to work within the ecosystem that MI users are familiar with. Python is growing as the de facto language for Machine Learning and numerical computations, courtesy of the excellent Sci-Kit Learn and NumPy packages. Additionally, the ecosystem allows for an integration with a number of packages and tools that allow easy access to virtually any system out there. With this in mind, we have focused on making available a Python SDK package that users can leverage to develop mission-critical solutions while using a familiar ecosystem.
While the REST API and the Python SDKs are key parts of an application strategy, Ayasdi went one step further, developing a framework for accelerating the development of intelligent applications by allowing a far larger group of “data aware” resources to design and deploy these next generation applications.
Envision addresses the gap between data science, IT and the business. Intelligent application development is often a disjointed, iterative and plodding process. Some resources had ML experience, others, data experience, others with business experience and yet others with development and deployment experience. The challenge was getting all of those people on the same page. It was difficult at best, so much so that many organizations did the natural thing – default to the familiar – powerpoint, excel or .pdfs.
Envision changes that process by providing simple Python, pre-built UI libraries, collaboration features and AI platform connectivity, enabling more parts of the organization to create intelligent applications.
Now business analysts that understand analytic workflows can collaborate live with business owners and have their work checked by data science.
The Problem With Platforms
Platforms, and in particular AI Platforms offer some tremendous advantages – they scale to different types of problems while retaining consistency and operational efficiency. On the other hand, a platform is not solving a problem per se – it is in need of problems to solve.
A platform without discrete problems to solve becomes its own problem.
The selection of an AI platform is predicated on having a suite of business problems that lend themselves well to resolution using AI techniques. The following list details the attributes of a good AI problem the technical capabilities that must be present to solve them:
- Complex Data
- Rapidly Generated or Constantly Evolving Data
- Signals That Are Difficult to Spot
- Models where Human-Powered Analytics are Slow or Highly Iterative
- Accuracy is Highly Valued
- Decreasing Model Generation Time is Critical
- Operationalization is Critical
In short, these are great AI problems – whether you tackle them with a specific application or with an AI platform will depend on the nature of the problem. Either way, the key is that the outcome of that work inform some type of action – because intelligence that ends up in powerpoint isn’t that smart….