The Free Discovery Project

Machine Learning for Everyone

Free Discovery is an open-source Web service tier that gives GUI developers an easy way to access complex analytics without having to deal with all the complexities of a machine learning/artificial intelligence package. FreeDiscovery is built on top of existing machine learning libraries (scikit-learn) and exposes a REST API for information retrieval applications.

FreeDiscovery Engine

provides a REST API for information retrieval applications

FreeDiscovery Core

a Python package that aims to extend scikit-learn

Download from Anaconda

free machine learning web service

Powerful Features

h

Easy document categorization

Put sets of documents into categories, e.g., “medical” or “legal.” If you already have some documents categorized, you can use FreeDiscovery to teach the machine to categorize new ones.

Document clustering

Place sets of documents into natural clusters, and FreeDiscovery will organize them into groups of related documents. Unlike other clustering algorithms, many of which require you to figure out in advance how many clusters exist, FreeDiscovery simply clusters your collection and generates a logical name for each cluster. This is useful when you have a large collection and want to know what’s in it.

i

Duplicate detection

Duplicate detection will identify duplicates in your collection, and it does so in a smart way (it doesn’t have to be a 100% duplicate).
.

E-mail threading

If you have a group of e-mails that includes various conversations, this algorithm will identify the conversations.

Free white paper – find out why most popular e-Discovery algorithms fail

E-Discovery vendors often use tools that apply only one method for categorizing text. We investigated these tools to find out how well they work and found wide variations in performance. Learn more, by requesting a FREE white paper.

Download the FREE White Paper:

Effectiveness Results for Popular e-Discovery Algorithms

Originally Presented at the 2017 International Conference on Artificial Intelligence & Law at King’s College in London.

Algorithms available for document categorization

N
K-nearest neighbor
K-nearest neighbor refers to finding the closest documents, again, in multidimensional space. So K is just the k closest documents.
N
Random forests
Decision tree algorithms find decision trees that allow you to make decisions. Think of it as generating explicit rules to say that if a certain word appears, then the document belongs in a certain category.
N
Support vector machine
This algorithm works by looking for a separating hyperplane that separates the documents into one category versus another. It’s designed for binary classification—where there are only two categories—but it’s easy to extend to multiple categories by simply comparing each category to all other categories.
N
Recurrent neural networks
Neural networks are all the rage right now. This algorithm models the brain by forming “neurons” and then training them with the exemplars. As learning progresses, weights for each neuron are adjusted, resulting in an algorithm that will identify categories.
N
Latent semantic indexing
This refers to identifying the “latent” concepts in a document, with the idea is that there are deeper semantics beyond the words. It does this by starting with a M x N matrix, where M is the number of documents and N is the number of distinct words. A matrix decomposition algorithm is used to represent the matrix in fewer dimensions.
N
Classic “bag of words” – each word is a feature
This represents a document with a bag of words. The order of the words doesn’t matter, and so it’s called a “bag.” For categorization purposes, many algorithms are 90% accurate without regard to word order. Building in word-order rules makes building an efficient algorithm much more complicated and difficult.

Looking for an Enterprise-class data retrieval, machine learning or AI application, or a more customized approach?

Free Discovery is committed to providing simple solutions to complex search problems.

Ask Us (almost) Anything!