top of page
regular_1708x683_0417-an-introduction-to-deep-learning-from-perceptrons-to-deep-networks-W

AI Projects

Recommendation Systems

For a master's project, I built a recommendation system called RelRec. I trained multiple models and compared them. The winning algorithm used the Latent Dirichlet Allocation algorithm. The recommender includes a stepped microservice pipeline for easy re-use. Steps include: obtain documents, parse, sanitize, remove stop words, run algorithm, benchmark.

​

Research: https://scholarsarchive.byu.edu/etd/6195/

Code: http://linguistics.byu.edu/thesisdata/relrec.html

recommender-system-provides-smart-recommendations-260nw-1978170500.webp

Data Indexing

Sometimes you need an index instead of recommendations. This is where my topic index comes it. It is a natural by-product of the application of the LDA algorithm used for RelRec, my recommendation system. This one is trained on LDS General Conference articles. It takes a while to load, but once loaded, you can click and navigate to inspect topic groups and the strongest document matches for them.

​

LDS Talk Topic Index

maxresdefault.jpg

Classification

ML Classifiers can be used to answer the question: is this what it purports to be?

​

Well, this question can arise when it comes to what you eat My team and I proved this is possible using a nueral network. Our model performed well above baseline. Our data came form the FDA nutritional information database.

​

https://bean5.github.io/machine-learning/food-classification/index.html

data classification.png

Human Training (Education)

Sometimes training the humans is important. ML could be used to select the dataset to provide to an educational system for training humans.

​

Here's a tool for learning SAT words. It uses Perl, a language often purported as great for NLP work. This was built as a way to learn Perl and build something useful for humanity. It was built in 2012. That was years before even Tensorflow was on the scene, so it is not surprising that it is built to use a command line (like DOS). It only takes 175 lines of code and total project is under 1 MB.

​

http://bean5.github.io/learn_words/

Sat-Vocabulary-Words.png

Plagiarism Detection

I wanted a simple, portable system to perform paraphrase detection using n-gram/gene sequence alignment. It needed to have a GUI and allow the user to easily manipulate parameters on the fly. The result is a Java paraphrase detector.

​

http://bean5.github.io/paraphraseDetector/

Plagiarism-Detection-System.png

Data Analysis

When you have a lot of documents spanning a lot of years, key questions invariably arise such as:

 

  •  which topics are trending

  • which topics were trending 10 years ago

 

Using machine learning, this is easily possible. To read up on a model and conclusions I was able to make, see Theological topics through time: An application of Gibbs sampling and other metrics to analyze topic venues in religious discourses

Evolution-of-topics-over-time.png

Regex

Regex is a great way to locate interesting linguistic patterns in documents. Some of those regular expressions are better expressed as customizable atomotons. After conducting research in this area, I had built a tool to locate alliterations in documents using orthographic cues.

​

Alliteration Locator

 

Related to this is haiku detection.

Haiku detection in LDS Conference talks

alliteration.jpeg

Tooling

I think I ported a stemming tool to Java or maybe I just optimized it for Java 7.​ It is over 11 years old. Github is not linking it to anything it might have been forked from which is strange. Being in Java, it probably still works.

 

I recommend tools like Python Natural Language ToolKit (NLTK) at this point. Not because they are better, but because they are more mainstream and therefore render code more re-usable and maintainable. Or use lemmatizers.

​

https://github.com/bean5/Java-Porter-Stemmer

stemming vs lemmatization.png

©2018 by Can Compute

bottom of page