![regular_1708x683_0417-an-introduction-to-deep-learning-from-perceptrons-to-deep-networks-W](https://static.wixstatic.com/media/bb243c_82facd1c4d97437283ebf5f23c14b885~mv2.png/v1/crop/x_0,y_107,w_1305,h_469/fill/w_1305,h_469,al_c,q_90,enc_avif,quality_auto/regular_1708x683_0417-an-introduction-to-deep-learning-from-perceptrons-to-deep-networks-W.png)
AI Projects
Recommendation Systems
For a master's project, I built a recommendation system called RelRec. I trained multiple models and compared them. The winning algorithm used the Latent Dirichlet Allocation algorithm. The recommender includes a stepped microservice pipeline for easy re-use. Steps include: obtain documents, parse, sanitize, remove stop words, run algorithm, benchmark.
​
![recommender-system-provides-smart-recommendations-260nw-1978170500.webp](https://static.wixstatic.com/media/bb243c_98d7964158fd4c7a8dc61bcff13ef13b~mv2.webp/v1/fill/w_358,h_164,al_c,q_80,usm_0.66_1.00_0.01,enc_avif,quality_auto/recommender-system-provides-smart-recommendations-260nw-1978170500.webp)
Data Indexing
Sometimes you need an index instead of recommendations. This is where my topic index comes it. It is a natural by-product of the application of the LDA algorithm used for RelRec, my recommendation system. This one is trained on LDS General Conference articles. It takes a while to load, but once loaded, you can click and navigate to inspect topic groups and the strongest document matches for them.
​
![maxresdefault.jpg](https://static.wixstatic.com/media/bb243c_20b4e2e174dc4745b8f507b094fae23e~mv2.jpg/v1/fill/w_358,h_201,al_c,q_80,usm_0.66_1.00_0.01,enc_avif,quality_auto/maxresdefault.jpg)
Classification
ML Classifiers can be used to answer the question: is this what it purports to be?
​
Well, this question can arise when it comes to what you eat My team and I proved this is possible using a nueral network. Our model performed well above baseline. Our data came form the FDA nutritional information database.
​
https://bean5.github.io/machine-learning/food-classification/index.html
![data classification.png](https://static.wixstatic.com/media/bb243c_38eba67f20e74830aca6eb363abb2c28~mv2.png/v1/fill/w_353,h_212,al_c,lg_1,q_85,enc_avif,quality_auto/data%20classification.png)
Human Training (Education)
Sometimes training the humans is important. ML could be used to select the dataset to provide to an educational system for training humans.
​
Here's a tool for learning SAT words. It uses Perl, a language often purported as great for NLP work. This was built as a way to learn Perl and build something useful for humanity. It was built in 2012. That was years before even Tensorflow was on the scene, so it is not surprising that it is built to use a command line (like DOS). It only takes 175 lines of code and total project is under 1 MB.
​
![Sat-Vocabulary-Words.png](https://static.wixstatic.com/media/bb243c_b27295369ad245c99fefd13c8e9d85b5~mv2.png/v1/fill/w_159,h_212,al_c,q_85,usm_0.66_1.00_0.01,enc_avif,quality_auto/Sat-Vocabulary-Words.png)
Plagiarism Detection
I wanted a simple, portable system to perform paraphrase detection using n-gram/gene sequence alignment. It needed to have a GUI and allow the user to easily manipulate parameters on the fly. The result is a Java paraphrase detector.
​
![Plagiarism-Detection-System.png](https://static.wixstatic.com/media/bb243c_878dbe5822f8499ca78369d854ac5b6b~mv2.png/v1/fill/w_297,h_212,al_c,q_85,usm_0.66_1.00_0.01,enc_avif,quality_auto/Plagiarism-Detection-System.png)
Data Analysis
When you have a lot of documents spanning a lot of years, key questions invariably arise such as:
-
which topics are trending
-
which topics were trending 10 years ago
Using machine learning, this is easily possible. To read up on a model and conclusions I was able to make, see Theological topics through time: An application of Gibbs sampling and other metrics to analyze topic venues in religious discourses
![Evolution-of-topics-over-time.png](https://static.wixstatic.com/media/bb243c_8d6e61fa4246461d9c942c862137c46a~mv2.png/v1/fill/w_318,h_212,al_c,q_85,usm_0.66_1.00_0.01,enc_avif,quality_auto/Evolution-of-topics-over-time.png)
Regex
Regex is a great way to locate interesting linguistic patterns in documents. Some of those regular expressions are better expressed as customizable atomotons. After conducting research in this area, I had built a tool to locate alliterations in documents using orthographic cues.
​
Related to this is haiku detection.
![alliteration.jpeg](https://static.wixstatic.com/media/bb243c_59e46cd94ab24e7ea89d3283ead79bcd~mv2.jpeg/v1/fill/w_358,h_179,al_c,q_80,usm_0.66_1.00_0.01,enc_avif,quality_auto/alliteration.jpeg)
Tooling
I think I ported a stemming tool to Java or maybe I just optimized it for Java 7.​ It is over 11 years old. Github is not linking it to anything it might have been forked from which is strange. Being in Java, it probably still works.
I recommend tools like Python Natural Language ToolKit (NLTK) at this point. Not because they are better, but because they are more mainstream and therefore render code more re-usable and maintainable. Or use lemmatizers.
​
![stemming vs lemmatization.png](https://static.wixstatic.com/media/bb243c_0e0be8f524124404bb3bdc9fa943e173~mv2.png/v1/fill/w_358,h_199,al_c,q_85,usm_0.66_1.00_0.01,enc_avif,quality_auto/stemming%20vs%20lemmatization.png)