
Thursday Seeds | Data that Samples
Let me preface this post with an important point: I’m not a data scientist. I do have training in systems and processes in my industrial engineering back ground. The beauty of this is that I have a different view point than many classically trained scientists. They could look at my article and scoff their heads off. To each their own. But this is almost the very point of the last article that I posted regarding data: Graphers that POV.
My view point could be valued to the point that it could change the way we look at certain given types of data. It’s curious to think about how an algorithm can process our whole book that we wrote and determine specific things about it. (Was it created by chatgpt might be one of those questions answered.) What is even more amazing is to understand the structure and benefits of a knowledge graph that in its schema, it documents the relationships of certain data that is almost in graphical form it self.
There are humans that learn in a text to oral manner best. Other humans are graphical by nature. (Nerds that diagram sentences or excel in understanding graphs or geometry.) These second kinds of learners tend to get lost in the academia unless they adapt to the text to oral manner of learning. What if we changed all that? These graph databases could be a beginning to this. What would the young learners think of them? I can’t wait to find out.
In today’s day and age of big data that is captured real time everywhere, we should know a little bit about why algorithms are doing certain things. What even is a large language model or LLM? How many kinds are they? And are they all words?
The idea of taking a lot of different kinds of data from several sources that are not simple to correlate or to plain relate to and placing them in a succinct manner that anyone can understand is what every analytic analyst is trying to do for senior management.
How do you do you relate to the single word or words unless you know a full sentence or a full concept? And then there are different points of view of the concept when the subject is story. And it could be that everything is actually a story. One term for these kinds of data is unstructured. And to get this data to connect in a way that is useful for interpretations means we need some computing help. It’s a lot of 0’s and 1’s to keep track of.
Can our whole lives measure down to 0 and 1? Not exactly. But we can use those two numbers to arrive at some conclusions that could astonish. The patterns are what we are looking for. We need to be able to slice and dice the realtime data into bite size pieces that make sense and tell the story from however many points of view we need them to. Spreadsheets with layers is not enough. Some of this data won’t sit nicely in a spreadsheet.
Here is an example of a large language model that deals with vectors. The beauty of vectors is that they can correlate in a way that is non-linear where a word is a word is a word but also a concept and pattern.
Example for the algorithm Word2Vec. I’m not going to explain much but give you a few details. A few sentences are now related in a table with a series of 0 and 1.
Consider these 4 following sentences:
I love apple
I like mango
I love cat
I have dog
Now..let’s give this “I” sentence a name. This is Maggy speaking. She loves apple, likes mango, loves her cat, and has a dog.
Maggy has a name. She is the center of this knowledge graph. Don’t forget her in all those 0’s and 1’s She is the reason we are doing this.
She is the story. She gives life to the story and it changes how you look at this data.
More to come….

