Posts Tagged ‘data’

And I have to start this essay with a simple statement that it is not lost on me that all of the above is 100% derived from my own history, studies, jobs, art works, and everything else that goes into me.  So maybe this is just a theory of myself or not even a theory, but yet another expression in a life time of expressions.   At the very least I enjoyed 20 hrs of re-reading some great science, crafting what I think is a pretty neat piece of art work, and then summarizing some pondering.   Then again, maybe I’ve made strides on some general abstract level.  In either case, it’s just another contingent reconfiguration of things.

At the end I present all the resources I read and consulted during the writing (but not editing) and the making of the embedded 19×24 inch drawing and ink painting (which has most of this essay written and drawn into it).   I drank 4 cups of coffee over 5 hrs, had 3 tacos and 6 hotwings during this process. Additionally I listened to “The Essential Philip Glass” while sometimes watching the movie “The Devil Wears Prada” and the latest SNL episode.


There is a core problem with all theories and theory at large – they are not The t=Truth and do not interact in the universe like the thing they refer to.   Theories are things unto themselves.  They are tools to help craft additional theories and to spur on revised dabbling in the world.

FullSizeRender (4)

We have concocted an unbelievable account of reality across religious, business, mathematical, political and scientific categories.  Immense stretches of imagination are required to connect the dots between the category theory of mathematics to radical behaviorism of psychology to machine learning in computer science to gravitational waves in cosmology to color theory in art.  The theories themselves have no easy bridge – logical, spiritual or even syntactically.

Furthering the challenge is the lack of coherence and interoperability of measurement and crafting tools.   We have forever had the challenge of information exchange between our engineered systems.   Even our most finely crafted gadgets and computers still suffer from data exchange corruption.   Even when we seem to find some useful notion about the world it is very difficult for us to transmit that notion across mediums, toolsets and brains.

And yet, therein lies the the reveal!

A simple, yet imaginative re-think provides immense power.   Consider everything as network.  Literally the simplest concept of a network – a set of nodes connected by edges.   Consider everything as part of a network, a subnetwork of the universe.  All subnetworks are connected more or less to the other subnetworks.   From massive stars to a single boson, all nodes in a network and those networks of networks.   Our theories are networks of language, logic, inference, experiment, context.  Our tools are just networks of metals, atoms, and light.   It’s not easy to replace your database of notions reinforced over the years with this simple idea.

But really ask yourself why that is so hard but you can believe that blackholes collide and send out gravitational waves that slightly wobble spacetime 1.3 billion light years away or if you believe in the Christian God, consider how that’s believable and that woman was created from a guy named Adam’s rib.    It’s all a bit far fetched but we buy these other explanations because the large network of culture and tradition and language and semiotics has built our brains/worldviews up this way.

Long ago we learned that our senses are clever biological interpreters of internal and external context.  Our eyes do not see most of “reality” – just a pretty course (30 frames per second) and small chunk of electromagnetic waves (visible light).   in the 1930s we learned that even mathematics itself and the computers we’d eventually construct can not prove many of the claims they will make, we just have to accept those claims. (incompleteness and halting problem.).

These are not flaws in our current understanding or current abilities.  These are fundamental features of reality – any reality at all.  In fact, without this incompleteness and clever loose interpretations of information between networks there would be no reality at all – no existence.   This is a claim to return to later.

In all theories at the core we are always left with uncertainty and probability statements.   We cannot state or refer to anything for certain, we can only claim some confidence that what we’re claiming or observing might, more or less, be a real effect or relation.   Even in mathematics with some of the simplest theorems and their logical proofs we must assume axioms we cannot prove – and while that’s an immensely useful trick it certainly doesn’t imply that any of the axioms are actually true and refer to anything that is true or real.

The notion of probability and uncertainty is no easy subject either.   Probability is a measure of what?   It is a measure belief (Bayes) that something will happen given something else?  Is it a measure of lack of information – this claim is only X% of the information?  Is it a measure of complexity?


Again, the notion of networks is incredibly helpful.  Probability is a measure of contingency.   Contingency, defined and used here, is a notion of connectivity of a network and nodes within the network.  There need be no hard and fast assignment of the unit of contingency – different measures are useful and instructive for different applications.  There’s a basic notion at the heart of all of them: contingency is a cost function of going from a configuration to another configuration of the network.

And that leads to another startling idea.   Spacetime itself is just a network.  (obvious intuition from my previous statement) and everything is really just a spacetime network.    Time is not the ticks on a clock nor an arrow marching forward.  Time is nothing but a measure of steps to reconfigure a network from state A to some state B.   Reconfiguration steps are not done in time, they are time itself.

(most of my initial thinking comes from Wolfram and others working on this long before my thinking about it: http://blog.stephenwolfram.com/2015/12/what-is-spacetime-really/ – Wolfram and others have done a ton of heavy lifting to translate the accepted theories and math into network terms).

This re-framing of everything into network thinking requires a huge amount of translation of notions of waves, light, gravity, mass, fields, etc into network conventions.  While attempting to do that in blog form is fun and I’ve attempted to keep doing it, the reality of the task is that no amount of writing about this stuff will make a sufficient proof or even useful explanation of the idea to people.

Luckily, it occurred to me (a contingent network myself!) that everyone is already doing this translation and even more startling it couldn’t go any other way.   Our values and traditions started to be codified into explicit networks with the advent of written law and various cultural institutions like religion and formal education.   Our communities have now been codified into networks by online social networks.  Our location and travels have been codified by GPS satellites and online mapping services.  Our theories and knowledge are being codified into Wikis, Programs (Wolfram Alpha, Google Graph, Deep Learning networks, etc).   Our physical interpretations of the world have been codified into fine arts, pop arts, movies and now virtual and augmented realities.   Our inner events/context are being codified by wearable technologies.    And now the cosmos has unlocked gravitational waves for us so even the mystery of black holes and dark matter will start being codified into knowledge systems.

It’s worth a few thoughts about Light, Gravity, Forces, Fields, Behavior, Computation.

  • Light (electromagnetic wave-particles) is the subnetwork encoding the total configurations of the entire universe and every subnetwork.
  • Gravity (and gravitational wave-particles) is the subnetwork of how all the subnetworks over a certain contingency level (mass) are connected.
  • Other 3 fundamental Forces (electromagnetics, weak nuclear, strong nuclear) are also just subnetworks encoding how all subatomic particles are connected.
  • Field is just another term for network, hardly worth a mention.
  • Behavior observations are partially encoded subnetworks of the connections between subnetworks.  They do not encode the entirety of a connection except for the smallest, most simple networks.
  • Computation is time is the instruction set is a network encoding how to transform one subnetwork to another subnetwork.

These re-framed concepts allow us to move across phenomenal categories and up and down levels of scale and measurement fidelity.  They open up improved ways of connecting the dots between cross-category experiments and theories.   Consider radical behaviorism and schedules of reinforcement combined with the Probably Approximately Correct learning theory in computer science against a notion of light and gravity and contingency as defined above.

What we find is that learning and behavior based on schedules of reinforcement is actually the only way a subnetwork (say, a person) and a network of subnetworks (a community) could encode the vast contingent network (internal and external environments, etc).   Some schedules of reinforcement maintain responses better than others, and again here we find the explanation.  Consider a Variable Ratio schedule reinforcing a network.  (see here for more details: https://en.wikipedia.org/wiki/Reinforcement#Intermittent_reinforcement.3B_schedules).   A variable ratio (a variations/compositions on this) schedule is a richer contingent network itself that say a fixed ratio network.  That is, as a network encoding information between networks (essentially a computer program and data) the variable ratio has more algorithmic content to keep associations linked after many related network configurations.

Not surprisingly this is exactly the notion of gravity explained above.  Richer, more complex networks with richer connections to other subnetworks have much more gravity – that is they attract more subnetworks to connect.  They literally curve spacetime.

To add another wrinkle in theory, it has been observed in a variety of categories that the universe seems to prefer computational efficiency.  Nearly all scientific disciplines from linguistics to evolutionary biology to physics to chemistry to logic end up with some basic notion of “Path of Least Effort” (https://en.wikipedia.org/wiki/Principle_of_least_effort).  In the space of all possible contingent situations networks tend to connect in the computationally most efficient way – they encode each other efficiently.  That is not to say it happens that way all the time.  In fact, this idea led me to thinking that while all configurations of subnetworks exist, the most commonly observed ones (I use a term: robust) are the efficient configurations.  I postulate this explains mathematical constructs such as the Platonic solids and transcendental numbers and likely the physic constants.  That is, in the space of all possible things, the mean of the distribution of robust things are the mathematical abstractions.  While we rarely experience a perfect circle, we experience many variations on robust circular things… and right now the middle of them is the perfect circle.


Now, what is probably the most bizarre idea of all:  nothing is actually happening at the level of the universe nor at the level of a photon.  The universe just is.   A photon, which is just a single massless node, everything happens to it all at once, so nothing happens.

That’s right, despite all the words and definitions above with all the connotations of behavior and movement and spacetime… experience and happening and events and steps and reconfigurations are actually just illusions, in a sense, of subnetworks describing other subnetworks.   The totality of the universe includes every possible reconfiguration of the universe – which obviously includes all theories, all explanations, all logics, all computations, all behavior, all schedules in a cross product of each other.   No subnetwork is doing anything at all, it simply IS and is that subnetwork within the specific configuration of universe as part of the wider set of the whole.

This sounds CRAZY.   until you look back on the history of ideas, this notion has come up over and over regardless of the starting point, the condition of the observational tools, the fads of language and business of the day.  It is even observable in how so many systems “develop” first as “concrete” physical, sensory things… they end up yielding time and time again to what we call the virtual – strangely looping recursive networks.   Here I am not contradicting myself, instead… this is what exists within the fractal nature of the universe (multiverse!) it is self similar all the way up and down scales and across all configurations (histories).

Theories tend to be ignored unless they are useful.   I cannot claim utility for everyone on this theory.  I do find it helpful for myself in moving between disciplines and not getting trapped in syntactical problems.   I find confirmation of my own cognitive bias in the fact that the technology of loosely connecting the dots like GPS, hyperlinks, search engine, social media, citation analysis, Bayes, and now deep learning/PAC have yielded tremendous expansion of information and re-imaging of the world.


Currency, writing, art, music are not concrete physical needs and yet they mediate our labor, property, government, nation states.   Even things we consider “concrete” like food and water are just encodings of various configurations.  Food can be redefined in many ways and has been over the eons as our abstracted associations drift.   Water seems like a concrete requirement for us, but us is under constant redefinition.  Should people succeed in creating human-like (however you define it) in computers or the Internet it’s not clear water would be any more concrete than solar power, etc.

Then again, if I believe anything I’ve said above, it all already exists and always has.




Chaitin on Algorithmic Information, just a math of networks.

Platonic solids are just networks

Real World Fractal Networks

Correlation for Network Connectivity Measures

Various Measurements in Transport Networks (Networks in general)

Brownian Motion, the network of particles

Semantic Networks


Probably Approximately Correct

Probability Waves

Bayes Theorem


Locality of physics

Complexity in economics


Gravity is not a network phenomenon?

Gravity is a network phenomenon?

Useful reframing/rethinking Gravity

Social networks and fields

Cause and effect

Human Decision Making with Concrete and Abstract Rewards

The Internet

Read Full Post »

From within the strange loop of self-reference the question “What is Data?” emerges.  Ok, maybe more practically the question arises from our technologically advancing world where data is everywhere, spouting from everything.  We claim to have a “data science” and now operate “big data” and have evolving laws about data collection and data use.   Quite an intellectual infrastructure for something that lacks identity or even a remotely robust and reliable definition.  Should we entrust our understanding and experience of the world to this infrastructure?   This question seems stupid and ignorant.  However, we have taken up a confused approach in all aspects of our lives by putting data ontologically on the same level as real, physical, actual stuff.    So now the question must be asked and must be answered and its implications drawn out.

Data is and Data is not.   Data is not data.   Data is not the thing the data represents or is attached to.   Data is but a ephemeral puff of exhaust from an limitless, unknowable universe of things and their relations. Let us explore.

Observe a few definitions and usage patterns:

Data According to Google

Data According to Google


The latin roots point to the looming mystery.  “Give” -> “Something Given”.   Even back in history data was “something”.   Almost an anti-definition.

Perhaps we can find clues from clues:

Crossword Puzzle Clues for

Crossword Puzzle Clues for “Data”


Has there been a crossword puzzle word with broader or more ambiguity than that?   “Food for thought?”  seems to hit the nail on the head.   The clues boil down to data is: numbers, holdings, information, facts, figures, fodder, food, grist, bits.   Sometimes crunched and processed, sometimes raw.  Food for thoughts, disks, banks, charts and computers.


Youtube usually can tell us anything, here’s a video directly answering What Is Data:

Strong start in that video, Qualitative and Quantitative… and then by the end the video unwinds the definitions to include basically everything.

Maybe a technical lesson on data types will help elucidate the situation:

Data Types

Perhaps sticking to computers as a frame of reference helps us.   Data is stuff stored in a database specified by data types.  What exactly is stored?   Bits on a magnetic or electric device (hard drive or memory chip) are arranged according to structure defined by this “data” which is defined or created or detected by sensors and programs…   So is the data the bit?  the electric symbol?  the magnetic structures on the disk?  a pure idea regardless of physical substrate?

The confusing self-referential nature of the situation is wonderfully exploited by Tupper’s formula:

Tupper's formula


What exactly is that?  it’s a pixel rendering (bits in memory turned into electrons shot a screen or LED excitations) of a formula (which is a collection of symbols) that when fed through a brain or a computer programmed by a brain end up producing a picture of a formula….

The further we dig the less convergence we seem to have.   Yet we have a “data science” in the world and employ “data scientists” and we tell each other to “look at the data” to figure out “the truth.”

Sometimes philosophy is useful in such confusing situations:

Information is notoriously a polymorphic phenomenon and a polysemantic concept so, as an explicandum, it can be associated with several explanations, depending on the level of abstraction adopted and the cluster of requirements and desiderata orientating a theory.


Er, that doesn’t seem like a convergence.  By all means we should read that entire essay, it’s certainly full of data.

Ok, maybe someone can define Data Science and in that we can figure out what is being studied:


That’s a really long article that points to data science as a duct taped loosely linked set of tools, processes, disciplines, activities to turn data into products and tell stories.   There’s clearly no simple definition or identification of the actual substance of data found there or in any other description of data science readily available.

There’s a certain impossibility of definition and identification looming.   Data isn’t something concrete.  It’s “of” everything.  It appears to be a shadowy representational trace of phenomena and relations and objects that is itself encoded in phenomena and relations and objects.

There’s a wonderful aside in the great book “Things to Make and Do in the Fourth Dimension” by Matt Parker

Finite Nature of Data

Finite Nature of Data


Data seems to have a finite, discrete property to it and yet is still very slippery.  It is reductive – a compression of the infinite patterns in the universe, it is also a pattern. Compressed traces of actual things.   Data is wisps of existence, a subset of existence.   Data is an optical and sensory illusion that is an artifact of the limitedness of the sensor and irreducibility of connections between things.

Data is not a thing.   It is of things, about things, traces of things, made up of things.

There can be no data science.   There is no scientific method possible.   Science is done with data, but cannot be done on data.  One doesn’t do experiments on data, experiments emit and transcode data, but data itself cannot be experimental.

Data is art.   Data is an interpretive literature.  It is a mathematics – an infinite regress of finite compressions.

Data is undefined and belongs in the set of unexplainables: art, infinity, time, being, event.

Data = Art Data = Art

Read Full Post »

We all are programmers.   And I want to explain what programming really is.  Most people think of it as a writing instructions that a computer will then interpret and go do what those instructions say.   In only the simplest sense is this fully encompassing of what programming is.


Programming in the broadest sense is a search through computational universe for interesting patterns that can be interpreted by other patterns.   A few definitions are in order.   A pattern is simply some set of data pulled from the computational universe (from my own investigations/research/logic everything is computational).  Thus a pattern could be a sentence of English words or a fragment of a program written in Java or DNA strands or a painting or anything else.   Some patterns are able to interact with other patterns (information processing) such as a laptop computer can interpret Microsoft Office documents or a replicated set of DNA (a human) can interpret Shakespeare and put on a play.   A program is simply a pattern that interacts with other programs.


When we write programs we are simply searching through the space of symbolic representations in whatever programming language.   When a program doesn’t work/doesn’t do what we want, we haven’t found a pattern of symbols that’s interpreted the way we prefer or the processing pattern can interpret.  We sometimes call that “bugs” in the software.   Underneath it all it’s simply another program, just not the one we want.


I call it a search to bring particular activities to mind.  When we say we write a program or create a program it seems to engender only a limited set of methods to find programs by a limited set of people, called programmers.   Calling it a search reflects reality AND opens our eyes to the infinite number of ways to find interesting patterns and to interpret them.   The space of programs is “out there”, we just have to mine it for the programs/patterns we wish to interpret.


Programs/patterns that become widely used owe that use to the frequency that those patterns can be interpreted.  For example, Windows or MacOS have billions of interpreting machines in which their programs can be interpreted.   Or on an even bigger scale, DNA “programs” have trillions of interpreters on just this planet alone.


Using a program is nothing more than interpreting it.  When you type a document in MS Word the OS is interpreting your keystrokes, refreshing the screen with pixels that represent your words, all while MS word itself is checking for grammar put in place by programmers who interpreted a grammar reference and so on and so on.   For sufficiently complex programs we aren’t able to say if a program “does the right thing.”.  Only simple programs are completely verifiable.   This is why programs exist only as patterns that are interpreted.


Humans have become adept at interpreting patterns most useful for the survival of human genes.  With the advent of digital computers and related patterns (tech) we are now able to go beyond the basic survival of our genes and instead mine for other patterns that are “interesting” and interpretable by all sorts of interpreters.  I don’t know where the line is on good for survival and not, but it’s really not a useful point here.  My point is that with computers we’re able to just let machines go mining the space of existence in much grander ways and interpreting those results.   Obvious examples include the SETI project mining for signs of aliens, LHC mining the space of particle collisions, Google search mining the space of webpages and now human roadways, Facebook mining everyone’s social graph and so on.  Non obvious examples include artists mining the space of perceptively interesting things, doctors mining the space of symptoms, and businesses mining the space of sellable products and so on.


Let me consider in a little more detail that last one.  Every business is a program.  It’s a pattern (a pattern of patterns) interpreting the patterns closest to it (competition and the industry) and finding patterns for its customers (persons or government or companies or other patterns) to buy (currency is just patterns interpreted).   Perhaps before computers and the explosion of “digital information” it wasn’t so obvious this is what it is.  But now that so much of the world is now digital and electronic how many businesses actually deal with physical goods and paper money?  How many businesses have ever seen all their employees or customers?  How many businesses exist really only has brief “ideas”?   What are all these businesses if not simply patterns of information interpreted as “valuable?”.  And isn’t every business at this point basically coming down to how much data it can amass and interpret better/more efficiently than the competition? How are businesses funded other than algorithmic trading algorithms trading the stock market in high frequency making banks and VCs wealthy so their analysts can train their models to identify the next program, er, business to invest in…..


When you get down to it, everything is programming.  Everything we do in life, every experience is programming.  Patterns interpreting patterns.  


The implications of this are quite broad.   This is why I claim the next major “innovation” we all will really notice is an incredible leap in the capability of “programming languages”.   I don’t know exactly what they will look or feel like but as the general population desires to have more programmability of the world in a “digital” or what I call “abstract” way the programming languages will have to become patterns themselves that are generally more easily interpreted (written by anyone!).   The more the stuff we buy and sell is pure information (think of a future in which we’re all just trading software and 3d printer object designs (which is what industrial manufacturers basically do)) the more we all will not want to wait for someone else to reprogram the world around us, we all will want to do it.   Education, health care, transportation, living, etc. is all becoming more and more modular and interchangeable, like little chunks of programs (usually called libraries or plugins).   So all these things we traditionally think as “the real world” are actually becoming little patterns we swap in and out of.  Consider how many of you have taken an uber from your phone, stayed at an airbnb, order an eBook from amazon, sent digital happy birthday and so on…. Everything is becoming a symbolic representation more and more easily programmed to be just how we want.


And so this is why big data is all the rage.  Not because it’s a cool fad or some new tech thing… it’s because it’s the ONLY THING.   All of these “patterns” and “programs” I’m talking about taken on the whole are just the SPACE OF DATA for us to mine all the patterns.   The biggest program of all is EVERYTHING in EXISTENCE.  On a smaller scale the more complicated and complex a program is the more it looks indistinguishable from a huge pile of data.   The more clever we find our devices the more it turns out that it’s an inseparable entanglement of data and programs (think of the autospell on your phone… when it messes up your spelling it’s just the data it’s gleaned from you….).  Data = programs.  Programs = data.   Patterns = patterns.   Our world is becoming a giant abstract ball of data, sometimes we’re symbolizing but more and more often we’re able to directly compute (interpret natively/without translation) with objects as they exist (genetic modification, quantum computing, wetware, etc.).   In either case it’s all equivalent… only now we’re becoming aware of this equivalence if not in “mind” then in behavior or what we expect to be able to do.


Face it. You are a programmer.   and you are big data.

Read Full Post »

It’s very evident to me that businesses, organizations and individuals who don’t handle data well (i’ll define that shortly) don’t end up making any difference (traffic, profit, buzz…).

yeah, that’s probably not intellectual news to anyone.   really, though, how many people really handle data well?

Here are some common samples bad analysis, bad data, bad labeling, bad process:

  • VCs seriously consider 3 year pro-formas on businesses that have yet to produce or sell a single unit
  • Ad Agencies blatantly ignore sources of traffic when reporting to their clients
  • The whole media world pays attention to comscore, nielsen (and some even alexa!)
  • Product managers never track down baselines and expectations
  • Ad sales teams routinely ignore inventory levels
  • Marketers talk about “brand value”
  • dotcoms install 5 or 6 tracking mechanisms and never sync them
  • analysts/bi people start analysis with false assumptions or no assumptions
  • home buyers don’t calculate property taxes or relative market value of their home
  • employees generally don’t consider all implications of FSA and 401k contributions when consider real take home pay
  • employers evaluate employees on qualities and skills not results
  • traditional resumes feature dates and objectives not results and plans
  • dow = market to general public
  • subprime is word of the year
  • “backing into” a model is a well honed practice in most executive offices
  • Music labels pay attention to “money lost to piracy”

There are an infinite number of anecdotes on fishy data analysis.

For those that want actual facts – here’s how I know data analysis is a problem in industry and society:

Ok, ok.  I’ve done a good job of pointing out horrible data analysis and lots of fun factoids but I haven’t demonstrated why poor analysis diminishes opportunities.

First, let me explain my qualifications for “good analysis”:

  • data should be collected and analyzed in an appropriate timeframe (don’t take 10 years to graduate!)
  • Make a clear statement of analytic objective and methods is a must
  • The accuracy and depth of data and analysis should be relative to the importance of the subject matter
  • prediction of human behavior is impossible, avoid absolutists statements
  • explain relationships between variables, avoid overbearing causation arguments
  • check and recheck (1 set of eyes is not enough)
  • qualitative research should always accompany quantitative, vice versa
  • ask more questions

With some of those key statements established, i can now draw out why people and orgs miss out or flat out make huge mistakes so often.

[I will do so in a forthcoming post!]

Read Full Post »