In my early discussions and presentations regarding Wolfram|Alpha I often used Computational Journalism as the initial non-engineering use case.   Most folks weren’t quite sure what I meant initially by Computational Journalism until I explained how, as a toe in the water step, one could easily and automatically enhance articles and features with generated knowledge and visuals.   It seems I won’t need to explain in great depth the utility and inevitability of computational journalism because enough conference summaries, op-eds and journalists are starting to popularize the concept.

Here’s a great piece from PBS.

A new set of tools would help reporters find patterns in otherwise unstructured or unsearchable information. For instance, the Obama administration posted letters from dozens of interest groups providing advice on issues, but the letters were not searchable. A text-extraction tool would allow reporters to feed PDF documents into a Web service and return a version that could be indexed and searched. The software might also make it easy to tag documents with metadata such as people’s names, places and dates. Another idea is to improve automatic transcription software for audio and video files, often available (but not transcribed) for government meetings and many court hearings.

Wired UK goes a bit deeper into some specific companies and projects.

And here’s a nice presentation by Kurt Cagle that gives a good overview of some of the computational foundational technology out there.

I don’t think it’s unreasonable to think that the vast majority of daily news will be completely machine generated and machine broadcast.  Journalists will be increasingly involved in bigger, deeper features and defining the computational logic to generate the news stream.

This article on estate taxes came across my email inbox today, from WSJ:

Under current laws in effect until the end of this year, the size of the exemption is $3.5 million per individual or up to $7 million per couple. The tax is slated to disappear entirely on Jan 1.

But estate planning in 2010 will be complicated by a new twist: a complex tax on capital gains, levied at death, that will affect a broader swath of taxpayers. The estate tax is scheduled to return in 2011 at a 55% rate with an exemption of slightly more than $1 million.

The looming lapse of the estate tax is presenting some families with unprecedented ethical quandaries.

“I have two clients on life support, and the families are struggling with whether to continue heroic measures for a few more days,” says Joshua Rubenstein, a lawyer with Katten Muchin Rosenman LLP in New York. “Do they want to live for the rest of their lives having made serious medical decisions based on estate-tax law?”

Let’s change the question a bit.  Can we calculate the price of another day of life now?  How much estate cash is at risk by expiring before Jan 1?   Probably could come up with a dollar figure for the estate holder and the medical team keeping folks alive.

Where does all this fit in some bigger sense of human nature?

This question, and its variants, might be the most common question asked in literature, storytelling, laws, history and philosophy (less so in daily conversations!).  This question defies an answer not because it is too complicated or out of our reach.  There is no such thing as Man (with a capital “M”), so the question is non-sense.

There is man – in the Linnaean taxonomy sense – you know, man is the creature with two hands, two feet, a biggish brain, two eyes and so on.  Though if we push hard enough on that – trace the evolutionary line back a couple of million years or push it forward a bit – we’ll find that pin pointing the precise animal known as “man” gets increasing hard to pin point.

This is definitely not a new idea or clever statement on my part.  I call attention to this in attempting to synthesize the impact of improving technology to augment our biological weaknesses, confusion over shifts in religious beliefs, global warming concerns, health care reform and other big things going on in our world that call into question some universal sense of Man.   My thesis is that clinging to a belief in Human Nature gets in the way of knowledge and impedes the progress of society on many fronts.  It is also can have grave consequences for each individual.

Cultures, societies, governments and various other collections of humans struggle to integrate big shifts within their lifetimes because learning is a long term exercise (some patterns of behavior take a lifetime to integrate).  The schedules we grow into throughout a lifetime are incredibly hard to change and sometimes require dramatic changes to the environment and/or our relation to it (body changes, for example).   It’s made every more difficult for most humans because our “blank slate” is so quickly filled with bad data, false assumptions, false positive patterns (aka superstition, religious dogma, good vs. evil, old wives tales, urban legends, irrational fears).   All of these things get associated with more and more behavior patterns very early and throughout life so much so that we all spend a life time UNLEARNING and DISASSOCIATING the falsehoods, inefficient behavior, and counter productive patterns.

The biggest false positive belief humans have is that there is Human Nature and definitive ideal of Man.  Our cultural narratives and norms claim that there is some Platonic form, some universal concept of Man and if we look hard enough, think deep enough, and/or believe enough we will understand Man and figure out how to really live.  This false positive concept of Man isn’t confined to religion or fading cultures – it pervades every modern institution too!   Top universities teach it (“liberal arts”).  Science chases it (google for scientific papers’ references to human nature).  Art celebrates it (the thinker!).  Churches preach it (man was made in the image of God).  Governments and courts enforce it (e.g. all men are created equal).  This belief is maintained over generations because it mostly “works” to keep people alive and procreating (at least, I think it does). A useful fiction, perhaps.  Truth, no.

If it ain’t broke, don’t fix it, right?

If we give up on Man what changes?  what contingencies go away?  what schedules are no longer maintained?

Does stem cell research pick up?  Do we march ever more quickly towards machine enhanced bodies and brains?  Do robots really start to pervade our workplaces? Would we really continue to worry so much about global warming destroying the sensitive environment we require?

How much does this false belief really change our behavior or is it just “exhaust” we spew out when trying to synthesize all the behavior around us?  That is, does a well defined and earnest belief in Man actually contribute to what we do or don’t do?

It’s an important discussion.

  • Health care reform tend to fall into two camps:  health care is a human right (Man is real and necessary) or health care is essentially an economic issue (Man is not relevant)
  • The penal system are built on a concept of perhaps not Universal Morality, but certainly a very strong concept of Character.
  • The debate on global warming rides on whether people believe the we should keep the earth at a stable temp for our current species biology (if we’re machines or just digitized versions or in space, global warming isn’t as concerning???)
  • Abortion rights are obviously about whether you think a bundle of cells in a woman’s body constitutes Man
  • End of Life decisions – is the life supported body still a Man when the lights have gone out?

Beyond these big issues consider many of the plots of recent pop culture smashes (all are about What is Man?):

  • Avatar
  • Terminator
  • Twilight
  • Heroes
  • Harry Potter
  • The Secret
  • Eckhart Tolle

If we lose the belief in Man (the soul, autonomous man, in God’s image, human nature) is there a negative impact personally and in society?  Do we all just become nihilists? Do we stop passionately pursuing things? do we devalue our relationships?

As Google grows bigger and deeper the op-eds and various critics are calling for hard core scrutiny and even regulation.

The latest piece I’ve come across is this rather drab call for “search neutrality” in the New York Times.

Without search neutrality rules to constrain Google’s competitive advantage, we may be heading toward a bleakly uniform world of Google Everything — Google Travel, Google Finance, Google Insurance, Google Real Estate, Google Telecoms and, of course, Google Books.


Does a consumer really have to use Google to find information on all these things?  No.   There are many much better providers of all those information sources.  Does Google actually make money directly on all those categories?  No.   Sure, people advertise on Google to bring people to transactions, but Google isn’t making money directly from the consumer on those ads.

The point of the op-ed relies on agreeing that Google is a gatekeeper to information access.  As a gatekeeper it unfairly restricts competition by promoting its own applications and information sources over third parties.  This is a false representation of Google.   Google is a search engine that a consumer may or may not choose to use.   It just so happens that millions of consumers choose to use Google and Google has negotiated enough deals to make it easier to choose Google.  However, hopping online does not require you to use Google at all.  There are many search engines, many mapping sites, many free email services, many in browser applications and so on.

Web search is not the ONLY way to find things online.  In fact it’s not even the number 1 way most people find information online.  Word of mouth via social networks, email, IMs is still the number 1 way people get to things online.   For websites and services that no one talks about/knows about Google is the number 1 people will find it.  That’s not a problem caused by big bad Google… in fact, the only reason businesses that can only be found via Google exist is, well, because of Google.

The author of the op-ed is a co-founder of a service called FoundEm, a price comparison site (and seller of its underlying technology).    Clearly, FoundEm has had some competitive issues with Google.  That happens.  And FoundEm should fight for its position in Google in any legal way.  However, I don’t think the experience of FoundEm is any way a justification for some regulation of Google in the form of enforced Neutrality.   Google pays to build a big fat index of the web and provide it free to consumers.  No where in that business does Google guarantee it’s the best, most authoritative source of information or way to find it.  It simple is useful enough to most people that they assume Google has it all.   Again, that’s not Google’s fault and Google should not be forced to include information and services it doesn’t think helps its clients and consumers.

There’s a more legitimate bone to pick with ISPs that hijack mistyped address and querystrings and send to advertiser only pages.  That’s an actual abuse of gatekeeper status – the consumer, in that case really doesn’t have a choice of information sources AND in many areas in the US there is only 1 ISP available.

Rather than picking on Google via regulation just out innovate them – in product and marketing.  Twitter, Facebook, Apple, Bing, LinkedIn and more have found ways to compete without Google.   In a world less and less about finding webpages and more about connecting useful information and synthesizing live data Google’s Web Search is losing relevance as a functional tool for users.  We’re a decade away from the market seeing that en masse, but it’s happening.  Web Search IS NOT a tractable problem long term and is constantly being thwarted by spam, new technologies, new presentation formats, the mobile world, and so forth.  The Google folks are very smart and forward thinking –  they are investing in NON SEARCH based products and services, knowing that the gravy train will run out eventually.

I mean think about it… what’s the web search market really worth?  Google is spinning off 20 billion in revenue, the other major competitors much less.   Let’s make a high estimate of $50 billion in direct revenue for the web search industry.  That’s not that big.  Barely a market at all in the grand scheme.  Perhaps what Google is doing is bigger than the revenues imply.  Maybe all the info they are collecting is much more of a scary thing that being a dominate #1 search engine.   Even by that measure google is probably less deep in its insights of important data compared to Facebook or even Yahoo!

Do I worry about Google?  Sure, personally I do with my own information.  As a company acting as an unfair monopoly, no, not at all.  I don’t have to use them.  I don’t have to buy ads on Google.  I can close my gmail account.  They don’t really even have aggressive retention methods like phone companies, insurance providers and ISPs (can’t cancel without 40 phone calls!).

Best way to beat a big business is to do what it grows too big to do – imagine and execute on that imagination.   Google can’t disrupt the gravy train – but small businesses can.   Build a great product, market aggressively and leave the regulation and activism to issues that really need it….

If been asked many times about the size of Facebook’s infrastructure.  Folks love to get a gauge of how much hardware/bandwidth is required to run high trafficked sites.

Here’s a recent report of the set up. Read the details there.  In short, 30,000 or so servers with tons of optimizations to networking, mysql, PHP, web server, and lots and lots of caching.

There’s an interesting point here.  30,000 servers to handle 300 million registers users and their 200 billion pageviews a month.  That puts about 7 million pageviews per server.   Almost every company I have worked with as WAY over built hardware and infrastructure.  I’ve seen people deploy new servers for every 100,000 pageviews per month.   Modern web servers and dbs, with the right set up, can handle far more load than most webmasters and IT folks realize.

One subtle point that’s hard to figure out from this data… the amount of compute/CPU time/power required to parse the metrics for this site.  Beyond serving the site up there’s a considerable amount of business intelligence to work through.  Logging and log parsing, without even the analysis part, has got to be a major effort not accounted for in these infrastructure details.

Seemingly random attacks and a shadowy, mysterious enemy are the hallmarks of insurgent wars, such as those being fought in Afghanistan and Iraq. Many social scientists, as well as the military, hold that, like conventional civil wars, these conflicts can’t be understood without considering local factors such as geography and politics. But a mathematical model published today in Nature (see Nature 462, 911–914; 2009) suggests that insurgencies have a common underlying pattern that may allow the timing of attacks and the number of casualties to be predicted.

A couple of issues to consider:

Say these researchers are “right”, what would we do then with a prediction of insurgency?  How would we prevent attacks and under what justification?

Does the data consider all failed or near insurgencies?

Are Nature and the authors beyond the scope of scientific research make an assertion that a model for human insurgency has been found before any real verification?

I really like this post on Good Math, Bad Math.

Beyond being mildly humorous in that cranky math person non-funny kinda way, it touches on lots of my favorite subjects: enumeration, Cantor, classic proofs, cranky math people.

The catch – and it’s a huge catch – is that the tree defines a representation, not an enumeration or mapping. As a representation, taken to infinity, it includes every possible real number. But that doesn’t mean that there’s a one-to-one correspondence between the natural numbers and the real numbers. There’s no one-to-one correspondence between the natural numbers and the nodes of this infinite tree. It doesn’t escape Cantor’s diagonalization. It just replaces “real number” with “node of this infinite tree”. The infinite tree contains uncountably many values – there’s a one-to-one correspondence between nodes of the infi To see the distinction, let’s look at it as an enumeration. In an enumeration of a set, there will be some finite point in time at which any member of the set will be emitted by the enumeration. So when will you get to 1/3rd, which has no finite representation as a base-10 decimal? When will you get to π?


I find simple equations sometimes help frame an opportunity.
In the case of software and media companies I have a very basic formula to gauge an opportunity that goes something like this…
M = Maximum possible number of users (consumers or members) a business could capture if they had 100% of the market
C = Average Cost to acquire a user
D = Research and Development cost to develop initial software or media property
L = Lifetime value of the user (can use advertising CPMs, licensing fees, subscrition rates and lengths)
S = seed money or capital to attempt the business
R = Likely Top Market Share Attained (typically not more than 15%)
M * L = Maximum Revenue Lifetime of the Business
D + (C * M) = Maximum Cost to Deliver Maximum Revenue
MaxRev – MaxCost = MaxProfitLoss
You can repeat this exercise with R instead of M to get the realistic model.
I like to make some guess as to how fast a business can get to that max so that I know the rev per year and what not.  Yes, this is a trivial calculation but I think it’s a really useful rule of thumb formula for sizing up an opportunity in software and media.
The key to this equation is estimated M accurately (usually means being very honest with your market).  It’s not too tough with todays tools and open data to get a good look at demographics, buying histories, competitors and so forth.
Note that I make no attempt to account for market valuations and all that.  That way of thinking is usually a chasing after the wind.
Why does this equation help?
Well, the main point is that it gives me a great sense of scope.  Many of the media properties and software products out there cost a ton of money and have mediocre maximum markets or very low lifetime value.  Many businesses are eager to create a killer app but don’t have a grasp of what the scope of a killer app really has to be and/or they grossly underestimate how hard it is to make something.
What I find with this equation is that there’s a sweet spot in media and software.  If you optimize this equation you find that you can’t make software or media that’s too esoteric or complicated to make nor can you make complete fluff.   If you want to make something that appeals to everyone on the planet (not possible) it’s going to cost a lot and the cost to acquire users will be very high… so with this simple equation you learn you’ll be at it for a long time.  On the other hand if you want to make a high end product for a niche, you’ll find that the overall opportunity might not be that big.
Again, this is hardly rocket science, sound economic theory or anything… simple napkin math.
What it doesn’t capture, but hints at is the gross mis-estimation a good deal of entreprenuers make – software and media is more art than science and can very quickly turn into something intractable.
A few other considerations I’ve accumulated over the years in and out of businesses:
  • If your media or software is uber mass market, the big guys are just going to make it and give it away.
  • If your project takes too long (a year or more), you’re going to have many more competitors working way faster than you before you ship.
  • A killer app or killer media property is often not the thing you set out to make, it’s usually the mistake, the tangent, the oddball idea.
  • More capital doesn’t improve the chances.  Capital only helps to scale once something is built, for MOST projects.
Perhaps you’ll find this useful as you move into 2010 and kick start your projects!

