Feeds:
Posts
Comments

Archive for July, 2008

Glenn Beck put out a commentary today that is really sadly apathetic. I get his argument,.  His conclusion is weak though.  

“But with more information, and more candidates than ever before, I find myself in some ways less interested. With no clear answer for what’s best for the country, part of me has a strong desire to just withdraw from it all. Washington is so eternally and impossibly mangled, even if I found the perfect candidate who agreed with everything I believe in, would I be dumb enough to think that they wouldn’t fall victim to the beltway? It’s like going to see a Ben Affleck movie: I’m walking in knowing I’m going to be disappointed.”

Mr. Beck, will never find a candidate who believes everything he believes and there’s no possible way that a politician won’t be “a victim to the beltway”.  Every living thing is profoundly impacted and shaped by its environment (physical, cultural, intellectual, political…).   That’s no reason to withdraw from it all, a certainly not during this election vs. any other – it’s always been the truth that no candidate can fully represent us nor remain unaffected by the political setup.  

Besides Mr. Beck makes his living by people remaining involved.  If everyone withdrew or if it were obvious who to vote for why would anyone watch a political show?  Mr. Beck has to stay involved for his living depends on it and that’s as good a reason as any!

As for who one should vote for… just vote.  As long as people vote and stay engaged, things appear to change.  It’s one’s involvement that matters in the broader world and to the individual.  An individual learns and grows by engaging and struggling.  The broad system learns and grows the more its parts learn and grow.  The less engaged individual probably doesn’t see those changes as much.

I’m not being rah! rah! let’s vote, cheerleader type.  Mr. Beck raised the question of who he should vote for and he answered it by saying “let’s just withdraw”.  I’m suggesting VOTING is what matters, not WHO YOU VOTE FOR.  Discussion is what matters, not just the content you discuss.

Read Full Post »

This morning’s web 2.0, vc, internety buzz is all about this new search engine, Cuil. Even CNN devoted the top space to it.

Here’s some more fan fare on TechCrunch.

Cuil’s front end is built with CherryPy.

This search engine is not good.

Touting the size of your index is a bit like pitching the world on GHz for chips or telling people what jet engine you use for the FedEx planes.  No one cares.  Does the product work?

Can I find what I’m looking for or be surprised by what I didn’t know I was looking for?

Cuil doesn’t work.  Usually it doesn’t bother me when a new internet product doesn’t work.  It’s the web.  Fly be free.  So what’s my beef here? This engine is getting major buzz and its a complete let down.  I want to believe.

Search Logic:

Try typing basic queries like restaurants in Chicago, then use their sidebar to drill down.  Ask.com works better than this.

Try your name.  If you have even a slightly non-obvious name you’ll get 0 results.

Try SOCRATES.  For goodness sake, SOCRATES.  0 results.

Layout:

Everyone tries alternative layouts for search.  I’ve seen and built so many myself.  Why? Perhaps if it looks different it will perform differently? Perhaps a different look will make it seem better? newer? improved?  Generally it boils down to “Google has been this way for 10 years it’s time for a change.”

I posit that most new search engine builders don’t actually look at the user behavior though.  If they did, they would never try these new layouts.  There’s simply no demand for them.  Users are confused by them. And the information these new engines show case doesn’t require a new layout.

Images, in non image based searches are highly distracting.  Most of the thumbnails an automated crawler picks up will not index correctly for the search at hand.  So the results relevance, visually, is diminished.  Images attract the eye too much and provide too little directional cues to drive a click (and you need clicks for the user to get the payoff).

The grid layout versus the rank ordered list also provides no cues for the user on which links to explore first.  The grid is easier to visually scan the whole set but again this reduces the click behavior of the user because they explore through scanning versus clicking.

The lack of bolded words in the title and URL is pretty bad too.  They took away yet another cue for the user to click.  Users don’t read on screen.  They rapidly scan, bouncing back and forth between interesting representations on the screen.  Without the bolded words, the eyes have no anchor points within the actual results.

Let me explain that point.  Finding the best results is only a small part of what makes a successful search engine.  Users need to visit the sites a search engine returns.  The only way to validate the relevance of the search results and to get what a user actually wants is to go to the sites.  Users often will find what they want and return to the results more satisfied.  Creating and interface that delays that behavior reduces the result relevance (perception) without having anything to do with the actual algorithms!

The tabbed results are interesting and tabs are a well known visual element now.  However, sometimes the tab contents bounce around and its slightly unnerving.  Worse using the tabs often produces worse results.  This hurts the experience because the user isn’t forced into producing a new search behavior – essentially they never shake off the bad initial query.  Instead the tabs are used in one “continuous” behavior and provide feedback to the initial search behavior which might lead to the user never trying a better query.

Pagination, results numbers and other numeric cues.  These are fairly inconsistent – that is, often the number of results bounce around page to page within the same result set.  Hard to trust the results when you have numbers changing and the layout already keeps you off balance.

Business:

If they did find a cheaper way to scale, that’s interesting.  If they can really handle query massaging better than google, that’s interesting.  This experiment might be worth a lot just for those two facets alone.  The user won’t appreciate those in the least, but acquirers and techs will.

Conclusions:

I think this is getting press partly because the media loves telling an underdog story and we love to take on the incumbent.  That’s great.  No harm in that.

Heck, there’s no harm if people think Cuil is great when it isn’t, except for Cuil and the folks who put in $33mil.  If they want to be a contender and not a “so what” like Clusty, Ask and all the other non-yahoo/non-googles they better keep it real.  If they had product bravery they would kill off access to google within their offices so they were forced to use Cuil for everything.  I’ve worked at a lot of search engines and not a single company was willing to kill off access to google and completely entrust their product to find what they need.

The pitch of privacy as a good reason to use Cuil is folly.  Few users hold back using google because it tracks you. No one stops using Facebook because your data flows freely.  These products deliver value.  If your pitch to users for using the search engine is not 100% because the search kicks ass you’re going to lose.  The behavior will be disconnected from the consequences and when that happens you have no hope of keeping the users searching.

I didn’t even get into the business model, advertising thing and how that is a huge feedback loop for google.  That’s for a later post.

Give me $33 mil and I’ll show you a search engine that has a shot 😉

No, really, if you have $33 mil and you want to try something that has a chance, call.

Read Full Post »

Well I’m in the process of getting somewhere finally, let’s put it that way.  My initial efforts were piss poor with only slightly interesting improvements using some ad hoc messing with means and brute force approaches.  Nothing I did would get me into the leader board.

Of course, I could just take what all the leaders have done and build on that.  Some teams have done that.  I think there’s something more simple to do and finding that more simple thing is hard when you are taking others complicated models as your starting point (you pick up all their shortcomings too!).  I’m starting with the most basic systems I can and searching through various simple tweaks.  Also, I want a system that will make sense to me if I have to put this on the shelf for a week or two to get other work done.

What I’ve done:

I use the pyflix package to handle the basic dataset and algorithm framework.

I borrowed the basic weighted KNN clustering algorithm from O’Reilly publishing/Toby Segaran’s book “Collective Intelligence”

I’ve connected the Pythonika package to Mathematica 6 so that I can use Mathematica front end to work out algorithm variations and visualizations (Mathematica 6 has very good clustering algorithms and features that are far easier to play with than building your own or using some python, c, java library.  And, once speed is the key, it’s possible to do compiles and speed ups in mathematica, I digress).  Pythonika let’s me run python code (retrieve the data from the pyflix data store)

I used the python pickles from Ilya G. He did the hard work of creating feature vectors (coded descriptions) from IMBD data for things like release date, genre, cast and so forth.  This makes the KNN algorithms more interesting than just going off of the movie title, rating and year.

My basic outline so far of the algorithm system:

Create clusters of the data based on Genre, release year, number of ratings, director (as proxy for many other factors).  Further polish the clusters by user.

Add in some meta features of users like frequent rater, avid rater and so on for clustering the users.

Do multiple runs of rating predictions based on just movie clusters, user clusters and combinations with variations neighborhood sizes.

My particular challenges are:

Reducing the memory required to do trial runs on the data with the algorithms.

Reducing the code required to try new algorithms

Keeping track of all these different parts

I hope this is helpful for instructive.  It’s fun for me at the very least.

Read Full Post »

One of my great friends and confidants has pointed out, in a non-chiding way, that one of my favorite authors, Ben Stein, has been doing and saying some disturbing and annoying things lately. This phase of his career started to get weird with the movie he produced and starred in called “Expelled: No Intelligence Allowed.”

Six Things in Expelled That Ben Stein Doesn’t Want You to Know …

Apr 16, 2008 In the film Expelled: No Intelligence Allowed, narrator Ben Stein poses as a “rebel” willing to stand up to the scientific establishment in
http://www.sciam.com/article.cfm?id=six-things-benstein-doesnt-want-you-to-know – 159k – CachedSimilar pages

The Big Picture | Farewell To Ben Stein

Let us not forget that Ben Stein thinks that Nixon could have won the Vietnam War, and defeated the Kamar Rouge, but not for the dastardly deeds of Woodward
bigpicture.typepad.com/comments/2008/01/farewell-to-ben.html – 181k – CachedSimilar pages

Well, the movie was panned. It is not at all good or entertaining, or informative. But now his reinforcers come increasingly from Fox faculty and from those who have a lot to lose if things continue to move in several different directions…. In education, government, free speech, the courts, the reduction of the imperial presidency, science, etc.

He has some points of fairness but some of the reviews in Scientific America were scathing. He benefits by being on the shows of bull dogs ‘cause he is a known entity. But where is it going? He can’t be looking for another TV gig, can he??!! He has some agenda. Stay tuned…

The people that he, BS, pushes in academia have some good and some mediocre value to science… but, in a peer review based community they don’t do well at all so they don’t get the tenure tracks, the research $$, etc. The free speech argument is impotent if no one is listening! It’s like it was for the black panthers: People hear it but don’t listen twice… even if it was true.

Say that 5 or 15 scientist have a point…well, it is chalkboard science rather than electromicrosopy level stuff so the shock jocks are the only ones that BS gets to visit because a guest lecturer series at MIT would or U. of Kazakhstan would not be entertained. His movie assures that!

Free speech per se is not part of academic freedom any more than racial epitaphs are part of the schick in a comedy club. The guy that posed that World Trade center focus on 9/11 was retaliation for 120 years of subjugation and abuse in the Middle East by Westerners, from the U. of Colorado was bounced. Nothing to do with data… just research grants and academic respect.

It may be more about BS then the profs that question evolution. Like Darwin work churned for 150 years to get traction, these belly-ache-ers are being used by BS and will have to fight to make their point. A media blitz works against it, so… what’s his agenda? Stay tuned!…..

They know what they are doing in academia. BS knows what he is doing in media mud. Both are faced with an uphill climb and a) getting someone in science in some country that doesn’t recognize voodoo to bridge their work to what we know about that is basic to chemistry, anatomy, paleontology, etc. that is more powerful than man and money having 99.997% of the same DNA… and b) getting BS to account for his loopiness or give him a show on conspiracy theories in the vein of John Stossel on 20/20 on ABC.

I can just see it now… he gets on TV with a hard-hitting conspiracy angle… and starts in on ‘morality of stem cell research’, intelligent design, elder care and extension of life and then ‘Self termination criteria you can live with’ and the ratings go over the top…. New reality based TV… a series: CSI – Geriatrics, or “Make a Wish in Area 51” and… oh, the horror…..Oh, Ben… Don’t be that way…

Don’t ask… No, I didn’t see the movie. I also didn’t see “Ol Yeller.” I am as good at ‘suspending disbelief’ as the next guy but I didn’t need the practice that bad.

However, I was heavy into what was going on with him and the scientists.

http://www.expelledexposed.com/index.php/the-truth/hitler-eugenics

Read Full Post »

A couple of posts ago I asked if real numbers exist (like pi).

It really doesn’t matter. and I’ve come back around (one of these mental oscillations…) to the conclusion brought to my attention on the NKS blog.

Here is the key statement:

Mathematics is a symbolic language — you can argue that none of its elements “exist” in physical reality, yet they can be used to communicate information about things which are real.

I found another statement to this effect in the classic “What is Mathematics?” by Courant & Robbins, revised by Stewart.

Through the ages mathematicians have considered their objects such as numbers, points, etc., as substantial things in themselves.  Since these entities had always defied attempts at an adequate description, it slowly dawned on the mathematicians of the the nineteenth century that the question of the meaning of these objects as substantial things does not make sense within mathematics, if at all.  The only relevant assertions concerning them do no refer to substantial reality; they state only the interrelations between mathematically “undefined objects” and the rules governing operations with them.”  What points, lines, numbers “actually” are cannot and need not be discussed in mathematical science.  What matters and what corresponds to “verifiable” fact is structure and relationship, that two points determine a line, that numbers combine according to certain rules to form other numbers, etc.

I often forget that the abstraction is not the thing.  The metaphor is not the thing.  The symbol is not the thing. Mathematics never makes assertions that it is the thing.  It is an abstraction – a description of relationships devoid of many of specific objects’ and environments’ properties.  This abstraction (and simplification) is required to make progress.  If mathematicians were to create theory that was specific to every situation, object, and environment, the world would run out of shelf space for storing all the math books and we’d gain nothing over flat out recording keeping.  In a sense, mathematical abstraction is a wonderfully useful compression of information.  The application of mathematics to a specific situation is the decompression of the abstraction.

This abstraction is so useful because it lets us focus on key relations and make progress on understanding despite our lack of complete knowledge of specific objects, environments, and situations.

This abstraction is also dangerous and/or limiting.  Not all situations in the universe are able to be described by a simplified mathematical theory.  In fact, a surprising number of very simple phenomena (theoretical, biological, physical, financial, etc.) are not mathematically compressible.  That is, a purely mathematical theory will not be sufficient for understanding in many situations.

What a relief!

Some mathematicians already experienced this relief from needing to describe the universe in some ultimate truth.   That’s far too big for any discipline to bear.

Not all mathematicians and a majority of economists, business people, vcs, media, and lay people do not recognize this limitation of mathematical theory (heck, and many other theories!).  In the US (perhaps elsewhere), business models (pro formas), stock indexes, indicators, projections, forecasts, formulas dominate our thinking on very complex phenomena.   We’ve explored this issue many times on this blog.

Understanding the universe we experience requires a combination of theories.  Math can sometimes point the way and get us going, keep us focused, or help us communicate.

Whether the real numbers exist doesn’t really matter.  The real numbers are useful for moving us forward on some problems in the real world.  Pi, as a compression of a really long number and challenging concept in geometric forms, is useful in helping us make wheels, explore space, and so much more.  That is what makes math great, even if it isn’t objective, ultimate truth.

Read Full Post »

Mathematica is rad.

Machine learning is also rad.

Check out these fine demos and code files for some nice informatics and machine learning ideas.

Read Full Post »

First, we all can read the backgrounder on human rights history and technical definitions/explanations.

Now that we’ve got all that out of the way, is there anyone that can give a satisfying explanation for human rights and why we continue to insist on them as a thing unto themselves?

Define a human right? or just describe one.  Where does the right come from?  what authority do we appeal to?  How do you enforce the right?

Yes, I know poly sci and human rights activists and social scientists publish and shout endless on this subject, but… what’s there?  what’s the current summary.

I ask because almost every high impact political or military move appeals to some protection or enforcement of human rights.  I figure we all to agree on what we’re talking about.

Read Full Post »

Older Posts »