NoSQL

Now that we just get used to SQL, lets go to NoSQL. Have a look at this prominent introduction … good for perspective . Introduction to NoSQL by Martin Fowler So, nothing lost...

Trees and Other Hierarchies in MySQL

Great chapter from the unmissable book by Peter Brawley and Arthur Fuller … http://www.artfulsoftware.com/ … Thank you boys! Most non-trivial data is hierarchical. Customers have orders, which have line items, which refer to products, which have prices. Population samples have subjects, who take tests, which give results, which have sub-results and norms. Web sites have pages, which have links, which collect hits, which distribute across dates and times. With such data, we know the depth of the hierarchy before we sit down to write a query. The depth of the hierarchy of tables fixes the number of JOINs we need to write. But if our data describes a family tree, or a browsing history, or a bill of materials, hierarchical depth depends on the data. We no longer know how many JOINs it will take to walk the tree. We need a different data model. That model is the graph (Fig 1), which is a set of nodes (vertices) and the edges (lines or arcs) that connect them. This chapter is about how to model and query graphs in a MySQL database. Graph theory is a branch of topology. It is the study of geometric relations which aren’t changed by stretching and compression—rubber sheet geometry, some call it. Graph theory is ideal for modelling hierarchies—like family trees, browsing histories, search trees and bills of materials—whose shape and size we can’t know in advance. >> Go to Source...
Data Transformation and Linear Algebra

Data Transformation and Linear Algebra

The problem of data transformation is solved in numerous ways with different levels of smartness and in different flavors. ETL (extract – transform – load) processes is a buzz word strongly related to this topic. Basically the requirement is to get a defined set of data entities, that would be data structures like records from tables in schemas from one presentation into another. That can be just a space time transformation (trivial as it maintains the structure – shape) or structural transformation which is shape changing. Based on some concepts of linear algebra where a fully understood algorithm has been defined over the last centuries, mostly the actual work done on different presentations of so called vectors ( which are well defined sets of data within a presentation (multi dimensional space) ). So something like the image above. Now, the idea is to try presenting a data structure in a space or what is equivalent provide a bi-directional transformation (mapping) onto that space. Impossible? I don’t think so. Conclusion, do it then! Ok, watch this blog and your curiosity will be satisfied...

Loading half a billion rows into MySQL

Interesting post on the derwiki blog … Especially the commenting is quite entertaining! Amazing how ignorance produces patronizing statements (-> Morg). See belwo the top of teh post … Background We have a legacy system in our production environment that keeps track of when a user takes an action on Causes.com (joins a Cause, recruits a friend, etc). I say legacy, but I really mean a prematurely-optimized system that I’d like to make less smart. This 500m record database is split across monthly sharded tables. Seems like a great solution to scaling (and it is) — except that we don’t need it. And based on our usage pattern (e.g. to count a user’s total number of actions, we need to do query N tables), this leads to pretty severe performance degradation issues. Even with memcache layer sitting in front of old month tables, new features keep discovering new N-query performance problems. Noticing that we have another database happily chugging along with 900 million records, I decided to migrate the existing system into a single table setup. The goals were: reduce complexity. Querying one table is simpler than N tables. push as much complexity as possible to the database. The wrappers around the month-sharding logic in Rails are slow and buggy. increase performance. Also related to one table query being simpler than N. … >> Go to Source...
Big data is better data

Big data is better data

And here is another TED talk. This time we are listening to wonderful Mr. Kenneth Cukier ……. Self-driving cars were just the start. What’s the future of big data-driven technology and design? In a thrilling science talk, Kenneth Cukier looks at what’s next for machine learning — and human knowledge. Watch...