Learning the Ropes of SQL: Moving onto Aggregate Functions & Multiple Tables in SQL

    This week I finished up the beginner's tutorial on SQL from Codecademy. The final topics I covered this week were aggregate functions and multiple tables. I started with aggregate functions, which are calculations using SQL that are used to analyze large amounts of data in order to get a summary that can identify trends or patterns. I worked with the most common aggregate functions which take multiple input values and return a single output value. Some of the common functions I used were: AVG(), MAX(), MIN(), COUNT(), ROUND(), and SUM().  


    These functions can also be used to group data by a specific characteristic, for instance, the data set I was working with contained prices, downloads, and categories for fake apps, I was able to group the apps by each of those characteristics by using one query each. Previous to learning this tool, in order to achieve this same result I would have had to use the WHERE clause with several queries.

    One interesting thing I found what when trying to group data by multiple characteristics, you can use the order of the columns as a reference number. Meaning, instead of writing out the name of the second or eighth column, you can simply write GROUP BY 2, 8. This will group data first by the second column, then group them again (within the first group) by the eighth column. This clause can be further fine tuned by filtering the groups. It was in this realm that I was also able to learn how to filter groups by using the HAVING clause.


    I was able to use aggregate functions on a data set with about 1000 rows, and find information in a very quick and time efficient manner. Using this tool you are able to examine each single record to extract the information needed, a process that would be difficult to do by hand, not to mention lengthily. I can see how this could be used in the science field. For example, in Biology the AVG function could be used to find the average amount a certain gene is expressed. Further, the COUNT function could be used to see a genes response to a particular condition by counting the number of cells that have increased or decreased their response. 

    Next topic I learned about and was able to work with was the idea of having a large datasets that contain multiple tables. The dataset I worked with had about three tables that contained customer information, purchases, and subscriptions. One of my first tasks was to join two of the tables, purchases and subscriptions. While joining tables it is also possible to further specify which data you are referring to. Using such a tool simplifies the information you are returning.

🠗🠗🠗
To join tables you use JOIN keyword and then set the ON conditions (the common field between two or more tables. 

    There are a multiple ways to join tables that I was able to perform using a primary key from one table, and foreign key from another. INNER JOIN, which join the rows from the tables ONLY where matching common fields exist. LEFT JOIN, wherein the table on the left side of the equals operator in the ON statement returns ALL of the rows specified, but only the matching rows on the table noted on the right side of the operator return (all non-matching rows on right table return a value of NULL) . RIGHT JOIN, which is fundamentally the opposite of the LEFT JOIN process. FULL OUTER JOIN whereby ALL rows from the tables are returned, BUT only matching rows will contain data and non-matching rows return NULL. UNION, which combines tables with matching number of columns and data types into one table. CROSS JOIN, simply combines all the rows of one table, with all the rows of another into as many possible combinations as possible.

    In the science arena, an example of joining tables could be used when a dataset contains two tables. One table can have data on patients and their health, with fields for types of disease they have and severity. The other table could have data on genetic mutations, with fields covering types of mutation and their name. A scientist can then join these two tables to find complex relations between, say, the severity of disease and the certain gene mutation.

    This concludes my brief, but educational journey into learning about the SQL programming language. Overall I learned how SQL can be a strong tool for not only analyzing, but managing scientific data because it allows scientists the capability to extract patterns and trends from their data, and then make further project choices based on those results.

Comments

Popular posts from this blog

Building a Site Pt. 3: Team feedback and tweaks

Building a Site, Part 4: Adjusting the site

Creating a Webpage with Google Sites