I had so much fun working on this data science course!
One aspect of the fun I had was learning interesting information about the data I used. I share my learnings here and look forward to hearing about yours.
#julialang #datasciencehttps://twitter.com/JuliaLanguage/status/1265278348005122049 …
-
Show this thread
-
The next time you visit Yellowstone National Park to check out the Old Faithful geyser, know that if you wait for too long for the geyser to go off... you are likely to witness a longer eruption.pic.twitter.com/gQ2dn3QEBt
2 replies 2 retweets 12 likesShow this thread -
We use a cars dataset of car models with features such as horsepower and cylinders (& 5 more). We perform dimensionality reduction on this data & find out that European/Japanese cars cluster together whereas American cars form their own two clusters. But why? I'd love to find outpic.twitter.com/pZBMWcMktH
2 replies 1 retweet 6 likesShow this thread -
We perform clustering on houses in CA based on their geographic location. If anything, these clusters showed that housing prices isn't directly mapped to neighborhood -- there is a pattern in the prices themselves but that seems to be mainly determined by closeness to the water.pic.twitter.com/qdcB5Z5Fy7
2 replies 1 retweet 7 likesShow this thread -
We run several classification methods: Lasso, Ridge, Elastic net, Decision tree, Random forest, Nearest Neighbors, and Support Vector Machines (on the famous Iris dataset) and built a score board of these methods. I'd love to see how this score board will look like on other data.
1 reply 2 retweets 8 likesShow this thread -
We use data from
@zillow and built a regression model to see in which states the ratio of houses sold to the houses listed is highest. Turns out, North Carolina seems to be a winner here (this is data on Feb. 2020).pic.twitter.com/Euq0qnp4dX
1 reply 2 retweets 9 likesShow this thread -
We work with a dataset of airports and flights within the United States. Spoiler alert: Atlanta has the highest number of flights from/to (duh!) and its PageRank value is one of the highest.pic.twitter.com/u353BWw5Ma
1 reply 1 retweet 7 likesShow this thread -
We got data from google finance (btw, it's really easy to get such data -- check out the `GOOGLEFINANCE` function you can use in G sheets) and performed a portfolio optimization problem. From the three companies we picked (FB, MSFT, Apple), most of the investment went to Apple.
1 reply 2 retweets 9 likesShow this thread -
We played around more with the
@zillow data on housing prices and listings... Here is a bar plot (not a big fan of bar plots but if you must, you gotta use Edward Tufte's style here). Not surprisingly, California had the highest number of house listings in February 2020.pic.twitter.com/OQQBI8ug1o
1 reply 1 retweet 7 likesShow this thread -
Another thing I learned from
@EdwardTufte is the idea of symmetry. Here, you can see that you don't need both sides of the violin plot -- so I plot data from 10 years apart. Interestingly, the price distribution seems to be very similar except that the median has shifted upwards.pic.twitter.com/Kj50p7cn2m
5 replies 19 retweets 180 likesShow this thread
California:pic.twitter.com/OwLzd5OHli
-
-
Красиво выглядит. Только что это ? Интересно
0 replies 0 retweets 0 likesThanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.