The biggest Python topics of 2023 ›

Pandas and Polars

The topic revolves around the comparison and usage of Pandas and Polars, two popular DataFrame libraries in Python. It includes discussions on performance differences, memory usage, third-party integrations, and the transition from Pandas to Polars. Additionally, it explores the incorporation of Apache Arrow in Pandas 2.0 and the benefits of utilizing PyArrow with Pandas for efficient data analysis.


pygwalker: Turn Pandas Into a Tableau-Style UI Project Started in 2023

PyGWalker: Turn your pandas dataframe into an interactive UI for visual analysis

https://github.com/Kanaries/pygwalker

Finally—Pandas Practice That Isn’t Boring Article

You won’t get fluent with Pandas doing boring, irrelevant, toy exercises. Bamboo Weekly poses questions about current events, using real-world data sets—and offers clear, comprehensive solutions in Jupyter notebooks. Challenge yourself, and level up your Pandas skills every Wednesday →

https://www.bambooweekly.com/

Using Polars in a Pandas World Article

Pandas has far more third-party integrations than Polars. Learn how to use those libraries with Polars dataframes.

https://pythonspeed.com/articles/polars-pandas-interopability/

Comparing to None in Python and Pandas Article

Missing data are a frequent source of headache and bugs. This post discusses three guidelines that make it less error-prone.

https://sourcery.ai/blog/python-pandas-compare-to-none/

Why Polars Uses Less Memory Than Pandas Article

Polars is an alternative to Pandas than can often run faster—and use less memory! This article shows you how to go from Pandas to Polars.

https://pythonspeed.com/articles/polars-memory-pandas/

Python Polars: A Lightning-Fast DataFrame Library Article

Welcome to the world of Polars, a powerful DataFrame library for Python! In this showcase tutorial, you’ll get a hands-on introduction to Polars’ core features and see why this library is catching so much buzz.

https://realpython.com/polars-python/

Why Are There So Many Python Dataframes? Article

Ever wonder why there are so many ways libraries that have Dataframes in Python? This article talks about the different perspectives of the popular toolkits and why they are what they are.

https://ponder.io/why-are-there-so-many-python-dataframes/

Pandas 2.0 and the Arrow Revolution (Part I) Article

This article details the changes in the Pandas 2.0 release, with emphasis on the underlying adoption of Apache Arrow.

https://datapythonista.me/blog/pandas-20-and-the-arrow-revolution-part-i

Exploring Pandas 2.0 & Targets for Apache Arrow Article

What are the new ways to describe your data in pandas 2.0? Will the addition of Apache Arrow to the data back end foster the growth of data interoperability? This week on the show, we talk with pandas core developer Marc Garcia about the release of pandas 2.0.

https://realpython.com/podcasts/rpp/167/

Pandas 2.1.0 Released Article

https://pandas.pydata.org/docs/whatsnew/v2.1.0.html

Pandas Illustrated: The Definitive Visual Guide to Pandas Article

“Pandas is an industry standard for analyzing data in Python. With a few keystrokes, you can load, filter, restructure, and visualize gigabytes of heterogeneous information.” Learn all about Pandas with key illustrations to help understand the core concepts.

https://betterprogramming.pub/pandas-illustrated-the-definitive-visual-guide-to-pandas-c31fa921a43

Boosting Efficiency in Pandas With Indexing Article

Pandas is the most widely used Python library for data manipulation, and it allows you to access and manipulate data efficiently. Its indexing techniques can significantly improve the speed and efficiency of your queries. Learn how.

https://stackabuse.com/the-power-of-indexing-boosting-data-wrangling-efficiency-with-pandas/

Replacing Pandas With Polars. A Practical Guide Article

Polars is becoming a popular alternative to Pandas. This article compares the two and shows you a path to Polars.

https://www.confessionsofadataguy.com/replacing-pandas-with-polars-a-practical-guide/

Create a Beautiful Polar Histogram With Python and Matplotlib Article

“Polar histograms are great when you have too many values for a standard bar chart. The circular shape where each bar gets thinner towards the middle allows us to cram more information into the same area.” Learn how to create one using Python and Matplotlib.

https://dev.to/oscarleo/how-to-create-a-beautiful-polar-histogram-with-python-and-matplotlib-400l

The Python Dataframe Interchange Protocol Article

The Python Dataframe Interchange Protocol is a mechanism for switching between Dataframes in different libraries that use them. It supports Vaex, cuDF, Modin, pandas, Polars, and more.

https://ponder.io/how-the-python-dataframe-interchange-protocol-makes-life-better/

Python for Finance: Pandas Resample, Groupby, and Rolling Article

When working with time series data such as financial information, the resample, grouping, and rolling features of Pandas can make your life easier. Read on to learn how.

https://ponder.io/python-for-finance-pandas-resample-groupby-and-rolling/

tidypolars: tidyverse (R) Clone in Polars Article

https://tidypolars.readthedocs.io/en/latest/

Pandas 2.0 vs Pandas 1: Performance Comparison Article

Pandas 2.0 was recently released with the new pyarrow backend. In the article, we did a quick performance comparison between the new pyarrow backend in 2.0 with the standard in Pandas 1. The results were expected, a big speedup in terms of String processing and null value handling, but slower with numeric processing and aggregations.

https://medium.com/@santiagobasulto/pandas-2-0-performance-comparison-3f56b4719f58

How to Iterate Over Rows in Pandas, and Why You Shouldn’t Article

In this tutorial, you’ll learn how to iterate over a pandas DataFrame’s rows, but you’ll also understand why looping is against the way of the panda. You’ll understand vectorization, see how to choose vectorized methods, and compare the performance of iteration against pandas.

https://realpython.com/pandas-iterate-over-rows/

The Fastest Way to Read a CSV File in Pandas 2.0 Article

The fastest way to read a CSV file into a Pandas DataFrame isn’t pd.read_csv(). This article shows you the alternative and how the result was bench-marked.

https://itnext.io/the-fastest-way-to-read-a-csv-file-in-pandas-2-0-532c1f978201

Python Parquet and Arrow: Using PyArrow With Pandas Article

Parquet and Arrow are two Apache projects available in Python via the PyArrow library. Parquet is column-oriented storage format for arrays and tables of data, while Arrow is an in-memory columnar format for data analysis. This article describes how to use them and how they compare to Pandas DataFrames.

https://codesolid.com/python-pyarrow-and-parquet/

Analyzing Labor Markets in Python With LODES Data Article

This article shows step-by-step instructions on how to use pandas and pygris to analyze geographical data. The example uses the LODES (LEHD Origin-Destination Employment Statistics) data set, a synthetic data set with US Census block and job workplace data, to map the commute flow to Apple headquarters in Cupertino, California.

https://walker-data.com/posts/lodes-commutes/

Exploratory Spatial Data Analysis With Python Article

Kyle Walker is the author of “Analyzing US Census Data: Methods, Maps, and Models, in R”. In this article he translates some of the book’s examples into Python.

https://walker-data.com/posts/esda-with-python/

polars-cookbook: Recipes for Using Python’s Polars Library Project Started in 2023

Recipes for using Python's polars library

https://github.com/escobar-west/polars-cookbook

Polars-business: Polars Business Date Arithmetic Project Started in 2023

Polars plugin offering eXtra stuff for DateTimes

https://github.com/MarcoGorelli/polars-business