10 Pandas One-Liners Every Data Analyst Should Know
Pandas can either feel like magic or like fighting NumPy with both hands tied. Ten one-liners cover the vast majority of day-to-day analyst work — memorise these and your notebooks shrink by half. The ten…
Pandas can either feel like magic or like fighting NumPy with both hands tied. Ten one-liners cover the vast majority of day-to-day analyst work — memorise these and your notebooks shrink by half.
The ten
- Top rows per group —
df.groupby('city').head(3). - Conditional column —
df['tier'] = np.where(df['amount'] > 1000, 'high', 'low'). - Cross-tab —
pd.crosstab(df.city, df.product, values=df.revenue, aggfunc='sum'). - Rolling average —
df['ma7'] = df['sales'].rolling(7).mean(). - Percent change —
df['pct'] = df['sales'].pct_change(). - Pivot —
df.pivot_table(index='date', columns='city', values='sales', aggfunc='sum'). - Drop duplicates by column —
df.drop_duplicates(subset=['email'], keep='last'). - Filter by multiple values —
df[df['city'].isin(['Chennai', 'Pune'])]. - String contains —
df[df['title'].str.contains('Engineer', case=False, na=False)]. - Apply with axis —
df.apply(lambda r: r['a'] + r['b'], axis=1).
The bonus skill
Learn .assign() for chainable column additions and you stop reassigning to df at every step. Method chaining makes notebooks read like a recipe.
When to leave pandas
Above ten million rows, pandas becomes painful. Reach for DuckDB (SQL on Parquet, brilliant), Polars (pandas-like API, much faster), or just push the work into the warehouse. Knowing when to step off pandas is itself a senior signal.
Practice tip
Take a Kaggle dataset and redo any familiar tutorial entirely with method chains. The discipline forces these one-liners into muscle memory.