Skip to content

Working with .csv files via pandas in Python

Read:

import pandas as pd

contents = pd.DataFrame(pd.read_csv('products.csv', delimiter=',',))
print(contents)

Read and name first row:

contents=pd.read_csv('products.csv', delimiter=',', names = ['product_id', 'product_title', 'product_description', 'price'])
print(contents)

Print first 5 rows:

contents=pd.read_csv('products.csv', delimiter=',', names = ['product_id', 'product_title', 'product_description', 'price']) 
print(contents.head())

Print last 5 rows:

contents=pd.read_csv('products.csv', delimiter=',', names = ['product_id', 'product_title', 'product_description', 'price']) 
print(contents.tail())

Print random 5 rows:

contents=pd.read_csv('products.csv', delimiter=',', names = ['product_id', 'product_title', 'product_description', 'price']) 
print(contents.sample())

Print specific columns:

contents=pd.read_csv('products.csv', delimiter=',', names = ['product_id', 'product_title', 'product_description', 'price']) 
print(contents[['product_title','price']])

Filter specific values:

print(contents[contents.product_title == 'Products ABC'])

Counts of rows:

print(contents.count())

Sum of data:

print(contents.price.sum())

Min of data:

print(contents.price.min())

Max of data:

print(contents.price.max())

Mean of data:

print(contents.price.mean())

Median of data:

print(contents.price.median())

Grouping of data(ex:mean of group):

print(contents.groupby('price').mean())

Merge of two dataset

print(contents.merge(price))

Sorting:

contents.sort_values('product_title')

Reset index after sorting:

#with deleting former indexing
contents.sort_values(by = ['product_title'], ascending = False).reset_index(drop = True) 
#without deleting former indexing
contents.sort_values(by = ['product_title'], ascending = False).reset_index() 

Discribe data:

  •     For numeric columns, describe() returns basic statistics: the value count, mean, standard deviation, minimum, maximum, and 25th, 50th, and 75th quantiles for the data in a column.
  •     For string columns, describe() returns the value count, the number of unique entries, the most frequently occurring value (‘top’), and the number of times the top value occurs (‘freq’)
contents['price'].describe()

Delete column: 

contents = contents.drop(columns='price')contents = contents.drop('price', axis=1) #axis=1 means columns
contents = contents.drop([0], axis=1) # delete according to column index
Delete Rows 

Delete rows:

contents = contents.drop([0], axis=0) # axis=0 means rows, delete according to row index
Exporting data:

Exporting Data:

# Output data to a CSV file# Typically, I don't want row numbers in my output file, hence index=False.# To avoid character issues, I typically use utf8 encoding for input/output.
data.to_csv("output_filename.csv", index=False, encoding='utf8')
# Output data to an Excel file.# For the excel output to work, you may need to install the "xlsxwriter" package.
data.to_csv("output_excel_file.xlsx", sheet_name="Sheet 1", index=False)
Published inData SciencePython

Be First to Comment

Leave a Reply

%d bloggers like this: