Programing

파이썬은 CSV를 목록으로 가져옵니다.

crosscheck 2020. 6. 3. 21:18
반응형

파이썬은 CSV를 목록으로 가져옵니다.


약 2000 개의 레코드가있는 CSV 파일이 있습니다.

각 레코드에는 문자열과 범주가 있습니다.

This is the first line, Line1
This is the second line, Line2
This is the third line, Line3

이 파일을 다음과 같은 목록으로 읽어야합니다.

List = [('This is the first line', 'Line1'),
        ('This is the second line', 'Line2'),
        ('This is the third line', 'Line3')]

이것을 csv파이썬을 사용하여 필요한 목록으로 어떻게 가져올 수 있습니까?


csv모듈을 사용하십시오 (Python 2.x).

import csv
with open('file.csv', 'rb') as f:
    reader = csv.reader(f)
    your_list = list(reader)

print your_list
# [['This is the first line', 'Line1'],
#  ['This is the second line', 'Line2'],
#  ['This is the third line', 'Line3']]

튜플이 필요한 경우 :

import csv
with open('test.csv', 'rb') as f:
    reader = csv.reader(f)
    your_list = map(tuple, reader)

print your_list
# [('This is the first line', ' Line1'),
#  ('This is the second line', ' Line2'),
#  ('This is the third line', ' Line3')]

Python 3.x 버전 (아래 @seokhoonlee)

import csv

with open('file.csv', 'r') as f:
  reader = csv.reader(f)
  your_list = list(reader)

print(your_list)
# [['This is the first line', 'Line1'],
#  ['This is the second line', 'Line2'],
#  ['This is the third line', 'Line3']]

Python3 업데이트 :

import csv

with open('file.csv', 'r') as f:
  reader = csv.reader(f)
  your_list = list(reader)

print(your_list)
# [['This is the first line', 'Line1'],
#  ['This is the second line', 'Line2'],
#  ['This is the third line', 'Line3']]

팬더 는 데이터 처리에 능숙합니다. 사용 방법의 예는 다음과 같습니다.

import pandas as pd

# Read the CSV into a pandas data frame (df)
#   With a df you can do many things
#   most important: visualize data with Seaborn
df = pd.read_csv('filename.csv', delimiter=',')

# Or export it in many ways, e.g. a list of tuples
tuples = [tuple(x) for x in df.values]

# or export it as a list of dicts
dicts = df.to_dict().values()

한 가지 큰 장점은 팬더가 헤더 행을 자동으로 처리한다는 것입니다.

Seaborn에 대해 들어 본 적이 없다면 자세히 살펴 보는 것이 좋습니다.

파이썬으로 CSV 파일을 읽고 쓰는 방법 도 참조하십시오 .

팬더 # 2

import pandas as pd

# Get data - reading the CSV file
import mpu.pd
df = mpu.pd.example_df()

# Convert
dicts = df.to_dict('records')

df의 내용은 다음과 같습니다.

     country   population population_time    EUR
0    Germany   82521653.0      2016-12-01   True
1     France   66991000.0      2017-01-01   True
2  Indonesia  255461700.0      2017-01-01  False
3    Ireland    4761865.0             NaT   True
4      Spain   46549045.0      2017-06-01   True
5    Vatican          NaN             NaT   True

dicts의 내용은

[{'country': 'Germany', 'population': 82521653.0, 'population_time': Timestamp('2016-12-01 00:00:00'), 'EUR': True},
 {'country': 'France', 'population': 66991000.0, 'population_time': Timestamp('2017-01-01 00:00:00'), 'EUR': True},
 {'country': 'Indonesia', 'population': 255461700.0, 'population_time': Timestamp('2017-01-01 00:00:00'), 'EUR': False},
 {'country': 'Ireland', 'population': 4761865.0, 'population_time': NaT, 'EUR': True},
 {'country': 'Spain', 'population': 46549045.0, 'population_time': Timestamp('2017-06-01 00:00:00'), 'EUR': True},
 {'country': 'Vatican', 'population': nan, 'population_time': NaT, 'EUR': True}]

팬더 # 3

import pandas as pd

# Get data - reading the CSV file
import mpu.pd
df = mpu.pd.example_df()

# Convert
tuples = [[row[col] for col in df.columns] for row in df.to_dict('records')]

The content of tuples is:

[['Germany', 82521653.0, Timestamp('2016-12-01 00:00:00'), True],
 ['France', 66991000.0, Timestamp('2017-01-01 00:00:00'), True],
 ['Indonesia', 255461700.0, Timestamp('2017-01-01 00:00:00'), False],
 ['Ireland', 4761865.0, NaT, True],
 ['Spain', 46549045.0, Timestamp('2017-06-01 00:00:00'), True],
 ['Vatican', nan, NaT, True]]

Update for Python3:

import csv
from pprint import pprint

with open('text.csv', newline='') as file:
reader = csv.reader(file)
l = list(map(tuple, reader))
pprint(l)
[('This is the first line', ' Line1'),
('This is the second line', ' Line2'),
('This is the third line', ' Line3')]

If csvfile is a file object, it should be opened with newline=''.
csv module


If you are sure there are no commas in your input, other than to separate the category, you can read the file line by line and split on ,, then push the result to List

That said, it looks like you are looking at a CSV file, so you might consider using the modules for it


result = []
for line in text.splitlines():
    result.append(tuple(line.split(",")))

As said already in the comments you can use the csv library in python. csv means comma separated values which seems exactly your case: a label and a value separated by a comma.

Being a category and value type I would rather use a dictionary type instead of a list of tuples.

Anyway in the code below I show both ways: d is the dictionary and l is the list of tuples.

import csv

file_name = "test.txt"
try:
    csvfile = open(file_name, 'rt')
except:
    print("File not found")
csvReader = csv.reader(csvfile, delimiter=",")
d = dict()
l =  list()
for row in csvReader:
    d[row[1]] = row[0]
    l.append((row[0], row[1]))
print(d)
print(l)

A simple loop would suffice:

lines = []
with open('test.txt', 'r') as f:
    for line in f.readlines():
        l,name = line.strip().split(',')
        lines.append((l,name))

print lines

Extending your requirements a bit and assuming you do not care about the order of lines and want to get them grouped under categories, the following solution may work for you:

>>> fname = "lines.txt"
>>> from collections import defaultdict
>>> dct = defaultdict(list)
>>> with open(fname) as f:
...     for line in f:
...         text, cat = line.rstrip("\n").split(",", 1)
...         dct[cat].append(text)
...
>>> dct
defaultdict(<type 'list'>, {' CatA': ['This is the first line', 'This is the another line'], ' CatC': ['This is the third line'], ' CatB': ['This is the second line', 'This is the last line']})

This way you get all relevant lines available in the dictionary under key being the category.


Here is the easiest way in Python 3.x to import a CSV to a multidimensional array, and its only 4 lines of code without importing anything!

#pull a CSV into a multidimensional array in 4 lines!

L=[]                            #Create an empty list for the main array
for line in open('log.txt'):    #Open the file and read all the lines
    x=line.rstrip()             #Strip the \n from each line
    L.append(x.split(','))      #Split each line into a list and add it to the
                                #Multidimensional array
print(L)

Next is a piece of code which uses csv module but extracts file.csv contents to a list of dicts using the first line which is a header of csv table

import csv
def csv2dicts(filename):
  with open(filename, 'rb') as f:
    reader = csv.reader(f)
    lines = list(reader)
    if len(lines) < 2: return None
    names = lines[0]
    if len(names) < 1: return None
    dicts = []
    for values in lines[1:]:
      if len(values) != len(names): return None
      d = {}
      for i,_ in enumerate(names):
        d[names[i]] = values[i]
      dicts.append(d)
    return dicts
  return None

if __name__ == '__main__':
  your_list = csv2dicts('file.csv')
  print your_list

참고URL : https://stackoverflow.com/questions/24662571/python-import-csv-to-list

반응형