[빅분기] - 실기 1과목 연습문제

자격증/빅분기

[빅분기] - 실기 1과목 연습문제

ro-jun 2025. 5. 27. 20:06

728x90

빅데이터분석기사 실기시험 1과목 대비를 위해 AI로 생성한 문제와 직접 푼 문제풀이입니다.
pandas를 활용한 데이터 전처리와 분석 중심으로 구성되어 있으며, 실제 시험에 나올 수 있는 유형들로 연습했습니다.

연습문제 1

✅ 문제 개요

Gender 컬럼에서 'M' 또는 'Male'은 'Male'로, 'F' 또는 'Female'은 'Female'로 값을 표준화하시오. (다른 값이나 결측값은 변경하지 않음)
Age 컬럼의 결측값은 전체 고객의 평균 'Age'로 대치하시오. (평균 계산 시 결측값은 제외)
Age가 30세 이상인 고객들만 선택하여 새로운 데이터프레임으로 만드시오.
위 3번 단계에서 선택된 고객들 중 PurchaseAmount 컬럼의 결측값은 해당 그룹(30세 이상 고객 그룹)의 PurchaseAmount 평균으로 대치하시오. (평균 계산 시 결측값은 제외)
최종적으로 처리된 데이터(30세 이상, PurchaseAmount 결측값 처리 완료)에서 Gender가 'Female'인 고객들의 PurchaseAmount 총합을 구하시오.
결과는 정수형으로 출력하시오. (소수점 이하는 버림)

🔍 데이터 생성 및 불러오기

import pandas as pd

data = {
    'ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
    'Age': [25, 35, 40, 22, 50, 33, None, 60, 28, 45, 31, 38, 29, 55, 30, 42, 37, 26, 48, 32],
    'Gender': ['Male', 'Female', None, 'M', 'F', 'Male', 'Female', 'M', 'F', 'Male', 'Female', 'Other', 'M', 'F', 'Male', 'Female', 'Male', 'F', None, 'Male'],
    'City': ['Seoul', 'Busan', 'Incheon', 'Seoul', 'Gwangju', 'Daegu', 'Daejeon', 'Suwon', 'Seoul', None, 'Busan', 'Incheon', 'Seoul', 'Gwangju', 'Daegu', 'Seoul', 'Busan', 'Incheon', 'Gwangju', 'Daegu'],
    'PurchaseAmount': [100, 150, 200, None, 300, 120, 180, 250, 90, 220, None, 160, 110, 320, 130, 210, None, 105, 280, 140],
    'MembershipLevel': ['Silver', 'Gold', 'Silver', 'Bronze', 'Gold', 'Silver', 'Bronze', 'Gold', None, 'Silver', 'Gold', 'Silver', 'Bronze', 'Gold', None, 'Silver', 'Gold', 'Bronze', 'Silver', None]
}

df = pd.DataFrame(data)

df.to_csv('customer.csv', index=False)

import pandas as pd

df = pd.read_csv('customer.csv')

🛠️ 전처리 및 분석

1. Gender 표준화

df['Gender'] = df['Gender'].replace('F', 'Female')
df['Gender'] = df['Gender'].replace('M', 'Male')
print(df['Gender'])

2. Age 결측값 처리

Age_mean = df['Age'].mean()
df['Age'] = df['Age'].fillna(Age_mean)
df['Age']

3. 30세 이상 고객 추출

df_copy = df.copy()
df_copy = df_copy[df_copy['Age'] >= 30]
df_copy

4. PurchaseAmount 결측값 평균으로 대체

PurchaseAmount_mean = df_copy['PurchaseAmount'].mean()
df_copy['PurchaseAmount'] = df_copy['PurchaseAmount'].fillna(PurchaseAmount_mean)
df_copy

5. Female 고객의 총 구매금액

df_copy[df_copy['Gender'] == 'Female']['PurchaseAmount'].sum()
int(df_copy[df_copy['Gender'] == 'Female']['PurchaseAmount'].sum())

연습문제 2

✅ 문제 개요

OrderDate 컬럼을 datetime 형식으로 변환하고, 이로부터 'OrderYear' (연도)와 'OrderMonth' (월) 컬럼을 새로 생성하시오.
DiscountRate 컬럼의 결측값은 0으로 대치하시오.
Quantity와 UnitPrice를 곱하여 TotalPrice (총 주문 금액) 컬럼을 생성하시오.
TotalPrice와 DiscountRate를 사용하여 FinalPrice (최종 결제 금액) 컬럼을 생성하시오. (계산식: FinalPrice=TotalPrice×(1−DiscountRate) )
2023년에 발생한 주문 중, ProductCategory가 'Electronics' 또는 'Apparel'인 상품들의 FinalPrice 평균을 구하시오.
결과는 소수점 둘째 자리까지 반올림하여 출력하시오. (예: 123.45)

🔍 데이터 생성 및 불러오기

import pandas as pd
data = {
    'OrderID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
    'OrderDate': ['2022-11-05', '2023-01-15', '2023-01-20', '2022-12-10', '2023-02-05', '2023-03-12', '2023-03-25', '2024-01-05', '2023-04-10', '2023-04-22', '2023-05-15', '2023-06-01', '2022-08-20', '2023-07-10', '2023-08-19'],
    'ProductCategory': ['Books', 'Electronics', 'Apparel', 'Home Goods', 'Electronics', 'Books', 'Apparel', 'Electronics', 'Home Goods', 'Electronics', 'Apparel', 'Electronics', 'Apparel', 'Books', 'Electronics'],
    'ProductName': ['Python Intro', 'SuperFast Charger', 'Winter Jacket', 'Scented Candle', 'Wireless Mouse', 'Data Science Handbook', 'Running Shoes', '4K Monitor', 'Desk Lamp', 'Gaming Keyboard', 'Summer T-Shirt', 'Noise Cancelling Headphones', 'Designer Jeans', 'Machine Learning Yearning', 'Smart Watch'],
    'Quantity': [2, 1, 1, 3, 2, 1, 1, 1, 1, 1, 3, 1, 1, 2, 1],
    'UnitPrice': [30000, 25001, 150000, 12000, 18000, 45000, 89000, 350000, 35000, 120000, 22000, 220000, 280000, 20000, 275000],
    'DiscountRate': [0.05, 0.1, 0.2, None, 0.15, 0.1, None, 0.05, 0.0, 0.2, 0.1, None, 0.3, 0.0, 0.1]
}

df_orders = pd.DataFrame(data)

df_orders.to_csv('sailes.csv', index=False)

df = pd.read_csv('sailes.csv')
print(df.describe)
print(df.dtypes)
print(df.index)
print(df.columns)
print(df.isna())

🛠️ 전처리 및 분석

1. 날짜 처리 및 파생 컬럼 생성

import pandas as pd

df = pd.read_csv('sailes.csv')
type(df['OrderDate'][0:3])
print(df['OrderDate'][0:3]) 
df['OrderDate'] = pd.to_datetime(df['OrderDate'])
type(df['OrderDate'][0:3])
print(df['OrderDate'][0:3])
df['OrderYear'] = df['OrderDate'].dt.year
df['OrderMonth'] = df['OrderDate'].dt.month

2. DiscountRate 결측값 0으로 대치

print(df['DiscountRate'].isna().sum())
df['DiscountRate'] = df['DiscountRate'].fillna(0)
print(df['DiscountRate'].isna())

3. TotalPrice 계산

df["TotalPrice"] = df["Quantity"] * df["UnitPrice"]
print(df["TotalPrice"])

4. FinalPrice 계산

df["FinalPrice"] = df["TotalPrice"] * (1 - df["DiscountRate"])
print(df["FinalPrice"])

5~6. 특정 조건 평균 계산 및 소수점 반올림

filter_df = df[(df["OrderYear"] == 2023) & (df["ProductCategory"].isin(["Electronics", "Apparel"]))]
print(filter_df)
FinalPrice_avg = filter_df["FinalPrice"].mean()
print(FinalPrice_avg)
result = round(FinalPrice_avg, 2)
result

연습문제 3

✅ 문제 개요

TransactionDate 컬럼을 datetime 형식으로 변환하세요.
'Seoul' 지역에서 발생한 거래 중 PaymentStatus가 **'Completed'**인 거래들의 Amount 평균을 계산하세요. 2023년 3월에 가장 많이 판매된 ProductCategory를 찾으세요. (거래 건수 기준)
결과를 각각 다음 형식으로 출력하세요:
- 'Seoul' 지역 완료 거래 평균 금액: 소수점 둘째 자리까지 반올림
- 2023년 3월 최다 판매 카테고리: 문자열

🔍 데이터 생성 및 불러오기

%%writefile sales_data.csv
TransactionID,TransactionDate,CustomerID,ProductCategory,Amount,PaymentStatus,Region
T001,2023-01-10,C101,Electronics,120000,Completed,Seoul
T002,2023-01-15,C102,Books,35000,Pending,Busan
T003,2023-01-20,C101,Apparel,75000,Completed,Seoul
T004,2023-02-01,C103,Electronics,250000,Completed,Gyeonggi
T005,2023-02-05,C102,Home Goods,40000,Completed,Busan
T006,2023-02-10,C101,Books,20000,Completed,Seoul
T007,2023-03-01,C104,Apparel,90000,Completed,Incheon
T008,2023-03-05,C103,Electronics,180000,Pending,Gyeonggi
T009,2023-03-10,C105,Home Goods,60000,Completed,Busan
T010,2023-04-01,C101,Electronics,300000,Completed,Seoul
T011,2023-04-05,C104,Books,28000,Completed,Incheon
T012,2023-04-10,C105,Apparel,55000,Completed,Busan

import pandas as pd

df = pd.read_csv('sales_data.csv')
print(df.columns)
print(df.index)
print(df.describe)
print(df.dtypes)
print(df.isna())

🛠️ 전처리 및 분석

import pandas as pd

df = pd.read_csv('sales_data.csv')
print(df['TransactionDate'].dtypes)
df['TransactionDate'] = pd.to_datetime(df['TransactionDate'])
print(df['TransactionDate'].dtypes)

연습문제 3-1

print(df.columns)

result = df[(df["Region"] == "Seoul") & (df["PaymentStatus"] == "Completed")]["Amount"].mean()
result = round(result, 2)
print(result)

연습문제 3-2

print(df)

category_count = df[((df["TransactionDate"].dt.year == 2023) & (df["TransactionDate"].dt.month == 3))]["ProductCategory"].value_counts()
top_category = category_count.idxmax()
top_category

연습문제 3-3

print(df)

category_count = df[((df["TransactionDate"].dt.year == 2023) & (df["TransactionDate"].dt.month == 3))]["ProductCategory"].value_counts()
top_category = category_count.idxmax()
top_category

연습문제 4

✅ 문제 개요

TransactionDate 컬럼을 datetime 형식으로 변환하고, TransactionMonth (월), - - - TransactionDayOfWeek (요일, 0=월요일, 6=일요일) 컬럼을 각각 생성하세요. TransactionDate 컬럼을 datetime 형식으로 변환하세요.
ProductCategory가 **'Electronics'**이면서 CustomerRating이 4.0 이상인 거래들의 총 Revenue의 합계를 계산하세요.
PromotionApplied가 **'Yes'**인 거래들의 UnitsSold 평균을 계산하고, 이 결과를 정수로 반올림하세요. (반올림 시 .round() 대신 int() 또는 np.round() 사용을 고려)
각 BranchID별로 월별(TransactionMonth) 평균 Revenue를 계산하고, 이 중 2월에 B01 지점의 평균 Revenue를 출력하세요. (소수점 없이 정수로 출력)
가장 많은 UnitsSold를 기록한 Salesperson은 누구이며, 그가 판매한 총 UnitsSold는 몇 개인지 출력하세요.

🔍 데이터 생성 및 불러오기

🛠️ 전처리 및 분석

%%writefile sales_data_2.csv
SaleID,TransactionDate,BranchID,Salesperson,ProductCategory,Revenue,UnitsSold,CustomerRating,PromotionApplied
S001,2023-01-01,B01,Alice,Electronics,1200000,1,4.5,Yes
S002,2023-01-01,B02,Bob,Apparel,80000,2,3.8,No
S003,2023-01-02,B01,Alice,Home Goods,150000,1,4.0,No
S004,2023-01-03,B03,Charlie,Electronics,900000,1,4.2,Yes
S005,2023-01-03,B02,Bob,Books,50000,5,4.7,No
S006,2023-01-04,B01,Alice,Apparel,100000,1,3.5,Yes
S007,2023-01-05,B03,Charlie,Electronics,2000000,1,4.9,No
S008,2023-01-05,B02,Bob,Home Goods,70000,1,3.0,No
S009,2023-01-06,B01,Alice,Books,30000,3,4.1,No
S010,2023-01-07,B03,Charlie,Apparel,120000,1,4.3,Yes
S011,2023-02-01,B01,Alice,Electronics,1500000,1,4.6,No
S012,2023-02-02,B02,Bob,Apparel,95000,1,3.9,No
S013,2023-02-03,B01,Alice,Home Goods,200000,1,4.2,Yes
S014,2023-02-04,B03,Charlie,Books,40000,4,4.8,No
S015,2023-02-05,B02,Bob,Electronics,700000,1,4.1,Yes
S016,2023-03-01,B01,Alice,Apparel,110000,1,3.7,No
S017,2023-03-02,B02,Bob,Home Goods,85000,1,3.2,Yes
S018,2023-03-03,B03,Charlie,Electronics,1000000,1,4.5,No
S019,2023-03-04,B01,Alice,Books,25000,2,4.0,No
S020,2023-03-05,B02,Bob,Apparel,60000,1,3.6,Yes

import pandas as pd

df = pd.read_csv('sales_data_2.csv')
print(df.info())
print(df.head(5))

연습문제 4-1

print(df.columns)
df["TransactionDate"] = pd.to_datetime(df["TransactionDate"])
df["TransactionMonth"] = df["TransactionDate"].dt.month
df["TransactionDayOfWeek"] = df["TransactionDate"].dt.weekday
print(df[["TransactionMonth", "TransactionDayOfWeek"]].head())

연습문제 4-2

print(df.columns)
result = df[(df["ProductCategory"] == "Electronics") & (df["CustomerRating"] >= 4.0)]["Revenue"].sum()
result

연습문제 4-3

result = df[df["PromotionApplied"] == 'Yes']["UnitsSold"].mean()
result = int(result)
result

연습문제 4-4

group_avg = df.groupby(["BranchID", "TransactionMonth"])["Revenue"].mean()
print(group_avg)
print(int(group_avg['B01'][2].mean()))

연습문제 4-5,6

salesperson_units = df.groupby("Salesperson")["UnitsSold"].sum()
print(salesperson_units)
print(salesperson_units.idxmax())
print(salesperson_units.max())

728x90

현재글[빅분기] - 실기 1과목 연습문제

250x250

ro-jun 님의 블로그

ro-jun 님의 블로그 입니다.

Pinecone, 소문자로 바꾸기, pyenv, VectorDB, LLM, sqld, 파이썬, freeze, 티스토리챌린지, conda, OpenAI, 질의응답, 한경 설정, 가상환경, 벡터데이터베이스, 오블완, 대문자로 바꾸기, 프로그래머스, 유사도, gradio,

Today :
Yesterday :

ro-jun 님의 블로그