1. Objectives 学习目标
- To understand the string/list data type and their representation 理解字符串/列表数据类型及其表示
- To become familiar with operations on strings/lists through built-in functions and methods 熟悉通过内置函数和方法对字符串/列表的操作
- To be able to apply string formatting to produce attractive output 能够应用字符串格式化产生美观的输出
2. Sequence 序列
What is a Sequence? 什么是序列?
- A positionally ordered collection of items 位置有序的项目集合
- Can refer to any item using its index number 可以使用索引号引用任何项目
Necessity of Sequence 序列的必要性
- Words in a document 文档中的单词
- Students in a course 课程中的学生
- Experimental data 实验数据
- Business customers 商业客户
Types of Sequences 序列类型
- Strings字符串:
'hello world!' - Lists列表:
['h','e','l','l','o',' ','w','o','r','l','d','!'] - Files文件
Sequence Characteristics 序列特性
- Homogeneous同质序列: All elements same type (e.g., strings) 所有元素类型相同
- Heterogeneous异质序列: Mixed element types (e.g., lists) 混合元素类型
- Homogeneous sequences are more efficient 同质序列更高效
Iterables 可迭代对象
- Any sequence is iterable 任何序列都是可迭代的
- Iterables are more general than sequences 可迭代对象比序列更通用
- Can get each element one by one 可以逐个获取每个元素
3. Data Type: String 数据类型:字符串
String Basics 字符串基础
- Sequence of characters enclosed in quotation marks 引号内的字符序列
- Use
" "or' 'for single-line strings 单行字符串使用双引号或单引号 - Use
""" """or''' '''for multi-line strings 多行字符串使用三引号
>>> firstName = input("Please enter your name: ")
Please enter your name: John
>>> print("Hello", firstName)
Hello JohnString Indexing 字符串索引
Forward index正向索引: 0, 1, 2, ...
Reverse index反向索引: -1, -2, -3, ...
text = "Hello World"
print(text[0]) # 'H'
print(text[-1]) # 'd'Notes注意:
- Index out of range causes error 索引越界会导致错误
- Index must be integer 索引必须是整数
String Slicing 字符串切片
<string>[<start>:<end>:<step>]text = "Hello World"
print(text[0:5]) # 'Hello'
print(text[6:]) # 'World' 与text[6]区分,text[6:]表示从第6个字符开始到结束
print(text[::2]) # 'HloWrd' 0,2,4...
print(text[::-1]) # 'dlroW olleH' (reverse) -1,-2,-3...
print(text[::-2]) # 'drWolH' -1,-3,-5...4. Simple String Processing 简单字符串处理
Username Generation 用户名生成
- First letter of first name + up to 7 letters from last name 名字首字母 + 最多7个姓氏字母
- Example: Zaphod Beeblebrox →
zbeebleb
# Username generation program
firstName = input("Please enter your first name (all lowercase): ")
lastName = input("Please enter your last name (all lowercase): ")
uname = firstName[0] + lastName[:7]
print("uname =", uname)Month Abbreviation 月份缩写
# Program to get month abbreviation
months = "JanFebMarAprMayJunJulAugSepOctNovDec"
n = int(input("Enter a month number (1-12): "))
pos = (n-1) * 3
monthAbbrev = months[pos:pos+3]
print("The month abbreviation is", monthAbbrev)Weakness弱点: Only works for outputs of same length 仅适用于相同长度的输出
5. String Operators 字符串运算符
Rational Operators 关系运算符
Concatenation连接: +
Repetition重复: *
>>> "spam" + "eggs"
'spameggs'
>>> 3 * "spam"
'spamspamspam'
>>> (3 * "spam") + ("eggs" * 5)
'spamspamspameggseggseggseggseggs'len() function:
>>> len("spam")
4
>>> for ch in "Spam!":
print(ch, end=" ")
S p a m !Comparison Operators 比较运算符
- Based on ASCII values 基于ASCII值
- Compare character by character 逐个字符比较
>>> "apple" < "banana"
True
>>> "Apple" < "apple" # 'A' (65) < 'a' (97)
TrueMember Operators 成员运算符
inandnot in- Return boolean values 返回布尔值
>>> 'a' in 'apple'
True
>>> 'seed' not in 'apple'
TrueFormatting Operators 格式化运算符
'%2d/%2d string' % (var1, var2)Format specifiers格式说明符:
%d: Decimal integer 十进制整数%2d: Width 2, right-justified, space-filled 宽度2,右对齐,空格填充%02d: Width 2, right-justified, zero-filled 宽度2,右对齐,零填充
6. String Representation 字符串表示
Character Encoding 字符编码
- ASCII: 127 bit codes, US-centric ASCII码,127位,以美国为中心
- Extended ASCII: Additional characters 扩展ASCII,附加字符
- Unicode: Universal standard, 100,000+ characters Unicode,通用标准,10万+字符
- UTF-8: Unicode Transformation Format UTF-8,Unicode转换格式
Character Functions 字符函数
>>> ord("A") # Get numeric code 获取数字代码
65
>>> ord("a")
97
>>> chr(97) # Get character from code 从代码获取字符
'a'
>>> chr(65)
'A'7. String Formatting 字符串格式化
Format Method 格式化方法
<template-string>.format(<values>)total = 1.5
print("The total value of your change is ${0:0.2f}".format(total))
# Output: The total value of your change is $1.50Format specifier格式说明符: <width>.<precision><type>
0.2f: Width 0, precision 2, fixed-point number 宽度0,精度2,定点数
Examples 示例
>>> "Hello {0} {1}, you may have won ${2}".format("Mr.", "Smith", 10000)
'Hello Mr. Smith, you may have won $10000'
>>> 'This int, {0:5}, was placed in a field of width 5'.format(7)
'This int, 7, was placed in a field of width 5'
>>> 'This float, {0:10.5f}, is fixed at 5 decimal places'.format(3.1415926)
'This float, 3.14159, has width 10 and precision 5.'Justification 对齐
>>> "left: {0:<5}".format("Hi!")
'left: Hi! '
>>> "right: {0:>5}".format("Hi!")
'right: Hi!'
>>> "centered: {0:^5}".format("Hi!")
'centered: Hi! '8. Programming an Encoder 编程实现编码器
Encoding Algorithm 编码算法
# Encode message to Unicode numbers
message = input("Please enter the message to encode: ")
print("Here are the Unicode codes:")
for ch in message:
print(ord(ch), end=" ")
print() # Blank lineDecoding Algorithm 解码算法
# Decode Unicode numbers to message
inString = input("Please enter the Unicode-encoded message: ")
message = ""
for numStr in inString.split():
codeNum = int(numStr)
message = message + chr(codeNum)
print("\nThe decoded message is:", message)Improved Decoder with Lists 使用列表改进解码器
# More efficient decoder using lists
inString = input("Please enter the Unicode-encoded message: ")
chars = [] # Create empty list 创建空列表
for numStr in inString.split():
codeNum = int(numStr)
chars.append(chr(codeNum)) # Append to list 追加到列表
message = "".join(chars) # Join list into string 将列表连接为字符串
print("\nThe decoded message is:", message)9. String Methods 字符串方法
Common String Methods 常用字符串方法
s.capitalize() # First character capitalized 首字母大写
s.title() # Each word capitalized 每个单词首字母大写
s.center(width) # Center in field 在字段中居中
s.count(sub) # Count occurrences 计数出现次数
s.find(sub) # Find first position 查找第一个位置
s.join(list) # Join list with separator 用分隔符连接列表
s.lower() # Convert to lowercase 转换为小写
s.upper() # Convert to uppercase 转换为大写Stripping Methods 去除方法
s.strip() # Remove leading/trailing whitespace 去除首尾空白
s.lstrip() # Remove leading whitespace 去除开头空白
s.rstrip() # Remove trailing whitespace 去除结尾空白Replacement and Splitting 替换和分割
s.replace(old, new) # Replace substrings 替换子字符串
s.split() # Split into list 分割为列表Validation Methods 验证方法
s.islower() # All characters lowercase? 所有字符小写?
s.isupper() # All characters uppercase? 所有字符大写?
s.isalpha() # All characters alphabetic? 所有字符字母?
s.isdigit() # All characters digits? 所有字符数字?
s.isalnum() # All characters alphanumeric? 所有字符字母数字?Advanced Methods 高级方法
# Find with parameters
s.find(str, beg=0, end=len(string))
# Join with separator
s.join(sequence)
# Count occurrences
s.count(substring)10. From Encoding to Encryption 从编码到加密
Encryption 加密
- Process of encoding information for secrecy 为保密而编码信息的过程
- Cryptography密码学: Study of encryption methods 加密方法的研究
Real-world Application 实际应用
- Used in credit card transmissions 用于信用卡传输
- Protects personal information 保护个人信息
11. Data Type: List 数据类型:列表
List Basics 列表基础
- Sequences of arbitrary values 任意值的序列
- Can contain mixed data types 可以包含混合数据类型
myList = [1, "Spam", 4, "U"] # 混合列表
myList = [] # Empty list 空列表
myList = [Point(1,1), Point(1,2)] # Objects 对象
student = [2020310912, 'Xiaoming', 'Male', 'China',
['Information School', 'Computer Science', 20020501]] # 列表嵌套List Indexing and Slicing 列表索引和切片
student = [2020310912, 'Xiaoming', 'Male', 'China','Information School', 'Computer Science', 20020501]
print(student[1]) # 'Xiaoming'
print(student[4]) # 'Information School'
print(student[:-3]) # '[2020310912, 'Xiaoming', 'Male', 'China']' 从第0个元素开始到倒数第3个元素结束,不包括倒数第3个元素
print(student[1:5:2]) # Slice with step 带步长的切片
print(student[:-3:-1]) # '[20020501, 'Computer Science']' 从-1开始,每次减1,直到倒数第3个元素结束,不包括倒数第3个元素
# Nested indexing 嵌套索引
print(student[4][0]) # 'I' from 'Information School'12. Immutable vs. Mutable 不可变 vs 可变
Immutable Objects 不可变对象
- Numbers, strings, tuples 数字、字符串、元组
- Cannot be changed 不能更改
- New object created when modified 修改时创建新对象
Mutable Objects 可变对象
- Lists, dictionaries, sets 列表、字典、集合
- Can be changed in place 可以原地更改
- Physical address remains same 物理地址保持不变
13. Lists as Sequences 列表作为序列
List Operations 列表操作
>>> [1,2] + [3,4] # Concatenation 连接
[1, 2, 3, 4]
>>> [1,2] * 3 # Repetition 重复
[1, 2, 1, 2, 1, 2]
>>> grades = ['A', 'B', 'C', 'D', 'F']
>>> grades[0] # Indexing 索引
'A'
>>> grades[2:4] # Slicing 切片
['C', 'D']
>>> len(grades) # Length 长度
5Month Program with Lists 使用列表的月份程序
# Improved month program using lists
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
n = int(input("Enter a month number (1-12): "))
monthAbbrev = months[n-1]
print("The month abbreviation is", monthAbbrev)
# Can easily extend to full names 可以轻松扩展为全名
full_months = ["January", "February", "March", "April",
"May", "June", "July", "August",
"September", "October", "November", "December"]14. List Methods 列表方法
append() Method 追加方法
# Create list of squares 创建平方列表
squares = []
for i in range(1, 101):
squares.append(i * i)
print(squares) # [1, 4, 9, ..., 10000]join() Method 连接方法
# Join list into string 将列表连接为字符串
words = ['Hello', 'world', '!']
message = " ".join(words) # 'Hello world !'14.1 Exercises 练习
Morse Code 摩斯电码
Task任务: Convert English sentence to Morse code 将英文句子转换为摩斯电码
morse_code = [".-","-...","-.-.","-..",".","..-.","--.","....",
"..",".---","-.-",".-..","--","-.","---",".--.",
"--.-",".-.","...","-","..-","...-",".--","-..-",
"-.--","--.."]
# Convert letters to Morse code 将字母转换为摩斯电码
# Use list indexing based on character position 基于字符位置使用列表索引Caesar Cipher 凯撒密码
Task任务: Implement Caesar encryption with variable shift 实现可变位移的凯撒加密
# Caesar cipher implementation
plaintext = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
shift = 19
ciphertext = ""
for char in plaintext:
if char.isalpha():
# Apply shift and handle wrap-around 应用位移并处理回绕
shifted = (ord(char) - ord('A') + shift) % 26
ciphertext += chr(shifted + ord('A'))
else:
ciphertext += char
print("Ciphertext:", ciphertext)15. Files: Multi-line Strings 文件:多行字符串
File Basics 文件基础
- Sequence of data stored in secondary memory 存储在辅助内存中的数据序列
- Contains multiple lines of text 包含多行文本
- Uses newline character
\nfor line breaks 使用换行符\n表示换行
Multi-line Strings 多行字符串
multi_line = """This is a
multi-line
string"""
print(multi_line)16. File Processing 文件处理
File Operations 文件操作
- Opening打开: Associate disk file with memory object 将磁盘文件与内存对象关联
- Processing处理: Read from or write to file 从文件读取或写入文件
- Closing关闭: Complete operations and bookkeeping 完成操作和簿记
Opening Files 打开文件
<filevar> = open(<name>, <mode>)Modes模式:
'r': Read 读取'w': Write (overwrites) 写入(覆盖)'a': Append 追加'r+': Read and write 读取和写入
infile = open("numbers.dat", "r")
outfile = open("output.txt", "w")File Paths 文件路径
Relative path相对路径:
f = open("../data.txt", "r") # Parent folder 父文件夹Absolute path绝对路径:
f = open("C:/users/data.txt", "r") # Full path 完整路径File Dialogs 文件对话框
from tkinter.filedialog import askopenfilename
infileName = askopenfilename()
if infileName:
infile = open(infileName, "r")
# Process file 处理文件17. File Methods 文件方法
Reading Methods 读取方法
# printfile.py
def main():
fname = input("Enter filename: ")
infile = open(fname, 'r')
data = infile.read() # Read entire file 读取整个文件
print(data)
infile.close()Reading options读取选项:
read(): Entire file as string 整个文件作为字符串readline(): Next line including newline 下一行包括换行符readlines(): List of all lines 所有行的列表
Reading Examples 读取示例
# Read first 5 lines 读取前5行
infile = open(someFile, "r")
for i in range(5):
line = infile.readline()
print(line[:-1]) # Strip newline 去除换行符
infile.close()
# Read all lines using readlines 使用readlines读取所有行
infile = open(someFile, "r")
for line in infile.readlines():
print(line.rstrip()) # Remove trailing whitespace 去除尾部空白
infile.close()
# Treat file as sequence 将文件视为序列
infile = open(someFile, "r")
for line in infile:
print(line, end="") # Line already has newline 行已有换行符
infile.close()Writing Methods 写入方法
outfile = open("mydata.out", "w")
print("Hello world!", file=outfile)
outfile.close()
# Using write method 使用write方法
outfile = open("data.txt", "w")
outfile.write("Hello world!\n")
outfile.close()
# Using writelines 使用writelines
lines = ["Line 1\n", "Line 2\n", "Line 3\n"]
outfile = open("data.txt", "w")
outfile.writelines(lines)
outfile.close()File Position 文件位置
# tell() - get current position 获取当前位置
position = file.tell()
# seek() - move cursor 移动光标
file.seek(offset, whence=0)
# whence: 0=beginning, 1=current, 2=endUsing with Statement 使用with语句
# Automatically closes file 自动关闭文件
with open('data.txt', 'r') as f:
data = f.read()
# File automatically closed here 文件在此自动关闭18. CSV Files CSV文件
CSV Processing CSV处理
- Tabular data with records and fields 具有记录和字段的表格数据
- Fields separated by commas or tabs 字段由逗号或制表符分隔
Using csv Module 使用csv模块
import csv
# Writing CSV 写入CSV
with open('data.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(['Name', 'Age', 'City'])
writer.writerow(['John', '25', 'New York'])
# Reading CSV 读取CSV
with open('data.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
print(row)Using pandas (Third-party) 使用pandas(第三方)
import pandas as pd
# Read CSV
df = pd.read_csv('data.csv')
# Write CSV
df.to_csv('output.csv', index=False)18.1 Example Program: Batch Usernames 示例程序:批量用户名
Task任务
- Process input file with first and last names 处理包含名字和姓氏的输入文件
- Generate system usernames 生成系统用户名
- Format: first character of first name + up to 7 of last name 格式:名字首字母 + 最多7个姓氏字母
Implementation实现
def main():
# Get file names
infileName = input("What file are the names in? ")
outfileName = input("What file should the usernames go in? ")
# Open files
infile = open(infileName, 'r')
outfile = open(outfileName, 'w')
# Process each line
for line in infile:
# Get first and last names from line
first, last = line.split()
# Create username
uname = (first[0] + last[:7]).lower()
# Write to output file
print(uname, file=outfile)
# Close files
infile.close()
outfile.close()
print("Usernames have been written to", outfileName)18.2 Exercise: Stock Data Processing 练习:股票数据处理
Tasks任务
-
Read and parse file 读取和解析文件
- Remove extra spaces 去除多余空格
- Standardize date format to YYYY-MM-DD 标准化日期格式为YYYY-MM-DD
-
String processing 字符串处理
- Convert company code to uppercase 公司代码转换为大写
- Replace
/with-in dates 日期中/替换为-
-
Basic analysis 基本分析
- Find highest/lowest closing prices 查找最高/最低收盘价
- Calculate average closing price 计算平均收盘价
-
User input 用户输入
- Prompt for company code 提示输入公司代码
- Analyze data for specified company 分析指定公司的数据