Skip to content

bigcode/bigcode-pii-dataset

Token ClassificationCODE

The bigcode/bigcode-pii-dataset dataset is a CODE token classification resource from bigcode at 2023 comprising 12,099 examples. With 18 downloads and 56 likes, it is actively used by the community and is a 10K<n<100K-scale dataset.

About bigcode/bigcode-pii-dataset

PII dataset Dataset description This is an annotated dataset for Personal Identifiable Information (PII) in code. The target entities are: Names, Usernames, Emails, IP addresses, Keys, Passwords, and IDs. The annotation process invo...

Details

Task
Token Classification
Language
CODE
Format
Parquet
Rows / instances
12099
Size
10K<n<100K
Creator
bigcode
Year
2023
Downloads
18
Likes
56
Download Homepage

Related Token Classification datasets

FAQ