[ PROMPT_NODE_26288 ]
Cellxgene Census 常用模式
[ SKILL_DOCUMENTATION ]
# 常见查询模式与最佳实践
## 查询模式分类
### 1. 探索性查询(仅元数据)
在不加载表达矩阵的情况下探索可用数据时使用。
**模式:获取组织中唯一的细胞类型**
python
import cellxgene_census
with cellxgene_census.open_soma() as census:
cell_metadata = cellxgene_census.get_obs(
census,
"homo_sapiens",
value_filter="tissue_general == 'brain' and is_primary_data == True",
column_names=["cell_type"]
)
unique_cell_types = cell_metadata["cell_type"].unique()
print(f"发现 {len(unique_cell_types)} 种独特的细胞类型")
**模式:按条件统计细胞数量**
python
cell_metadata = cellxgene_census.get_obs(
census,
"homo_sapiens",
value_filter="disease != 'normal' and is_primary_data == True",
column_names=["disease", "tissue_general"]
)
counts = cell_metadata.groupby(["disease", "tissue_general"]).size()
**模式:探索数据集信息**
python
# 访问数据集表
datasets = census["census_info"]["datasets"].read().concat().to_pandas()
# 按特定条件过滤
covid_datasets = datasets[datasets["disease"].str.contains("COVID", na=False)]
### 2. 中小规模查询 (AnnData)
当结果适合内存(通常 < 10万个细胞)时,使用 `get_anndata()`。
**模式:组织特异性细胞类型查询**
python
adata = cellxgene_census.get_anndata(
census=census,
organism="Homo sapiens",
obs_value_filter="cell_type == 'B cell' and tissue_general == 'lung' and is_primary_data == True",
obs_column_names=["assay", "disease", "sex", "donor_id"],
)
**模式:多基因特异性查询**
python
marker_genes = ["CD4", "CD8A", "CD19", "FOXP3"]
# 首先获取基因 ID
gene_metadata = cellxgene_census.get_var(
census, "homo_sapiens",
value_filter=f"feature_name in {marker_genes}",
column_names=["feature_id", "feature_name"]
)
gene_ids = gene_metadata["feature_id"].tolist()
# 使用基因过滤器进行查询
adata = cellxgene_census.get_anndata(
census=census,
organism="Homo sapiens",
var_value_filter=f"feature_id in {gene_ids}",
obs_value_filter="cell_type == 'T cell' and is_primary_data == True",
)
**模式:多组织查询**
python
adata = cellxgene_census.get_anndata(
census=census,
organism="Homo sapiens",
obs_value_filter="tissue_general in ['lung', 'liver', 'kidney'] and is_primary_data == True",
obs_column_names=["