QQ登录

只需要一步,快速开始

 注册地址  找回密码
查看: 2560|回复: 4
打印 上一主题 下一主题

每日科技报告 第48期Statistical Method for Genetic Studies Cut Computation Time

[复制链接]
字体大小: 正常 放大

522

主题

10

听众

4072

积分

升级  69.07%

  • TA的每日心情
    奋斗
    2015-1-3 17:18
  • 签到天数: 6 天

    [LV.2]偶尔看看I

    自我介绍
    学习中!

    优秀斑竹奖 元老勋章 新人进步奖 最具活力勋章

    群组Matlab讨论组

    群组C 语言讨论组

    群组每天多学一点点

    群组数学趣味、游戏、IQ等

    群组南京邮电大学数模协会

    跳转到指定楼层
    1#
    发表于 2010-3-21 21:58 |只看该作者 |倒序浏览
    |招呼Ta 关注Ta
    本帖最后由 sea_star666 于 2010-3-21 22:03 编辑

    New Statistical Method for Genetic Studies Could Cut Computation Time from Years to Hours

    In the ongoing quest to identify the genetic factors involved in disease, scientists have increasingly turned to genome-wide association studies, or GWAS, which enable the scanning of up to a million genetic markers in thousands of individuals.

    These studies generally compare the frequency of genetic variants between two groups -- those with a particular disease and healthy individuals. Differences in the frequency of a given variant suggest the variant may be involved in the disease.
    Over the last few years, such studies have successfully implicated hundreds of genes in human disease, and the research has been used to identify risk and protective factors for asthma, cancer, diabetes, heart disease, mental illness and other conditions.
    But genome-wide association studies aren't perfect. In fact, the genealogy of study participants can sometimes prove a stumbling block to accurate findings.
    "Unfortunately, differences in frequencies can arise for reasons unrelated to the disease if the individuals collected have ancestry from different regions of the world," said Eleazar Eskin, associate professor of computer science at the UCLA Henry Samueli School of Engineering and Applied Science, who holds a joint appointment in the department of human genetics at the David Geffen School of Medicine at UCLA.
    "This problem, called 'population structure,' has led to many apparent discoveries of genes involved in disease which later turned out to be artifacts," he said.
    In a new study to be published in the April edition of the journal Nature Genetics, Eskin and his research group unveil a new computational strategy for GWAS that corrects for population structure and is both faster and easier to use.
    One of the basic assumptions in typical GWAS is that participating individuals are "unrelated," and investigators typically perform screening procedures to ensure that pairs of individuals are not close relatives. However, due to the complex history of the human population, none of the individual pairs are perfectly unrelated, and each individual pair is somewhat distantly related to various degrees. This is referred to as "pairwise relatedness."
    "Such a variety in degrees of relatedness -- which we call 'sample structure' -- can be manifested into two different forms: population structure and hidden relatedness. While typical statistical methods for GWAS handle only either of the two forms, our method can handle both aspects of sample structure simultaneously in a computationally efficient manner," said Hyun Min Kang, an assistant research professor in biostatistics at the University of Michigan and an author of the study.
    "Moreover, if the samples come from a very homogeneous population, it is possible that some of the subjects are, in fact, distantly related," said Chiara Sabatti, professor of human genetics and statistics at UCLA and a corresponding author of the study. "In the analysis of GWAS, it is necessary to correct for such sample structure, which can lead to spurious association signals. The methods presented in our ** allow researchers to do this in a manner that is both fast and effective."
    Eskin's team worked with a data set of 5,000 people from Finland who were born in the same year, tracked over an extensive amount of time, and had a large amount of population relatedness.
    The 5,000 people produced a data set of 300,000 variants. From these 300,000 points of variation, the group examined pairwise relatedness between individuals, which means they compared the number of mutations each shared. From the mutations, Eskin's group could estimate how related individuals were to each other.
    "It was very interesting to see how much these pairwise relations explained of the trait," Eskin said. "So what we did in this pap-er is we proposed a statistical method that also allowed us to correct for a wide range of sample structure by explicitly accounting for pairwise relatedness between individuals using high-density markers in modeling the distribution of observable traits."
    This variance component in the new strategy, called EMMAX (Efficient Mixed Model Association Expedited), would capture the complex mixture of both population structure and hidden relatedness, direct byproducts of genealogy, and correct for these relationships when performing genetic mapping.
    "Capitalizing on the characteristics of complex traits in humans, we made a few simplifying assumptions that allowed us to dramatically increase the speed of computations, ma-king our approach readily applicable to genome-wide association studies with tens of thousands of samples," Eskin said.
    "Our variance component model is actually a widely known classical model for genetic mapping," Kang said. "However it was too computationally costly to be applied to the current scale of GWAS involving thousands of individuals with hundreds of thousands of genetic variants because even the fastest method -- which we previously developed -- took years of computational time to analyze the data once. We further expedited the method by capitalizing the characteristics of most human association studies, reducing the computational time from years to hours."
    According to Eskin, their method will also have a large impact on admixed populations, which are basically samples of individuals who have ancestry from multiple regions around the world. Studies on Los Angeles, for example, would benefit from this method greatly, as people in the city are very ethnically diverse and it's difficult to obtain very accurate estimates of people's ancestry.
    The study was supported in part by the National Toxicology Program/National Institute of Environmental Health Sciences.
    zan
    转播转播0 分享淘帖0 分享分享0 收藏收藏0 支持支持3 反对反对0 微信微信
    第一次用linux登录madio,纪念一下

    1

    主题

    6

    听众

    693

    积分

    升级  23.25%

  • TA的每日心情
    开心
    2021-2-3 08:59
  • 签到天数: 9 天

    [LV.3]偶尔看看II

    新人进步奖 发帖功臣 最具活力勋章

    群组2017himcm交流群组

    回复

    使用道具 举报

    0

    主题

    3

    听众

    21

    积分

    升级  16.84%

    该用户从未签到

    自我介绍
    200 字节以内

    不支持自定义 Discuz! 代码
    回复

    使用道具 举报

    0

    主题

    4

    听众

    1156

    积分

    升级  15.6%

  • TA的每日心情
    开心
    2012-11-30 12:23
  • 签到天数: 2 天

    [LV.1]初来乍到

    自我介绍
    我是一名信息与计算科学的学生,我喜欢数学,我学习努力,工作认真。能吃苦。自制力强,很想和人讨论数学。

    不支持自定义 Discuz! 代码

    新人进步奖

    群组数学建模

    回复

    使用道具 举报

    huangjin        

    6

    主题

    3

    听众

    294

    积分

    升级  97%

    该用户从未签到

    新人进步奖

    呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵谢谢~!!!!!!
    回复

    使用道具 举报

    您需要登录后才可以回帖 登录 | 注册地址

    qq
    收缩
    • 电话咨询

    • 04714969085
    fastpost

    关于我们| 联系我们| 诚征英才| 对外合作| 产品服务| QQ

    手机版|Archiver| |繁體中文 手机客户端  

    蒙公网安备 15010502000194号

    Powered by Discuz! X2.5   © 2001-2013 数学建模网-数学中国 ( 蒙ICP备14002410号-3 蒙BBS备-0002号 )     论坛法律顾问:王兆丰

    GMT+8, 2025-8-19 23:40 , Processed in 0.438318 second(s), 75 queries .

    回顶部