豆五行属什么| 把头是什么意思| pbo是什么| 翻白草长什么样| 脚麻吃什么药| 看破红尘是什么意思| 乳头发黑是什么原因| 埋单是什么意思| 身体年龄是什么意思| 莎莎舞是什么意思| 舒张压偏低是什么原因| 盗墓笔记的结局是什么| 高血脂吃什么| 男生吃菠萝有什么好处| kimi什么意思| 鳌鱼是什么鱼| 大黄蜂是什么车| 拉屎黑色的是什么原因| 排异是什么意思| 不割包皮有什么影响| 肠化生是什么意思| 手热脚凉是什么原因| 砍是什么生肖| 孕妇心情不好对胎儿有什么影响| 逝者已矣生者如斯是什么意思| 游离甲状腺素偏低是什么意思| 榴莲为什么苦| 为什么要长征| 结肠炎吃什么药效果最好| 香鱼又叫什么鱼| 官杀旺是什么意思| 澳门是什么时候回归的| 鹞子是什么鸟| 心脏不舒服做什么检查| 齐耳短发适合什么脸型| 什么人容易得红斑狼疮| 做梦梦到掉牙齿是什么意思| 屎壳郎长什么样子| 肺部微结节是什么意思| 骨折线模糊什么意思| 眼睛的睛可以组什么词| 溜达鸡是什么意思| 血常规检查能查出什么| 早上流鼻血是什么原因| 什么是匝道图片| 夏天喝什么茶减肥| 血脂高吃什么能降下来| 什么地方| 中药什么时候喝| 什么是点天灯| yair是什么牌子的空调| 胸口疼应该挂什么科| 927什么星座| 生长激素由什么分泌| 80年属什么| 1998年五行属什么| 血清谷丙转氨酶偏高是什么意思| 东方不败练的什么武功| 长痣是什么原因引起的| 什么是对称轴| 肩周炎挂什么科室| 鬼节会开什么生肖| 发烧挂什么科| 晟这个字读什么| 什么叫职业年金| 送手镯的寓意是什么| 冚家铲是什么意思| 种什么药材最快又值钱| 复方药是什么意思| 免疫力差吃什么可以增强抵抗力| 1.16是什么星座| 尿失禁是什么症状| 农历六月十九是什么星座| 浑身疼痛什么原因| 攻坚是什么意思| 小学教师需要什么学历| 6.4是什么星座| 沉积是什么意思| 补气吃什么好| 胎停是什么原因引起的| 什么是植物神经功能紊乱| 人最重要的是什么| 冬阴功是什么意思| 猪横利是什么| 急性青光眼是什么症状| 月经量少什么原因| abs材质是什么| 本科属于什么学位| 鸽子怕什么怎么赶走| 什么的快乐| 赏脸是什么意思| 啤酒加生鸡蛋一起喝有什么效果| 崩塌的读音是什么| 什么盛开| 好吧是什么意思| 早孕三项检查什么| 什么是全脂牛奶| 小孩口臭吃什么药效果最好| 高压和低压差值在什么范围正常| 胃疼买什么药| 衣服五行属什么| 猛吸气胸口疼什么原因| 昕字取名什么寓意| 绝经是什么意思| 孕妇刚生完孩子吃什么好| 造纸术什么时候发明的| 甲状腺肿大吃什么药| 新奇的什么| 火鸡面为什么那么辣| 为什么有的人晒不黑| 什么意思啊| 两肺间质性改变是什么意思| 辛未日五行属什么| 一生一世是什么意思| 69年属什么生肖| 孱弱是什么意思| 出局是什么意思| 偶尔胸闷是什么原因| 互粉是什么意思| 射精是什么感觉| 吃什么胎儿眼睛黑又亮| 中国四大国粹是什么| 尿突然是红褐色的是什么问题| 同房疼痛什么原因| 无犯罪证明需要什么材料| 猫瘟吃什么药| 蓝柑是什么水果| 什么是对食| 阴部毛变白是什么原因| 维生素b什么时候吃| 宝宝为什么老是吐奶| 蓝瘦香菇是什么意思| 盆腔炎吃什么消炎药效果好| 为什么会长汗疱疹| 小孩子手脚脱皮是什么原因| 什么时候收花生| 什么是窦性心律| 丹青是什么| 319是什么意思| lp是什么的简称| hcy是什么检查项目| 中午吃什么饭 家常菜| 林俊杰的粉丝叫什么| 床榻是什么意思| 孕妇贫血对胎儿有什么影响| 闫和阎有什么区别| 为什么叫客家人| 益生菌有什么功效| 7代表什么| 金达克宁和达克宁有什么区别| 农历五月是什么星座| 司南是什么| 窦性心律过速吃什么药| c3是什么驾驶证| 黄瓜不能和什么一起吃| 老人经常头晕是什么原因引起的| 蚯蚓可以钓什么鱼| 男人梦见蛇是什么预兆| 脂肪肝有什么危害| 什么是腺样体肥大| 纳征是什么意思| 医院的特需门诊是什么意思| 肛裂用什么药治最好效果最快| 类风湿不能吃什么食物| 2039年是什么年| 寅时属什么生肖| 双肺纹理增多是什么意思严重吗| 什么话什么说| 才美不外见的见是什么意思| 筷子掉地上是什么征兆| 屁股痛是什么引起的| 微笑表情代表什么意思| 膝盖痒是什么原因| 日行一善是什么意思| 消防队属于什么单位| 吃什么补白蛋白最快| 乙型肝炎表面抗体高是什么意思| 化痰祛痰吃什么药| 肌酐高可以吃什么水果| 甘露茶叶属于什么茶| 错落有致的意思是什么| 医保报销需要什么材料| 李白有什么之称| 炉中火是什么意思| 通便吃什么最快排便| 磷高有什么症状和危害| 什么时候喝牛奶效果最佳| 怀姜是什么姜| 冰糖和白砂糖有什么区别| 蛊是什么意思| 云州是现在的什么地方| cd ts 什么意思| 梦见自己大便是什么意思| 喝蒲公英根有什么好处| 胃炎伴糜烂吃什么药效果好| 怀孕梦到老公出轨预示什么| 胆在什么位置图片| 花青素是什么颜色| 权衡利弊是什么意思| 中人是什么意思| 邓紫棋为什么叫gem| 真言是什么意思| 椰子和椰青有什么区别| 早上跑步有什么好处| 婧读什么| 白细胞偏高有什么危害| 文替是什么意思| 碳十四检测是查什么的| 健康管理师是干什么的| 六月份适合种什么蔬菜| 糖醋排骨用什么醋好吃| 什么什么为什么| 什么树叶| 眼睛跳是什么原因| 彩虹是什么形状| 健脾胃吃什么食物好| 蜥蜴人是什么| 猫有什么特点| FAN英语什么意思| 退职是什么意思| 元素是什么| 腰疼贴什么膏药| 减肥适合吃什么| 小鱼际发红预示着什么| kappa属于什么档次| 长春有什么特产| 榧子是什么| 吃什么补铁快| 缺钾最忌讳吃什么| 回头鱼是什么鱼| 七月上旬是什么时候| 消费税是什么| 夏雨什么| 吃什么补血补气最快| 谷旦是什么意思| 女性腰酸是什么原因引起的| 长白头发缺什么维生素| 玛卡和什么搭配壮阳效果最佳| 什么样的牙齿需要矫正| 拉油便是什么原因| 男人胡子长得快是什么原因| 发烧喝什么粥| 放屁是热的是什么原因| 高圆圆老公叫什么名字| 私生子是什么意思| 缺钠是什么原因造成的| 飞鱼籽是什么鱼的籽| 夏天煲鸡汤放什么材料| 心电图逆钟向转位是什么意思| mlb中文叫什么| 金牛座的幸运色是什么| bgo是什么意思| 女性为什么会感染巨细胞病毒| 中国民间为什么要吃腊八粥| 土茯苓和什么煲汤最好| 鲣鱼是什么鱼| shia是什么意思| 喝什么牌子的水最健康| 咳嗽吃什么好的快| 倾向是什么意思| 宽宽的什么填空| 胸透能查出什么| 眼睛发炎吃什么消炎药| 甘油三酯高吃什么食物降得快| 百度Jump to content

西藏:赴日喀则、阿里、那曲等地市检查“扫黄打非...

From Wikipedia, the free encyclopedia
百度 人、地、城“三位一体”第四个落脚点是人的精神,关键评价指标是城市的创新氛围和心灵启迪。

In statistics and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Topic modeling is a frequently used text-mining tool for discovery of hidden semantic structures in a text body. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats, and "the" and "is" will appear approximately equally in both. A document typically concerns multiple topics in different proportions; thus, in a document that is 10% about cats and 90% about dogs, there would probably be about 9 times more dog words than cat words. The "topics" produced by topic modeling techniques are clusters of similar words. A topic model captures this intuition in a mathematical framework, which allows examining a set of documents and discovering, based on the statistics of the words in each, what the topics might be and what each document's balance of topics is.

Topic models are also referred to as probabilistic topic models, which refers to statistical algorithms for discovering the latent semantic structures of an extensive text body. In the age of information, the amount of the written material we encounter each day is simply beyond our processing capacity. Topic models can help to organize and offer insights for us to understand large collections of unstructured text bodies. Originally developed as a text-mining tool, topic models have been used to detect instructive structures in data such as genetic information, images, and networks. They also have applications in other fields such as bioinformatics[1] and computer vision.[2]

History

[edit]

An early topic model was described by Papadimitriou, Raghavan, Tamaki and Vempala in 1998.[3] Another one, called probabilistic latent semantic analysis (PLSA), was created by Thomas Hofmann in 1999.[4] Latent Dirichlet allocation (LDA), perhaps the most common topic model currently in use, is a generalization of PLSA. Developed by David Blei, Andrew Ng, and Michael I. Jordan in 2002, LDA introduces sparse Dirichlet prior distributions over document-topic and topic-word distributions, encoding the intuition that documents cover a small number of topics and that topics often use a small number of words.[5] Other topic models are generally extensions on LDA, such as Pachinko allocation, which improves on LDA by modeling correlations between topics in addition to the word correlations which constitute topics. Hierarchical latent tree analysis (HLTA) is an alternative to LDA, which models word co-occurrence using a tree of latent variables and the states of the latent variables, which correspond to soft clusters of documents, are interpreted as topics.

Animation of the topic detection process in a document-word matrix through biclustering. Every column corresponds to a document, every row to a word. A cell stores the frequency of a word in a document, with dark cells indicating high word frequencies. This procedure groups documents, which use similar words, as it groups words occurring in a similar set of documents. Such groups of words are then called topics. More usual topic models, such as LDA, only group documents, based on a more sophisticated and probabilistic mechanism.

Topic models for context information

[edit]

Approaches for temporal information include Block and Newman's determination of the temporal dynamics of topics in the Pennsylvania Gazette during 1728–1800. Griffiths & Steyvers used topic modeling on abstracts from the journal PNAS to identify topics that rose or fell in popularity from 1991 to 2001 whereas Lamba & Madhusushan [6] used topic modeling on full-text research articles retrieved from DJLIT journal from 1981 to 2018. In the field of library and information science, Lamba & Madhusudhan [6][7][8][9] applied topic modeling on different Indian resources like journal articles and electronic theses and resources (ETDs). Nelson [10] has been analyzing change in topics over time in the Richmond Times-Dispatch to understand social and political changes and continuities in Richmond during the American Civil War. Yang, Torget and Mihalcea applied topic modeling methods to newspapers from 1829 to 2008. Mimno used topic modelling with 24 journals on classical philology and archaeology spanning 150 years to look at how topics in the journals change over time and how the journals become more different or similar over time.

Yin et al.[11] introduced a topic model for geographically distributed documents, where document positions are explained by latent regions which are detected during inference.

Chang and Blei[12] included network information between linked documents in the relational topic model, to model the links between websites.

The author-topic model by Rosen-Zvi et al.[13] models the topics associated with authors of documents to improve the topic detection for documents with authorship information.

HLTA was applied to a collection of recent research papers published at major AI and Machine Learning venues. The resulting model is called The AI Tree. The resulting topics are used to index the papers at aipano.cse.ust.hk to help researchers track research trends and identify papers to read, and help conference organizers and journal editors identify reviewers for submissions.

To improve the qualitative aspects and coherency of generated topics, some researchers have explored the efficacy of "coherence scores", or otherwise how computer-extracted clusters (i.e. topics) align with a human benchmark.[14][15] Coherence scores are metrics for optimising the number of topics to extract from a document corpus.[16]

Algorithms

[edit]

In practice, researchers attempt to fit appropriate model parameters to the data corpus using one of several heuristics for maximum likelihood fit. A survey by D. Blei describes this suite of algorithms.[17] Several groups of researchers starting with Papadimitriou et al.[3] have attempted to design algorithms with provable guarantees. Assuming that the data were actually generated by the model in question, they try to design algorithms that probably find the model that was used to create the data. Techniques used here include singular value decomposition (SVD) and the method of moments. In 2012 an algorithm based upon non-negative matrix factorization (NMF) was introduced that also generalizes to topic models with correlations among topics.[18]

In 2017, neural network has been leveraged in topic modeling to make it faster in inference,[19] which has been extended weakly supervised version.[20]

In 2018 a new approach to topic models was proposed: it is based on stochastic block model.[21]

Because of the recent development of LLM, topic modeling has leveraged LLM through contextual embedding[22] and fine tuning.[23]

Applications of topic models

[edit]

To quantitative biomedicine

[edit]

Topic models are being used also in other contexts. For examples uses of topic models in biology and bioinformatics research emerged.[24] Recently topic models has been used to extract information from dataset of cancers' genomic samples.[25] In this case topics are biological latent variables to be inferred.

To analysis of music and creativity

[edit]

Topic models can be used for analysis of continuous signals like music. For instance, they were used to quantify how musical styles change in time, and identify the influence of specific artists on later music creation.[26]

See also

[edit]

References

[edit]
  1. ^ Blei, David (April 2012). "Probabilistic Topic Models". Communications of the ACM. 55 (4): 77–84. doi:10.1145/2133806.2133826. S2CID 753304.
  2. ^ Cao, Liangliang, and Li Fei-Fei. "Spatially coherent latent topic model for concurrent segmentation and classification of objects and scenes." 2007 IEEE 11th International Conference on Computer Vision. IEEE, 2007.
  3. ^ a b Papadimitriou, Christos; Raghavan, Prabhakar; Tamaki, Hisao; Vempala, Santosh (1998). "Latent semantic indexing". Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems - PODS '98. pp. 159–168. doi:10.1145/275487.275505. ISBN 978-0897919968. S2CID 1479546. Archived from the original (Postscript) on 2025-08-06. Retrieved 2025-08-06.
  4. ^ Hofmann, Thomas (1999). "Probabilistic Latent Semantic Indexing" (PDF). Proceedings of the Twenty-Second Annual International SIGIR Conference on Research and Development in Information Retrieval. Archived from the original (PDF) on 2025-08-06.
  5. ^ Blei, David M.; Ng, Andrew Y.; Jordan, Michael I; Lafferty, John (January 2003). "Latent Dirichlet allocation". Journal of Machine Learning Research. 3: 993–1022. doi:10.1162/jmlr.2003.3.4-5.993.
  6. ^ a b Lamba, Manika jun (2019). "Mapping of topics in DESIDOC Journal of Library and Information Technology, India: a study". Scientometrics. 120 (2): 477–505. doi:10.1007/s11192-019-03137-5. ISSN 0138-9130. S2CID 174802673.
  7. ^ Lamba, Manika jun (2019). "Metadata Tagging and Prediction Modeling: Case Study of DESIDOC Journal of Library and Information Technology (2008-2017)". World Digital Libraries. 12: 33–89. doi:10.18329/09757597/2019/12103 (inactive 12 July 2025). ISSN 0975-7597.{{cite journal}}: CS1 maint: DOI inactive as of July 2025 (link)
  8. ^ Lamba, Manika may (2019). "Author-Topic Modeling of DESIDOC Journal of Library and Information Technology (2008-2017), India". Library Philosophy and Practice.
  9. ^ Lamba, Manika sep (2018). Metadata Tagging of Library and Information Science Theses: Shodhganga (2013-2017) (PDF). ETD2018:Beyond the boundaries of Rims and Oceans. Taiwan, Taipei.
  10. ^ Nelson, Rob. "Mining the Dispatch". Mining the Dispatch. Digital Scholarship Lab, University of Richmond. Retrieved 26 March 2021.
  11. ^ Yin, Zhijun (2011). "Geographical topic discovery and comparison". Proceedings of the 20th international conference on World wide web. pp. 247–256. doi:10.1145/1963405.1963443. ISBN 9781450306324. S2CID 17883132.
  12. ^ Chang, Jonathan (2009). "Relational Topic Models for Document Networks" (PDF). Aistats. 9: 81–88.
  13. ^ Rosen-Zvi, Michal (2004). "The author-topic model for authors and documents". Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence: 487–494. arXiv:1207.4169.
  14. ^ Nikolenko, Sergey (2017). "Topic modelling for qualitative studies". Journal of Information Science. 43: 88–102. doi:10.1177/0165551515617393. S2CID 30657489.
  15. ^ Reverter-Rambaldi, Marcel (2022). Topic Modelling in Spontaneous Speech Data (Honours thesis). Australian National University. doi:10.25911/M1YF-ZF55.
  16. ^ Newman, David (2010). "Automatic evaluation of topic coherence". Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics: 100–108.
  17. ^ Blei, David M. (April 2012). "Introduction to Probabilistic Topic Models" (PDF). Comm. ACM. 55 (4): 77–84. doi:10.1145/2133806.2133826. S2CID 753304.
  18. ^ Sanjeev Arora; Rong Ge; Ankur Moitra (April 2012). "Learning Topic Models—Going beyond SVD". arXiv:1204.1956 [cs.LG].
  19. ^ Miao, Yishu; Grefenstette, Edward; Blunsom, Phil (2017). "Discovering Discrete Latent Topics with Neural Variational Inference". Proceedings of the 34th International Conference on Machine Learning. PMLR: 2410–2419. arXiv:1706.00359.
  20. ^ Xu, Weijie; Jiang, Xiaoyu; Sengamedu Hanumantha Rao, Srinivasan; Iannacci, Francis; Zhao, Jinjin (2023). "vONTSS: vMF based semi-supervised neural topic modeling with optimal transport". Findings of the Association for Computational Linguistics: ACL 2023. Stroudsburg, PA, USA: Association for Computational Linguistics: 4433–4457. arXiv:2307.01226. doi:10.18653/v1/2023.findings-acl.271.
  21. ^ Martin Gerlach; Tiago Pexioto; Eduardo Altmann (2018). "A network approach to topic models". Science Advances. 4 (7): eaaq1360. arXiv:1708.01677. Bibcode:2018SciA....4.1360G. doi:10.1126/sciadv.aaq1360. PMC 6051742. PMID 30035215.
  22. ^ Bianchi, Federico; Terragni, Silvia; Hovy, Dirk (2021). "Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence". Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Stroudsburg, PA, USA: Association for Computational Linguistics. pp. 759–766. doi:10.18653/v1/2021.acl-short.96.
  23. ^ Xu, Weijie; Hu, Wenxiang; Wu, Fanyou; Sengamedu, Srinivasan (2023). "DeTiME: Diffusion-Enhanced Topic Modeling using Encoder-decoder based LLM". Findings of the Association for Computational Linguistics: EMNLP 2023. Stroudsburg, PA, USA: Association for Computational Linguistics: 9040–9057. arXiv:2310.15296. doi:10.18653/v1/2023.findings-emnlp.606.
  24. ^ Liu, L.; Tang, L.; et al. (2016). "An overview of topic modeling and its current applications in bioinformatics". SpringerPlus. 5 (1): 1608. doi:10.1186/s40064-016-3252-8. PMC 5028368. PMID 27652181. S2CID 16712827.
  25. ^ Valle, F.; Osella, M.; Caselle, M. (2020). "A Topic Modeling Analysis of TCGA Breast and Lung Cancer Transcriptomic Data". Cancers. 12 (12): 3799. doi:10.3390/cancers12123799. PMC 7766023. PMID 33339347. S2CID 229325007.
  26. ^ Shalit, Uri; Weinshall, Daphna; Chechik, Gal (2025-08-06). "Modeling Musical Influence with Topic Models". Proceedings of the 30th International Conference on Machine Learning. PMLR: 244–252.

Further reading

[edit]
[edit]
天地不仁以万物为刍狗是什么意思 虫草泡水喝有什么功效 嘴巴长溃疡是什么原因 假菌丝是什么意思 膜拜是什么意思
烧伤的疤痕怎么去除用什么法最好 623是什么意思 梦到高考是什么意思 lofter是什么意思 头晕眼睛模糊是什么原因
空调一匹是什么意思 天麻起什么作用 什么时候打胎对身体伤害最小 脸霜什么牌子的好 咳嗽干呕是什么原因
免疫比浊法是什么意思 1月27日什么星座 01年属什么 梦见战争是什么兆头 故宫为什么叫紫禁城
做是什么感觉xjhesheng.com 属虎的和什么属相最配hcv8jop1ns9r.cn 大腿内侧什么经络hanqikai.com 枣子什么季节成熟hcv8jop6ns4r.cn 腺肌症不治疗会导致什么结果hcv9jop3ns3r.cn
经常眨眼睛是什么原因hcv8jop9ns3r.cn 孕妇梦见好多蛇是什么预兆hcv9jop2ns1r.cn 护肝喝什么茶yanzhenzixun.com 吃孕酮片有什么副作用hcv8jop3ns7r.cn 天宫是什么意思hcv9jop1ns7r.cn
肌酐是什么病hcv9jop8ns1r.cn 九月四号是什么星座的hcv8jop7ns8r.cn 早搏吃什么药最管用hcv7jop9ns3r.cn 为什么胃有灼热感hcv8jop5ns9r.cn 看膝盖挂什么科hcv7jop9ns2r.cn
实名认证是什么意思hcv8jop0ns9r.cn 流鼻涕吃什么药最管用hcv8jop3ns9r.cn 八月13号是什么星座hcv9jop5ns2r.cn 悸是什么意思hcv9jop4ns2r.cn 卡卡西为什么要杀琳hcv9jop3ns3r.cn
百度