지식창고/논문연구

딥러닝 (DL, Deep Learning)과 대규모언어모델(LLM, Large Language Model)의 관계

오이시이 2025. 10. 31. 08:20
반응형

딥러닝 (DL, Deep Learning)과 대규모언어모델(LLM, Large Language Model)의 관계

딥러닝은 인공신경망을 사용하여 복잡한 데이터 패턴을 자동으로 학습하는 머신러닝의 하위 분야이며, 대규모 언어 모델(LLM)은 이런 딥러닝 기반의 자연어처리(NLP) 기술을 활용해 방대한 텍스트 데이터를 학습하고 인간처럼 언어를 이해·생성하는 모델입니다.

관련하여 논문의.참조 문헌 인용을 추가해서 정리해줘

딥러닝과 대규모 언어 모델의 관계

딥러닝(Deep Learning)은 다층 신경망을 활용하여 데이터의 복잡한 패턴을 자동으로 학습하는 머신러닝의 핵심 분야로, 대규모 언어 모델(Large Language Model, LLM)은 딥러닝 기술을 기반으로 구축된 자연어처리(NLP) 모델입니다. 딥러닝의 발전과 혁신적인 아키텍처들이 결합되면서 현대의 강력한 LLM이 등장할 수 있었습니다.

딥러닝의 기초 이론

역전파 알고리즘(Backpropagation)은 신경망 학습의 근간이 되는 핵심 기술로, Rumelhart, Hinton, Williams(1986)가 "Learning representations by back-propagating errors"에서 체계화했습니다[1][2][3]. 이 알고리즘은 신경망의 가중치를 효율적으로 조정하여 출력 오차를 최소화하는 방법을 제공하며, 현대 딥러닝의 기반이 되었습니다.

컨볼루션 신경망(CNN)의 발전에서 LeCun et al.(1998)의 "Gradient-Based Learning Applied to Document Recognition"[4][5]은 LeNet 아키텍처를 통해 이미지 인식 분야에서 CNN의 효과를 입증했습니다. 이후 Krizhevsky, Sutskever, Hinton(2012)의 AlexNet[6][7]은 ImageNet 대회에서 압도적 성능으로 딥러닝 시대를 본격화했습니다.

ResNet(Residual Networks)은 He et al.(2015)이 "Deep Residual Learning for Image Recognition"[8][9][10]에서 제안한 혁신적 아키텍처로, residual connection(skip connection)을 도입하여 152층 이상의 매우 깊은 신경망을 효과적으로 학습할 수 있게 했습니다. 이는 ILSVRC 2015에서 우승하며 깊은 신경망 학습의 새로운 지평을 열었습니다.

딥러닝 최적화 및 정규화 기술

Dropout은 Srivastava et al.(2014)이 "Dropout: A Simple Way to Prevent Neural Networks from Overfitting"[11][12][13]에서 제안한 정규화 기법으로, 학습 중 무작위로 뉴런을 제거하여 과적합을 방지합니다. 이는 시각, 음성, 문서 분류 등 다양한 분야에서 신경망 성능을 크게 향상시켰습니다.

Batch Normalization은 Ioffe and Szegedy(2015)가 "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift"[14][15][16]에서 제안했으며, 층별 입력을 정규화하여 학습 속도를 획기적으로 개선하고 높은 학습률 사용을 가능하게 했습니다.

Adam Optimizer는 Kingma and Ba(2014)가 "Adam: A Method for Stochastic Optimization"[17][18][19]에서 제안한 적응형 학습률 최적화 알고리즘으로, 1차 기울기 기반 확률적 목적 함수 최적화에 효과적입니다. Adam은 현재 딥러닝 모델 학습에 가장 널리 사용되는 최적화 기법 중 하나입니다.

ReLU(Rectified Linear Unit)는 Nair and Hinton(2010)이 "Rectified Linear Units Improve Restricted Boltzmann Machines"[20][21][22][23]에서 제안한 활성화 함수로, 기존 sigmoid나 tanh 함수의 vanishing gradient 문제를 해결하며 딥러닝 모델의 학습 속도와 성능을 크게 향상시켰습니다.

자연어처리를 위한 단어 임베딩

Word2Vec은 Mikolov et al.(2013)이 "Efficient Estimation of Word Representations in Vector Space"[24][25][26]에서 제안한 단어 임베딩 기술로, CBOW(Continuous Bag of Words)와 Skip-gram 두 가지 아키텍처를 통해 단어를 저차원 벡터 공간에 효율적으로 표현합니다. 이는 단어 간 의미론적 유사성을 포착하여 NLP 분야에 혁명을 가져왔습니다.

GloVe(Global Vectors for Word Representation)는 Pennington, Socher, Manning(2014)이 "GloVe: Global Vectors for Word Representation"[27][28][29]에서 제안한 단어 임베딩 방법으로, 전역 단어-단어 동시출현 행렬을 활용한 로그 쌍선형 회귀 모델입니다. GloVe는 전역 행렬 분해와 지역 문맥 윈도우 방법의 장점을 결합했습니다.

ELMo(Embeddings from Language Models)는 Peters et al.(2018)이 "Deep Contextualized Word Representations"[30][31][32][33]에서 제안한 문맥화된 단어 표현으로, 양방향 LSTM 언어 모델의 내부 상태를 활용하여 다의어 문제를 해결하고 질의응답, 감정 분석 등 다양한 NLP 태스크에서 성능을 크게 향상시켰습니다.

Transformer와 Attention Mechanism

Attention Mechanism은 Bahdanau, Cho, Bengio(2014)가 "Neural Machine Translation by Jointly Learning to Align and Translate"[34][35][36]에서 제안했으며, 고정 길이 벡터의 병목 현상을 해결하기 위해 디코더가 소스 문장의 관련 부분을 자동으로 탐색하는 메커니즘을 도입했습니다. 이는 기계 번역에서 혁신적 성능 향상을 가져왔습니다.

Transformer 아키텍처는 Vaswani et al.(2017)이 "Attention Is All You Need"[37][38][39]에서 제안한 혁신적 모델로, RNN이나 CNN 없이 순수하게 attention mechanism만으로 구성됩니다. Transformer는 self-attention과 scaled dot-product attention을 통해 병렬 처리가 가능하며, 현대 LLM의 기반 아키텍처가 되었습니다.

대규모 언어 모델의 등장

BERT(Bidirectional Encoder Representations from Transformers)는 Devlin et al.(2018)이 "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"[40][41][42]에서 제안한 모델로, Transformer의 인코더 구조를 활용하여 양방향 문맥을 학습합니다. Masked Language Model(MLM)과 Next Sentence Prediction(NSP)을 통해 사전학습하며, 11개 NLP 태스크에서 SOTA를 달성했습니다.

GPT-3(Generative Pre-trained Transformer 3)는 Brown et al.(2020)이 "Language Models are Few-Shot Learners"[43][44][45][46]에서 제안한 대규모 언어 모델로, 1750억 개의 매개변수를 가지며 few-shot learning 능력을 보여줍니다. GPT-3는 번역, 질의응답 등 다양한 태스크에서 기울기 업데이트나 미세조정 없이도 강력한 성능을 달성했습니다.

T5(Text-to-Text Transfer Transformer)는 Raffel et al.(2019)가 "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"[47][48][49][50]에서 제안한 통합 프레임워크로, 모든 NLP 태스크를 텍스트-투-텍스트 형식으로 변환합니다. T5는 C4(Colossal Clean Crawled Corpus) 데이터셋으로 학습하여 다양한 벤치마크에서 SOTA를 달성했습니다.

GPT-4는 OpenAI(2023)가 "GPT-4 Technical Report"[51][52][53][54]에서 발표한 멀티모달 모델로, 텍스트와 이미지를 입력으로 받아 텍스트를 출력합니다. GPT-4는 변호사 시험에서 상위 10% 수준의 성능을 보이며, RLHF(Reinforcement Learning from Human Feedback) 기법을 활용하여 사실성과 안전성을 개선했습니다.

LLM 학습 및 정렬 기술

Scaling Laws는 Kaplan et al.(2020)이 "Scaling Laws for Neural Language Models"[55][56][57]에서 제안한 경험적 법칙으로, 언어 모델의 손실이 모델 크기, 데이터셋 크기, 학습에 사용된 계산량과 멱법칙 관계를 가짐을 보였습니다. 이는 대규모 모델 개발에 중요한 이론적 기반을 제공했습니다.

RLHF(Reinforcement Learning from Human Feedback)는 Christiano et al.(2017)이 "Deep Reinforcement Learning from Human Preferences"[58][59][60]에서 제안한 기법으로, 인간의 선호도 피드백을 활용하여 보상 함수를 학습하고 이를 통해 강화학습으로 모델을 최적화합니다. RLHF는 InstructGPT와 ChatGPT 등 현대 LLM의 핵심 정렬 기술입니다[61].

결론

딥러닝은 역전파 알고리즘[1][3], CNN[4][5], ResNet[8][10] 등 기초 아키텍처에서 시작하여, Dropout[11][13], Batch Normalization[14][16], Adam Optimizer[17][19] 등 최적화 기술을 거쳐 발전했습니다. Word2Vec[24][26], GloVe[27][29], ELMo[32][33] 등 단어 임베딩 기술과 Attention Mechanism[34][35], Transformer[37][39] 아키텍처가 결합되면서 BERT[40][41], GPT-3[43][46], T5[48][49], GPT-4[52][53] 등 현대 LLM이 탄생했습니다. Scaling Laws[55][57]와 RLHF[58][59] 등 최신 기술은 LLM의 성능과 안전성을 더욱 향상시키며, 딥러닝과 LLM의 관계는 지속적으로 진화하고 있습니다.

인용:
[1] [PDF] Learning representations by back-propagating errors https://www.semanticscholar.org/paper/Learning-representations-by-back-propagating-errors-Rumelhart-Hinton/052b1d8ce63b07fec3de9dbb583772d860b7c769
[2] Backpropagation https://en.wikipedia.org/wiki/Backpropagation
[3] Learning representations by back-propagating errors https://www.nature.com/articles/323533a0
[4] Gradient-Based Learning Applied to Document Recognition http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf
[5] Gradient-based learning applied to document recognition https://ieeexplore.ieee.org/document/726791
[6] The 9 Deep Learning Papers You Need To Know About ... https://adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html
[7] ImageNet Classification with Deep Convolutional Neural ... https://proceedings.neurips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
[8] [1512.03385] Deep Residual Learning for Image Recognition https://arxiv.org/abs/1512.03385
[9] arXiv:1512.03385v1 [cs.CV] 10 Dec 2015 https://arxiv.org/pdf/1512.03385.pdf
[10] Deep Residual Learning for Image Recognition https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf
[11] A Simple Way to Prevent Neural Networks from Overfitting https://jmlr.csail.mit.edu/beta/papers/v15/srivastava14a.html
[12] A Simple Way to Prevent Neural Networks from Overfitting https://jmlr.org/papers/v15/srivastava14a.html
[13] Dropout: A Simple Way to Prevent Neural Networks from ... https://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf
[14] Batch Normalization: Accelerating Deep Network Training ... https://arxiv.org/abs/1502.03167
[15] Batch Normalization: Accelerating Deep Network Training ... http://research.google.com/pubs/archive/43442.pdf
[16] Batch Normalization: Accelerating Deep Network Training by ... https://proceedings.mlr.press/v37/ioffe15.html
[17] [1412.6980] Adam: A Method for Stochastic Optimization https://arxiv.org/abs/1412.6980
[18] [PDF] Adam: A Method for Stochastic Optimization https://www.semanticscholar.org/paper/Adam:-A-Method-for-Stochastic-Optimization-Kingma-Ba/a6cb366736791bcccc5c8639de5a8f9636bf87e8
[19] adam:amethod for stochastic optimization https://arxiv.org/pdf/1412.6980.pdf
[20] Rectified linear unit https://en.wikipedia.org/wiki/Rectified_linear_unit
[21] ReLU Activation Function https://www.dremio.com/wiki/relu-activation-function/
[22] Activation Functions: Comparison of Trends in Practice and ... https://arxiv.org/pdf/1811.03378.pdf
[23] Rectified Linear Units Improve Restricted Boltzmann ... https://www.cs.toronto.edu/~hinton/absps/reluICML.pdf
[24] Efficient Estimation of Word Representations in Vector Space https://arxiv.org/abs/1301.3781
[25] Word2vec - YJJo - 티스토리 https://yjjo.tistory.com/13
[26] Efficient Estimation of Word Representations in Vector Space https://arxiv.org/pdf/1301.3781.pdf
[27] GloVe: Global Vectors for Word Representation https://aclanthology.org/D14-1162/
[28] [PDF] GloVe: Global Vectors for Word Representation https://www.semanticscholar.org/paper/GloVe:-Global-Vectors-for-Word-Representation-Pennington-Socher/f37e1b62a767a307c046404ca96bc140b3e68cb5
[29] GloVe: Global Vectors for Word Representation https://nlp.stanford.edu/pubs/glove.pdf
[30] Contextualized Word Embeddings https://www.csie.ntu.edu.tw/~miulab/s108-adl/doc/200331_ELMo.pdf
[31] Deep contextualized word representations - Hyunyoung2 https://hyunyoung2.github.io/2019/12/02/Deep_Contextualized_Word_Representations/
[32] Deep Contextualized Word Representations https://aclanthology.org/N18-1202/
[33] [1802.05365] Deep contextualized word representations https://arxiv.org/abs/1802.05365
[34] Neural Machine Translation by Jointly Learning to Align ... https://arxiv.org/abs/1409.0473
[35] Neural machine translation by https://arxiv.org/pdf/1409.0473.pdf
[36] Interactive Attention for Neural Machine Translation https://aclanthology.org/C16-1205.pdf
[37] Attention is All you Need https://papers.neurips.cc/paper/7181-attention-is-all-you-need.pdf
[38] Attention Is All You Need https://en.wikipedia.org/wiki/Attention_Is_All_You_Need
[39] [1706.03762] Attention Is All You Need https://arxiv.org/abs/1706.03762
[40] arXiv:1810.04805v2 [cs.CL] 24 May 2019 https://arxiv.org/pdf/1810.04805.pdf
[41] BERT: Pre-training of Deep Bidirectional Transformers for ... https://aclanthology.org/N19-1423/
[42] [논문] BERT: Pre-training of Deep Bidirectional ... https://velog.io/@pyopyo/BERT-Pre-training-of-Deep-Bidirectional-Transformers-for-Language-Understanding
[43] Language Models are Few-Shot Learners https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
[44] Language Models are Few-Shot Learners https://papers.nips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html
[45] Language Models are Few-Shot Learners (GPT-3) https://ysu1989.github.io/courses/au20/cse5539/GPT-3.pdf
[46] [2005.14165] Language Models are Few-Shot Learners https://arxiv.org/abs/2005.14165
[47] Exploring Transfer Learning with T5: the Text-To ... https://research.google/blog/exploring-transfer-learning-with-t5-the-text-to-text-transfer-transformer/
[48] Exploring the Limits of Transfer Learning with a Unified ... https://jmlr.org/papers/volume21/20-074/20-074.pdf
[49] Exploring the Limits of Transfer Learning with a Unified ... https://arxiv.org/abs/1910.10683
[50] Exploring the limits of transfer learning with a unified text-to ... https://dl.acm.org/doi/abs/10.5555/3455716.3455856
[51] GPT-4 Technical Report (OpenAI, 2023) https://www.gpters.org/llm-service/post/gpt4-technical-report-SoHeOZHMqGein4I
[52] GPT-4 https://openai.com/index/gpt-4-research/
[53] [2303.08774] GPT-4 Technical Report https://arxiv.org/abs/2303.08774
[54] GPT-4 https://openai.com/ko-KR/index/gpt-4-research/
[55] [2001.08361] Scaling Laws for Neural Language Models https://arxiv.org/abs/2001.08361
[56] Scaling Laws from the Data Manifold Dimension https://jmlr.csail.mit.edu/papers/volume23/20-1111/20-1111.pdf
[57] Scaling Laws for Neural Language Models https://arxiv.org/pdf/2001.08361.pdf
[58] Deep Reinforcement Learning from Human Preferences http://papers.neurips.cc/paper/7017-deep-reinforcement-learning-from-human-preferences.pdf
[59] Deep reinforcement learning from human preferences https://arxiv.org/abs/1706.03741
[60] Illustrating Reinforcement Learning from Human Feedback ... https://huggingface.co/blog/rlhf
[61] Training language models to follow instructions with ... https://cdn.openai.com/papers/Training_language_models_to_follow_instructions_with_human_feedback.pdf
[62] [2402.06196] Large Language Models: A Survey https://arxiv.org/abs/2402.06196
[63] Transformer Architecture and Attention Mechanisms in ... https://scholarworks.bwise.kr/cau/bitstream/2019.sw.cau/69879/1/Transformer%20Architecture%20and%20Attention%20Mechanisms%20in%20Genome%20Data%20Analysis%20A%20Comprehensive%20Review.pdf
[64] A Review on Large Language Models: Architectures ... https://ieeexplore.ieee.org/document/10433480/
[65] Famous Deep Learning Papers https://papers.baulab.info
[66] 30 Important Research Papers to Understand Large ... https://youssefh.substack.com/p/30-important-research-papers-to-understand
[67] Seminal Papers of Deep Learning - by Hussein Lezzaik https://husseinlezzaik.substack.com/p/seminal-papers-of-deep-learning
[68] Evaluating large language models: a systematic review of ... https://www.frontiersin.org/journals/computer-science/articles/10.3389/fcomp.2025.1523699/full
[69] terryum/awesome-deep-learning-papers: The most cited ... https://github.com/terryum/awesome-deep-learning-papers
[70] A Comprehensive Overview of Large Language Models https://arxiv.org/pdf/2307.06435.pdf
[71] Local Attention Mechanism: Boosting the Transformer ... https://arxiv.org/abs/2410.03805
[72] [1503.02531] Distilling the Knowledge in a Neural Network https://arxiv.org/abs/1503.02531
[73] LLM Research Papers: The 2025 List (January to June) https://magazine.sebastianraschka.com/p/llm-research-papers-2025-list-one
[74] Evaluation of Transformer model and Self-Attention ... https://www.sciencedirect.com/science/article/pii/S2214581823001258
[75] [Discussion] What are some papers you read which helped ... https://www.reddit.com/r/MachineLearning/comments/tht1qy/discussion_what_are_some_papers_you_read_which/
[76] itsual/Notable-LLM-Research-Papers https://github.com/itsual/Notable-LLM-Research-Papers
[77] Devlin et al 2018 - BERT: Pre-training of Deep Bidirectional ... https://czlwang.com/zettel/20200903163814-devlin_et_al_bert_2019.html
[78] Long short-term memory https://de.wikipedia.org/wiki/Long_short-term_memory
[79] 10.1. Long Short-Term Memory (LSTM) https://d2l.ai/chapter_recurrent-modern/lstm.html
[80] Long Short-Term Memory - an overview https://www.sciencedirect.com/topics/chemical-engineering/long-short-term-memory
[81] Long Short-Term Memory | Neural Computation https://direct.mit.edu/neco/article/9/8/1735/6109/Long-Short-Term-Memory
[82] Sentence Embeddings using Siamese BERT-Networks https://arxiv.org/abs/1908.10084
[83] LONG SHORT-TERM MEMORY 1 INTRODUCTION https://www.bioinf.jku.at/publications/older/2604.pdf
[84] A survey of GPT-3 family large language models including ... https://www.sciencedirect.com/science/article/pii/S2949719123000456
[85] BERT - Pretraining of Deep Bidirectional Transformers for ... https://rauleun.github.io/BERT
[86] Long Short-Term Memory | MIT Press Journals & Magazine https://ieeexplore.ieee.org/abstract/document/6795963
[87] Language models are few-shot learners https://dl.acm.org/doi/abs/10.5555/3495724.3495883
[88] BERT: Pre-training of Deep Bidirectional Transformers for ... https://www.semanticscholar.org/paper/BERT:-Pre-training-of-Deep-Bidirectional-for-Devlin-Chang/df2b0e26d0599ce3e70df8a9da02e51594e0e992
[89] Convolutional neural network https://en.wikipedia.org/wiki/Convolutional_neural_network
[90] Deep residual learning for image recognition; ResNet (2015) https://sjpyo.tistory.com/72
[91] Convolutional Networks and Applications in Vision http://yann.lecun.com/exdb/publis/pdf/lecun-iscas-10.pdf
[92] CNN Architecture(1)-LeNet - ITT's tech-blog https://kjhov195.github.io/2020-02-10-CNN_architecture_1/
[93] Learning representations by backpropagating errors https://gwern.net/doc/ai/nn/1986-rumelhart-2.pdf
[94] LeNet-5, 1998 - YJJo - 티스토리 https://yjjo.tistory.com/50
[95] [ResNet] He et al., 2015, Deep Residual Learning for Image ... https://pulsar-kkaturi.tistory.com/entry/CNN-%EB%AA%A8%EB%8D%B8Classification-ResNet-He-et-al-2015-Deep-Residual-Learning-for-Image-Recognition
[96] Learning representations by back-propagating errors https://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf
[97] [논문 리뷰 & 코드 구현] ResNet (Deep Residual Learning ... https://velog.io/@tony3ynot/%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0-ResNet-Deep-Residual-Learning-for-Image-Recognition
[98] Experiments on Learning by Back Propagation https://www.cnbc.cmu.edu/~plaut/papers/pdf/PlautNowlanHinton86TR.backprop.pdf
[99] (ResNet) Deep residual learning for image recognition 번역 및 ... https://sike6054.github.io/blog/paper/first-post/
[100] Back Propagation - I'm Lim - 티스토리 https://imlim0813.tistory.com/12
[101] RECURRENT BATCH NORMALIZATION https://openreview.net/pdf?id=r1VdcHcxx
[102] Dropout Regularization in Deep Learning https://www.digitalocean.com/community/tutorials/droput-regularization-deep-learning
[103] An Investigation of ADAM: A Stochastic Optimization Method https://michaelfperez.com/files/ADAM_Investigation_Report.pdf
[104] 8.5. Batch Normalization http://d2l.ai/chapter_convolutional-modern/batch-norm.html
[105] a simple way to prevent neural networks from overfitting https://www.semanticscholar.org/paper/Dropout:-a-simple-way-to-prevent-neural-networks-Srivastava-Hinton/34f25a8704614163c4095b3ee2fc969b60de4698
[106] Kingma, D.P. and Ba, J. (2014) Adam A Method ... https://www.scirp.org/reference/referencespapers
[107] Dropout: a simple way to prevent neural networks from ... https://dl.acm.org/doi/10.5555/2627435.2670313
[108] Diederik P. Kingma https://scholar.google.com/citations?user=yyIoQu4AAAAJ&hl=en
[109] Batch normalization | Proceedings of the 32nd International ... https://dl.acm.org/doi/10.5555/3045118.3045167
[110] Regularization of deep neural networks with spectral dropout https://www.sciencedirect.com/science/article/abs/pii/S0893608018302715
[111] Adam: A Method for Stochastic Optimization https://inspirehep.net/literature/1670744
[112] Word2Vec의 아이디어와 역할 - 데이널 『데이터 널리지』 https://bommbom.tistory.com/entry/Word2Vec%EC%9D%98-%EC%95%84%EC%9D%B4%EB%94%94%EC%96%B4%EC%99%80-%EC%97%AD%ED%95%A0
[113] arXiv:1411.5595v2 [cs.CL] 26 Nov 2014 https://arxiv.org/pdf/1411.5595.pdf
[114] Efficient Estimation of Word Representations in Vector Space https://secundo.tistory.com/88
[115] [술술 읽히는 논문 요약] GloVe: Global Vectors for Word ... https://jimmy-ai.tistory.com/11
[116] [논문리뷰] Deep Contextualized Word Representations (ELMo ... https://aiforeveryone.tistory.com/19
[117] Word2vec - 위키백과 https://translate.google.com/translate?u=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FWord2vec&hl=ko&sl=en&tl=ko&client=srp
[118] Deep contextualized word representations (2018) a.k.a. ... https://velog.io/@zvezda/Deep-contextualized-word-representations
[119] [술술 읽히는 논문 요약] Word2Vec 논문 - Skip-gram, CBOW https://jimmy-ai.tistory.com/7
[120] GloVe: Global Vectors for Word Representation https://nlp.stanford.edu/projects/glove/
[121] T5 모델 소개 및 실습 https://happygrammer.github.io/nlp/t5/
[122] GPT-4 OpenAI의 새롭게 향상된 챗봇 새 버전 발표 https://gpt4chatgpt.tistory.com/entry/GPT-4-OpenAI%EC%9D%98-%EC%83%88%EB%A1%AD%EA%B2%8C-%ED%96%A5%EC%83%81%EB%90%9C-%EC%B1%97%EB%B4%87-%EC%83%88-%EB%B2%84%EC%A0%84-%EB%B0%9C%ED%91%9C
[123] GPT-4 https://namu.wiki/w/GPT-4
[124] Switching Neurons On: Understanding ReLU Activation https://www.reinforcementlearningpath.com/relu-activation-function/
[125] Empirical study of the modulus as activation function in ... https://www.sciencedirect.com/science/article/pii/S0952197623000477
[126] Exploring the Limits of Transfer Learning with a Unified ... https://www.semanticscholar.org/paper/Exploring-the-Limits-of-Transfer-Learning-with-a-Raffel-Shazeer/6c4b76232bb72897685d19b3d264c6ee3005bc2b
[127] [PDF] Rectified Linear Units Improve Restricted Boltzmann ... https://www.semanticscholar.org/paper/Rectified-Linear-Units-Improve-Restricted-Boltzmann-Nair-Hinton/a538b05ebb01a40323997629e171c91aa28b8e2f
[128] T5: Exploring the Limits of Transfer Learning with a ... https://dlaiml.tistory.com/entry/T5-Exploring-the-Limits-of-Transfer-Learning-with-a-UnifiedText-to-Text-Transformer
[129] Reinforcement Learning from Human Feedback (RLHF) ... https://intuitionlabs.ai/articles/reinforcement-learning-human-feedback
[130] Explaining neural scaling laws https://www.pnas.org/doi/10.1073/pnas.2311878121
[131] garganm1/Neural-Machine-Translation-with-Bahdanau ... https://github.com/garganm1/Neural-Machine-Translation-with-Bahdanau-Attention
[132] NLP에서 scaling law는 어떻게 발전되었을까? https://cartinoe5930.tistory.com/entry/How-has-scaling-law-developed-in-NLP-%F0%9F%A4%94-NLP%EC%97%90%EC%84%9C-scaling-law%EB%8A%94-%EC%96%B4%EB%96%BB%EA%B2%8C-%EB%B0%9C%EC%A0%84%EB%90%98%EC%97%88%EC%9D%84%EA%B9%8C
[133] Reinforcement learning from human feedback https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback
[134] What Is Reinforcement Learning From Human Feedback ... https://www.ibm.com/think/topics/rlhf
[135] Neural scaling law https://en.wikipedia.org/wiki/Neural_scaling_law
[136] Bahdanau Attention - YJJo - 티스토리 https://yjjo.tistory.com/46
[137] RLHF: Reinforcement Learning from Human Feedback https://wikidocs.net/225547
[138] Effective Approaches to Attention-based Neural Machine ... https://supkoon.tistory.com/20
[139] "Scaling Laws for Neural Language Models", Kaplan et al ... https://www.reddit.com/r/mlscaling/comments/jl143s/scaling_laws_for_neural_language_models_kaplan_et/

반응형