arxiv: 2604.18955 · v1 · submitted 2026-04-21 · 💻 cs.CL · cs.AI· cs.SI

Recognition: unknown

Assessing Capabilities of Large Language Models in Social Media Analytics: A Multi-task Quest

Ramtin Davoudi , Kartik Thakkar , Nazanin Donyapour , Tyler Derr , Hamid Karimi

Authors on Pith no claims yet

Pith reviewed 2026-05-10 03:16 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.SI

keywords large language modelssocial media analyticsauthorship verificationpost generationuser attribute inferenceTwitter datasetbenchmark evaluationmulti-task assessment

0 comments

The pith

Large language models receive systematic tests on verifying post authors, generating user-like content, and inferring attributes from Twitter data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper runs a wide evaluation of current large language models on three connected social media tasks using a Twitter dataset. It applies a structured sampling method across users and posts, then tests on newly collected 2024 tweets to reduce the chance that models have already seen the examples. The tasks include checking whether a post matches a claimed author, producing posts that real users would write, and predicting occupations and interests from posting history with the help of standard classification systems. A separate user study asks people to rate how authentic the generated posts appear when based on their own writing style. The overall effort supplies shared test materials and results that others can use to compare future models on the same problems.

Core claim

The study evaluates GPT-4, GPT-4o, GPT-3.5-Turbo, Gemini 1.5 Pro, DeepSeek-V3, Llama 3.2, and BERT on social media authorship verification through a systematic sampling framework over users and posts, post generation assessed by multiple metrics plus a human perception study, and user attribute inference annotated with IAB Tech Lab 2023 and 2018 U.S. SOC taxonomies, thereby providing new insights and establishing reproducible benchmarks for LLM-driven social media analytics.

What carries the argument

A multi-task evaluation framework built around systematic sampling of users and posts that tests generalization on post-2023 tweets while linking verification, generation, and attribute inference through shared data.

If this is right

Models display distinct patterns of success and failure when asked to generate posts that humans judge as authentic.
Standard taxonomies for occupations and interests allow consistent measurement of how well models infer user attributes from posts.
Testing on tweets collected after January 2024 separates capabilities learned during training from simple memorization of earlier data.
Public release of the dataset and evaluation code allows other researchers to run the same tests and track progress over time.
The connection between authorship verification and post generation highlights shared challenges in style detection and style imitation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Strong results on attribute inference could support more precise automated user profiling in research or moderation settings.
The sampling approach could be reused to evaluate models on other user-generated content platforms beyond Twitter.
Findings on human detection of generated posts may help design better tools for spotting synthetic social media content.
Extending the same multi-task setup to additional analytics problems such as trend detection would create a fuller picture of model strengths.

Load-bearing premise

The chosen Twitter posts, sampling strategies, and mix of automatic and human evaluation metrics accurately capture real-world LLM performance in social media analytics without hidden selection biases or overfitting to the collection period.

What would settle it

If an independent collection of tweets from a later period produces substantially different performance orderings among the same models on any of the three tasks, or if a larger human study reaches opposite conclusions about the realism of generated posts, the reported benchmark insights would require revision.

Figures

Figures reproduced from arXiv: 2604.18955 by Hamid Karimi, Kartik Thakkar, Nazanin Donyapour, Ramtin Davoudi, Tyler Derr.

**Figure 2.** Figure 2: An example of metrics for post generation [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: User survey responses categorized by generated tweets from different LLMs 7 [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Mean class-wise weighted F1-scores for each model. Points above the dashed line ( [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: Precision and recall comparison across models for the authorship verification task. [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Heatmap illustrating mean evaluation metrics [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

read the original abstract

In this study, we present the first comprehensive evaluation of modern LLMs - including GPT-4, GPT-4o, GPT-3.5-Turbo, Gemini 1.5 Pro, DeepSeek-V3, Llama 3.2, and BERT - across three core social media analytics tasks on a Twitter (X) dataset: (I) Social Media Authorship Verification, (II) Social Media Post Generation, and (III) User Attribute Inference. For the authorship verification, we introduce a systematic sampling framework over diverse user and post selection strategies and evaluate generalization on newly collected tweets from January 2024 onward to mitigate "seen-data" bias. For post generation, we assess the ability of LLMs to produce authentic, user-like content using comprehensive evaluation metrics. Bridging Tasks I and II, we conduct a user study to measure real users' perceptions of LLM-generated posts conditioned on their own writing. For attribute inference, we annotate occupations and interests using two standardized taxonomies (IAB Tech Lab 2023 and 2018 U.S. SOC) and benchmark LLMs against existing baselines. Overall, our unified evaluation provides new insights and establishes reproducible benchmarks for LLM-driven social media analytics. The code and data are provided in the supplementary material and will also be made publicly available upon publication.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper sets up a multi-task LLM evaluation on Twitter tasks with systematic sampling, 2024 hold-out data, and a user study, but the lack of shown results and untested sampling stability leaves the benchmark claims thin.

read the letter

The core takeaway is that the authors run LLMs on three linked social media tasks—authorship verification, post generation, and attribute inference—using fresh X data collected after 2023 and a user study to check whether generated posts feel authentic to the original writers. They add systematic user and post sampling plus standardized taxonomies for occupations and interests, and they plan to release the code and data. That framework is more organized than the usual one-off prompting experiments in this area, and the user study is a reasonable way to connect the generation and verification pieces. The attempt to reduce seen-data bias with a January 2024+ split is also sensible on paper. What the work actually delivers is a reusable setup and some human judgment data rather than any surprising performance numbers or new theoretical insight. The soft spots are more noticeable. The abstract and available description give no concrete metrics, error breakdowns, or model comparisons, so the claim of “new insights and reproducible benchmarks” rests on methods that have not yet been shown to produce stable rankings. The stress-test concern about temporal and demographic biases in the Twitter sample is fair: without ablations on alternative stratifications or a second time split, it is hard to know whether the results would hold under different collection windows or account types. If the full paper does not include those checks, the reproducibility angle stays aspirational. This is useful reading for people who build or evaluate LLM tools specifically for social media analytics or online behavior studies. A reader already working on those tasks could borrow the sampling approach or the user-study protocol. It is not essential for broader LLM or social science audiences. I would send it to peer review because the multi-task framing and data release have practical value, but the authors should be asked to add the missing results tables, bias checks, and clearer limitations before it goes further.

Referee Report

1 major / 2 minor

Summary. The paper presents the first comprehensive evaluation of modern LLMs (GPT-4, GPT-4o, GPT-3.5-Turbo, Gemini 1.5 Pro, DeepSeek-V3, Llama 3.2, and BERT) on three Twitter-based social media analytics tasks: (I) authorship verification via a systematic sampling framework over diverse user/post strategies with generalization tested on newly collected January 2024+ tweets to mitigate seen-data bias, (II) post generation assessed through comprehensive metrics plus a user study measuring real users' perceptions of LLM-generated content conditioned on their own writing, and (III) user attribute inference with occupations/interests annotated via IAB Tech Lab 2023 and 2018 U.S. SOC taxonomies and benchmarked against baselines. The central claim is that this unified multi-task evaluation yields new insights and establishes reproducible benchmarks for LLM-driven social media analytics, with code and data provided in supplementary material for public release.

Significance. If the reported results, error analyses, and robustness checks hold, the work would be significant as one of the first multi-task benchmarks spanning verification, generation, and inference on social media data, with the user study and standardized taxonomies adding practical value. The public release of code/data further strengthens potential impact for the field, though significance is moderated by the need to confirm that findings reflect genuine capabilities rather than dataset-specific artifacts.

major comments (1)

[Abstract] Abstract: the headline claim that the unified evaluation 'establishes reproducible benchmarks' is load-bearing for the paper's contribution, yet the described systematic sampling and January 2024 hold-out set lack reported ablations testing whether LLM performance rankings remain stable under alternative user/post stratifications (e.g., by account age, follower count, or topic distribution) or a second independent temporal split; without these, the reproducibility assertion risks being sensitive to the specific 2023-2024 collection window and X API effects.

minor comments (2)

[Abstract] Abstract: 'comprehensive evaluation metrics' for post generation are referenced but not enumerated; the main text should explicitly list and justify each metric (e.g., perplexity, human-likeness scores) to allow readers to assess their appropriateness.
The manuscript should clarify the exact size and composition of the Twitter dataset (number of users/posts per task) and the precise annotation protocol for the IAB/SOC taxonomies to support the reproducibility claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

Thank you for the opportunity to respond to the referee's comments on our manuscript. We appreciate the careful reading and the focus on strengthening the reproducibility aspects of our work. Below we address the major comment point by point with a commitment to revisions where appropriate.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claim that the unified evaluation 'establishes reproducible benchmarks' is load-bearing for the paper's contribution, yet the described systematic sampling and January 2024 hold-out set lack reported ablations testing whether LLM performance rankings remain stable under alternative user/post stratifications (e.g., by account age, follower count, or topic distribution) or a second independent temporal split; without these, the reproducibility assertion risks being sensitive to the specific 2023-2024 collection window and X API effects.

Authors: We thank the referee for this constructive observation. Our systematic sampling framework was explicitly constructed to incorporate multiple diverse user and post selection strategies, and the January 2024 hold-out was collected independently to evaluate generalization beyond the original data window while mitigating seen-data bias. We agree, however, that we did not report explicit ablations confirming that LLM performance rankings remain invariant under further stratifications (e.g., account age, follower count, topic distribution) or an additional temporal split, nor did we isolate potential X API collection artifacts. In the revised manuscript we will add a dedicated robustness subsection that performs and reports such stability checks on available metadata attributes where computationally feasible, and we will revise the abstract language from 'establishes reproducible benchmarks' to 'contributes to establishing reproducible benchmarks' to more precisely reflect the scope of the presented evidence. These changes will be made while preserving the core multi-task evaluation and public data release. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical evaluation on external data and new collections

full rationale

The paper performs direct empirical benchmarking of LLMs across three tasks using newly collected January 2024 Twitter data, systematic sampling, human user studies, and standardized external taxonomies (IAB and SOC). No equations, parameter fitting presented as prediction, self-definitional constructs, or load-bearing self-citations appear in the derivation chain. All claims rest on observable experimental outcomes rather than reducing to inputs by construction, satisfying the criteria for a self-contained non-circular analysis.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract does not detail any fitted parameters, axioms, or new entities; the evaluation framework implicitly assumes standard LLM prompting and metric validity without explicit justification.

pith-pipeline@v0.9.0 · 5555 in / 985 out tokens · 30992 ms · 2026-05-10T03:16:58.912380+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

97 extracted references · 18 canonical work pages · 3 internal anchors

[1]

arXiv preprint arXiv:2407.11016 , year=

Longlamp: A benchmark for personalized long-form text generation , author=. arXiv preprint arXiv:2407.11016 , year=

work page arXiv
[2]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Lamp: When large language models meet personalization , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[3]

InProceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’08, page 677–680, New York, NY , USA

Personalization of large language models: A survey , author=. arXiv preprint arXiv:2411.00027 , year=

work page arXiv
[4]

Cheng-Han Chiang and Hung-yi Lee

Evaluation of text generation: A survey , author=. arXiv preprint arXiv:2006.14799 , year=

work page arXiv 2006
[5]

Perez-Castro and M.R

A. Perez-Castro and M.R. Martínez-Torres and S.L. Toral , keywords =. Efficiency of automatic text generators for online review content generation , journal =. 2023 , issn =. doi:https://doi.org/10.1016/j.techfore.2023.122380 , url =

work page doi:10.1016/j.techfore.2023.122380 2023
[6]

International Journal of Engineering & Technology , volume=

Impact of social media on e-commerce , author=. International Journal of Engineering & Technology , volume=
[7]

infodemics

From “infodemics” to health promotion: a novel framework for the role of social media in public health , author=. American journal of public health , volume=. 2020 , publisher=

2020
[8]

International Journal of Disaster Risk Reduction , volume=

Use of social media in crisis management: A survey , author=. International Journal of Disaster Risk Reduction , volume=. 2020 , publisher=

2020
[9]

Proceedings of the 19th international conference on World wide web , pages=

What is Twitter, a social network or a news media? , author=. Proceedings of the 19th international conference on World wide web , pages=
[10]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Bad actor, good advisor: Exploring the role of large language models in fake news detection , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[11]

arXiv preprint arXiv:2403.16248 , year=

Large language models offer an alternative to the traditional approach of topic modelling , author=. arXiv preprint arXiv:2403.16248 , year=

work page arXiv
[12]

, author=

Authorship Verification: A Review of Recent Advances. , author=. Res. Comput. Sci. , volume=
[13]

Sentiment Analysis in the Era of Large Language Models: A Reality Check.CoRR abs/2305.15005, 2023

Sentiment analysis in the era of large language models: A reality check , author=. arXiv preprint arXiv:2305.15005 , year=

work page arXiv
[14]

Proceedings of the 2010 ACM Symposium on Applied computing , pages=

E-mail authorship verification for forensic investigation , author=. Proceedings of the 2010 ACM Symposium on Applied computing , pages=

2010
[15]

ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP) , pages=

Similarity learning for authorship verification in social media , author=. ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP) , pages=. 2019 , organization=

2019
[16]

Multimedia Tools and Applications , volume=

Novel authorship verification model for social media accounts compromised by a human , author=. Multimedia Tools and Applications , volume=. 2021 , publisher=

2021
[17]

arXiv preprint arXiv:2209.06869 , year=

On the state of the art in authorship attribution and authorship verification , author=. arXiv preprint arXiv:2209.06869 , year=

work page arXiv
[18]

Proceedings of the 2021 IEEE/ACM 8th International Conference on Big Data Computing, Applications and Technologies , pages=

Linking user accounts across social media platforms , author=. Proceedings of the 2021 IEEE/ACM 8th International Conference on Big Data Computing, Applications and Technologies , pages=

2021
[19]

2019 IEEE international conference on big data (Big Data) , pages=

Explainable authorship verification in social media via attention-based similarity learning , author=. 2019 IEEE international conference on big data (Big Data) , pages=. 2019 , organization=

2019
[20]

Transactions of the Association for Computational Linguistics , volume=

Benchmarking large language models for news summarization , author=. Transactions of the Association for Computational Linguistics , volume=. 2024 , publisher=

2024
[21]

Proceedings of the fourth ACM international conference on Web search and data mining , pages=

Everyone's an influencer: quantifying influence on twitter , author=. Proceedings of the fourth ACM international conference on Web search and data mining , pages=
[22]

Journal of information technology & politics , volume=

Twitter use in election campaigns: A systematic literature review , author=. Journal of information technology & politics , volume=. 2016 , publisher=

2016
[23]

science , volume=

The spread of true and false news online , author=. science , volume=. 2018 , publisher=

2018
[24]

Journal of the American society for information science and technology , volume=

Twitter power: Tweets as electronic word of mouth , author=. Journal of the American society for information science and technology , volume=. 2009 , publisher=

2009
[25]

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval , pages=

Short text classification in twitter to improve information filtering , author=. Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval , pages=
[26]

Proceedings of the 20th international conference on World wide web , pages=

Information credibility on twitter , author=. Proceedings of the 20th international conference on World wide web , pages=
[27]

Artificial Intelligence Review , volume=

Transformer-based models for combating rumours on microblogging platforms: a review , author=. Artificial Intelligence Review , volume=. 2024 , publisher=

2024
[28]

Frontiers in big Data , volume=

Sentiment analysis of COP9-related tweets: a comparative study of pre-trained models and traditional techniques , author=. Frontiers in big Data , volume=. 2024 , publisher=

2024
[29]

2024 IEEE International Conference on Big Data (BigData) , pages=

SentimentGPT: Leveraging GPT for Advancing Sentiment Analysis , author=. 2024 IEEE International Conference on Big Data (BigData) , pages=. 2024 , organization=

2024
[30]

Findings of the Association for Computational Linguistics: EMNLP 2024 , year=

Can Large Language Models Identify Authorship? , author=. Findings of the Association for Computational Linguistics: EMNLP 2024 , year=

2024
[31]

ACM SIGKDD Explorations Newsletter , volume=

Authorship attribution in the era of llms: Problems, methodologies, and challenges , author=. ACM SIGKDD Explorations Newsletter , volume=. 2025 , publisher=

2025
[32]

Findings of the Association for Computational Linguistics ACL 2024 , pages=

RePALM: Popular Quote Tweet Generation via Auto-Response Augmentation , author=. Findings of the Association for Computational Linguistics ACL 2024 , pages=

2024
[33]

Science Advances , volume=

AI model GPT-3 (dis) informs us better than humans , author=. Science Advances , volume=. 2023 , publisher=

2023
[34]

arXiv preprint arXiv:2304.06588 , year=

Chatgpt-4 outperforms experts and crowd workers in annotating political twitter messages with zero-shot learning , author=. arXiv preprint arXiv:2304.06588 , year=

work page arXiv
[35]

Findings of the Association for Computational Linguistics: ACL 2023 , pages=

Towards open-domain Twitter user profile inference , author=. Findings of the Association for Computational Linguistics: ACL 2023 , pages=

2023
[36]

Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=

Who Wrote it and Why? Prompting Large-Language Models for Authorship Verification , author=. Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=

2023
[37]

Amplifying your social media presence: Personalized influential content generation with llms.arXiv preprint arXiv:2505.01698, 2025

Amplifying Your Social Media Presence: Personalized Influential Content Generation with LLMs , author=. arXiv preprint arXiv:2505.01698 , year=

work page arXiv
[38]

Proceedings of the 31st International Conference on Computational Linguistics , pages=

Engagement-driven Persona Prompting for Rewriting News Tweets , author=. Proceedings of the 31st International Conference on Computational Linguistics , pages=
[39]

Proceedings of the 17th ACM International Conference on Web Search and Data Mining , pages=

Once: Boosting content-based recommendation with both open-and closed-source large language models , author=. Proceedings of the 17th ACM International Conference on Web Search and Data Mining , pages=
[40]

Machine Learning , volume=

Evaluating large language models for user stance detection on X (Twitter) , author=. Machine Learning , volume=. 2024 , publisher=

2024
[41]

2023 IEEE International Conference on Big Data (BigData) , pages=

An analysis of the dynamics of ties on twitter , author=. 2023 IEEE International Conference on Big Data (BigData) , pages=. 2023 , organization=

2023
[42]

2023 , note =

Content Taxonomy: v3.1 , author =. 2023 , note =

2023
[43]

2018 , note =

Standard Occupational Classification (SOC) System , author =. 2018 , note =

2018
[44]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Sentence-bert: Sentence embeddings using siamese bert-networks , author=. arXiv preprint arXiv:1908.10084 , year=

work page internal anchor Pith review arXiv 1908
[45]

OpenAI blog , volume=

Language models are unsupervised multitask learners , author=. OpenAI blog , volume=
[46]

Intelligent Medicine , volume=

Large language models-powered clinical decision support: enhancing or replacing human expertise? , author=. Intelligent Medicine , volume=. 2025 , publisher=

2025
[47]

Nature Medicine , pages=

Toward expert-level medical question answering with large language models , author=. Nature Medicine , pages=. 2025 , publisher=

2025
[48]

npj Science of Learning , volume=

Evaluating large language models in analysing classroom dialogue , author=. npj Science of Learning , volume=. 2024 , publisher=

2024
[49]

Philosophical Transactions of the Royal Society A , volume=

Large language models as tax attorneys: a case study in legal capabilities emergence , author=. Philosophical Transactions of the Royal Society A , volume=. 2024 , publisher=

2024
[50]

Bakal , title =

Emre Cicekyurt and Mehmet G. Bakal , title =. Computational Economics , volume =. 2025 , doi =

2025
[51]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations , pages =

Dat Quoc Nguyen and Thanh Vu and Anh Tuan Nguyen , title =. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations , pages =. 2020 , publisher =

2020
[52]

Tom B. Brown and Benjamin Mann and Nick Ryder and Melanie Subbiah and Jared Kaplan and Prafulla Dhariwal and Arvind Neelakantan and Pranav Shyam and Girish Sastry and Amanda Askell and Sandhini Agarwal and Ariel Herbert-Voss and Gretchen Krueger and Tom Henighan and Rewon Child and Aditya Ramesh and Daniel M. Ziegler and Jeffrey Wu and Clemens Winter and ...
[53]

Klein and Arjun Magge and Graciela Gonzalez-Hernandez , title =

Karen O'Connor and Su Golder and Davy Weissenbacher and Ari Z. Klein and Arjun Magge and Graciela Gonzalez-Hernandez , title =. Journal of Medical Internet Research , volume =. 2024 , doi =

2024
[54]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing , pages =

Anna Tigunova and Andrew Yates and Paramita Mirza and Gerhard Weikum , title =. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing , pages =. 2020 , publisher =

2020
[55]

Advances in Neural Information Processing Systems , year=

TwiBot-22: Towards Graph-Based Twitter Bot Detection , author=. Advances in Neural Information Processing Systems , year=
[56]

Expert Systems with Applications , year=

A review on sentiment analysis from social media platforms , author=. Expert Systems with Applications , year=
[57]

Scientific Reports , year=

The potential of generative AI for personalized persuasion at scale , author=. Scientific Reports , year=
[58]

On the Opportunities and Risks of Foundation Models

On the opportunities and risks of foundation models , author=. arXiv preprint arXiv:2108.07258 , year=

work page internal anchor Pith review arXiv
[59]

Interactive Learning Environments , volume=

Automatic text generation using deep learning: providing large-scale support for online learning communities , author=. Interactive Learning Environments , volume=. 2023 , publisher=

2023
[60]

PNAS Nexus , year=

Large Language Models Can Infer Psychological Dispositions of Social Media Users , author=. PNAS Nexus , year=
[61]

Sensors , volume =

Enhancing Personalized Ads Using Interest Category Classification of SNS Users Based on Deep Neural Networks , author =. Sensors , volume =. 2021 , publisher =

2021
[62]

AGILE: GIScience Series , volume=

Occupation Prediction with Multimodal Learning from Tweet Messages and Google Street View Images , author=. AGILE: GIScience Series , volume=. 2024 , publisher=

2024
[63]

arXiv preprint arXiv:2407.12882 , year=

InstructAV: Instruction Fine-tuning Large Language Models for Authorship Verification , author=. arXiv preprint arXiv:2407.12882 , year=

work page arXiv
[64]

arXiv preprint arXiv:2411.13226 , year=

AIDBench: A benchmark for evaluating the authorship identification capability of large language models , author=. arXiv preprint arXiv:2411.13226 , year=

work page arXiv
[65]

S3: Social-network simulation system with large language model-empowered agents

S3: Social-network simulation system with large language model-empowered agents , author=. arXiv preprint arXiv:2307.14984 , year=

work page arXiv
[66]

Knowledge-Based Systems , volume=

Understanding writing style in social media with a supervised contrastively pre-trained transformer , author=. Knowledge-Based Systems , volume=. 2024 , publisher=

2024
[67]

Engineering, Technology & Applied Science Research , volume=

Authorship Attribution for English Short Texts , author=. Engineering, Technology & Applied Science Research , volume=
[68]

Proceedings of the International AAAI Conference on Web and Social Media , volume=

StyleLink: User Identity Linkage Across Social Media with Stylometric Representations , author=. Proceedings of the International AAAI Conference on Web and Social Media , volume=
[69]

arXiv preprint arXiv:2502.12073 , year=

Can LLMs Simulate Social Media Engagement? A Study on Action-Guided Response Generation , author=. arXiv preprint arXiv:2502.12073 , year=

work page arXiv
[70]

Proceedings of the 29th ACM SIGKDD conference on knowledge discovery and data mining , pages=

Twhin-bert: A socially-enriched pre-trained language model for multilingual tweet representations at twitter , author=. Proceedings of the 29th ACM SIGKDD conference on knowledge discovery and data mining , pages=
[71]

Proceedings of the 16th international conference on World Wide Web , pages=

Demographic prediction based on user's browsing behavior , author=. Proceedings of the 16th international conference on World Wide Web , pages=
[72]

Fundamenta Informaticae , volume=

Predicting website audience demographics forweb advertising targeting using multi-website clickstream data , author=. Fundamenta Informaticae , volume=. 2010 , publisher=

2010
[73]

Marketing Science , volume=

Predicting individual behavior with social networks , author=. Marketing Science , volume=. 2014 , publisher=

2014
[74]

NLP Evaluation in Trouble: On the Need to Measure LLM Data Contamination for Each Benchmark

NLP evaluation in trouble: On the need to measure LLM data contamination for each benchmark , author=. arXiv preprint arXiv:2310.18018 , year=

work page arXiv
[75]

Neurocomputing , volume=

Data mining techniques in social media: A survey , author=. Neurocomputing , volume=. 2016 , publisher=

2016
[76]

Proceedings of the 40th annual meeting of the Association for Computational Linguistics , pages=

Bleu: a method for automatic evaluation of machine translation , author=. Proceedings of the 40th annual meeting of the Association for Computational Linguistics , pages=
[77]

Text summarization branches out , pages=

Rouge: A package for automatic evaluation of summaries , author=. Text summarization branches out , pages=
[78]

1998 , publisher=

Evaluation metrics for language models , author=. 1998 , publisher=

1998
[79]

Advances in neural information processing systems , volume=

Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers , author=. Advances in neural information processing systems , volume=
[80]

Proceedings of the 2018 conference on empirical methods in natural language processing: system demonstrations , pages=

Universal sentence encoder for English , author=. Proceedings of the 2018 conference on empirical methods in natural language processing: system demonstrations , pages=

2018

Showing first 80 references.